Latest Posts

Archives [+]

Categories [+]

Authors [+]

Archive for July 2009

    Posted by Cedric Huesler JUL 30, 2009

    Posted in announcements Comment 1

    Same procedure as every year... not exactly! This year's Customer Summit - happening in October 2009 - will be held in two location. For the folks in Europe and Asia we meet in Zurich and for our friends in US and Canada we meet in beautiful Chicago.

    Further, we combined the business and technical gathering - the two-day conference agenda is packed with keynotes in the morning and split business and technical sessions in the afternoon.

    We also use the opportunity of the Customer Summit to present the latest product advancements and announce new releases.

    This year's product highlights:

    • In-depth presentation of version 5.3 of the CQ5 platform (that includes WCM, DAM and Social Collaboration)
    • Unveiling of CRX 2.0: the JCR 2.0 (JSR-283) content repository
    • Presentation of the 2010 roadmap
    • and one more thing

    Do you like to share your CQ5 story? We have reserved speaking slots for partners, developers and customers to share their experience. Check out the Call for Speakers page to learn more.

    The Customer Summit is a great opportunity to meet the community, exchange ideas and network with implementation partners and Day employees. We look forward to see you in Zurich and Chicago.

    Visit the Summit Site for detailed agenda and registration.

    Posted by Michael Marth JUL 29, 2009

    Posted in data first and lotd Add comment

    A couple of days ago, I wrote about the NoSQL movement and my conviction that data storage models will soon be better fitted to the data at hand (rather than have any data shoehorned into a relational model). It turns out that Scott Leberknight has come up with a nice buzz word for this phenomenon: "Polyglot Persistence". Quote from a presentation abstract of his:

    Polyglot persistence is all about considering your persistence requirements and selecting a persistence mechanism that best mets those requirements, as opposed to selecting an RDBMS as the default choice.

    InfoQ has more information on his talk. From the article:

    The types of data managed in the applications is very different as well. It can be either Structured (relational data), Semi-Structured (for example, documents in a medical records system) or Unstructured (audio/video stream).

    (SCNR: if you have all of the above you might want to look at content repositories).

    Related to this: For an introduction to ACID vs BASE you might also enjoy the talk "Drop ACID and think about data" from PyCon.

    Posted by Michael Marth JUL 28, 2009

    Posted in jcr Add comment

    Congratulations to the eXo folks for their latest release of the eXo JCR implementation. Apart from the standardized JSR-170 features they provide an Extensions API that looks interesting. Extensions can be used to trigger actions when the content changes. That sounds like JCR Observations, but the extensions get called when the changes happen in the session, whereas observations only fire for persisted content. Therefore, this allows for "before update"-like functionality. Nice!

    Posted by Jean-Michel Pittet JUL 23, 2009

    Posted in cq5 Add comment

    I'd like to thank the dozens of customer for joining us immediately after the release of CQ5's version 5.2 and with whom we've been able to work with. It has been and is a fantastic collaboration that spans across all continents. Readers of this blog know the brands we have communicated in the past and more are to be announced. Our customers have provided us with excellent feedback, good challenges and excellent collaboration that helped us to increase CQ5's already high level of fun-of-use, stability, scalability, and performance. Fun-of-use, not just for the authors and content editors, but also for our developers and system administrators, in line with our 4 target groups for whom we strive to deliver excellence.

    Version 5.2.1 is now available for download. It is a maintenance and convenience release. The most important improvements according to your feedback have been rolled up into it. As such version 5.2.1 delivers on of the pillars of our product strategy: "tight customer and partner focus".

    We continue working on the other two aspects of our product strategy: technology & product leadership and standards & open source leadership for the content industry with our planned CQ 5.3 release as well as JCR 2.0 (JSR-283).

    For more on those two, please join us at the "Ingnite 2009" customer summits Europe and US. The sign-up page will be up very soon.

    Posted by Michael Marth JUL 21, 2009

    Posted in dynamic languages, lotd and sling Add comment

    Yesterday, a Python framework for GAE development called Pyxer got updated to version 0.7.2. I mention this because in a well-hidden corner of Pyxer is a Python wrapper for Sling development: check out /src/pyxer/sling/sling.py. This class encapsulates the Sling HTTP API in a Python class (at least some of the API, but it gives you a start).

    If you want to give it a spin be aware that the Pyxer class puts the default Sling port to 7777, instead of 8888.

    Posted by Michael Marth JUL 20, 2009

    Posted in jackrabbit, jcr, jsr-283 and lotd Add comment

    Jukka Zitting, committer in the Apache Jackrabbit project, has published a post on the status of the JSR-283 implementation in Jackrabbit. For a quick summary of JSR-283 see spec lead David Nuescheler's recent post.

    Posted by Michael Marth JUL 16, 2009

    Posted in data first, ecm and wcm Comments 9

    OK, I admit it, declaring that "the RDBMS is dead" is a meme that has been going around the software industry for a while. Remember object-oriented data bases that were supposed to replace the relational ones? Well, guess who is still here. However, despite the RDBMS's amazing survival skills I would like to propose a related prediction:

    I believe that the year 2009 will go down in history as the year when the "relational model default" ended. The term "relational model default" was coined by me to describe a peculiar thing that goes on in application development: start talking to your average application developer about some arbitrary business requirement and chances are that simultaneously he mentally constructs a relational model to fit those requirements.

    That relational approach to modeling your problem may or may not be suitable. The real problem is that all too often this default does not get challenged. As a consequence, whatever the fitting data model might be, it gets shoehorned into tables and relations.

    This default "thinking" has not yet changed for the masses, but I believe that it has changed for the early adopters (which means that invariably it will change for the masses in some years).

    I see the default to change from:

    "I need to store some data i.e. I need a relational database"

    to:

    "I need to store something, let me see the data to decide how to store it."

    The most concrete and visible manifestation of the rising interest in non-relational data store is the "NoSQL" movement. NoSQL denotes a group of people interested in exploring and comparing alternatives to the traditional relational data storages like MySQL or Postgres. The inaugural get-together has been covered in Computerworld, see also Johan Oskarsson's post and there is, of course, a Hashtag.

    Other than the NoSQL group I have a second data point to offer: there is a Cambrian Explosion happening in terms of projects exploring non-relational data stores. During the Cambrian Explosion a major diversification of organisms took place. Similarly a plethora of new projects that explore alternatives to relational models continue to gain interest. Here is an incomplete list:

    AllegroGraph, Amazon's SimpleDB, Cassandra, CouchDB, Dynomite, Google's App Engine datastore, HBase, Hypertable, Kai, MemcacheDB, Mongo DB, Neo4J, OpenRDF, Project Voldemort, Redis, Ringo, Scalaris , ThruDB, Tokyo Cabinet (and Tokyo Tyrant and LightCloud)

    Last, but certainly not least, there are Apache Jackrabbit and Apache Sling.

    From my perspective there are three main areas of innovation in this Cambrian Explosion of data stores:

    1. Models
    In the relational model you break down your data into tables and relations. This model implies that the data is somewhat tabular. However, in some cases the data simply is not tabular.

    Consider web content, which is hierarchical and mixes fine-granular data with binary files (this model is implemented in Jackrabbit). Other (not mutually exclusive) alternative models are document-oriented, key-value pairs, or Graphs/RDF.

    One very important aspect of many alternative models is that they are schemaless. That means that they accommodate for Data First approaches where it is not required to define the data structure before one can actually store any data. This enables agile approaches to software development in the short term as well as more flexibility in the long term evolution of business requirements.

    Without defining a data structure first it is not possible to store anything at all in an RDBMS. This fact is probably one of the root causes of the relational default thinking. An RDBMS-based developer simply cannot develop anything without thinking about table structure.

    2. Scalability
    A second area of innovation is scalability. This can be split down into two sections: One is scalability achieved by distributing the data store across separate machines, the approach pioneered by Google. Opposed to classical clustering of RDBMSs the order of magnitude of machines that are considered is hundreds rather than ten. Obviously, different trade-offs regarding consistency and availability of individual cluster nodes must be taken when architecting for such a high number of cluster nodes. Eventual consistency is one of the interesting concepts invented in this space.

    While the commoditization of server hardware triggered this first approach to scalability, a second area is related to the rise of multi-core processors. For a number of years CPUs have not gotten faster, but rather the number of cores has increased. There is no explicit contradiction in running a classical RDBMS on a multi-core machine and even having the RDBMS take advantage of them. However, it seems to me that the SQL language is a poor fit for queries in a multi-core environment when compared with alternatives such as Map/Reduce which are parallel by design.

    3. Web
    The third area of innovation revolves around the fact that the web is the dominant paradigm for computing in our time. This is also acknowledged by the two considerations discussed above. However, a third one is that HTTP is used for accessing the data. Other types of connectivity that were typically implemented as JDBC or ODBC drivers are not needed/used anymore. In many cases the data store exposes its resources in a RESTful API. An obvious benefit is the ubiquitous availability of clients including the browser itself. The classical RDBMS approach involving a dedicated driver looks like a client-server architecture mindset in comparison (I wrote about this 1.5 years ago).

    At this point let me re-iterate that RDBMSs are here to stay, just like mainframes never went away. Moreover, a couple of the innovation areas cited above are not that new at all, especially, when it comes to non-relational data models (for example, I recently dug into the foundations of the Lotus Notes document store and came out very impressed). However, it is only now that the relational model default will disappear.

    What about content management systems?

    Considering the content management system industry as a whole I am extremely happy about this shift away from RDBMSs. Especially the model aspect is crucial: RDBMSs embody a fundamentally wrong model for content. There are varying opinions in the industry about what "content" really is, but one thing is more or less universally accepted: it is (at least partially) unstructured. Well, RDBMSs are designed for structured data. Duh.

    So why are there one gazillion LAMP-based CMSs? I blame the relational model default. But as this default vanishes we will see more and more CMSs that are not based on an RDBMS (see the Jackrabbit wiki for a list of JCR-based ones, as well as the recent PHP-based JCR implementations Jackalope or for Typo3 or the Midgard content repository).

    Don't laugh, but I truly envision a better (CMS) world once more CMSs are built upon proper tools and not forced into a relational model anymore. It will be a better world for developers and consequently for the CMS users.

    What about Day?

    REST and content repositories were invented and evangelized by Day's Chief Scientist Roy and Day's CTO David years ago already. So it is no surprise that Day's content management systems are in an excellent shape with respect to these considerations. CQ5 is built upon Apache Jackrabbit, i.e. a data store that implements a content-centric model, and Apache Sling, a web framework designed to be RESTful right from the start.

    When it comes to scaling: a week ago we gave a live demonstration on how to install and cluster CQ5 on Amazon's EC2 service. But, expect even more exciting news in this area.

    Posted by Michael Marth JUL 13, 2009

    Posted in apache Comment 1

    Day's Chief Scientist Roy Fielding has been elected as one of the new members of the Apache Software Foundation's a new Board of Directors. Congratulations! Roy had been member of the Board before, see Shane Cucuru's nice timeline of the Board members.

    On the occasion of the ASF's 10th birthday we published an interview with Roy and Bertrand Delacretaz (who served on the Board during the last term) about the foundation's past and future.

    Posted by Michael Marth JUL 08, 2009

    Posted in cq5, fun and idle Add comment

    My colleague Lars Trieloff has gone through a fun excercise: he used Nat Pryce's software Project Painter and Synesketch to visualize how happy the developers of our software are. Project Painter analyzes the emotions expressed in comments of the software and generates an image from that. The results: they are happy and in the case of Sling a bit surprised as well (see the Synesketch wiki for how to interpret the images).

    Apache Felix

    Apache Jackrabbit

    Apache Sling

    CQ5

    Posted by Freddy Mallet JUL 01, 2009

    Posted in quality and sling Comment 1

    This is a cross-post of Freddy's analysis at the Sonar site. We use Sonar internally at Day to track and improve the quality of all our software. Also check out Nemo which is Sonar's platform for analysing various other FOSS projects.


    A few weeks ago Michael Marth, who runs dev.day.com (Day’s developer portal), asked us if we could put together our impressions on the code quality of Apache Sling using Sonar. We thought it would be valuable to share the result of this exercise with the community.

    Apache Sling in a few words

    “Apache Sling is an innovative web framework that is intended to bring back the fun to web development. It uses all those nice cool and new technologies that make up a state-of-the-art framework. This is Apache Sling in five bullets:

    • REST based web framework
    • Content-driven, using a JCR content repository
    • Powered by OSGi
    • Scripting inside, multiple languages
    • Apache Open Source project

    Some size indications of the project

    • 40 Maven modules
    • 70,707 lines of code
    • 731 Java classes
    • and 23,043 lines of Javadoc

    The strengths in terms of quality

    • A project that you get and compile with no difficulty by running two commands:
      1. svn checkout https://svn.apache.org/repos/asf/sling/trunk/
      2. mvn clean install
      This sounds like an evidence but is not always the case :-)
    • Amongst 130,172 physical lines, only 0.9% are involved in a duplication
    • 46.4% of public API are commented with a Javadoc block

    The weaknesses

    • Only 9% of the source code is covered by 338 unit tests
    • Average cyclomatic complexity by method (excluding getters and setters) is greater than 3 (3.2).
      That is kind of a warning saying “your methods are taking too much responsibilities and should be re-factored”. This warning is confirmed by others metrics : 394 methods have a complexity greater than 7 and 86 methods have more than 50 statements. What is true at method level gets also partially confirmed at class level as 60 classes have a Fan Out Complexity greater than 20 (The number of other classes referenced by a class)

    Bad programming practices that should be improved

    • 198 times, method parameters are reassigned in the core of the method
    • 68 times, local variables are defined and hide class fields
    • 28 times, NullPointerException are thrown when an IllegalParameterException would be more suitable

    Potential bugs that should be quickly analyzed

    • Correctness - An apparent infinite recursive loop : there is an apparent infinite recursive loop in org.apache.sling.scripting.jsp.jasper.runtime. JspContextWrapper.include(String, boolean)
    • Multithreaded correctness - Unsynchronized get method, synchronized set method : org.apache.sling.scripting.jsp.jasper.compiler. JspRuntimeContext.getJspReloadCount() is unsynchronized, org.apache.sling.scripting.jsp.jasper.compiler. JspRuntimeContext.setJspReloadCount(int) is synchronized
    • Multithreaded correctness - Method calls Thread.sleep() with a lock held : org.apache.sling.event.impl. JobEventHandler.runJobQueue(String, JobBlockingQueue) calls Thread.sleep() with a lock held
    • Malicious code vulnerability - Field is a mutable array : org.apache.sling.jcr.webdav.impl.servlets. SlingWebDavServlet.COLLECTION_TYPES_DEFAULT is a mutable array

    This analysis was done with the intention of giving a synthetic overview of the current state of the project. Where should you start from if tomorrow you wake up with a single idea in mind : “Improving quality of the Apache Sling project !” ?

    • With respectively a cyclomatic complexity of 428, 385 and 343, classes Generator, Parser and XMLEndoginDetector should be first refactored. With no surprise, the Generator.java file has the greatest number of duplicated lines (154) and rules violations (109)
    • With its 43 cyclomatic complexity and no unit tests, the method ModifyAceServlet.handleOperation(..) is what we call “a crappy method” :-)

    More information on the code quality of the project is available on Nemo.