Latest Posts

Archives [+]

Archive for 'May 2011'

    Posted by Cedric Huesler MAY 11, 2011

    Posted in conferences Add comment

    Good five months ago the acquisition with Adobe was official approved and the engineering teams could go full steam ahead. It's time we are going to show the first results of where we are heading with the CRX content repository and the CQ5 platform.

    For this propose we are going to be at the JAX developer conference in San Jose from June 20-23. Among developer-focuses sessions on stage (to be announced) we are going to have a booth with experts and notebooks loaded with all the goodness.

    How about a meet-up of CRX/CQ5 experts during these days in San Jose?

    Jax Java Conference - San Jose 2011

     

    Learn more about the JAX conference: http://jaxconf.com/

    Don't forget to use the discount code JAXADOBE to get 25% off on registration.

    Posted by Cedric Huesler MAY 10, 2011

    Posted in cq5, crx and tutorials Comments 8

    Back in April - Gabriel Walt and myself - recorded two sessions for CQ5 and CRX developers - both based on CRX 2.2 and CQ 5.4 released earlier this year.

    In our office in Basel - while on air - Gabriel sharing the insight on how the mobile site rendering works with CQ 5.4:

    file

     

    Both session are around 60mins and have been recorded with Adobe Connect.

    The 1st session is for people new to CRX and the "Content Repository". This is a good session to get started - with coding examples. We cover:

    • What is a Content Repository and how does it work
    • Comparison to relational database
    • Benefits of the OSGi platform
    • Open Source projects included in CRX
    • Demo of building an app with CRX
    • Deployment options and clustering

    Sign-in to watch recording of CRX Session

    You can download the CRX 2.2 with the free Developer Edition license from day.com.

    The 2nd session is for experienced CQ5 developers - good for people that worked with previous releases of CQ5. We are touching some of the new features in 5.4 and explain the concepts and show demos. We cover:

    • Mobile Device Capability and Device Group
    • Configurable roll-out configuration for LiveCopy
    • Improve page speed with ClientLibs
    • HTML5 Video Component and Transcoding Profiles
    • Workflow-based reverse-replication & user generated content moderation
    • Integration with SiteCatalyst and custom events tracking

    Sign-in to watch recording of CQ5 Session

    Enjoy.

    Note: The login to watch the recording is using the "Adobe ID". If you already have an Adobe ID because you happened to be Adobe customer e.g. using Creative Suite or other products - use this account. If you don't have an Adobe ID yet - take a few minutes to create one. Going forward - we are going to use Adobe ID to make stuff available to you.

    Posted by Kas Thomas MAY 02, 2011

    Posted in contentmodels, crx gems, development, jackrabbit, java content repository, javascript, jcr and performance Comments 4

    Adobe CRX is an extremely versatile content store that can handle a wide range of content types (structured and unstructured), capable of reliably storing many millions of objects. In fact, the system's ultimate storage limits are actually not subject to any particular limitations of CRX itself but (rather) depend on the underlying persistence manager. You can choose from a number of different types of persistence (DB2, MySQL, Oracle, TarPM; see documentation here), each with its own particular limitations.

    In general, the default TarPM persistence manager gives better performance than most RDBMS alternatives for the typical CRX use cases (involving web content and user management). But in certain situations, with certain use cases, performance with TarPM can take a hit. The most common problem? Big Flat Lists.

    Although read performance remains good, write performance can suffer in the case where you need to store, say, thousands of sibling nodes under one parent node. This has to do with the fact that TarPM is an append-only store in which objects are immutable and never overwritten, only rewritten. What it means is that the cost of adding (or updating) Node No. N-thousand-plus-one can be quite high.

    Of course, the answer is to divide and conquer: Break the nodes up into smaller groups, preferably hierarchical groups.

    Suppose you have a large number of users whose user-data you want to store in CRX, and you'd like to be able to store users by name. The naive way (we'll keep the example simple and assume no name collisions) would be to store Joe Smith under a node named users/joe_smith, Lee Jones under users/lee_jones, etc. But after a thousand names or so, performance will start to suffer noticeably as new entries are written to the repository. Far better performance will result if container nodes (buckets) are created for each letter of the alphabet, and for each Last Name, so that you can add Joe Smith as /users/S/Smith/Joe, for example.

    A more sophisticated approach would be to hash user IDs and chunk the hash to form an ad-hoc hierarchy. For example, "Joe Smith" might give a hash of ab12cd34. The user data for Joe Smith can be stored at users/ab/12/cd/34. When the time comes to look up data for Joe Smith, you would first hash the name (to obtain ab12cd34), then create the necessary path from the hash, and look up the data.

    As it turns out, the Jackrabbit API (which of course is built into CRX) offers yet another alternative for efficient hierarchical storage of arbitrary data, in the form of the BTreeManager. This class provides B+ tree-like behavior in allocating subtrees of nodes that are always balanced, with a fixed limit on how many siblings any given node can have. (You provide the limit as an argument in the constructor.)

    I wrote a very short test script (in ECMAScript) to show how the BTreeManager operates, as shown below:

    <html>
    <body>
    <%
    /* Create a new TreeManager instance rooted at the current node.
    Splitting of nodes takes place
    when the number of children of a node exceeds 40 and is done such that each new
    parent node has >= 10 child nodes. Keys are ordered according to the natural
    order of java.lang.String. */
     

     var treeManager = new Packages.org.apache.jackrabbit.commons.flat.BTreeManager(    this.currentNode, 10, 40, Packages.org.apache.jackrabbit.commons.flat.Rank.comparableComparator(), true);

     // Create a new NodeSequence with that tree manager
     var nodes = Packages.org.apache.jackrabbit.commons.flat.ItemSequence.createNodeSequence(treeManager);
     
     var totalNodes = 100;
     
     // Do some profiling:
     var start = 1 * new Date();
     
     // add a bunch more nodes
      for (var i = 0; i < totalNodes; i++)
       nodes.addNode( "MyNode" + i,
       Packages.javax.jcr.nodetype.NodeType.NT_UNSTRUCTURED);
       
     var end = 1 * new Date();
     
     %>
     
    <%= "Total time: " + (end - start) + " millisecs" %>
    </body>
    </html>

    I called this script tree.esp and placed it under /apps/tree in CRX, then created a dummy node under /content and gave the dummy node a sling:resourceType of "tree" (to trigger the script when navigating to content/dummyNode.tree).

    The performance benefits of BTreeManager are notable. On my (decrepit Dell) laptop, adding 100 nodes as a flat list took 1.6 seconds (which includes about 200 milliseconds for servlet compilation). Adding 1000 nodes as a flat list (no B-tree) took 22 seconds. Adding 5000 nodes took 289 seconds. Note that adding five times as many entries took almost 13 times as long.

    By contrast, using BTreeManager (set to a maximum sibling breadth of 40), adding 1000 nodes took 14 seconds and adding 5000 took 86 seconds. (Five times the data takes roughly five times as long.)

    The real lesson here is: If your content is hierarchical (or can be made to look hierarchical), by all means capitalize on that fact! Don't try to treat your content as a Big Flat List, especially if you'll be doing a lot of updates. (If you're doing mostly reads and few writes, on the other hand, it doesn't much matter.) Introducing a bit of hierarchy to your content organization scheme will go a long way toward promoting fast update performance.

    (Many thanks to Felix Meschberger and Marcel Reutegger for input into this blog.)