Latest Posts

Archives [+]

Entries filed under 'jcr'

    Posted by Kas Thomas MAY 02, 2011

    Posted in contentmodels, crx gems, development, jackrabbit, java content repository, javascript, jcr and performance Comments 4

    Adobe CRX is an extremely versatile content store that can handle a wide range of content types (structured and unstructured), capable of reliably storing many millions of objects. In fact, the system's ultimate storage limits are actually not subject to any particular limitations of CRX itself but (rather) depend on the underlying persistence manager. You can choose from a number of different types of persistence (DB2, MySQL, Oracle, TarPM; see documentation here), each with its own particular limitations.

    In general, the default TarPM persistence manager gives better performance than most RDBMS alternatives for the typical CRX use cases (involving web content and user management). But in certain situations, with certain use cases, performance with TarPM can take a hit. The most common problem? Big Flat Lists.

    Although read performance remains good, write performance can suffer in the case where you need to store, say, thousands of sibling nodes under one parent node. This has to do with the fact that TarPM is an append-only store in which objects are immutable and never overwritten, only rewritten. What it means is that the cost of adding (or updating) Node No. N-thousand-plus-one can be quite high.

    Of course, the answer is to divide and conquer: Break the nodes up into smaller groups, preferably hierarchical groups.

    Suppose you have a large number of users whose user-data you want to store in CRX, and you'd like to be able to store users by name. The naive way (we'll keep the example simple and assume no name collisions) would be to store Joe Smith under a node named users/joe_smith, Lee Jones under users/lee_jones, etc. But after a thousand names or so, performance will start to suffer noticeably as new entries are written to the repository. Far better performance will result if container nodes (buckets) are created for each letter of the alphabet, and for each Last Name, so that you can add Joe Smith as /users/S/Smith/Joe, for example.

    A more sophisticated approach would be to hash user IDs and chunk the hash to form an ad-hoc hierarchy. For example, "Joe Smith" might give a hash of ab12cd34. The user data for Joe Smith can be stored at users/ab/12/cd/34. When the time comes to look up data for Joe Smith, you would first hash the name (to obtain ab12cd34), then create the necessary path from the hash, and look up the data.

    As it turns out, the Jackrabbit API (which of course is built into CRX) offers yet another alternative for efficient hierarchical storage of arbitrary data, in the form of the BTreeManager. This class provides B+ tree-like behavior in allocating subtrees of nodes that are always balanced, with a fixed limit on how many siblings any given node can have. (You provide the limit as an argument in the constructor.)

    I wrote a very short test script (in ECMAScript) to show how the BTreeManager operates, as shown below:

    <html>
    <body>
    <%
    /* Create a new TreeManager instance rooted at the current node.
    Splitting of nodes takes place
    when the number of children of a node exceeds 40 and is done such that each new
    parent node has >= 10 child nodes. Keys are ordered according to the natural
    order of java.lang.String. */
     

     var treeManager = new Packages.org.apache.jackrabbit.commons.flat.BTreeManager(    this.currentNode, 10, 40, Packages.org.apache.jackrabbit.commons.flat.Rank.comparableComparator(), true);

     // Create a new NodeSequence with that tree manager
     var nodes = Packages.org.apache.jackrabbit.commons.flat.ItemSequence.createNodeSequence(treeManager);
     
     var totalNodes = 100;
     
     // Do some profiling:
     var start = 1 * new Date();
     
     // add a bunch more nodes
      for (var i = 0; i < totalNodes; i++)
       nodes.addNode( "MyNode" + i,
       Packages.javax.jcr.nodetype.NodeType.NT_UNSTRUCTURED);
       
     var end = 1 * new Date();
     
     %>
     
    <%= "Total time: " + (end - start) + " millisecs" %>
    </body>
    </html>

    I called this script tree.esp and placed it under /apps/tree in CRX, then created a dummy node under /content and gave the dummy node a sling:resourceType of "tree" (to trigger the script when navigating to content/dummyNode.tree).

    The performance benefits of BTreeManager are notable. On my (decrepit Dell) laptop, adding 100 nodes as a flat list took 1.6 seconds (which includes about 200 milliseconds for servlet compilation). Adding 1000 nodes as a flat list (no B-tree) took 22 seconds. Adding 5000 nodes took 289 seconds. Note that adding five times as many entries took almost 13 times as long.

    By contrast, using BTreeManager (set to a maximum sibling breadth of 40), adding 1000 nodes took 14 seconds and adding 5000 took 86 seconds. (Five times the data takes roughly five times as long.)

    The real lesson here is: If your content is hierarchical (or can be made to look hierarchical), by all means capitalize on that fact! Don't try to treat your content as a Big Flat List, especially if you'll be doing a lot of updates. (If you're doing mostly reads and few writes, on the other hand, it doesn't much matter.) Introducing a bit of hierarchy to your content organization scheme will go a long way toward promoting fast update performance.

    (Many thanks to Felix Meschberger and Marcel Reutegger for input into this blog.)

    Posted by Bertrand Delacretaz DEC 15, 2010

    Posted in apache, java content repository, jcr and open source Comments 11

    Written by David Nuescheler and Bertrand Delacretaz

    The Apache Software Foundation (ASF) recently announced (https://blogs.apache.org/foundation/entry/the_asf_resigns_from_the) that it is leaving the Executive Committee of the JCP (http://www.jcp.org/) and that it will be "removing all official representatives from any and all JSRs".

    In this post, we present our perspective on the impact of this decision for Java in general, and more specifically on JCR - the Java Repository API on which our products are based.

    Impact on future Java API specs

    As David's graph below shows, JCP activity has been going down to very low levels in recent years.

    New JSRs submitted, 1998-2010

    This might be due to uncertainty about the JCP's future since the Apache/Sun dispute started (http://www.apache.org/jcp/sunopenletterfaq.html) in 2006, but also very probably to a lack of need.

    The Java language is a mature one, which has found its (large) niche. Do you really care about improvements in Java SE 7 and 8? We don't see much enthusiasm and the vast majority of programmers seem to be happy with Java SE 5 or 6.

    In the API space, OSGi for example, brings real innovation, independently from the JCP. The OSGi alliance (http://www.osgi.org/) that manages it is working fine. As with any API spec that's outside of the JCP, OSGi is not allowed to use package names starting with javax, but...did you even notice that? It doesn't make much of a difference.

    Many Java APIs are also being developed outside of the JCP, at the ASF, in other open source organizations and in industry consortiums. Their specification processes might not always be as formalized as the JCP's, but as long as one gets a versioned set of documented Java interfaces that a group of experts agrees on, along with a test suite, people are happy. This looks sufficient to us for the evolution (as opposed to a revolution, which is not needed) of the Java ecosystem.

    So, on one hand the need for new Java APIs is not as big as it used to, and on the other hand there are other places where APIs can be developed.

    Even a totally inactive JCP wouldn't have a serious impact on future Java APIs, in our opinion.

    Impact on Java in the content management space

    We don't think the JCP and the ASF going separate ways will have any impact on enterprise software in general and in the content management space in particular.

    There even may be surprisingly little impact on Java based Apache projects. Apache Jackrabbit, which has been widely adopted as infrastructure for many content management related projects, will continue its development as planned.

    Beyond Jackrabbit and Apache Sling, there are a large number of Java based content management projects outside the ASF which are not impacted, and we continue to see a vibrant Java content management community.

    Impact on JCR

    There are some minimum requirements put on spec leads by the JCP in terms of licensing. Since it is up to the spec lead of a JSR, the licensing varies from spec lead to spec lead.

    As the spec lead for JCR (JSR-170, JSR-283 and ongoing work on JSR-333) we opted for the most open licensing we could use for those JSRs:

    The JCR APIs are also available from the central Maven repository under http://repo2.maven.org/maven2/javax/jcr, without requiring any click-through or other agreement.

    The official test suites (TCK) for those JSR specs have been contributed to the Apache Jackrabbit project under the Apache CCLA (http://www.apache.org/licenses/cla-corporate.txt) and as such are freely available to anyone under the permissive and business-friendly Apache License.

    Our ongoing work on JSR-333, the next release of the JCR API, is also unaffected by the ASF leaving the JCP. We are acting as Day/Adobe employees or individuals there, not as representatives of the Apache Software Foundation.

    Impact on CQ5 and CRX

    As we don't see any impact on JCR, we are not planning any changes concerning CQ5 and CRX in this area. JCR continues to gain momentum in the WCM industry and beyond, so we are looking forward to an even broader use of JCR.

    Conclusions

    In bad pun mode we could say nothing new under the Oracle.

    Apache leaving the JCP is a step that was discussed for years, that helps clarify the situation and might help Oracle be more explicit about their plans for Java. People might need to find other places than the JCP to create new specifications, but our work on existing and in-process JSRs is not affected.

    We're happy to have invested lots of energy in keeping the JCR specs that we're leading as open as possible. This helps the community understand that one can produce specifications, reference implementations and TCKs in an open manner, even within the JCP. Openness always pays, when it comes to creating sustainable ecosystems.

    Full disclosure: David Nuescheler is a member of the ASF and Spec Lead for JSR 170, 283 and 333, and Bertrand Delacretaz is a member and board member of the ASF. So consider the above as educated but personal opinions, not wearing any particular ASF or JCP hat.

    Posted by Kas Thomas DEC 04, 2010

    Posted in jcr, news and php Comment 1

    David Buchmann of Liip reports that the Jackalope project has all but finished porting the Java Content Repositry API (JCR) to PHP. Work on PHPCR began last year. Buchmann says: "With lots of input from Benjamin Eberlei and discussions on the Symfony2 cmf project mailing lists, I stripped all Java specific stuff out of the PHPCR API, making it more PHP. Most notably, we got rid of all the elements that are only relevant in strongly typed languages. Plus PHPCR now specifies to use the standard PHP iterators instead of specific classes that could not be used in foreach. If you had a look at the earlier interfaces, you will notice that we now use the PHP 5.3 namespaces. A full list of the changes is documented in from JCR to PHPCR.

    While PHPCR defines an API, Jackalope is an implementation of that API. "In the last few weeks," Buchmann notes, "things got a real boost, as a full team at Liip is completing the implementation. We aim to have a beta release of Jackalope ready by the end of the year. Our implementation talks to the Java Jackrabbit backend for data storage. This is a quite performant setup, plus it allows to access data in existing Jackrabbit-based products. Ideas exist how we could write a PHP-only storage layer, but for now, we focus on creating a fully working implementation with Jackrabbit."

    Buchmann says first performance tests are quite promising. He notes: "Chregu did a couple of performance tests and found that Jackrabbit scales really well. Having 15 Jackrabbits share one database backend with 350'000 nodes and doing requests scaled linearly and the database did not get overloaded."

    For more details, see David Buchmann's post here.

    Posted by Jean-Christophe Kautzmann SEP 13, 2010

    Posted in java content repository, jcr, jsr-283, sling and tutorial Comment 1

    When you develop an application on top of a JCR repository, you eventually need to access the following basic objects: the repository, a workspace, a session, a node, a property and some managers to perform operations like versioning, querying or controlling the access to the repository.
    I've put together here some code samples to let you do that.

    Getting the repository

    One way to access the repository is to use the JcrUtils.getRepository(Map) utility method of the jackrabbit-jcr-commons library:

    Map parameters = new HashMap();
    parameters.put(..., ...);
    Repository repository = JcrUtils.getRepository(parameters);

    Other ways to access the repository are described here.

    If you are using Sling, you can reference an existing service, called SlingRepository, by using the @scr.reference annotation as follows in your class:

    /** @scr.reference */
    private SlingRepository repository;


    Getting a session and a workspace

    Once you have accessed the repository, you can define a session:

    SimpleCredentials credentials = new SimpleCredentials(userID, password);
    Session session = repository.login(credentials);

    You can also define an administrative session by using the loginAdministrative() method of the SlingRepository interface:

    Session adminSession = repository.loginAdministrative(null);

    When using Sling, to get the user session within a servlet, you can use the following code (e.g. within the doGet() method):

    Session session = req.getResourceResolver().adaptTo(Session.class);
    // req is the SlingHttpServletRequest object passed to the servlet

    Note: all opened sessions should be properly closed when they are no longer needed. The recommended pattern is:

    Session session = repository.login(...);
    try {
      // use the session
    } finally {
      session.logout();
    }

    The only time you don't need this is when you're adapting the SlingHttpServletRequest instance to a Session, as Sling will automatically close that session once the request has been fully processed.

    To get a workspace:

    Workspace workspace = session.getWorkspace();        


    Getting a node, a property and a value

    To get a node:

    Node node = session.getNode(pathToNode);

    To get a property:

    Property property = session.getProperty(pathToProperty);

    To get the value of a single value property:

    if (!property.isMultiple()) {
        Value value = property.getValue();
        String myStringValue = value.getString();
    }

    To get the values of a multiple value property:

    if (property.isMultiple()) {
        Value[] values = property.getValues();
        for (int i = 0; i < values.length; i++) {
            Value value = values[i];
            // assuming the values are strings
            String myStringValue = value.getString();
        }
    }

    Getting managers

    To get managers for the features introduced by the JCR 2.0 specifications like locking, node type management, observation, querying, versioning, access control or retention:

    LockManager lockManager = workspace.getLockManager();
    NodeTypeManager nodeTypeManager = workspace.getNodeTypeManager();
    ObservationManager observationManager = workspace.getObservationManager();
    QueryManager queryManager = workspace.getQueryManager();
    VersionManager versionManager = workspace.getVersionManager();
    AccessControlManager accessControlManager = session.getAccessControlManager();
    RetentionManager retentionManager = session.getRetentionManager();

    For more details please refer to the JCR 2.0 API javadocs or to the JSR-283 specifications that define the JCR API.

    Posted by Kas Thomas AUG 16, 2010

    Posted in content management, crx, http, jcr, request and sling Comments 2

    The first version of this post originally was published here.

    One of the things that gives Apache Sling a great deal of power and flexibility is the way it resolves script URLs. Consider a request for the URL

    /content/corporate/jobs/developer.html

    First, Sling will look in the repository for a file at exactly this location. If such a file is found, it will be streamed out as is. But if there is no file to be found Sling will look for a repository node located at:

    /content/corporate/jobs/developer

    (and will return 404 if no such node exists). If the node is found, Sling then looks for a special property on that node named "sling:resourceType," which (if present) determines the resource type for that node. Sling will look under /apps (then /lib) to find a script that applies to the resource type. Let's consider a very simple example. Suppose that the resource type for the above node is "hr/job." In that case, Sling will look for a script called /apps/hr/job/job.jsp or /apps/hr/job/job.esp. (The .esp extension is for ECMAScript server pages.) However, if such a script doesn't exist, Sling will then look for /apps/hr/job/GET.jsp (or .esp) to service the GET request. Sling will also count apps/hr/job/html.jsp (or .esp) as a match, if it finds it.

    Where things get interesting is when selectors are used in the target path. In content-centric applications, the same content (the same JCR nodes, in Sling) must often be displayed in different variants (e.g., as a teaser view versus a detail view). This can be accomplished through extra name steps called "selectors." For example:

    /content/corporate/jobs/developer.detail.html

    In this case, .detail is a selector. Sling will look for a script at /apps/hr/job/job.detail.esp. But /apps/hr/job/job.detail.html.esp will also work.

    It's possible to use multiple selectors in a resource URL. For example, consider:

    /content/corporate/jobs/developer.print.a4.html

    In this case, there are two selectors (.print and .a4) as well as a file extension (html). How does Sling know where to start looking for a matching script? Well, it turns out that if a file called a4.html.jsp exists under a path of /apps/hr/jobs/print/, it will be chosen before any other scripts that might match. If such a file doesn't exist but there happens to be a file, html.jsp, under /apps/hr/jobs/print/a4/, that file would be chosen next.

    Assuming all of the following scripts exist in the proper locations, they would be accessed in the order of preference shown:

    /apps/hr/jobs/print/a4.html.jsp
    /apps/hr/jobs/print/a4/html.jsp
    /apps/hr/jobs/print/a4.jsp
    /apps/hr/jobs/print.html.jsp
    /apps/hr/jobs/print.jsp
    /apps/hr/jobs/html.jsp
    /apps/hr/jobs/jobs.jsp
    /apps/hr/jobs/GET.jsp

    This precedence order is somewhat at odds with the example given in SLING-387. In particular, a script named print.a4.GET.html.jsp never gets chosen (nor does print.a4.html.jsp). Whether this is by design or constitutes a bug has yet to be determined. But in any case, the above precedence behavior has been verified.

    For more information on Sling script resolution, be sure to consult the (excellent) Sling Cheat Sheet as well as Michael Marth's previous post on this topic. (Many thanks to Robin Bussell at Day Software for pointing out the correct script precedence order.)