Latest Posts

Archives [+]

Entries by Jukka Zitting

    Posted by Jukka Zitting JUN 11, 2010

    Add comment

    Earlier this week I attended the Berlin Buzzwords conference that focused on scalable open source technologies for storing and searching things. These are all core topics for the CRX content platform, so the chance to meet people from various different projects and to compare notes on how people approach different problems was very interesting.

    One of the key takeaways from the conference was that while there's a lot of innovation going on in the NoSQL space, most of the projects achieve their improvements by dropping one or more key features from more traditional storage solutions. Most notably many solutions have fairly limited built-in search capabilities. While the advances are nice, the lack of features does mean more integration work for applications on top of the underlying technologies. A good example of this is Steven Noels' nice presentation on building CMS systems on top of NoSQL technologies. I believe application platforms like CRX have much to offer here.

    My contribution to the conference program was in the form of two presentations. The first one was a reminder of the hierarchical model as a nice alternative in between the document and graph databases that many NoSQL projects choose to implement.

    The second presentation was about the Apache Tika project that we've been helping to support the integration efforts required to enable full text indexing and metadata extraction of all kinds of file formats.

    Posted by Jukka Zitting JUN 04, 2010

    Comments 5

    The J2EE Connector Architecture (JCA) is a mechanism by which applications can access all kinds of information systems in a controlled and coordinated manner. The JCA support included in application servers like Weblogic and JBoss takes care of managing things like connections, transactions and connection security on behalf of a client application. To do this, the application server leverages a JCA connector for the information system being accessed.

    JCA is quite useful in many environments with complex integration requirements, so thanks to a community contribution, the Apache Jackrabbit project has been shipping a JCA connector since the 1.0 release. However, the Jackrabbit JCA connector was originally designed to run the Jackrabbit repository in embedded mode within the connector itself, only providing standard JCR API access to client applications. This made the connector design unsuitable for repositories like CRX that include a full suite of web-based management and editing tools that need access to repository internals. We worked around this issue with custom solutions for some CRX 1.x customers, but an improved JCA connector design was clearly needed.

    Instead of implementing something for just CRX, we wanted to make a generic JCA connector for all JCR implementations and release it as open source. This way we'll return the favor to the community that contributed the original JCA connector code and will benefit from any improvements and fixes by the external contributors.

    To improve the connector design, we turned to the RepositoryFactory interface introduced in JCR 2.0 and the Repository URI work we had done earlier. With these tools we could turn the existing Jackrabbit JCA connector from a Jackrabbit-specific tool to one that supports all JCR 2.0 implementations. This work was tracked in JCR-2555 and released as a part of Jackrabbit 2.1.0. You can find the resulting JCA resource archive (rar) file on the Jackrabbit download page.

    To use the JCA connector with CRX on for example the JBoss application server, you first need install the CRX webapp following the normal installation documentation. Then deploy the Jackrabbit JCA rar file without modifications. Finally, as the last step you only need to connect these two resources together by deploying a connection factory descriptor like the one shown below:

        <config-property name="repositoryURI"
        <config-property name="bindSessionToTransaction"

    NOTE: The repositoryURI configuration property has been split to multiple lines for display. It needs to be all in one one in an actual descriptor file.

    Set the jndi-name to the path under which you want the JCA-managed Repository instance to be available in the default JNDI context of the application server. Note also that you need to adjust the rar-name setting if you've deployed a different version of the Jackrabbit JCA rar file.

    The diagram below illustrates how these different components work together:



    When you need to replicate this setup over multiple application servers, you can even package these components together with your application inside an enterprise archive (ear) for simple deployment of your entire setup. But that's a topic for another post...

    Posted by Jukka Zitting MAY 28, 2010

    Add comment

    The Repository object is the front door of the JCR API. Whenever a client application needs to access a content repository, it first needs access to a Repository instance through which all the other JCR functionality can be accessed. However, JCR 1.0 didn't specify where and how such a Repository instance can be found or created, so applications had to use things like JNDI lookups or implementation-specific initialization code to bootstrap access a content repository.

    One of the goals of JCR 2.0 was to specify a standard solution to this bootstrap problem. To achieve this goal, the expert group specified the RepositoryFactory interface that allows an application to convert a Map of configuration options to the corresponding Repository instance. The recommended way to access a RepositoryFactory instance is to use the ServiceLoader class in Java 6 or the equivalent but oddly located ServiceRegistry class in Java 5. As a last resort one can also instantiate a known RepositoryFactory implementation class using the specified public zero-argument constructor. The following code snippet shows how this works:

    Map parameters = new HashMap();
    parameters.put(..., ...);

    Repository repository = null;
    Ierator<RepositoryFactory> iterator =
    while (repository == null && iterator.hasNext()) {
        repository =;

    The RepositoryFactory approach works pretty well, but it's a bit verbose as shown above. To avoid having to rewrite this piece of code in all JCR client applications, we implemented a simple JcrUtils.getRepository(Map) utility method in the jackrabbit-jcr-commons library. With this utility method you can simplify the above code to:

    Map parameters = new HashMap();
    parameters.put(..., ...);

    Repository repository = JcrUtils.getRepository(parameters);

    That's already pretty nice, but in practice we found that having to keep track of the map of configuration options is often a bit complicated. For example it's unnecessarily difficult to specify the configuration options as command line arguments, in servlet configuration or in a GUI configuration dialog. Wouldn't it be great if we could narrow the configuration down to a single string, like the database URIs used in JDBC? We thought so, and set out to implement the concept of a repository URI.

    To do this we defined a specific repository configuration parameter "org.apache.jackrabbit.repository.uri". All repository factories that want to support repository URIs should look for this parameter in the Map instance given to the getRepository() call. We modified all the repository factories in Jackrabbit and CRX to support this parameter and added the JcrUtils.getRepository(String) utility method to further simplify client code. For example:

    // WebDAV remoting access to a CRX server

    // RMI remoting access to a CRX server (with RMI enabled)

    The same mechanism also works for JNDI URLs or file:// URLs to local embedded repositories. You only need to make sure that you have all the correct libraries in your classpath. Accessing a JCR repository has never been easier!