Latest Posts

Archives [+]

Categories [+]

Authors [+]

Entries filed under 'ask the community'

    Posted by Michael Marth NOV 04, 2009

    Posted in ask the community and sling Add comment

    The Tuberculosis Project project is one of the Sling users registered on the Sling user wiki page. This is an interview with developer Audrey Colbrant who worked on the project.

    Audrey, can you please tell us a bit about the TibTec Tuberculosis Project? What are the project's aims and background?

    The TB project is developed by Tibtec, a nonprofit technology center based in Dharamsala (India) and directed by M. Phuntsok DORJEE. The aim of the project was to build a system to monitor the tuberculosis among tibetan communities in India, Nepal and Bhutan. Thanks to technology advances in mobile and web computing, it is now possible to design a recording and reporting web portal supporting the WHO DOTS protocol.

    The project of monitoring the tuberculosis among tibetan communities in India was born 1 year ago thanks to four actors: the DoH (Department of Health, Tibetan Government in Exile), Tibetan Delek Hospital (Gangchen Kyishong - India), AISPO (Italian Association for Solidarity Of Persons), and the Johns Hopkins University (USA). TibTec is working on a system for the above four actors.

    The main goal of the project is to build a simple, low-cost and versatile framework so that communities all over the world could benefit from it. The system could be easily customized for other works as well since it based on open source software.

    If you want to take a look at the architecture, follow the guide.

    So how did you end up using Sling? Did you compare Sling against some other frameworks?

    The implementation of the TB project was part of the master project in computer science of my university. Jacques Lemordant, researcher in the WAM project at INRIA was in contact with M. Dorjee, CEO of TibTec, since several years. Together they have defined headlines of the project and chosen the more efficient technologies to be used.

    Sling was chosen because we are very familiar with XML technologies (RELAX NG, XPATH, XSLT...) and hierarchical representation of data.

    Another point was the fact that we wanted to access data from Android (Apache http client) and a full REST API was the simplest way to access a JCR and manipulate data represented as trees. XML being very well supported in Android, Sling is a perfect match with Android to design agile mobile web framework.

    Sling is also part of a course in mobile and web technologies as the master level of the University Joseph Fourier of Grenoble.

    Now that you have completed an implementation project with Sling are there any lessons learned you would like to share with the community?

    The Sling approach is fairly new and I haven’t seen any other same kind of approach before. The concept is simple but it takes a little bit time to be used to the utilization. So never give up, solutions come slowly with perseverance.

    If you had one free wish from the Sling committers...

    Sling is a very interesting and powerful way to work with resources but difficult to handle for Sling beginners when you have a full and composite website to implement, mostly because of the lack of information on the internet.

    The harder thing that gives me a lot of headaches was to find a good syntax to use that changes according to the technology you mix up.

    So I think it could be helpful to have more tutorials on the syntax to use in each different case, what is better to do or not, and advice on choices to take in programming (for example I have met choices for protecting the access to the repository; choices about which kind of link is better to use: reference or path, etc).

    It could be also good to finalize all links of this useful webpage

    Posted by Michael Marth JUL 31, 2008

    Posted in ask the community and jcr Comment 1

    Recently, I blogged about Brix, which is a new open source cms based on Apache Wicket and Jackrabbit. Brix has been developed by the Inertia Beverage Group. Inertia's Chief Strategy Officer Paul Mabray sat down with the dev team and passed on my questions about JCR and Jackrabbit:

    Paul Mabray: So tell me a bit about each of you - your background, hobbies, etc.

    Matej Knopp: Hi, I am Matej. I’m a key independent consultant from Slovakia. My number one project for Inertia is working on Brix and enjoying every minute of it. I’m also an Apache Wicket enthusiast and committer. Obviously I have a severe lack of free time.

    Paul: Do you do anything else?

    Matej: Whatever little time I find away from the keyboard I try to read as much as I can.

    Patrick Angeles: I guess, aside from you Paul, I am the veteran at Inertia. I am the VP of Technology and lead our SaaS development for a software solution that connects wineries to their customers. I’ve been in tech for over 10 years working for big companies like Bankers Trust and Acquire Systems. Food, wine and my family are the only things I have time to do outside of work.

    Igor Vaynberg: I’m the Principal Software Engineer at Inertia, residing in sunny Sacramento, California. I’m the guy your brought in to help build the next-generation enterprise web application that help will more easily bridge the gap between wineries and their customers.

    Paul: Tell me more

    Igor: Well, my love for writing software was sparked when I received a Sinclair Z80 on my tenth birthday and I’ve been doing it ever since. I do have the same hobbies as you such as playing with my kids, taking my beautiful wife to fancy dinners - that should get me points at home - and fragging my friends at Halo. I also still find time to work on software. My number one outlet is the Apache Wicket framework - I really believe to be the best way to build web applications.

    Paul: And I am the founder of Inertia and now Chief Strategy Officer in charge of product development and business development. Just like Patrick I only have time for the family outside of work. But I love what I do and when this team brought forth the concept to make Brix open source, I was 200% in support of the initiative.

    Igor/Patrick: COME ON

    Paul: OK, I frag a few people on Halo when I get a spare minute.

    Paul: So Michael asked us a question - Brix uses Apache Jackrabbit as its content repository. Back in the days when you made this decision what potential choices were you considering and what influenced your decision towards Jackrabbit?

    Igor: From the start, one of our major requirements was the ability for our users to edit content using their favorite desktop editor. We picked SVN and JCR as two possible solutions: SVN with its local check-in/check-out, and JCR with its WebDAV support. We started off with SVN because it had more powerful versioning features, such as merging. But we quickly ran into features we needed that were missing and would be too expensive to implement ourselves: referential integrity and ASLv2 java connector library being the two that jump to mind (we were planning to open-source Brix from the start under ASLv2).

    Patrick: That is why we switched to JCR. Jackrabbit seemed to be the most actively developed and complete JCR implementation, and since it was already released under ASLv2 it was not a very hard choice. A bonus for us is that two of our Brix developers are also Apache committers, and since Inertia is not shy about contributing to open source we saw an opportunity to improve Jackrabbit and give back to its community as we went along.

    Q: Did your expectations regarding the JCR come true or did you have to overcome some difficulties you did not expect? Were there any pleasant or not so pleasant surprises after working with Jackrabbit for a while?

    Matej: The most pleasant surprise was how much functionality Jackrabbit/JCR brought to the table: indexing, referential integrity, WebDAV, workspaces, versioning, and the list keeps going on. The API is consistent and easy to use.

    Igor: Don’t forget our experience with Jackrabbit/JCR was not perfect. It seems that the modus operandi of the JCR community is to use a repository with a set small number of workspaces, which did not mesh well with our needs. Brix makes an extensive use of JCR workspaces for multi-site support, site snapshots, and publishing workflow. This means that we needed to create and delete workspaces on the fly as well as have an easy way to search across them via some index.

    The first problem we ran into was that workspace creation event was not replicated across the Jackrabbit cluster. We have already patched Jackrabbit to support this feature, the patch is available in Jackrabbit’s issue tracker under JCR-1677.

    Patrick: The second, and more severe, problem was the way Jackrabbit manages database connections. Currently each Jackrabbit workspace keeps an open connection to the database. We are planning to host over 300 client sites on Brix in the near future. Brix’s publishing workflow requires 3 workspaces per site: development, staging, and production. Without even counting workspaces created for snapshots, we are up to 900 workspaces and thus 900 open database connections. This simply does not scale. Unfortunately, this does not look like a high priority for the Jackrabbit community, because seeminly the most common usecase is a repository with a small number of workspaces. Nonetheless, we are working on a patch that would allow for connection pooling. A first pass of the patch is available in the Jackrabbit issue tracker under JCR-1456.

    Q: One often discussed aspect of building content management systems on top of JCRs is the question if object-content-mapping should be used to wrap collections of JCR nodes into application-level objects. What is your view on this question?

    Igor: Brix’s object model is very simple, so using some kind of object-content mapping would have been overkill for us. We do have a wrapper for the JCR api, but that was meant for other things (see below).

    Q: In Brix you have wrapped JCR nodes and sessions in Brix' own wrapper classes. In the wrappers API calls are intercepted and events are generated. Why was this design decision made rather than using JCRs native Observation mechanisms?

    Patrick: Our JCR API wrapper is two levels deep.

    The bottom layer is an event layer. It provides events for before and after a change, as well as an ability to queue the events in a list and batch process them later. We also group and normalize the events so it is possible to know what is about to be changed before the change is applied. JCR’s native observation mechanism is limited to only observing changes that have already happened.

    Matej: The top wrapper has two basic jobs:

    Translate JCR’s checked exception model into unchecked while providing a pluggable and centralized way for exception handling strategies. In OOP JCR code is sprinkled among many classes, the checked exception model becomes extremely viral. Also, since a lot of exceptions are unrecoverable from Brix’s perspective there is very little point of having them be checked.

    The second job of this wrapper is to allow node wrapping. Instead of using a full blown and complex object-content-mapping system we provide a simple mechanism for having nodes wrapped based on their brix:nodeType property. This makes it possible to write code like this:

    Node n = session.getRootNode().getNode("page.html");
    PageNode page = (PageNode) n;
    page.setTitle("Page Title");
    page.setMarkup("Page Content");
    

    Q: Brix' WebDAV server has the nice feature that uploaded resources are automatically converted into Brix artifacts. How does that work?

    Igor: This is done by intercepting node events using the aforementioned wrapper and modifying the nodes before they are saved based on rules defined in Brix. There is also some magic to handle varying WebDAV client behaviors, for example: when a new file is created via Coda it will first create an Untitled file and immediately rename it to its given name.

    Q:If you had one wish regarding JCR and the JCR community, what would it be?

    Patrick: We would wish the JCR community a very long and productive future.

    Paul: I hope that the JCR community sees Brix as useful piece of software and contributes to its health and success.

    Posted by Michael Marth FEB 27, 2008

    Posted in ask the community and jcr Comments 2

    David Dossot has just released the JCR transport for Mule. For the uninitiated: Mule is an open source Enterprise Service Bus. The JCR transport allows users to attach a JCR to Mule so that content is routed between the JCR and other sources of information. I think this is a great piece of software that will be useful in many enterprise projects where systems integration plays a big role. I was happy that I could ask David a couple of questions about JCR development:

    Q: David, congratulations to the release of the JCR transport. I looked at the list of transports on MuleForge and saw that your JCR transport is one of only two projects that has reached production level - you were fast. Did you find JCR to be a good fit to Mule?

    Thank you! The JCR Transport was one of the first projects to be hosted on MuleForge, so it had a little more time for maturing than the other ones. In fact, the very first version of the transport went out at the end of July 2007. This new version, with full support for reading and writing from JCR, took seven months and a couple of milestone releases to be finalized.

    JCR has proven to be an incredible good fit for Mule. I was initially not convinced that it would make sense to go further than an simple Observations-based event listener, but comments on The Server Side and Mule's users mailing list decided me to go for a full fledged transport. Indeed, it was proven to be a good idea: the capacity to read and write to JCR from an ESB opens up new possibilities of content-oriented integration scenarios, thanks to the solid and comprehensive architecture of Mule combined with the huge amount of both officially and community supported transports.

    Even if the current version of the transport does not support transactions, users can already enjoy advanced functionalities that leverage diverse JCR features like: multi-syntax queries, content streaming and custom node types.

    Q: I believe the JCR transport for Mule relies a lot on JCR observations. Would you like to share some of your experiences on working with the Observations API with us?

    The Observations mechanism is a simple yet powerful way to monitor changes in a JCR. It was very straightforward to create a Mule message receiver that leverages Observations for creating messages the ESB can then propagate to whatever component or endpoint that needs to receive it.

    There are several aspects of the Event object that are somewhat challenging for the ESB, the main one being that it is a "connected" object (bound to the current Session) which can be problematic if you send it over to an asynchronous or distant consumer. This is why I had to create an optional transformer that "detaches" an Event, making it a fully resolved serializable object better apt to be used outside of the JCR context.

    Finally, if I had one comment on Observations for the upcoming JSR-283, I would ask for the possibility to listen to only one sub-level below a node path, as currently it is either the whole sub-tree or a single node that can be observed.

    Q: What tools did you use for your JCR-related development?

    My main Java development environment is Eclipse Europa on Kubuntu Gutsy Gibbons, complemented with the usual plug-ins: FindBugs, CheckStyle, EclEmma, Subclipse... I use JackRabbit's TransientRepository for all the integration tests and the transport examples: it is really convenient as you have full control on the initial state of the repository before running each test.

    I am also using Maven for my builds, which is made easy thanks to Mule's modularization in a wealth of specialized Maven artifacts. It is to be noted that Sun's opposition to have javax binaries distributed in the central repository could have complicated the build but, thanks to Day and their legacy Maven repository that contains the JCR archive, anyone can access the required dependency without first installing it by hand!

    On the server side, I have been leveraging the great tools offered by MuleSource to the community via its MuleForge platform, which has proven to be a successful place for fostering Mule related extensions. These tools comprise Subversion, the Atlassian full suite (Bamboo, Jira, Confluence, Fisheye), forums and mailing lists.

    Q: For the benefit of the JCR community let me ask you about your general experiences with JCR. What did you learn about JCR architecture or technology? What did you miss? What took you by surprise?

    I started first to explore JCR in 2006 when helping a client I was contracting for to investigate alternatives to their proprietary and expensive document management system. Since this time, it seems that I can not escape this technology, as I am now working with Communique/CRX!

    From the beginning I have been impressed by the clarity of the JCR API, something that is alas rare enough to be mentioned. But JCR is a twofold specification as it not only specifies an API, but also mandates pre-defined mixin and primary node types. I initially thought that, for the sake of cross-vendor portability, JCR should go further in normalizing content models (maybe by deprecating nt:unstructured and by forcing users to stick to a limited set of mixins and node types), but I realized it was as naive as willing to normalize, say, standard database models in the JDBC specification!

    I must add that "David's Model", from JackRabbit's Wiki, has been a great source of inspiration for the way the Mule JCR transport handles custom node types.

    Q: Coming back to Mule: did you get some user feedback on the JCR transport? Are you aware of any real-world use cases you can share?

    As with all open source projects, users tend to only manifest themselves when they face an issue or need a feature. So far, none of this has happened so I do not know who, out there, is using or experimenting with the transport. As JCR got listed by Carlos E. Perez as one of the top 5 Java technologies to learn in 2008, I am pretty sure this transport will get more traction in the coming months!

    Posted by Michael Marth JAN 16, 2008

    Posted in ask the community and jcr Add comment

    Here's the next part of my journey to retrieve some of the hidden JCR community knowledge. Encouraged by the interesting insights of the Mindquarry team I approached Cédric Damioli who is a Product Manager at Anyware Technologies based in Toulouse. Cédric has built the open source CMS Ametys on top of Apache Jackrabbit and Cocoon. It leverages OSWorkflow as a workflow engine and uses DocBook internally (nice architecture!).

    Here's what Cédric had to say about building Ametys:

    Q: Cédric, when you were designing your persistence architecture for Ametys what potential choices were you considering and what influenced your decision towards Jackrabbit? Did you consider XML databases like the Mindquarry team?

    Back in 2004, when we made the Jackrabbit choice, we were designing the architecture of the version 2.0 of our CMS (the name Ametys was born only last year). The 1.x versions persisted their documents in a CVS repository. While this may sound somewhat odd, it appeared to be really nice: things like versioning or documents hierarchy were handled natively, and thanks to a Java bridge between Cocoon and the CVS repository, the source code was quite light and easy to understand.

    But there also was some important drawbacks: a CVS repository can't handle custom metadata, the time to access a single document grows with the overall size of the repository and the installation of the CMS was very intrusive on the target server.

    So we decided to switch to a new persistence architecture, with the same benefits as the CVS server, but without the same limitations.

    We considered three technologies:

    • Subversion, as the natural successor of CVS
    • WebDAV/Slide
    • JCR/Jackrabbit

    The JCR spec was even not final, and Jackrabbit was still under incubation with no public release, but three facts made the choice obvious for me:

    • In 2004 Stefano Mazzochi wrote a paper about a new technology called JCR and I remembered this article a few months later
    • Sylvain Wallez, former Cocoon PMC chairman and our R&D director, is an Apache member. As such he was part of the JCR 1.0 expert group and early Jackrabbit committer, which seemed to be a quite good warranty to me.
    • And last, but certainly not least, the JCR spec is very, very good! It contains all concepts I wanted to have in a content repository.

    Q: Let us know how the reality check worked out. Did your expectations regarding the JCR come true or did you have to overcome some difficulties you did not expect? Were there any pleasant or not so pleasant surprises after working with Jackrabbit for a while?

    For this question I have to clearly distinguish the spec and its implementation. While it was great to learn to work with JCR, the use of Jackrabbit in a production environment was not that great one or two years ago.

    JCR, I would say, has the pros and cons of a young technology. JCR 1.0 handles nodetypes, hierarchies and versioning very well, but its search capabilities are limited in a real-world application.

    About Jackrabbit: the spec is well implemented, but it lacks administrative tools and APIs. For example, it is impossible to programmatically inspect the contents of a PersistenceManager in order to detect or repair inconsistencies. It is also impossible to programmatically reindex a workspace or the whole repository. Moreover, there's no real backup/restore solution or monitoring application.

    So yes, the JCR choice was good, and all content related tasks are well designed and implemented, but I did not anticipate that the needs of my customers would go far beyond.

    Q: I am still trying to find my favourite JCR tool. What tools did you use for your JCR-related development work?

    In early 2005, we used the Jackrabbit XML PersistenceManager and I used to crawl the repository with... vi or notepad. Now that the community has grown, some cool tools exist. We mainly use the web-based JCR-explorer and JCR Controller which is an Eclipse app.

    Q: You surely must have learned some important lessons about JCR from building Ametys. Would you like to share some of them with us?

    JCR is very powerful by itself. At the beginning, we made the choice to hide the JCR API behind our own Repository API, in order to give CMS developers more flexibility. Two years later, no other implementation of this proprietary API exists and our developers had to learn yet another API. So we were wrong: the content model is well thought through and the API is easily learnt. There is no need to map the API onto another one, like we used to do with JDBC.

    Q: If you had one wish regarding JCR and the JCR community, what would it be?

    The Jackrabbit community is healthy, I hope it will keep that way. One could expect to have more visibility into the JCR expert group work, but I know the JCP rules are quite strict.

    Regarding functionality: better query features and more admin tools and APIs. That would definitively make JCR and Jackrabbit rock!

    Posted by Michael Marth DEC 18, 2007

    Posted in ask the community and jcr Add comment

    In my opinion, there is a certain shortage of established community wisdom in the JCR world. There is not too much information on topics like architectural design patterns, decision guidelines, implementation gotchas and pitfalls etc. Therefore, I thought I should go out and ask some of the JCR users what they learned. This is the first part of this endeavor, which will result in a mini-series of blog posts tagged "ask the community".

    Well, actually, for this first post I did not have to go out at all. Only a couple of days ago, Lars Trieloff, Alexander Klimetschek and Alexander Saar have joined Day. The three guys constituted the development team of Mindquarry which is a collaboration suite based on Apache Jackrabbit. So I went to ask the three guys about their experiences with JCR.

    Q: Back at the time when you were designing your persistence architecture what potential choices were you considering and what influenced your decision towards Jackrabbit?

    L.T.: We had decided on Apache Cocoon as a web framework and REST as an architecture model and were looking for a storage that allowed us to store XML files, binary documents and folders and provide us querying,transactions, and a standardized API. At first we had a look at XML database systems, but found no open source XML database that matched our requirements. When we found Jackrabbit and JCR, we knew it could provide us what we needed for storing binary data and folders, but that we needed to invest some time to allow for querying of XML files.

    A.S.: From the beginning we decided that Mindquarry has to be very flexible in terms of data storage and structure. With that in mind we decided that relational databases will not have the capabilities to provide what we need on the development side. Sure they perform very well if you know you data model, but as a collaboration solution that allows community members to create content, you can not rely on a certain data model and you surely don't want to restrict your users.
    But what are the alternatives? As Lars says, we first looked at several open source XML databases, but none of them provided what we needed like XPath, versioning, transactions and so on. After a while we found Jackrabbit which provides all the flexibility we needed plus the standards features like versioning and transactions I expect from a good data storage.

    A.K.: That's true. We wanted the schema flexibility of an XML database including versioning, but unfortunately the OpenSource XML DBs either didn't offer all features we needed or were not maintained anymore. So we came across JCR and with a bit of thinking we saw that it provides exactly what we need.

    Q: Did your expectations regarding the JCR come true or did you have to overcome some difficulties you did not expect? Were there any pleasant or not so pleasant surprises after working with Jackrabbit for a while?

    L.T.: First we had to overcome the difficulty we did expect: JCR is not an XML-database, but it was fairly easy to develop an one-to-one mapping of JCR node types and XML elements, attributes and text nodes. The next issue for us was the XPath-implementation in Jackrabbit that only supported a very limited subset of XPath 1.0, while we were expecting full XPath 1.0, if not XQuery support. Fortunately we could solve this relatively quickly by implementing a custom Query Handler that is based on Jaxen, an open source XPath engine.

    A.K.: In my opinion, the most unpleasant surprise was the FileSystem persistence manager because we didn't know about its limitations. The main problems at the end were scalability because of that and a missing import and export including version history - we needed that for a full backup for our customer data on our hosting environment. Eventually we used a PostgreSQL DB with a BundleDBPersistenceManager and did the backup with SQL dumps.

    A.S.: I also think at that time it was a very optimistic decision and we don't really realized how much this will affect our application. But once we started with extending Jackrabbit for our needs, e.g. a direct mapping of XML elements to nodes we realized how much flexibility we can get.
    One thing that was missing was proper tool support and best practices on how to design content centric applications. This issue is much more in the scope today and I think this will help others to start with such an approach.
    The biggest problem was probably the lack of high performance persistence managers in the open source Jackrabbit version. We started with the file system persistence manager, but soon we had problems with too many inodes on Linux file systems. Then we switched to DB persistence management, but this has introduced another layer for the application. While Jackrabbit itself is a kind of database, it shouldn't be necessary to run it on another database.

    Q: What tools did you use for your JCR-related development work?

    A.S.: Mostly a JCR command line client and a small eclipse plugin that allows to browse Jackrabbit repositories..

    Q: You surely must have learned some important lessons about JCR from building such a complex product as Mindquarry. Would you like to share some of them with us?

    L.T.: The most important lesson I learned about JCR is that it provides a very powerful framework for content-centric applications, and with powerful JCR implementations such as Apache Jackrabbit or CRX you get much more than just an API. But this powerful framework should also have implications on your application design, as the more you abstract from JCR, by mapping content to XML, to relations or by object-relational mapping, you are losing a lot of the power and flexibility that JCR gives you.

    A.K.: Using a tree-based content structure now looks much more intuitive to me compared to relational database schemas. But the JCR API has a very fine-grained data access, which is a performance issue if the database (aka Jackrabbit) lies on a different server. So an important point to solve is the implementation of getting a set of nodes with one query across the wire, so that the individual access to subnodes and properties doesn't involve network access anymore.
    On the other hand it's useful to have even simpler APIs on top of JCR, eg. for dynamic languages. I think Lars has already done such a thing for microsling. But until those two things are solved by Jackrabbit and co. in general, the content access model is very, very good. No need for additional object-to-relational or other mappings.

    A.S.: I agree, you should not try to handle your JCR based content by another abstraction like objects and relations. Take it as it is, unstructured content, and live with it. This will reduce efforts and at the end of the day it will make you more flexible.

    Q: If you had one wish regarding JCR and the JCR community, what would it be?

    A.S.: Better file system based persistence managers for the open source Jackrabbit, XQuery support

    L.T.: Yes, regarding JCR my wish would be better query support and this means XQuery. XQuery is a very powerful language when it comes to querying tree-structured data such as XML, but it is not limited to XML. Regarding the JCR community, I see it as too fragmented: we have communities for the various open source JCR implementations and JCR-related projects like Jackrabbit, Alfresco, Exo Platform, we have vendor communities, but not enough sharing of best practices when it comes to application architecture and content modeling. Especially content modeling is a topic where we could get the maximum benefit from a standardized API, because it would allow for content mash-ups at the repository level.

    A.K.: I also think XQuery would help in retrieving those sets of nodes I described for improving performance. And a set of tools like it is available for CRX, eg. a full-featured web-based JCR browser with editing capabilites, would improve development enormously.