Latest Posts

Archives [+]

Categories [+]

Authors [+]

Archive for January 2008

    Posted by Michael Marth JAN 31, 2008

    Posted in jcr and tutorial Comments 4

    Currently, I am working on importing several mailing lists into my Java Content Repository (with the ultimate aim of making them searchable on dev.day.com). While this is not really a complex thing to do I still thought I might give a pointer to help others who want to do something similar.

    There is a sandbox project of Apache James (the mail server) where Jukka Zitting coded an integration of James with a JCR backend. An easy way to get started with JCR/mail integration is to reuse Jukka's JCRStoreBean (find it here in Apache's svn). The method of interest is:

    public void storeMessage(Message message)
        throws MessagingException, 
          RepositoryException {
    	try {
    	    Node node = createNode(parent, 
    	      getMessageName(message), "nt:file");
    	    importEntity(message, node);
    	    parent.save();
    	} catch (IOException e) {
    	    throw new MessagingException
    	      ("Could not read message", e);
    	}
    }
    

    It stores the passed javax.mail.Message. Before using it make sure to set the parent node property of the JCRStoreBean. Through the parent node JCRStoreBean has access to the repository you want to use.

    Jukka has structured a mail node such that the node has a child named "content". This child node contains properties like "to", "from" as well as the actual mail body which is stored as a binary attachment (or multiple in case of multi-part mails). Here is an example node:

    If you use the code you need to keep the Apache copyright notice around somewhere.

    Posted by Michael Duerig JAN 31, 2008

    Posted in dynamic languages, jcr, rad and tutorial Comment 1

    In my last post I showed how to define a JCR tree structure with Scala. I introduced two new operators, which allow the code to be written such that it resembles an ASCII art drawing of the tree. In this post I discuss the goals I had in mind when designing these operators. This will help to understand how these operators actually work and serve as preparation for a later post where I will show how properties fit into the picture.

    My goals were:

    1. To design an integrated mechanism for building JCR tree structures in an expressive and concise way. The code should resemble the tree structure as close as possible.
    2. Don't use esoteric magic. That is, the mechanism should be a clean, rigorous abstraction, which does not expose strange artefacts. It should be applicable without having to deal with its internals while still being relatively easy to comprehend if necessary.

    I think I reached the first goal. ASCII art wasn't my primary intent. Rather did it turn out that way after dismissing various other approaches most of which grossly violated the second goal. The second goal is more interesting and deserves a more detailed explanation. Along the way I explore the details of how these operators work and which Scala features they employ.

    The ¦- operator is for adding nodes to a parent node:

    n ¦- "child1"
    n ¦- ("child2", "nt:file")
    

    The first line adds a node called 'child1' of type nt:unstructured to the parent node n. The second line adds a node called 'child2' of type nt:file to the same node n. There are several notable points here: First (more or less) arbitrary symbols can be used as identifiers in Scala. So ¦- is just a fancy name for a method. Further Scala allows for infix notation. Removing syntactic sugar reveals plain old method calls on n.

    n.¦-("child1")
    n.¦-("child2", "nt:file")
    

    But wait, n is of type Node and Node does not have a method called ¦-. At this point Scala's Pimp my library design pattern is used to retroactively extend the Node class with additional methods. (This is similar to extension methods in C#. See also my post JCR with Scala for further details).

    implicit def extendNode(node: Node) = 
        new NodeExtender(node)
    
    protected class NodeExtender(node: Node) {
      ( ... )
    	 
      def -+[T](f: Node => T) = f(node)
    
      def ¦-(relPath: String) = node.addNode(relPath)
      def ¦-(relPath: String, tyqe: String) = 
          node.addNode(relPath, tyqe)
    }
    

    With this extensions in place ¦- can be called on a Node instance as if it were an actual method of the Node class. From the code we also see, that ¦- is just an alias for Node's addNode method.

    Let's turn to the second operator +- now. It is used for adding new branches:

    root ¦- "Movies" -+ { n: Node =>
                    n ¦- "Pulp Fiction"  
                    n ¦- "Casablanca"  
                    n ¦- "The Godfather"}
    

    This code first adds a child node 'Movies' to the 'root' node. It then adds the three nodes 'Pulp Fiction', 'Casablanca', and 'The Godfather', respectively to the 'Movies' node. Let's again de-sugar it:

    root.¦-("Movies").-+({ n: Node =>
                     n ¦- "Pulp Fiction"  
                     n ¦- "Casablanca"  
                     n ¦- "The Godfather"})
    

    First the ¦- operator is called on root passing the string "Movies" as argument. Then the +- operator is called on the node returned by ¦- passing an anonymous function as argument. Taking a look at the definition of the +- operator reveals the remaining details on what is going on. +- takes as argument a function, which takes an argument of type Node and returns an arbitrary type T. It then applies that function to the node +- was called on. So here the 'Movies' node is passed as argument for the parameter n of the anonymous function. The latter eventually adds the three nodes 'Pulp Fiction', 'Casablanca', and 'The Godfather' by invoking ¦- on n.

    So while there is some magic involved here, I think it is definitely not esoteric. The operators are implemented by means of standard Scala features and patterns. There are next to none side effects. Although, leveraging Scala's implicits might result in unwanted automatic conversions in rare cases, that risk can be minimized by carefully restricting the scope of the Node extensions.

    Posted by Michael Marth JAN 30, 2008

    Posted in jcr, link of the day, osgi and sling Add comment

    Carlos Perez has posted his list of the top five Java-based technologies to learn in 2008. They include OSGi, JCR, and Groovy. Sling gets mentioned as well.

    To quote Carlos:

    Reality check, not all data fits well within a relational database.

    Nuff said. (There is also a discussion about this post on TSS)

    Posted by Christian Sprecher JAN 30, 2008

    Posted in dynamic languages, jcr, rad and tutorial Comments 5

    I was rather impressed by Michaels samples on Scala and JCR. Myself, I am currently fiddling around with a Groovy/Grails project and a JCR backend. There has been some work within Grails to have a JCR implemented as a persistence possibility alternative to Hibernate/RDBM. While this effort (which is currently stalled) is surely interesting in itself (a Grails CMS backed by a JCR might really be a killer app!), there is another fascinating technique in the Groovy ecosystem: Builders

    Builders are the Groovy implementation of the same named pattern. If you don't remember this pattern, lets put it simple: Builders allow you to create stuff in a declarative manner. The Groovy incarnation is especially well suited for tree like structures. One of the most prominent implementation is the SwingBuilder (to learn more about Builders follow the link, the example there is strikingly simple).

    So, JCR... trees... builders... you see where it goes: it is easy to create a JCR tree builder with Groovy. But first, let's have a look at a test script:

    import javax.jcr.Repository
    import javax.jcr.Session
    import org.apache.jackrabbit.core.TransientRepository
    import javax.jcr.SimpleCredentials;
    import javax.jcr.Node
    
    class JCRBuilderTests extends GroovyTestCase {
       static Session session
    
       static {
         Repository repository = new 
           TransientRepository()
         //Session session = repository.login()
         session = repository.login(
                   new SimpleCredentials("username",
                    "password".toCharArray()))
           }
    
           void testBasicStuff() {
           JCRBuilder builder = new JCRBuilder()
           builder.session = session
           try {
               def node = 
    	      builder.blog('title':'The Depressed') {
                   entry() {
                     first('title': 'Monday again') {
                       attachment('nt:file') {
                         'jcr:content'('nt:resource',
                         ['jcr:mimeType':'text/plain',
                         'jcr:data':
    		         'Monday morning ...again',
                         'jcr:lastModified':
    		         Calendar.instance])
                        }
                     }
                 }
               }
               println "node:  ${node.name}"
               assert session.rootNode.getNode('blog').
                   getProperty('title').string==
    	          'The Depressed'
               assert
                 session.rootNode.
                 getNode('blog/entry/first/attachment').
                 primaryNodeType.name == 'nt:file'
           }finally {
               session.rootNode.getNode('blog').remove()
               session.save()
    
           }
        }
    }
    

    The testBasicStuff method shows how a builder is used. Two things need to be mentioned:

    1. builder.blog('title':'The Depressed') {
      means that a node named 'blog' is created and then an attribute named title is defined.
    2. attachment('nt:file') { ...
      means that a node with the primary type nt:file is defined

    Now to the meat. The builder itself looks like this:

    
    import javax.jcr.Session
    import javax.jcr.Node
    
    
    class JCRBuilder extends BuilderSupport {
           Session session
           Node parentNode
    
       public JCRBuilder() {
           super()
       }
    
           //trigger: foo()
           //return: Node
           def createNode(name) {
                   checkParentNode()
                   parentNode = parentNode.
    	          addNode(name)
                   return parentNode
           }
    
           //trigger: foo('x')
           //return: Object
           // creates node with value=nodetype
           def createNode(name, value) {
                   checkParentNode()
                   parentNode = parentNode.
    	          addNode(name, (String)value)
                   return parentNode
           }
    
           //trigger: foo(a:1)
           //return: Object
           def createNode(name, Map attributes) {
    
           Node node = createNode(name)
                   attributes.each { key, value ->
                           node.setProperty(key, value)
                   }
                   return node
           }
    
           //trigger: foo(a:1, 'x')
           //return: Object
           def createNode(name, Map attributes, value) {
                   checkParentNode()
           Node node = createNode(name, value)
           attributes.each { key, myvalue ->
               node.setProperty(key, myvalue)
           }
                   return null;
           }
    
           //trigger: createNode(...) finished
           def void setParent(parent, child) {
    
           }
    
           //trigger: recursive closure call finished
           void nodeCompleted(parent, node) {
           session.save()
           parentNode = parentNode.getParent()
           }
    
           private checkParentNode() {
                   if(parentNode==null) parentNode = 
    	          session.getRootNode()
           }
    }
    

    This is a most basic form of a possible builder, but it hopefully clarifies the whole concept nevertheless. Groovy builders are a superb way to create hierarchical object trees, hence a "natural one" to create JCR tree fragments. Of course there might be better ways to create a JCR builder, with more of a DSL-like approach. The example above is not perfect, and one major issue remains: after executing the test for the second time, an exception "java.lang.IllegalStateException: Index already present" is thrown. A free beer to the one who finds out this one :)

    Posted by Lars Trieloff JAN 25, 2008

    Posted in documentation, jcr and modelling Comments 2

    Since content-centric applications are content-driven, modeling the content structure is the most crucial part when documenting the architecture of your application. A big part of the general architecture is usually determined by the framework you chose to use: If you are using Sling, it is Content-Behavior-Appearance, if you are using Apache Cocoon, it is content pipelining, and so on. What makes your application special is the content structure or the content model. As understanding the content structure is a crucial part for communicating the architecture of your application, you should spend considerable amount of time on designing, documenting and communicating the content structure to other developers. In JCR content has two general properties that deserve documentation: one the one hand there is the location of nodes in the content tree. The most straight-forward approach of documenting this is simply expressing the tree structure in a diagram as the one below or using a JCR repository browser like the CRX explorer that comes with Day's CRX repository or the open source tool JCR Explorer.

    There are multiple downsides connected with this approach: One the one hand, these autogenerated tree models communicate importance and relation of portions of the content tree poorly, as they can only express parent-child relationships, and to a certain degree node types. Secondly as the tree grows, it becomes increasingly complex and confusing to the observer. If you really care about communicating your content structure, then drive structure documentation, do not let it happen.

    The second aspect of content modeling for JCR is the node type. JCR has a complex node typing system that allows multiple inheritance, mixins, child-nodes and references. For real-world application documentation three approaches can be found:

    • using standard CND notation - this is the most obvious approach as you have to write the CND files anyway and it provides a very compact notation that is able to express every aspect of the node type. Unfortunately, this CND notation is optimized for writes, not readability or comprehensibility. In order to make it easy to understand, the following two approaches are being used.
    • automatically generated HTML nodetype documentation, using a tool like Jackrabbit-NTdoc , which basically takes the node type definitions and automatically translates them into a number of HTML pages that are browsable similar to Javadoc and document every aspect that can be found in the node type definition.
    • ad-hoc graphical notations. These notations often are inspired by UML or entity relationship diagrams, but seldom reused or documented. While they are more readable than the CND notation or browsable HTML documentation, the lack of standardization and meta-documentation makes them hardly portable.

    A main advantage of these graphical notations however is that you as the architect can decide what is important, what is related and what is obvious and does not need to be documented at a high level. This again shows that you should drive your content model documentation and not let it happen.

    The notation proposed below uses a combination of a graphical treemap notation for describing the content tree and a UML-class-diagram inspired notation for documenting node types, node type inheritance and node references. A main advantage of this notation is, besides re-use of existing notations like UML or Fundamental Modeling Concepts (FMC) that it offers a connection between tree structure and node type.

    The upper part of the chart features an example content tree in treemap notation. Speaking in FMC terms, this content tree is a set of nested places and this nesting can be driven by the architect in order to express relation (places are next to each other), containment (one place in another) and importance (place is bigger). You can even "zoom in" parts of the chart to explain content structure more in-depth. A good example for variable content can be found in /apps/wiki/themes where any number of themes can be stored, but two "default" and "extra" are mentioned as examples.

    This treemap structure is both visually compelling and compact, so it can be combines with the UML-inspired node type notation at the bottom of the chart. This notation uses UML class diagrams to express node types (bold font, shaded background) and Mixins (italic font, white background). Node types can have three types of relations: inheritance, containment and reference. For inheritance the default solid line with a hollow triangle arrowhead at the super type is used. For child nodes and associations a basic "association" line without arrowheads is used. For the cardinality of relationships: as there is only one parent node or referencing node, only the cardinality indicator at the child or referenced node type is used. Here we use a simple-regular-expressions inspired syntax where * means: any number of node, + means at least one node, n means exactly n nodes, and so on.

    Using a dotted line you can map node types to places in the treemap where this node type can be used.

    To sum it up, the proposed notation is a tool that helps understanding and communicating content-centric software systems. It is not intended to be used to automatically generate code or to be generated automatically from code, instead it is a second description of your software system that lives beside the code of your system (as the primary description) and is suited for technical communication with humans.

    Posted by Michael Marth JAN 23, 2008

    Posted in announcements, dev.day.com and link of the day Add comment

    Every now and then I post a "Link of the day", which is often a pointer to some other blog or article I find interesting. However, today I had too many links I wanted to post (four of them). So I thought, heck, I might as well set up a Digg clone for link sharing. Here it is (it is called Daigg).

    Anyone can sign up, post links and vote on links. The upcoming links (aka new and getting interest) are also listed on the right sidebar of this blog.

    In case you wonder about the name: it is a combination of Day and Digg, but also a word in Swiss German (Basel edition) and means "dough".

    Posted by Michael Duerig JAN 23, 2008

    Posted in dynamic languages, jcr, rad and tutorial Comments 2

    In a previous post I showed how to access a JCR compliant repository with Scala. This time I show how to add child nodes to a given node. Again I leverage Scala's implicit parameters mechanism to retroactively extend the Node class. A working example of the following code is available from my personal blog.

    My goal was to define an operator for adding nodes in such a way that the tree structure of the nodes becomes apparent just from looking at the code. Optimally the code would look similar to an ASCII art drawing of a file system hierarchy.

    ????mobile
    ????monads
    ????parsing
    ?   ????lambda
    ?       ????test
    ????pilib
    ????tcpoly
    ?   ????collection
    ?   ????monads
    ????xml
        ????phonebook
    

    To achieve this I had to come up with a chainable mechanism to recursively add child nodes to a parent node. Two operators are involved in this mechanism: one for adding nodes and another for opening a new branch. The operator for adding a child node to a parent node is just an alias for JCR's addNode method:

    def ¦-(relPath: String) = node.addNode(relPath)
    def ¦-(relPath: String, tyqe: String) =
                   node.addNode(relPath, tyqe)
    

    The operator for opening a new branch is a bit trickier:

    def -+[T](f: Node => T) = f(node)
    

    It takes a lambda expression and applies it to the node it was called on. That way a block of code following the -+ operator receives a binding for its parent node. It can then use this binding to recursively add further nodes and branches. With these operators in place adding new nodes to a parent node can be expressed in a very concise way:

    def addSubtree(n: Node) {  
      n ¦- "1" -+ { n: Node =>
              n ¦- "i"  
              n ¦- "ii"  
              n ¦- "iii"  
              n ¦- "iv"  
              n ¦- "v"
      }
      n ¦- "2"
      n ¦- "3" -+ { n: Node =>
              n ¦- "i"  
              n ¦- "ii" -+ { n: Node =>
                       n ¦- "a"
                       n ¦- "b"
                       n ¦- "c"
              }
              n ¦- "iii"  
              n ¦- "iv"  
              n ¦- "v"
      }
      n ¦- "4"
      n ¦- "5"
      n ¦- ("6", "nt:file")
    }
    

    Extending this mechanism to single valued properties is straightforward. Handling multi valued properties in a consistent way (i.e. allowing the property's value to be passed as sequence) turns out to be much more difficult. I will write more about this in a later post.

    Posted by David Nuescheler JAN 18, 2008

    Posted in atom, atompub, jcr, jsr-170, jsr-283 and webdav Comments 9

    There has been a bit of discussion about JCR and Atom in the blogosphere this week. Things got started by Adrian Sutton's post "Atom Is The New JCR", then there was Dan Diephouse ("Atom has not replaced JCR it has supplemented it"), then Sam Ruby wrote (1 line) about it.

    These posts reminded me of the time 5 years ago. Back then, people started just about every discussion around JCR with the question "We have WebDAV, why do we need JCR?"to the point where I would include the slides onthe relationship between JCR and WebDAV to preemptthe questioning. Luckily it worked, after a coupleof months the whole debate was gone...

    ...the good news is, what applied to the relationbetween JCR and WebDAV also applies to the relationbetween JCR and Atom. And I get to recycle my slides ;)

    I should mention that I see Atom Publishing Protocol (which I will refer to as "Atom" throughout this post) as a simplified version (probably for the right reasons) of WebDAV & related specifications (which I will refer to as "WebDAV" throughout this post).

    So, let's look at some aspects of how Atom and JCR are related:

    1. Master of the obvious.

    1. JCR is an API, Atom is a protocol.
      We need both an API and a protocol. Nobody would argue the needfor the Servlet API, on the basis of the existence of the HTTP protocol.
    2. Functional goals.
      As mentioned I think that Atom is can be compared to WebDAV.Both are protocols, both are REST-ish XML over HTTP.
      However from a functional perspective WebDAV is much broaderthan Atom and JCR goes even beyond that.

      Features Atom WebDAV JCR
      Read x x x
      Write x x x
      Query   x x
      Fulltext Search   x x
      Access Control   x x
      Workspace Mgmt   x x
      Versioning   x x
      Locking   x x
      Observation     x
      Transactions     x
      Retention & RM     x

    The above table doesn't mean that Atom is inferior to JCR - they just have different goals and scopes. The simplicity of Atom is great, and itsextensibility prevents you from being too constrained when exchanging content. But JCR has a different focus, and so does WebDAV.

    2. "Content Silos" and Integration

    JCR provides standards based integration, that allowsto implement a minimal (sub)set of features and still covera large set of applications. JCR calls this "Level 1 compliance".

    In the JCR expert group we had a long discussion on what featuresshould go into "Level 1" to be able to enable as manysmall and simple applications as possible, and we choseRead & Search.

    I am convinced that if you are integrating into a thirdparty repository search is very important. I completelyunderstand why search is not specified in Atom. It isa very painful process to get various vendors to agree on query syntax and semantics.

    3. Content Repositories are Application Development Infrastructure.

    JCR also provides infrastructure for application developersthat need features like access control, full-fledged versioning, fulltext search or just simply be able to deal efficiently with large streamed binaries or an arbitrarily sized hierarchy of information.

    I personally tend to compare Content Repositories to RelationalDatabases or Filesystems.

    4. Atom + JCR.

    Apache Jackrabbit for a long time exposed a WebDAV server. A general purpose Filesystem based WebDAV access anda complete remoting of the JCR API through WebDAV.Judging on the great deal of interest in broader Jackrabbit community I am convinced that it will be a matter of weeks until we see an Atom layer on top of JCR.

    The fact that people already built a backing store for Apache Abdera (Apache's Atom "Server") using JCR (Apache Jackrabbit)very much shows that clearly it is JCR + Atom not JCR vs. Atom.

    I think Atom and JCR are a very natural fit and a Atom over JCR implementationmakes a lot of sense as part of an Apache Project, be it a partof Abdera, Jackrabbit or even Sling

    Posted by Michael Marth JAN 18, 2008

    Posted in jcr and link of the day Add comment

    If you read this blog you probably know this already, but: Jackrabbit 1.4 has been released! Congratulations to the project team!

    Jukka Zitting has the announcement on his blog and provides some further insights on TheServerSide's forum.

    Posted by Michael Marth JAN 18, 2008

    Posted in jcr, microjax, microsling, rest, ria, ujax and usling Comments 7

    Ever since Gmail entered the Internet we have seen an ever increasing interest in the "client". Before Gmail (in the beginning of this century) the best practice for web applications was HTML pages that were dynamically generated on the server-side. The high level system architecture of such an application would usually consist of a data base, an application server of some kind that generates the HTML and the browser.

    Fast forward to 2008: the world has changed. Let me highlight some differences to 5 years ago.

    • The user experience of web applications is more of a concern and has been improved a lot. Consider Flickr’s drag-and-drop interface or iGoogle. While they were not technically impossible few considered to implement such an UI back then. In general, this means that more code is executed on the client (aka AJAX).
    • Servers routinely pump out pure data (rather than markup) in formats like RSS or JSON. This data can be integrated and displayed on the client (see the whole mash-ups phenomenon). From the point of view of the application developer all he does is write the client.

    These developments have an impact on the high level system architecture: the client actually does something (rather than just render documents). And more: the client consumes data from the server (rather than documents to render).

    So where does the "business logic" reside now? There is a continuum of possibilities: on one extreme it resides exclusively on the server (this is the "old style"). Then there is AJAX applications, where usually some logic resides on the server and some on the client. And by now, there is also the other extreme: all logic resides on the client (Bob Buffone has a nice slide on this continuum, see his presentation, slide 9).

    The clients are alright

    Let us look at the extreme case "all logic on the client". It might seem an unnatural thing to do (especially if you spent the last years coding server-executed JSPs or PHP scripts). But consider the technologies used for building Rich Internet Applications like Flex or Silverlight. If you use such a technology it is quite obvious that the client exchanges nothing but raw data with the server. Who in his right mind would let the server dynamically generate parts of a Flex application? Now, if you think about it, with an AJAX client the situation is not so different.

    So in summary, it has become a perfectly valid possibility for an application design to place all logic onto the client.

    The architectural change

    If a certain amount of business logic resides on the server writing an application invariably means server programming. The server is your application.

    However, in an architecture where the server just persists data and serves it to a client where the application logic resides this is necessarily the case anymore. If there is no application logic on the server then the server is just a piece of standard infrastructure much like a RDBMS is today (most people do not code business logic into their data base. Oh well, yes, some do). It just sits there, but there is no coding to be done on it.

    Meet microjax

    This type of application architecture is what David Nuescheler is aiming at with microjax. Let the client do the UI (because the client-side technologies are so much better at that) and let the server serve the data.

    In microjax you have a (JCR) content repository and a very thin layer on top that reads and writes JSON. This means that the server-side is not split into the classical database/application server combo. Your data is your web server. The mapping between data and JSON output is implicit and does not need any explicit coding. The access rights are in your JCR (on "row level" in db lingo).

    Let me highlight some implications of this:

    • There is no need for server-side scripts.
    • There is no need for db drivers. A REST interface is the "driver".
    • There is less need for server-side CPUs. The clients bring their own.
    • Web application development requires only HTML and Javascript skills (no PHP, JSP, ASP, …) which is around in abundance and the artificial boundary between web designers and server developers that translate their work into server code is gone.

    See here to get started with microjax.

    In the blogosphere

    Who else is on this case? Peter Svensson has been exploring similar ideas (see his blog post The End of Web Frameworks and the follow-up Alternatives to Server-side Web Frameworks.

    Even more, for his project Mashupstation Peter has come up with these interesting rules for application development:

    1. All functionality that possibly can be implemented on the client side shall be implemented on the client side.
    2. All communication with the server middleware shall be constrained to service-interfaces; For instance REST.
    3. No part of the client shall be evoked, generated or templated from the server-side. This rules out in-line conditional HTML in JSP, ASP, PHP, etc.
    4. The logic making up the server middleware will be implementing only the following functionality:
    a. Security
    b. Database access
    c. Cross-domain proxying for remote resources implementing only a) and b).

    These rules could have been taken from the microjax textbook it seems to me.

    Back to the continuum

    As I described above, the "all logic on the client" case is an extreme case of a continuum. But this extreme case might not be the right choice for each and every application. Luckily, microjax is part of microsling. Microsling allows application developers to execute server-side scripts that are implemented e.g. in Javascript. As such, the application developer can freely decide how much logic shall reside on the server and how much shall be on the client. Since both environments are coded in the same programming language moving logic back and forth is rather seamless (to get started with microsling see here).

    IBM "gets it"

    From the high level architecture view microjax and CouchDB are very similar. Both combine persistence (without prior structure) and a web server. Now, IBM has recently bought CouchDB.

    IBM’s employee Patrick Mueller blogged about this purchase:

    It's a web server and a database. And to me, this is the most interesting point. Just as we've seen client programs start to embed web browsing technology (like iTunes), there's really no reason why server programs like a database shouldn't be able to embed a web server.

    Apparently, Big Blue considers the architectural change described above to be relevant.

    As a side-note: while microjax and CouchDB are similar in ideas, but they differ in maturity. JCR implementations like Apache Jackrabbit (upon which microjax is based) have been in production use for years. Plus, there are tools for using them efficiently.

    And another note: the "without prior structure" bracket above is further explored here.

    Posted by Michael Marth JAN 16, 2008

    Posted in ask the community and jcr Add comment

    Here's the next part of my journey to retrieve some of the hidden JCR community knowledge. Encouraged by the interesting insights of the Mindquarry team I approached Cédric Damioli who is a Product Manager at Anyware Technologies based in Toulouse. Cédric has built the open source CMS Ametys on top of Apache Jackrabbit and Cocoon. It leverages OSWorkflow as a workflow engine and uses DocBook internally (nice architecture!).

    Here's what Cédric had to say about building Ametys:

    Q: Cédric, when you were designing your persistence architecture for Ametys what potential choices were you considering and what influenced your decision towards Jackrabbit? Did you consider XML databases like the Mindquarry team?

    Back in 2004, when we made the Jackrabbit choice, we were designing the architecture of the version 2.0 of our CMS (the name Ametys was born only last year). The 1.x versions persisted their documents in a CVS repository. While this may sound somewhat odd, it appeared to be really nice: things like versioning or documents hierarchy were handled natively, and thanks to a Java bridge between Cocoon and the CVS repository, the source code was quite light and easy to understand.

    But there also was some important drawbacks: a CVS repository can't handle custom metadata, the time to access a single document grows with the overall size of the repository and the installation of the CMS was very intrusive on the target server.

    So we decided to switch to a new persistence architecture, with the same benefits as the CVS server, but without the same limitations.

    We considered three technologies:

    • Subversion, as the natural successor of CVS
    • WebDAV/Slide
    • JCR/Jackrabbit

    The JCR spec was even not final, and Jackrabbit was still under incubation with no public release, but three facts made the choice obvious for me:

    • In 2004 Stefano Mazzochi wrote a paper about a new technology called JCR and I remembered this article a few months later
    • Sylvain Wallez, former Cocoon PMC chairman and our R&D director, is an Apache member. As such he was part of the JCR 1.0 expert group and early Jackrabbit committer, which seemed to be a quite good warranty to me.
    • And last, but certainly not least, the JCR spec is very, very good! It contains all concepts I wanted to have in a content repository.

    Q: Let us know how the reality check worked out. Did your expectations regarding the JCR come true or did you have to overcome some difficulties you did not expect? Were there any pleasant or not so pleasant surprises after working with Jackrabbit for a while?

    For this question I have to clearly distinguish the spec and its implementation. While it was great to learn to work with JCR, the use of Jackrabbit in a production environment was not that great one or two years ago.

    JCR, I would say, has the pros and cons of a young technology. JCR 1.0 handles nodetypes, hierarchies and versioning very well, but its search capabilities are limited in a real-world application.

    About Jackrabbit: the spec is well implemented, but it lacks administrative tools and APIs. For example, it is impossible to programmatically inspect the contents of a PersistenceManager in order to detect or repair inconsistencies. It is also impossible to programmatically reindex a workspace or the whole repository. Moreover, there's no real backup/restore solution or monitoring application.

    So yes, the JCR choice was good, and all content related tasks are well designed and implemented, but I did not anticipate that the needs of my customers would go far beyond.

    Q: I am still trying to find my favourite JCR tool. What tools did you use for your JCR-related development work?

    In early 2005, we used the Jackrabbit XML PersistenceManager and I used to crawl the repository with... vi or notepad. Now that the community has grown, some cool tools exist. We mainly use the web-based JCR-explorer and JCR Controller which is an Eclipse app.

    Q: You surely must have learned some important lessons about JCR from building Ametys. Would you like to share some of them with us?

    JCR is very powerful by itself. At the beginning, we made the choice to hide the JCR API behind our own Repository API, in order to give CMS developers more flexibility. Two years later, no other implementation of this proprietary API exists and our developers had to learn yet another API. So we were wrong: the content model is well thought through and the API is easily learnt. There is no need to map the API onto another one, like we used to do with JDBC.

    Q: If you had one wish regarding JCR and the JCR community, what would it be?

    The Jackrabbit community is healthy, I hope it will keep that way. One could expect to have more visibility into the JCR expert group work, but I know the JCP rules are quite strict.

    Regarding functionality: better query features and more admin tools and APIs. That would definitively make JCR and Jackrabbit rock!

    Posted by Michael Marth JAN 16, 2008

    Posted in link of the day and rest Add comment

    Although Roy Fielding is Day's Chief Scientist we have not posted a lot about REST on this blog so far. Well, I'd like to start to change this with this link of the day: "Will Enterprise Architects Get Any REST in 2008?" (on Baseline).

    On the REST vs. SOAP debate Roy is quoted as:

    But Fielding says framing the question as simplicity versus sophistication is a false choice. "In reality, it requires the most sophistication to create a simple system, whereas it's trivial to create a system that nobody understands," he says. "Really what REST does is constrain the application developer or the designer of these systems into following a pattern that makes every interface simple. If you do that, later on, you can recombine these interfaces and simple and interesting ways."

    Posted by David Nuescheler JAN 11, 2008

    Posted in jcr, jsr-170 and jsr-283 Add comment

    When I speak at conferences and other occasions about JCR the question of adoption is asked frequently.
    People usually think of adoption as "getting as many repository vendors as possible to adhere to the specification". This is probably because a lot of people see JCR primarily as a vehicle to integrate content from various repository vendors.

    It is important to mention that JCR has been very successful in that respect.
    To illustrate that please check my slide below for repository vendors that I am aware of that can be accessed through JCR, hence are JCR compliant to a specific level (please feel free to let me know of repositories that I am not aware of).

    Now this means for example that you can hook-up Microsoft Sharepoint to a standardized java interface and virtualize the content with any of the other JCR compliant repositories.

    While this integration aspect certainly is a big part of the mission of JCR, it is clearly not the only goal and most certainly not the only way how JCR impacts the market towards standardization.

    JCR frees the application developers

    Application developers do no longer have to implement their own repositories, but can use off the shelf standards compliant repositories like for example Apache Jackrabbit.
    Every application developer that does not build their own repository is "one repository down".

    This changes the game drastically and makes the application developers a lot more nimble, not having to implement features like access control, versioning, hierarchies, or multi-value properties that really every application that I can think of requires.

    It is very encouraging to see the quick rate at which new JCR applications are being developed.

    Ultimately the applications that exist on top of JCR will drive the adoption and will fuel the consolidation of the JCR infrastructure market.
    The important question is not "How many repositories are compliant" but "How many applications are using jcr as its backing store".

    I always like to use relational databases as a parallel when it comes to the evolution of an infrastructure market.
    It was SQL as the initial relational standard that allowed application developers to build applications without having to ship their embedded proprietary database with the application and that finally led to the consolidation that we have today in the database market.

    It is important to understand that this cycle from chaos, through standardization, to infrastructure and finally to commodity usually takes around 10-25 years and we are at the very early beginning with respect to JCR.



    Posted by Michael Marth JAN 11, 2008

    Posted in announcements and dev.day.com Add comment

    We are happy to announce the latest addition to dev.day.com: PlanetDay. PlanetDay is an aggregation of public weblogs written by employees of Day Software in their spare time. So you might encounter posts about software development as well as cartoons or randon ramblings (so far, it's mostly the former).

    I'd like to thank Lars Trieloff for helping me in setting this up.

    Posted by Michael Duerig JAN 10, 2008

    Posted in dynamic languages, jcr, rad and tutorial Comment 1

    Having seen Scala being mentioned in a recent comment I thought I'd give it a try with JCR.

    First I installed the Scala Plugin for Eclipse. Alternatively I could have also used Scala's command line tools. Then I created a new Scala project from within the Eclipse IDE and put the Jackrabbit jars on the classpath.

    Now I set out to implement Hop 3 from the article First Hops with Jackrabbit. I therefore downloaded the test.xml file and put it into a subfolder of my Scala project called data. Translating the java code to Scala is straight forward. However, I decided that I wanted to benefit from some of Scala's goodies which required a little more work. Essentially I wanted to be able to use for comprehensions to iterate the child nodes and properties of a given node. Let's take a look at the code first. I will comment on the crucial parts afterward.

    package com.day.scalademo;
    
    import javax.jcr.{Node, Property, SimpleCredentials, 
      ImportUUIDBehavior}
    import org.apache.jackrabbit.core.TransientRepository
    import java.io.FileInputStream
    
    object Hop3 {
    
      def main(args: Array[String]) {
        val repository = new TransientRepository
        val session = repository.login(
          new SimpleCredentials("username", 
          "password".toCharArray));
            
        try {
          val root = session.getRootNode
    
          if (!root.hasNode("importxml")) {
            print("Importing xml... ")
    
            val node = root.addNode("importxml", 
              "nt:unstructured")
            val xml = new 
              FileInputStream("data/test.xml")
            session.importXML("/importxml", xml, 
              ImportUUIDBehavior.IMPORT_UUID_CREATE_NEW)
            xml.close
    
            session.save
            println("done.")
          }
    
          dump(root.getNode("importxml"))
        } 
        finally {
          session.logout
        }    
      }
            
      private def dump(node: Node) {
        import Extensions._
        println(node.getPath)
    
        for (property <- node.properties) {
          if (property.getDefinition.isMultiple)      
            for(value <- property.getValues)
              println(property.getPath + " = " + value)
          else 
            println(property.getPath + " = " +
              property.getString)
        }
            
        for (child <- node.childNodes)
          dump(child)
      }
    }
    
    object Extensions {
      implicit def extendNode(node: Node) =
        new NodeExtender(node)    
    
      class NodeExtender(node: Node) {
    
        def childNodes = new Iterator[Node] {
          val it = node.getNodes
          def hasNext = it.hasNext
          def next = it.nextNode
        }
        
        def properties = new Iterator[Property] {
          val it = node.getProperties
          def hasNext = it.hasNext
          def next = it.nextProperty
        }
      }
    }
    

    The main method is a straight forward translation from java. It acquires a Repository instance, imports the content of test.xml if it hasn't been imported before, and finally outputs the subtree generated from the import by a call to the dump method.

    The dump method takes a Node argument and recursively outputs its properties and its child nodes. There are two noteworthy things here: the import directive and the for loops. This form of import basically imports all definitions from the Extensions object. The for loops iterate over the child nodes and properties of a Node. But note, Node objects neither have a childNodes nor a properties member but nevertheless exactly these are accessed here. Why didn't I use the getNodes and getProperties members? The reason is that in order to be usable in a for loop an object must implement scala.Iterator[T] for some type T. Both methods getNodes and getProperties return an iterator of the wrong type.

    I employed Scala's implicit parameters mechanism* to emulate something like C#'s extension methods. Definitions in scope which are marked with the implicit keyword are candidates for implicit type conversion. In our case we import everything from the Extensions object. So extendNode is in scope at the location of the for loops. Since the Scala compiler cannot find the properties and childNodes members it employs extendNode passing the current node as argument. extendNode returns a NodeExtender which has properties and childNodes methods which return a scala.Iterator[Property] and a scala.Iterator[Node], respectively. Both iterators simply delegate to the iterators returned by the getProperties and the getNodes methods of the underlying node.

    Using Scala for JCR programming seems very promising. It adds little to no overhead to accessing the repository from java. Moreover it possesses highly sophisticated features which lead to elegant solutions for many common programming tasks. I will try to elaborate more on using JCR from Scala soon.

    (*) <rant>This is in some ways similar to implicit type conversion in C++ but done right (i.e. more powerful and safer).<rant/>

    Posted by Michael Marth JAN 09, 2008

    Posted in dev.day.com, microsling, sling and usling Add comment

    Lately, I have been a bit quiet about Sling (apologies to the readers that wait for part 2 of "How this blog is built"). The reason is that there has been quite some discussions (and changes) regarding

    One architectural bit that has been ironed out is the separation of Sling and microsling. It has been decided that they get merged again.

    Sling developer Betrand Delacretaz has nicely summed up this merge in the corresponding ticket description (on Sling's bug tracker). Unlike most tickets it is definitely worth a read.

    Especially useful is that Bertrand clearly defines the design goals of microsling (2.0):

    µsling 2.0 is a preconfigured instance of Sling, meant to allow web developers to test drive Sling by building scripted web and REST applications backed by a JCR repository.

    The µsling 2.0 distribution only requires a Java 5 VM to run, no installation is needed. Fifteen minutes should be enough to start µsling and understand the basic concepts, based on self-guiding examples. µsling should ideally be delivered as a single runnable jar file.

    Java programming is not required to build web and REST applications with µsling 2.0: both server-side and client-side javascript code and presentation templates can be used to process HTTP requests. Other scripting and templating languages (JSP and BSF-supported ones) can be plugged in easily.

    Posted by Lars Trieloff JAN 09, 2008

    Posted in graph, news, open and social Comment 1

    The in the last two days we have seen two exciting news: Google and Facebook joining Dataportability.org and Google, IBM and Verisign agree to support OpenID. Together with Apache's Shindig, an open source implementation of Google's OpenSocial engine (see this Youtube Video for an interview with Brian McCallister who explains what Shindig and OpenSocial are) we are witnessing what will evolve to true social network portability.

    • With OpenID you are able to transfer your identity from one network to another. No need to enter the same information about yourself over and over again. You are even free to create multiple identities if you would like to separate some aspects of your digital life.
    • With DataPortability.org you are able to transfer the social graph from one network to another. This means you do not have to find your friends on each new network again and again.
    • And with OpenSocial have application portability. If there is a nice widget in one network that you would like to embed into another network - no problem with OpenSocial.

    As a consequence, the costs of joining yet another social network will shrink dramatically. As you have portability of identity, social graph and applications, you can start cherry-picking by joining many specialized networks, selecting the parts of the application that are most useful and aggregate them in your main OpenSocial container. But this also changes the rules of the game for social network vendors. You do not have to build up your user community from scratch, you do not have to convince your users to jump the high sign-up-and give-away-your-private-information barrier anymore, no you can create a highly specialized niche social network that serves only a small fraction of the population or that has only few, highly specific use-cases. This will lead to the generation of thousands of nice social networks, some standalone, some embedded into larger websites, but all will be able to interchange users, social graph and widgets with each other.

    As for the big players Google and Facebook: They will have to compete for the best platform for running these widgets. Facebook can benefit from the large number of existing Facebook applications and the really neat integration into the site from a usability point of view, but Google's hold on the desktop with Google Toolbar and the ability to display desktop widgets in the web and vice versa could lead to a completely new way to see the web.

    Posted by Michael Marth JAN 09, 2008

    Posted in link of the day, sling, ujax and usling Add comment

    Link of the day: My colleague Lars Trieloff has has written a nice and easy-to-understand overview article about Apache Sling and microjax.

    Posted by Michael Marth JAN 09, 2008

    Posted in link of the day, sling, ujax and usling Add comment

    OK, I just came across another introduction to Apache Sling so here's another link of the day :) The author of this blog (called "So Limited") has also published a number of posts on JSR-170.

    Posted by Michael Marth JAN 04, 2008

    Posted in dynamic languages, jcr, rad and tutorial Add comment

    Given a) the current interest in dynamic languages and b) the comments after the last post I would like to show how easy it is to use a Java Content Repository from JRuby.

    First, download and install JRuby. Make sure you got it right by executing

    jruby -v

    from the command line.

    Also download the Jackrabbit jars and put them on the classpath. An easy way to do this is to copy them into JRuby's lib directory.

    Now, let's access our Jackrabbit repository through Ruby code. I have taken the examples Hop 1 and Hop 2 from the Jackrabbit introduction and translated the Java code straight into Ruby:

    require 'java'
    include_class('java.lang.String') {|package,name| "J#{name}" }
    include_class 'javax.jcr.Repository'
    include_class 'javax.jcr.SimpleCredentials'
    include_class 'org.apache.jackrabbit.core.TransientRepository'

    repository = TransientRepository.new
    session = repository.login(SimpleCredentials.new("admin", JString.new("admin").toCharArray))
    name = repository.getDescriptor(Repository::REP_NAME_DESC);
    user = session.getUserID
    puts "logged in as " + user + " in " + name

    root = session.getRootNode
    hello = root.addNode("hello")
    world = hello.addNode("world")
    world.setProperty("message", "Hello, World!")
    session.save

    node = root.getNode("hello/world")
    puts node.getPath
    puts node.getProperty("message").getString

    root.getNode("hello").remove
    session.save
    session.logout

    Save this as a file jcrTest.rb and execute

    jruby jcrTest.rb

    You should get something like:

    C:\>jruby testJava.rb
    log4j:WARN No appenders could be found for logger (org.apache.jackrabbit.core.TransientRepository).
    log4j:WARN Please initialize the log4j system properly.
    logged in as admin in Jackrabbit
    /hello/world
    Hello, World!

    Don't worry about the log4j warnings (or fix them according to the Jackrabbit introduction mentioned above). The important bit is that the script gets a repository session, writes some content and deletes it again.

    The code is really just a one-by-one translation of the Jackrabbit samples so see the original page for explanations. In case you are confused about the JString import in the second line: this is to avoid the name clash of Ruby strings and Java strings. For further information have a look at Ola Bini's intro to JRuby.

    We can also use JRuby for something more interesting: an interactice JCR console. Ruby (and JRuby) have the console irb (and jirb) that dynamically executes ruby code. A session where a JCR is connected to and written into could look like this:

    C:\>\dev-tools\jruby-1.0.3\bin\jirb
    irb(main):004:0> repository = TransientRepository.new
    => #<Java::OrgApacheJackrabbitCore...
    irb(main):005:0> session = repository.login(SimpleCredentials.new("admin", JString.new("admin").toCharArray))
    log4j:WARN No appenders could be found for logger (org.apache.jackrabbit.core.TransientRepository).
    log4j:WARN Please initialize the log4j system properly.
    => #<Java::OrgApacheJackrabbitCore...;
    irb(main):006:0> user = session.getUserID
    => "admin"
    irb(main):007:0> root = session.getRootNode
    => #<Java::OrgApacheJackrabbitCore...
    irb(main):008:0> hello = root.addNode("hello")
    => #<Java::OrgApacheJackrabbitCore...
    irb(main):009:0> world = hello.addNode("world")
    => #<Java::OrgApacheJackrabbitCore...
    irb(main):010:0> world.setProperty("message", "Hello, World!")
    => #<Java::OrgApacheJackrabbitCore...
    irb(main):011:0> session.save
    => nil
    irb(main):012:0> session.logout
    => nil
    irb(main):013:0>exit

    (For better legibility I have shortened some of jirb's output and set the input lines to bold.)

    Posted by Michael Marth JAN 02, 2008

    Posted in fud busting and jcr Comments 4

    CMSWatch has commented on JCR and how important it is, that content repositories can be accessed through various languages. I wholeheartedly agree. However, they also write:

    But if the API mandates the use of one particular language (such as Java), the Holy Grail of universality immediately takes a hit. Not everyone uses Java, or wants to.

    The API itself does not mandate a particular language (if anything, you might have troubles with type conversions). In fact there are a number of projects for integration of JCRs and other languages.

    Integration of a JSR-170 compliant repository and a language other than Java can come in 3 flavours (that I am aware of):

    1. Implementation of of the API in another language, i.e. implementation of the spec
    2. Accessing a Java-based JSR-170 implementation through an adapter that maps Java and another language
    3. Accessing a Java-based JSR-170 implementation through REST (i.e. http)

    The first route is taken by the open source cms Typo3. In the upcoming version 5 Typo 3 will use its own PHP-based implementation of a JSR-170 compliant content repository. More information about this approach is available here:

    The second approach has been chosen by another popular open source cms: Midgard. While Midgard is written PHP it makes makes its content available through the JCR API. JNI is used for the integration.

    Further examples of this approach are

    • Getting access to a JCR from .Net (C#) has been described here
    • The Java Content Repository Ice Connector allows access to a JCR from C#, VB, C++, Python, PHP.
    • The PHP-Java-Bridge has been used in the Typo 3 project recently to get access to a Java-based JCR from PHP (before the port mentioned above).

    The third way of cross-language access to a JCR is to access the repository via a RESTful API. This is implemented for example in microjax. In microjax the language of choice is Javascript and the data exchange format is JSON.

    Are you aware of other endeavours in this area? Please leave a comment.