Since content-centric applications are content-driven, modeling the content structure is the most crucial part when documenting the architecture of your application. A big part of the general architecture is usually determined by the framework you chose to use: If you are using Sling, it is Content-Behavior-Appearance, if you are using Apache Cocoon, it is content pipelining, and so on. What makes your application special is the content structure or the content model. As understanding the content structure is a crucial part for communicating the architecture of your application, you should spend considerable amount of time on designing, documenting and communicating the content structure to other developers. In JCR content has two general properties that deserve documentation: one the one hand there is the location of nodes in the content tree. The most straight-forward approach of documenting this is simply expressing the tree structure in a diagram as the one below or using a JCR repository browser like the CRX explorer that comes with Day's CRX repository or the open source tool JCR Explorer.
There are multiple downsides connected with this approach: One the one hand, these autogenerated tree models communicate importance and relation of portions of the content tree poorly, as they can only express parent-child relationships, and to a certain degree node types. Secondly as the tree grows, it becomes increasingly complex and confusing to the observer. If you really care about communicating your content structure, then drive structure documentation, do not let it happen.
The second aspect of content modeling for JCR is the node type. JCR has a complex node typing system that allows multiple inheritance, mixins, child-nodes and references. For real-world application documentation three approaches can be found:
- using standard CND notation - this is the most obvious approach as you have to write the CND files anyway and it provides a very compact notation that is able to express every aspect of the node type. Unfortunately, this CND notation is optimized for writes, not readability or comprehensibility. In order to make it easy to understand, the following two approaches are being used.
- automatically generated HTML nodetype documentation, using a tool like Jackrabbit-NTdoc , which basically takes the node type definitions and automatically translates them into a number of HTML pages that are browsable similar to Javadoc and document every aspect that can be found in the node type definition.
- ad-hoc graphical notations. These notations often are inspired by UML or entity relationship diagrams, but seldom reused or documented. While they are more readable than the CND notation or browsable HTML documentation, the lack of standardization and meta-documentation makes them hardly portable.
A main advantage of these graphical notations however is that you as the architect can decide what is important, what is related and what is obvious and does not need to be documented at a high level. This again shows that you should drive your content model documentation and not let it happen.
The notation proposed below uses a combination of a graphical treemap notation for describing the content tree and a UML-class-diagram inspired notation for documenting node types, node type inheritance and node references. A main advantage of this notation is, besides re-use of existing notations like UML or Fundamental Modeling Concepts (FMC) that it offers a connection between tree structure and node type.
The upper part of the chart features an example content tree in treemap notation. Speaking in FMC terms, this content tree is a set of nested places and this nesting can be driven by the architect in order to express relation (places are next to each other), containment (one place in another) and importance (place is bigger). You can even "zoom in" parts of the chart to explain content structure more in-depth. A good example for variable content can be found in /apps/wiki/themes where any number of themes can be stored, but two "default" and "extra" are mentioned as examples.
This treemap structure is both visually compelling and compact, so it can be combines with the UML-inspired node type notation at the bottom of the chart. This notation uses UML class diagrams to express node types (bold font, shaded background) and Mixins (italic font, white background). Node types can have three types of relations: inheritance, containment and reference. For inheritance the default solid line with a hollow triangle arrowhead at the super type is used. For child nodes and associations a basic "association" line without arrowheads is used. For the cardinality of relationships: as there is only one parent node or referencing node, only the cardinality indicator at the child or referenced node type is used. Here we use a simple-regular-expressions inspired syntax where * means: any number of node, + means at least one node, n means exactly n nodes, and so on.
Using a dotted line you can map node types to places in the treemap where this node type can be used.
To sum it up, the proposed notation is a tool that helps understanding and communicating content-centric software systems. It is not intended to be used to automatically generate code or to be generated automatically from code, instead it is a second description of your software system that lives beside the code of your system (as the primary description) and is suited for technical communication with humans.