Reviewing code quality of Apache Sling using Sonar

This is a cross-post of Freddy's analysis at the Sonar site. We use Sonar internally at Day to track and improve the quality of all our software. Also check out Nemo which is Sonar's platform for analysing various other FOSS projects.


A few weeks ago Michael Marth, who runs dev.day.com (Day’s developer portal), asked us if we could put together our impressions on the code quality of Apache Sling using Sonar. We thought it would be valuable to share the result of this exercise with the community.

Apache Sling in a few words

“Apache Sling is an innovative web framework that is intended to bring back the fun to web development. It uses all those nice cool and new technologies that make up a state-of-the-art framework. This is Apache Sling in five bullets:

  • REST based web framework

  • Content-driven, using a JCR content repository

  • Powered by OSGi

  • Scripting inside, multiple languages

  • Apache Open Source project

Some size indications of the project
  • 40 Maven modules

  • 70,707 lines of code

  • 731 Java classes

  • and 23,043 lines of Javadoc

The strengths in terms of quality
  • A project that you get and compile with no difficulty by running two commands:
    1. svn checkout https://svn.apache.org/repos/asf/sling/trunk/
    2. mvn clean install
    This sounds like an evidence but is not always the case :-)

  • Amongst 130,172 physical lines, only 0.9% are involved in a duplication

  • 46.4% of public API are commented with a Javadoc block

The weaknesses
  • Only 9% of the source code is covered by 338 unit tests

  • Average cyclomatic complexity by method (excluding getters and setters) is greater than 3 (3.2).
    That is kind of a warning saying “your methods are taking too much responsibilities and should be re-factored”. This warning is confirmed by others metrics : 394 methods have a complexity greater than 7 and 86 methods have more than 50 statements. What is true at method level gets also partially confirmed at class level as 60 classes have a Fan Out Complexity greater than 20 (The number of other classes referenced by a class)

Bad programming practices that should be improved
  • 198 times, method parameters are reassigned in the core of the method

  • 68 times, local variables are defined and hide class fields

  • 28 times, NullPointerException are thrown when an IllegalParameterException would be more suitable

Potential bugs that should be quickly analyzed
  • Correctness - An apparent infinite recursive loop : there is an apparent infinite recursive loop in org.apache.sling.scripting.jsp.jasper.runtime. JspContextWrapper.include(String, boolean)

  • Multithreaded correctness - Unsynchronized get method, synchronized set method : org.apache.sling.scripting.jsp.jasper.compiler. JspRuntimeContext.getJspReloadCount() is unsynchronized, org.apache.sling.scripting.jsp.jasper.compiler. JspRuntimeContext.setJspReloadCount(int) is synchronized

  • Multithreaded correctness - Method calls Thread.sleep() with a lock held : org.apache.sling.event.impl. JobEventHandler.runJobQueue(String, JobBlockingQueue) calls Thread.sleep() with a lock held

  • Malicious code vulnerability - Field is a mutable array : org.apache.sling.jcr.webdav.impl.servlets. SlingWebDavServlet.COLLECTION_TYPES_DEFAULT is a mutable array

This analysis was done with the intention of giving a synthetic overview of the current state of the project. Where should you start from if tomorrow you wake up with a single idea in mind : “Improving quality of the Apache Sling project !” ?

  • With respectively a cyclomatic complexity of 428, 385 and 343, classes Generator, Parser and XMLEndoginDetector should be first refactored. With no surprise, the Generator.java file has the greatest number of duplicated lines (154) and rules violations (109)

  • With its 43 cyclomatic complexity and no unit tests, the method ModifyAceServlet.handleOperation(..) is what we call “a crappy method” :-)

More information on the code quality of the project is available on Nemo.

[LOTD] Content Structure in a CMS

Via Seth Gottlieb I have found this really good presentation on content modelling for content management systems by Deane Barker:

Cleve Gibbon's recent well-written series on content modelling is closely related to Deane's presentation. Make sure to have a look if you are interested in that area.

I really like Deane's presentation, at least when I look at it from the paradigm that proper content management needs a-priori content modelling. Lately, I have come to question this idea, but that shall be the topic of a different post.

Jazoon 09 slides

The slides shown in my Jazoon talk are now online:

Jazoon 09 Slides

In case you missed the Day talks at Jazoon conference please find the slides below:

Thomas Mueller: Testing Zen
View more documents from day.


Thomas Mueller: Java Persistence Frameworks
View more documents from day.


Michael Marth and Michael Dürig: Scalable Agile Web Development: REST meets JCR meets OSGI
View more presentations from mmarth.


Michael Dürig and Michael Marth: Building RESTful Web Applications with Scala for Sling
View more presentations from day.


[ANN] Upcoming Cloud-Computing Events (July 7 and 9)

The introduction of the simplified clustering in CRX 1.4.1 (back in January 2009) kick-started the efforts to make CRX and CQ5 easily deployable into the Amazon cloud computing infrastructure.

A lot happend since then - most importantly - we got in touch with customers that have exactly the challenges we were looking to solve with cloud-based deployments. One of them being the ability to scale infrastructure for peak usage, without the cost of running all servers all the time. Of course, to set a good example, we run our own sites - such as www.day.com - on EC2.

Next week, we would like to give you an update of what we did so far and share the plans for the future.

On July 7th (Tuesday) we have a half day seminar in London. Sarah Burnett from Butler Group and myself will discuss the advantages and use cases to make best use of cloud computing infrastructure. This is a great event to get yourself familiar with the cloud computing topic and learn how you can apply that to your content management initiatives. Join the free seminar in London. Sign-up here.

On July 9th (Thursday) I'm going to broadcast my speech and discussions from London via a Webinar using WebEx. Feel free to sign-up to get the details to join the Webinar.

Personally, I believe the most exciting part of the cloud computing era are the new ways to solve challenges accepting the fact that almost unlimited computing resources are at your disposal (at a fairly decent price).

Scala for Sling @ Jazzon 09


Yesterday I gave a presentation at Jazoon 09 about using Scala for scripting RESTful web applications with Apache Sling.

In the session I showed how to take advantage of Scala to create RESTful web applications with Apache Sling. I demonstrated how to uses its DSL capability and support for XML literals to create type safe web site templates. In contrast to conventional web site template mechanisms (e.g. JSP), this does not rely on a pre-processor but rather uses pure Scala code.

There are Session slides and support material available here. The support material contains a fully workable demo application. A Scala scripting bundle for Sling is also included.

Posted in Uncategorized Tagged: JCR, Scala, Sling

Puzzle: implement this (solution)


Well, I wasn’t aware of Ticket #1737 when I was trying to find a solution to the problem from my previous post. Thanks to Jorge Ortiz for pointing this out. However, I reviewed my approach to solving this and didn’t find sever limitations. Maybe someone else does…

When I initially stumbled on this, I remembered that existential types where introduced into Scala for coping with Java’s raw types. But there is an additional twist here, we need to tell the compiler that our MyIterator implementation actually ‘is an instance of a raw type’. So combining existential types with self types led me to the following solution:

class MyIterator extends Iterator2 {
  this: java.util.Iterator[_] =>
  def hasNext = true
  def remove = throw new Error
  def next = "infinity"
}

We can now safely use instances of MyIterator.

  def test1(it: MyIterator) = {
    println(it.next)
  }

  def test2(it: java.util.Iterator[_]) = {
    println(it.next)
  }

  val it = new MyIterator
  val v: String = it.next
  println(v)

  test1(it)
  test2(it)

The approach using existential types in combination with self types makes sure that values returned from the next method always are typed correctly.

Posted in Uncategorized Tagged: Puzzle, Scala

Jazoon talk on "Scalable Agile Web Development"



On Thursday, I will give a talk at the Jazoon conference in Zurich. It will be about Apache Sling, the web framework for content-centric applications. The agenda is:

Scalable Agile Web Development: REST meets JCR meets OSGI

This session is a very hands-on lab that shows how a real web application is developed from scratch in a very agile fashion leveraging a heavy-weight enterprise ready back-end yet allowing for unprecedented agility in development in building rest-style web applications. Thinking of a classic j2ee stack this may sound like a contradiction.

Agility of development begins with the amount of tooling and setup we need to get started, so expect to see the entire walk-through from installation of the server software to the development of a complete application within the time constraints of the session.

Agenda:
(1) Web architecture, think outside the box.
(2) Meet: apache sling.
(3) Building a real-life webapp from scratch.

The full conference agenda is here. I shall also help Michael Dürig with his session on Scala and Sling.

[ANN] Day engineers at the OSGi DevCon (updated)

Starting today the conference OSGi DevCon Europe 2009 takes place in Zurich (in association with the Jazoon conference). Two talks will be given by Day's OSGi experts:
Felix Meschberger: Declarative Services: Dependency Injection OSGi style

2009-06-22, 11:20

Applications in general and OSGi applications in particular consist of a host of different modules and services which need to be bound together to form the actual application. In a traditional application services are generally bound by calling factories or instantiating the service classes or accessing a registry of services. In recent years a new buzz-word entered the arena: Dependency Injection. With dependency injection services are provided to the service clients as they become available. Likewise configuration is injected into the services, that is services do manage their configuration themselves. The OSGi specification for dependency injection is the Declarative Services specification: The components are declared and indicate what services they use and require and may in addition be provided with configuration. This talk shows the benefit of using Declarative Services and how the Apache Felix Maven SCR Plugin simplifies the service declaration even more.

Update: find Felix' slides below



Betrand Delacretaz: Tales from the OSGi trenches

2009-06-22, 14:60

In this talk we share our experience of using OSGi for a major rewrite of Day's family of content management products. After more than two years working with OSGi, the impact on our products, developers, customers and service people is very high, in a positive way. OSGi is no silver bullet either. The extreme modularization and dynamic service deployment features of OSGi make our products much more robust and maintainable, but the costs associated with changing people's way of thinking about code and modules, and with testing and debugging highly dynamic systems, must not be underestimated. Based on real-life code samples, we will show how OSGi is used at several levels in our products, from low-level interactions with the framework to very simple creation of (compiled or scripted) services. We will also present some of the automated testing techniques used in our project. Sharing our experience will help you decide if OSGi is for you, and more importantly at which level you should use it.

Puzzle: implement this


This is something I stumbled on recently when trying to implement javax.jcr.NodeIterator in Scala.

Assume you are using a library which exports an Iterator2 interface:

public interface Iterator2 extends java.util.Iterator {}

Note that Iterator is a raw type and Iterator2 does not take any type parameters. So how would you implement Iterator2 in Scala?

Here is a start:

class MyIterator extends Iterator2 {
  def hasNext = false
  def remove = throw new Error
  def next: Nothing = throw new Error
}

But if the next method should return an actual value, what would be it’s return type? It turn’s out that any other type than Nothing results in a compiler error:

error overriding method next in trait Iterator of type ()E;
method next has incompatible type ()Any

So how would you implement Iterator2?

Posted in Uncategorized Tagged: JCR, Puzzle, Scala

links for 2009-06-18

links for 2009-06-17

Sling graduates from Incubator

Excellent news: Apache Sling has just graduated from the Apache incubator and is now a top-level project. ASF Board member and Sling committer Bertrand Delacretaz announced the news on the Sling mailing list:

I'm pleased to announce that our Board of Directors, at yesterday's meeting, approved the graduation of Sling as a top-level project. I abstained from that vote (as working on Sling is part of my job), so it's not my fault ;-)

Felix Meschberger is the chair of the new PMC, composed of

* Alexandru Popescu <apopescu>
* Bertrand Delacretaz <bdelacretaz>
* Christophe Lombart <clombart>
* Carsten Ziegeler <cziegeler>
* Felix Meschberger <fmeschbe>
* Gianugo Rabellino <gianugo>
* Padraic Hannon <hannonpi>
* Juan José Vázquez Delgado <juanjo>
* Karl Pauls <pauls>
* Vidar Ramdal <vramdal>

Congratulations! The graduation is well deserved.

[ANN] Meet us at Jazoon 09

This year's Jazoon conference is to take place next week in Zurich, Switzerland. Some of the presentations will given by speakers from Day. Hope to see you there:

Thomas Mueller: Testing Zen

Technical short talk, Tuesday, 2009-06-23, 16:30

Test driven development not only improves code quality, it also let's you refactor or replace legacy source code with little risk. Testing also saves time because the earlier a problem is found, the faster and easier it can be corrected.

Unfortunately writing tests is boring. There are many possible use cases that should be tested. Writing tests for every feature is a lot of work, and manually writing tests for all possible combinations is almost impossible.

However with fuzz testing (randomized testing), you don't need to write much, and still get very good results. This talk explains how a typical randomized test looks like, and how to combine it with other techniques such as unit tests, integration tests, and measuring code coverage.

Automated unit tests should run quickly. Many applications use a database, and in many cases this database is the bottleneck when running tests. This talk shows how an embedded in-memory Java database speeds up your tests, and can speed up development as well.

Thomas Mueller: Java Persistence Frameworks

Technical short talk, Wednesday, 2009-06-24, 16:00

In Java, there are many ways to access relational databases. Today, there are almost as many persistence frameworks as there are web frameworks.

Some frameworks are based on a standardized interface like JDO and JPA, some have their own API. Some just simplify using JDBC and SQL. Most frameworks need XML configuration, but not all of them. Some provide a string based query language similar to SQL, while some newer ones don't.

Todays main technologies are:

- SQL using and the JDBC API
- JPA (specially Hibernate, JPOX, OpenJPA)
- Apache iBATIS
- JDO (JPOX)
- Apache Cayenne
- Apache Commons DBUtils

We list the market share of each technology, and discuss the key differences.

Last year a whole new breed of persistence frameworks appeared: Frameworks with a fluent API and an integrated DSL (domain specific language); frameworks that don't need strings for dynamic queries. This new technology was strongly influenced by Microsoft's LINQ. Those frameworks support compile-time type-checking, auto-complete in the IDE, and protect against code injection. They better bridge the "object-relational impedance mismatch" than older frameworks, but they are not ready for prime time yet.

SQL injection is the biggest server-side security vulnerability today. SQL injection is a subset of code injection. Unfortunately, most persistence frameworks don't protect against code injection. The talk gives examples how to inject code and how to protect against it.

Michael Marth: Scalable Agile Web Development: REST meets JCR meets OSGI

Technical long talk, Thursday, 2009-06-25, 13:30

This session is a very hands-on lab that shows how a real web application is developed from scratch in a very agile fashion leveraging a heavy-weight enterprise ready back-end yet allowing for unprecedented agility in development in building rest-style web applications. Thinking of a classic j2ee stack this may sound like a contradiction.

Agility of development begins with the amount of tooling and setup we need to get started, so expect to see the entire walk-through from installation of the server software to the development of a complete application within the time constraints of the session.

Agenda:
(1) Web architecture, think outside the box.
(2) Meet: apache sling.
(3) Building a real-life webapp from scratch.

Michael Dürig and Michael Marth: Building RESTful Web Applications with Scala for Sling

Jazoon Cutting-Edge, Thursday, 2009-06-25, 16:00

In this session we demonstrate how to build RESTful web applications for the Sling framework using the Scala programming language.

Apache Sling is a web application framework which eases development of content centric applications. Sling is based on REST principles and uses a JCR content repository (JSR-170/JSR-283) for storage. Based on the JSR 223 specification (Scripting for the Java Platform) it integrates various scripting languages as OSGi bundles.

Scala is a scalable programming language for the JVM which is fully interoperable with Java. It is designed to express common programming patterns in a concise, elegant, and type safe way. Scala smoothly bridges the gap between object oriented and functional paradigms. Despite being strongly typed, Scala has the touch and feel of a genuine scripting language. It has the ability to infer types of expressions rather than relying on the programmer to explicitly declare them. Scala thus combines the best of the two worlds: flexible scripting and strong tool support e.g. documentation, safe refactoring and fail fast compilation. Its flexible syntax lets programmers easily define their own internal DSLs, effectively extending the language without leaving it.

In our session, we show how to take advantage of Scala to create RESTful web applications with Sling. We use its DSL capability and support for XML literals to create type safe web site templates. In contrast to conventional web site template mechanisms (e.g. JSP), we do not rely on a preprocessor but rather use pure Scala code.

Session outline:
- Short introduction to Apache Sling. Just enough to get everyone started.
- Short overview of Scala and its relevant features.
- Demonstration of how Scala can be used with Sling to create RESTful web applications.
- Q&A

links for 2009-06-16

[LOTD] Spring JCR Extension 0.9

Salvatore Incandela, committer of the Spring extension for JCR, left a comment on a previous post on dev.day.com where he announces the release of version 0.9. Since many of Day's customers and partners use Spring (and JCR), I am happy to know that the extension is actively maintained.

(thanks for the pointer, Salvatore)

The single mailing list dream


The ASF uses a (way too) large number of mailing lists for all its internal and project communications.

Having crosscutting discussions is quite hard – for example, many projects use OSGi these days, and the only way for them to share their OSGi experience would be to create yet another list, or to subscribe to all of each other’s lists, which means a lot more traffic to manage.

One of my current technical dreams is to have a single list for all of the ASF, using tags to define the audience and visibility of messages – a la Twitter hashtags.

A message about the maven-scr-plugin on the Sling list, for example, would be tagged

#sling #osgi #maven-scr-plugin #scr #public

so that people subscribing to the #osgi and #scr tags, for example, would see it.

Another obvious use case is to easily ignore all discussions about a given topic (like #budget maybe? ;-), in a reliable way and without losing other communications within the same group.

I’m not sure how to implement this today (particularly the access control part for things like the #asf-private tag), but that would in my opinion be a huge improvement on what we have now.

Note on nodes and files

Reading "Content Integration Standards -- CMIS, JSR-170, JSR-283" I stumbled over the quote

The conclusion was that JSR-170 is not the best fit for WCM with its nodes and properties equaling to folders and files. In Devnani’s opinion, it fits document management better.

Apparently, there is some confusion around what nodes and properties are in a Java Content Repository. Let's shed some light on this:

First of all, there are two specific node types defined in JSR-170: files and folders. These are node types that behave as you would expect it from a file system. For example, the node type "file" has specific properties for the creation date and the binary stream.

If your content is nothing but files and folders (for example because your application is document management) you would use these node types. As such, in Java Content Repositories files, folders and document management-type applications are special cases of a more generic case.

This more generic case are node types that have different properties than the ones described above. In the most flexible case "unstructured node type" the node can accomodate for any string, date, binary, boolean, etc property. One node could for example have two binary stream properties (thumbnail and full-length). This covers exactly the use case web content management where content is stored on a fine-granular model. The fine-granular model enables the presentation layer to do things like displaying an article in teaser format.

In summary the situation is like this:

  1. Files and folders are special node types in JCR.

  2. On a modelling level DM is a special case of WCM (I recognize that on a application requirements level DM and WCM diverge).

As a consequence I would rewrite the quote from above as:

JSR-170 is a very good fit for WCM because the nodes and properties store content in a fine-granular way. Since folders and files are just special node types it can be equally well used for document management.

The Jackrabbit Wiki has a list of JCR applications that are either in the content management or the document management domain. As a side-note: JCR is flexible enough to provide infrastructure for completely different application domains as well, here is a list of some of them.

Slick JCR Explorer based on JQuery and Sling

Renaud Richardet (who won a price in last year's JCR cup) has posted a very slick repository explorer to the Sling Jira. It is based on JQuery and has the usual explorer features like CRUD for nodes and properties. However, what is really neat: the actual code is about 100 lines of Javascript (including blank lines).

In case you want to check it: copy the files into /apps/sling and make sure that the mime types of the html files are correct (e.g. by fixing the property using CRX Explorer :). Then point your browser to http://localhost:7402/apps/sling/servlet/default/explorer.esp (the port depends on your installation, of course).

Commits per weekday and hour


The punchcard graphs at Github are a nice way to quickly detect the rough geographical distribution (or nighttime coding habits) of the key contributors of an open source project. Here’s a few selected examples from the ASF.

Apache HTTP Server

Apache HTTP Server

Apache Maven (core)

Apache Maven

Apache Jackrabbit

Apache Jackrabbit

IKS Update: Requirements Workshop

Last week the first requirements workshop of the IKS project (Interactive Knowledege Stack) has taken place. Bertrand has described the project's setup and goals in a previous post, but in a nutshell:

The goal of this integrating project, partly funded by the European Commission, is to create an open source technology platform for semantically enhanced content management systems.

The purpose of this workshop was to identify CMS use cases out of this high-level goal. The project consortium consists of academic institutions and commercial CMS vendors that contribute to open source CMSs like Day or the companies contributing to Midgard or OpenCMS. However, for this task (and the public discussion to follow in the future) the project's consortium members were joined by numerous representatives of various open source content management systems. For example, there were John Norman of Sakai, Jahia's Stephane Croissier, Justin Cormack of squiz, one of the original Joomla founders Johan Janssens, Plone's Raphael Ritz, and Arne Blankerts of fCMS to name only a few (here's the complete list). It is rare to see that much CMS competence in one room.

The Salzburg Research team did a fantastic job at moderating the various thoughts and ideas about use cases and requirements. As a consequence there are now a number of projects already on the way:

  • A semantic search engine proposed by Bertrand. Amongst other things this search engine will be helpful to benchmark the data generated by our CMSs

  • A common ontology for CMSs

  • Henry Bergius has suggested to implement a semantically enhanced rich text editor (and will lead the project). Think "insert person" instead of "insert link".

I was also impressed by the demos shown by SRDC. A semantically enhanced search engine was demonstrated to find documents containing the text "Angela Merkel" when being queried "German Chancellor". DKFI showed a newsroom type application where search facets where generated out of the news item's extracted information.

If you are interested in following this project sign up to the IKS community mailing list. The discussions on use cases or requirements are still in full swing so your voice will be heard.

More impressions about the workshop have been blogged by Bertrand, Stephane Croissier, Henry Bergius has Quaiku'ed the workshop, the Twitter hashtag is #iks-project and Salzburg Research has written about it here.

Back from a great IKS project meeting


I’m on my way back from Salzburg where the Salzburg Research team organized a great meeting for the IKS project. Flawless organization as usual, thanks and congrats!

Today’s requirements workshop featured an impressive collection of very powerful brains (and nice people to hold them ;-) including, besides the usual IKS suspects, representatives from more than twenty CMS communities and companies.

I was a bit worried at first that IKS, being mostly in a requirements definition phase, didn’t have much to show to those people, but today’s brainstorming went very well, and the results exceed my expectations.

The most important result for me is agreeing to setup a prototype semantically enhanced search engine, that will use metadata and RDFa embedded in web pages to index content. This will provide the IKS community with a testbed for semantically enhanced websites, and allow us to demonstrate the usefulness of embedded semantic information by making full use of that for searching instead of just enhancing the display of search results. The extracted data might also be very useful for our academic partners to run experiments on real-life data that we’re familiar with. We might not have to write lots of code to setup such a search engine, but it’s important to have our own thing that people can also run behind firewalls, if needed to run experiments on private data.

The second result that I’m excited about is agreeing to work together on a prototype of a semantic rich text content editor, where you’ll get functions like insert person or insert company besides the usual insert link and insert image functions. This will allow us to start making our customers more aware of the importance of semantic markup, in a way that’s not too different from what they’re doing now.

Last but not least in my list of results-that-got-me-excited-about-all-this is agreeing on the creation of a list of simple user stories that demonstrate what IKS is about, in a very simple and understandable way, while allowing us to define use cases and features that might be challenging to implement today.

More complete information about the meeting should be available from the IKS project blog in the next few days, make sure to subscribe to that. For now Bergie (who suggested the semantic editor project) has been taking notes on Quaiku if you’re eager to learn more.

To take part in (or just follow) these projects, subscribe to the IKS mailing list which is going to be our communicatios hub.

Hope to see you there – in a week from now, as next week is my cycling-in-France/offline holiday. Looking forward to getting more familiar with the 29er before the next, more off-road trip in a few weeks.

JCR - CMIS comparison

On the CQ5 tour stop in Milan I had the pleasure to talk about CMIS and JCR for the first time since I am serving as the official JCR / CMIS Liaison.

This brought the opportunity to compare and differentiate the two efforts for with a unique legitimacy. There seems to be a desire to discuss the relationship between CMIS and JCR similar to the desire to discuss JCR and WebDAV or, more recently, JCR and Atom. So far, JCR and CMIS have been compared in what I would call "a less educated fashion" for example in the CMIS v0.5 spec draft. Most of those comparisons have been corrected in the meantime.

Feel free to view the entire slideshow here, but here are the main points:

API vs. Protocol

JCR specifies an API (Application Programming Interface) while CMIS specifies two protocol bindings. Much like the Servlet API in Java and the HTTP protocol are complementary this is also the case for JCR and CMIS. Similar arguments have been made for Atom and WebDAV. JCR and CMIS are complimetary in this aspect.

Focused Model vs. Generally Applicable Model

JCR specifies a very general model based Node and Properties that lends itself to the implementation of specific domain models. On the hand, CMIS specifies such a specific domain model for document management.

The CMIS domain model can easily be implemented in a JCR model (see next point). Some people could say that if one were to implement CMIS from scratch a JCR repository would be the ideal starting point as it provides the perfect infrastructure to do that.

I think it is one of the most important assets of CMIS that it exposes the domain model of document management. It will be helpful in further standard discussions to have an established consensus on the domain model amongst the document management vendors.

Every JCR repository is a CMIS repository

Based on this type of compatibility between CMIS and JCR Apache Chemistry (currently in incubation) implements CMIS on top of JCR (amongst other things). This turns every JCR compliant repository into a CMIS compliant repository without any development effort. Even better news are that Chemistry gets bootstrapped with numerous existing JCR repositories and connectors. These repositories can thus expose CMIS right from the start.

Interop vs. Infrastructure

Something that I learned during the years of specifying JCR: There is always a tension in a specification to address both "interop" and "infrastructure" needs. Interop enables different repositories to be compatible on some level, whereas infrastructure provides users with a platform to build upon.

There were tendencies in the JCR expert group to support users of the API in a way that they could build real-life applications. Hence, the infrastructure aspect was always a very important aspect. Because of that, the least common denominator aspects that come with "interop" were only a part of the equation.

For CMIS, offering general purpose infrastructure is a stated non-goal. CMIS is only concerned with "interop".

Consequently, JCR is very successful with the number of users and applications built on top of JCR, while I believe that it is the goal of CMIS to be very successful with number of implementations.

In terms of perception I think CMIS reduces the expectation on JCR to be a least common denominator interop spec. I welcome that because both specification efforts will be able to evolve in a more agile fashion when they focus on "interop" and "infrastructure" needs respectively.

In summary, I am very excited to have a specification both on the protocol and on the API level addressing the needs of a more open standards based content management landscape.

[ANN] Public CMIS server at cmis.day.com

You might be aware that we are running a public instance of CRX (Day's JSR-170 compliant repository) at http://jcr.day.com (the login is admin/admin). As of today there is also the CMIS interface (Atom binding) to that repository publicly available at http://cmis.day.com. The underlying code is the version from the CMIS plugfest, but expect frequent updates. Feedback is appreciated (either as a comment on this post or to mmarth (at) day (dot) com).

Would you trust a pirate?


Apparently they’re now setting up a Pirate Party also in Finland. I guess it’s good to have a political force that questions the appropriateness of traditional copyright in the digital world. However, as a knowledge worker I’m not that excited about drastic changes in the protection of immaterial rights.

Anyway, my appreciation for the movement in Finland went down considerably when I saw their spokesman in the news today. When asked about the main goals of the new party he only mentioned freedom of speech and protection of privacy. Did he just forget the massive overhaul of copyright and patent laws that they’re primarily after?

Wrangling mime.types

One of the chores that I do for the Apache HTTP server project, every three months or so, is to slog through the IANA media type registry to see what new media types have been registered and add them to the mime.types configuration file. This is one of the few things I do that is almost all pain for little or no gain. It takes hours to do it right because IANA has gone out of their way to make the registry impossible to process automatically via simple scripts. I don’t even get the pleasure of “changing the world” in some meaningful way, since Apache doesn’t update mime.types automatically when installed to an existing configuration.

BTW, if you are responsible for an existing Apache installation, please copy the current mime.types configuration file and install it manually — your users will thank you later not gripe as much about unsupported media types.

IANA is a quaint off-shoot of the Internet Engineering Taskforce that, much like the IETF, is still stuck in the 1980s. One would think that, given a task like “maintain a registry of all media types” so that Internet software can communicate, would lead to something that is comprehensible by software. Instead, what IANA has provided is a collection of FTP directories containing a subset of private registry templates, each in the original (random) submitted format, and nine separate inconsistently-formated index.html files that actually contain the registered types.

The first thought that any Web developer has when they look at the registry is that it should be laid out as a resource space by type. That is, each directory under “media-types” would be a major type (e.g., application, text, etc.) and then each file within those directories would correspond to exactly one subtype (e.g., html, plain, csv, etc.). Such a design would be easy to process automatically and fits with the organization’s desire to serve everything via both FTP and HTTP. Sadly, that is not the case. Most of the private registrations have some sort of like-named file within the expected directory to contain its registration template, but the names do not always correspond exactly to the subtype and the contents are whatever random text was submitted (rather than some consistent format that could be extracted). What’s worse, however, is that the standardized types do not have any corresponding file; instead, the type’s entry in the index may have some sort of link to the RFC or external specification that defines that type.

grumble

The second thought of any Web developer would be “oh, I’ll just have to process the index files to extract the media type fields.” Good luck. The HTML is not well-formed (even by HTML standards). It uses arbitrarily-created tables to contain the actual registry information. There is no consistency across the files in terms of the number of table columns, nor any column headers to indicate what they mean. There is no mark-up to distinguish the registry cells from other whitespace-arranging layout cells. And the registered types are occasionally wrapped in inconsistently-targeted anchors for links to the aforementioned template files.

grumble GRUMBLE

Okay, so the really stubborn Web developers think that maybe a browser can grok this tag soup and generate the table in some reasonably consistent fashion, which can then be screen-scraped to get the relevant information. Nope. It doesn’t even render the same on different browsers. In any case, the index files don’t contain the relevant information: the most important information (aside from the type name) is the unique filename extension(s) that are supposed to be used for files of that type. For that information, we have to follow the link to the registry template file, or RFC containing one or more template files, and look for the optional form field for indicating extensions. Most of the time, the field is empty or just plain wrong (i.e., almost all XML-based formats suggest that the filename extension is .xml, in spite of the fact that the only reason to supply an extension is so that all files of that extension can be mapped to that specific type).

sigh

And, perhaps the most annoying thing of all: the index files are obviously being generated from some other data source that is not part of the public registry.

Normally, what I am left with is a semi-manual procedure. I keep a mirror of the registry files on my laptop and, each time I need to do an update, I pull down a new mirror and run a diff between the old and new index files. I then manually look-up the registry template for file extensions or, if that fails, do a web search for what the deployed software already does. I then do a larger Web search for documentation that various companies have published about their unregistered file types, since I’ve given up on the idea that companies like Adobe, Microsoft, and Sun will ever register their own types before deploying some half-baked experimental names that we are stuck with forever due to backwards-compatibility concerns.

Unfortunately, yesterday I messed up that normal procedure. I forgot that I had started to do the update a month ago by pulling down a new mirror, but hadn’t made the changes yet. So I blew away my last-update-point before doing the diff.

groan

After reliving all of the above steps, I ended up with a new semi-manual procedure:

wget -m ftp://ftp.iana.org/assignments/media-types/
cd ftp.iana.org/assignments/media-types
foreach typ (`find * -type d -print`)
   links -dump $typ/index.html | \
      perl -p -e "s|^\s+|$typ/|;" >> mtypes.txt
end
# manually edit mtypes.txt to remove the garbage lines
foreach typ (`cut -d ' ' mtypes.txt`)
   grep -q -i -F "$typ" mime.types || echo $typ
end

That gave me a list of new registered types that were not already present in mime.types. I still had to go through the list manually, add each type to its location within mime.types, and search for its corresponding file extension within the registry templates. As usual, most of the types either had no file extension (typical for types that are only expected to be used within message envelopes) or non-unique extensions that can’t be added to the configuration file because they would override some other (more common) type.

Please, IANA folks, fix your registries so that they can be read by automated processes. Do not tell me that I have to write an RFC to specify how you store the registry files. The existing mess was not determined by an RFC, so you are free to fix it without a new RFC. If you have software generating the current registry, then I will be more than happy to fix it for you if you provide me with the source code. At the very least, include a text/csv export of whatever database you are using to construct the awful index files within the current registry.

Why am I bothering with all this? Because media types are the only means we have for an HTTP sender to express the intent for processing a given message payload. While some people have claimed that recipients should sniff the data format for type information, the fact is that all data formats correspond to multiple media types. Sniffing a media type is therefore inherently impossible: at best, it can indicate when a data format does not match the indicated media type; at worst, it breaks correct configurations and creates security holes. In any case, sniffing cannot determine the sender’s intent.

The intent can only be expressed by sending the right Content-Type for a given resource. The resource owner needs to configure their resource correctly. Even though Apache provides at least five different ways to set the media type, most authors still rely on the installed file extension mappings for representations that are not dynamically-generated. Hence, most will rely on whatever mime.types file has been installed by their webmaster, even if it hasn’t been updated in ten years.

How old is your mime.types file?

[ANN] New Release of Apache Sling 5

Cross-post of Carsten's announcement on the Sling list

Bringing Back the Fun - Reloaded

Apache Sling brings back the fun to Java developers and makes the life of a web developer much easier. It combines current state of the art technologies and methods like OSGi, REST, scripting, and JCR.

The main focus of Sling deals with the important task of bringing your content into the web and providing a plattform to manage/update the content in a REST style.

Sling is built into OSGi bundles and therefore benefits from all advantages of OSGi. On the development side a scripting layer (using Apache BSF) allows to use any scripting language with Sling (of course you can use plain old Java, too). And on top of this, Sling helps in developing an application in a RESTful way.

As the first web framework dedicated to JSR-170 Java Content Repositories, Sling makes it very simple to implement simple applications, while providing an enterprise-level framework for more complex applications. Underneath the covers Apache Jackrabbit is used for the repository implementation.

Download the new release, Apache Sling 5, today and give it a try!

Apache Sling currently comes in four flavors:

  • A standalone application (a jar containing everything to get started with Sling)

  • A web application (just drop this into your favorite web container)

  • The full source package (interested in reading the source?)

  • Maven Artifacts (available through the Apache Incubator Repository)

For more information, please visit the Apache Sling web site at http://incubator.apache.org/sling
or go directly to the download site at
http://incubator.apache.org/sling/site/downloads.cgi

The Apache Sling Community

Sling, POST and Extensions

Along the way of implementing a Trackback feature on Apache Sling I stumbled across a few details that I would like to share. Maybe, it saves other Sling dabblers a minute or two.

Trackback is a specification developed by SixApart for pings commenting or referring across different websites. Wikipedia defines it like this:

Trackbacks are used primarily to facilitate communication between blogs; if a blogger writes a new entry commenting on, or referring to, an entry found at another blog, and both blogging tools support the TrackBack protocol, then the commenting blogger can notify the other blog with a "TrackBack ping"; the receiving blog will typically display summaries of, and links to, all the commenting entries below the original entry. This allows for conversations spanning several blogs that readers can easily follow.

Trackbacks work via http POSTs and require a specific response. For the latter reason Sling's DefaultPostServlet cannot be used, but a custom servlet must be implemented (this has the implicit benefit that further custom logic like sping detection can easily be placed in the servlet).

A good starting point for looking up how to implement your own Sling servlet is the test servlets in the Launchpad. The Maven plugin and annotation make it easy to specify for which node types, selectors, or extensions the servlet shall be invoked, e.g.:

/** Example/test Sling Servlet registered with two selectors
 * 
 * @scr.component immediate="true" metatype="no"
 * @scr.service interface="javax.servlet.Servlet"
 * 
 * @scr.property name="service.description" value="Default Query Servlet"
 * @scr.property name="service.vendor" value="The Apache Software Foundation"
 * 
 * Register this servlet for the default resource type and two selectors:
 * @scr.property name="sling.servlet.resourceTypes"
 *               value="sling/servlet/default"
 *               
 * @scr.property name="sling.servlet.selectors"
 *               values.1 = "TEST_SEL_1"
 *               values.2 = "TEST_SEL_2"
 *                
 * @scr.property name="sling.servlet.extensions"
 *               value = "txt"
*/

So, that is the easy part. However, here are the above-mentioned gotchas you should be aware of:

  • For registering the servlet for the node type nt:unstrutured the corresponding annotation's value must be nt/unstructured (well, this is obvious from the example above, but I did not see it right away).

  • When the method is POST extensions are ignored. Selectors work though.

  • However, POSTs to mynode.myselector will not work. You need to POST to mynode.myselector.html (no matter what mime type you return). Therefore, you might want make sure that you explicitly set the response type: e.g. response.setContentType("text/xml")

  • Last, if the Sling instance you happen to use is a CQ5 author system be aware that the default ACLs do not grant read access for anonymous users. Therefore, you need to add credentials to your POST in that case.

[LOTD] Jackalope and OSGi-centric Sling tutorial

Two noteworthy new items in the blogosphere:

  1. The Jackalope project (a JCR implementation for PHP) I mentioned previously is now up to three team members - see their announcement and the enhanced project description on the Liip blog.

  2. Aaron Zeckoski of the Sakai project has written a beginner's introduction to Apache Sling that takes a different approach than usual - rather than the RAD aspects of Sling it focuses on OSGi.

links for 2009-05-10

Midgard: Where it all began


On Friday we celebrated the tenth anniversary of the Midgard project. The celebration took the form of a very nice gala evening with good food and drinks with live music, show and of course some speeches. I was asked to deliver a few words about how it all began for Midgard.

Here’s my speech, reconstructed from my draft notes and edited for the web audience:

We were a group of teenagers and young adults doing historical re-enactment and live action role playing games. One evening in early -97 we were sitting in a bus, returning from the woods with all our viking gear on. Bergie said to me: “Hey Yaro”, as I was known as Yaroslav at the time. “Hey Yaro”, he said, “you’re over 18 and you have a drivers license. Would you like to take a dozen teenagers to a trip to Norway and back?” Even back then Bergie was the one with big dreams and the power to inspire people. I had the skills required to make those dreams happen but not yet enough experience to tell that we perhaps should think twice. So I just answered: “Sounds cool, let’s do it!” That’s pretty much what happened also with Midgard.

The trip to Norway went well for us and was followed by a number of other adventures. One of them was our quest to build a better web site for our group. It was -97 and the web was booming. The de facto web publishing technology was FTP, that people used to push static HTML to a web server. Geocities was a major cool thing as it allowed you to publish your static HTML for free. We however had bigger plans and our own server running in the closet of a friendly internet company. And we were publishing lots of stuff: news, photos, articles, etc. Quite a few people were actively contributing new content to the web site.

Our first serious attempt at better managing the site was based on technologies called SGML and DSSSL. For the technically minded: nowadays you’d use XML and XSLT for similar tasks. We used this system to “cook” our content into nicely formatted HTML that was then served to the world. It worked pretty well, but was hopelessly too complex for almost all of our contributors. This was a time when people were only just discovering the Internet. Most of our contributors were teenagers who were using the net from libraries or schools. Internet connections with modems were only just finding their ways to normal households. Even FTP was often out of the question, so there was little hope of making the heavy SGML tooling work as well as we’d like.

We wanted a system that could be managed entirely through the browser. Not just the content you saw on the web site, but the layout templates and even the functional code used to list pages or to handle the forms for adding or modifying content. The system should allow you to build an entire web site, including all the administration interfaces, without any other tooling than a web browser. Such systems simply didn’t exist at the time and in fact they’re pretty rare even today.

So we had to build our own system. We looked at a number of potential platforms for something like this, and the LAMP stack seemed like a good fit. Our server already ran Linux and, like pretty much everyone, we used the Apache web server. We hadn’t used PHP or MySQL before, but they were getting some good press and were easy enough to get started with. In fact we hadn’t done much anything when we started: we hadn’t done Apache modules, we hadn’t extended (or even written!) PHP, and at the time I had only read about relational databases. As we used to say: “How hard can it be?” We didn’t know, and so we just did it.

The result of our efforts was called Midgard. We had used it to power our web site for about a year when Bergie was hired to build a new web site for a Finnish tech company. Midgard seemed like a good fit for that need, and we figured that also other people might find the system useful. Open source was cool and we wanted to join the movement so we decided to publish Midgard as open source. After nights spent researching licensing options, writing press releases, creating the project web site and setting up mailing lists and public CVS access we were finally ready to publish Midgard 1.0 to the world. That happened exactly ten years ago.

The 1.0 release was like the Land Rover it was named for. The magnificent car from -62, that we used on many of our trips, was really cool and when it worked, it did so very well. However every known and then it required some “manual help” to get it started or to keep it going. This was also the case for Midgard 1.0. The first external installation that I know of was done on a Solaris platform and required a few days worth of help and patches delivered over the mailing list before it was up and running. Much of that early feedback and experience was reflected in Midgard 1.1 that was our first release that people were actually managing to install and run without direct assistance. That started the growth of the Midgard community.

Meanwhile I had also been hired by the same company where Bergie worked, and much of our work there resulted in improvements to Midgard. Together with the feedback and early contributions we were getting from the mailing lists this made Midgard 1.2 already a pretty solid piece of software. It was fairly straightforward to install (at the standards of the time), it performed well and it had most of the functionality that you’d need to run a moderately complex web site.

And the results were showing. We were getting increasing traffic on the mailing lists, some companies would start offering Midgard support and the number of Midgard-based sites around the world was growing. One of my earliest concrete rewards for doing open source was a bottle of quality whiskey that some Midgard user from Germany sent me with a note saying: “Thanks for Midgard!” The whiskey is long gone, but I still treasure the memory. A few years later Bergie and a few other friends and Midgard developers went on to start their own company based on Midgard. I was tempted to join them, but at the time my life was taking  a different route and I gradually left Midgard to pursue other things.

Seeing the Midgard project take off and build a life of its own has been a very inspiring process for me. Having your first open source project become so successful is pretty amazing and also quite humbling. Looking at all the things Midgard is today fills me with pride of not what I’ve done, but of what you, the Midgard community, have accomplished. Thank you for that. Especially I’d like to thank my long time friend and co-conspirator in starting the Midgard project. Bergie, without your dreams and refusal to take  ”no” as an answer we wouldn’t be here today. Thank you.

CMIS Technical Committee

It has been a while since I have been on a standards committee (the last one was OMTP), but I have now joined the Technical Committee of CMIS: Content Management Interoperability Services. Better interop is certainly something the CMS world is in dire need of.

Faster testing with the Maven CLI plugin


Although it’s not that new, I discovered Don Brown’s Maven CLI plugin only this morning, and played with mojavelinux‘ s enhanced version which supports -D parameters and profiles, among other things.

The great thing is to be able to run a simple test or test -D MyTest command quickly. You first start Maven with mvn cli:execute-phase, which gives you a maven2> command prompt to start Maven lifecycle phases. As Maven is already started, phases run much quicker than when starting from scratch.

In my experiments, the test command ran about five times faster than using mvn -o test, but the difference depends how fast your tests are, of course.

To setup the plugin, I’m adding the following to my settings.xml, so as to not interfere with project’s POMs, as the CLI is more an environment feature than a project thing:

<!--
  mvn settings.xml that enable the CLI plugin described at
  http://tinyurl.com/maven-cli-plugin
  (For example "mvn cli:execute-phase")
-->
<settings>
  <pluginGroups>
    <pluginGroup>org.twdata.maven</pluginGroup>
  </pluginGroups>
  <profiles>
    <profile>
      <id>cli-plugin</id>
      <activation>
        <activeByDefault>true</activeByDefault>
      </activation>
      <pluginRepositories>
        <pluginRepository>
          <id>repository.jboss.org</id>
          <name>JBoss Repository</name>
          <url>http://repository.jboss.org/maven2</url>
        </pluginRepository>
      </pluginRepositories>
    </profile>
  </profiles>
</settings>

Find more info on the mojavelinux page.

Great tool – thanks Don Brown and mojavelinux!

[ANN] Chemistry list added to discussion groups

Just a quick announcement: I have added the mailing list of the Apache Chemistry project to the dev.day.com discussion groups (there is also the CMIS technical committee list). The Chemistry project proposal can be read here.

[LOTD] CMIS Plugfest, JCR and Star Spec Leads

Here's a post-weekend round-up of content-centric news:

David Nuescheler has been awarded as "Star Spec Lead" for his work JSR-170 and JSR-283. He joins this year's other two Star Spec leads Ed Burns of Sun and Mike Milikich of Motorola.

The JBoss DNA project has released version 0.4. I wrote a bit about DNA before, but was not aware that they are implementing their own (level 2 compliant) JCR implementation. Moreover, the federated approach they take looks interesting to me.

Julian Wraith has posted experimental code to hook up the Tridion CMS with Day's CRX - essentially making it possible to use JCR as another storage option in Tridion next to file system and RDBMS.

More on last week's CMIS plugfest: Serge Huber, CTO of Jahia, has shared his impressions on day 1 and day 2 as well as posted videos of the CMIS client presentations.

Live from the CMIS Plugfest: Day 2

Day 2 of the CMIS plugfest just ended. As I blogged about yesterday we tried to connect as many client implementations with as many server implementations as we can. The results are displayed in the matrix below: "C" means being able to connect, "R" able to read, "W" able to write, and "W+S" write and search.

So, all in all we have tested 31 client/server combinations, most ATOM-based and 4 with SOAP. All tests were based on the spec version 0.6.1. I am quite happy with these results, especially because many servers and clients were updated to the latest spec version (or even implemented from scratch!) during the plug fest. Cedric Huesler has compiled a collection of screenshots of CMIS clients in action (all experimental and subject to changes):

Also, today the Apache Chemistry project (Apache's CMIS implementation) has been accepted in the ASF's Incubator. Congratulations!

Live from the CMIS Plugfest: Day 1

Day 1 of the CMIS plugfest is just getting into the beer-oriented phase. But before I leave I would like to share some basics of what we are up to:

Today's participants are: Berry van Halderen (Hippo), Cedric Huesler(Day), Dave Caruana (Alfresco), David Nuescheler (Day), Dominique Pfister (Day), Florent Guillaume (Nuxeo), Florian Mueller (OpenText), Jens Huebel (OpenText), Martin Hermes (SAP), Paul Goetz (SAP), Serge Huber (Jahia), Ugo Cei (SourceSense), Volker John (Saperion), and me.

By now all clients and servers are running on version 0.6(.1) of the CMIS spec. For the Atom binding we have as clients:

  • the Javascript client from the Apache Chemistry project

  • the Java client from the Apache Chemistry project

  • Alfresco

  • SAP

  • Shane Johnson's Flex-based CMIS Explorer

  • the CMIS Explorer portlet from Sourcesense

Servers with Atom bindings are:

  • Day CRX

  • Nuxeo

  • OpenText

  • Alfresco

That gives us 28 combinations to have fun with already. On top of that we have SOAP-based clients from OpenText and SAP (and the same list of servers).

We are checking which client can read from, write to and query which server (and tweaking both ends to make things work). Results tomorrow... :)

For getting live updates as we progress look for #plugfest on Twitter

more pictures here

minimeme.org says: Hello world!

Today, I am happy to announce that minimeme.org is finally "officially" going live. minimeme is a news aggregator focused on tech and software development news.

minimeme was born out of a personal frustration of mine: each morning I would skim through my feed reader only to find the relevant items twice or more times. On the other hand the signal to noise ratio of many feeds was way too low. I felt like a machine trying to retrieve the important items. So I decided to build a machine to do that for me.

There is no human intervention in the news selection - it is all done in a bias-free, neutral algorithm. Hence there is the claim "little Switzerland of tech news", minimeme is supposed to be neutral like Switzerland.

Having tested the algorithm for a couple of months I believe minimeme is now stable enough to be officially let loose. On top of the two currently implemented sections "dev" (feed) and "valley" (feed) there is a Twitter account you might like to follow. "dev" covers software development aspects from Ruby to CSS to REST. In the "valley" section you will find news from Google to startups to gadgets.

For the future I plan to add other topics as well as look into some recommendation algorithms. Let me now on the feedback forum which features you would like to see.

CMIS could be the MIDI interface of content management…


MIDI – the Musical Instrument Digital Interface – was created back in 1982 by a consortium of musical equipment manufacturers including, if I remember correctly, Roland, Yamaha, Sequential Circuits, Korg, Oberheim (I’ve got a Matrix 6 to sell BTW ;-), maybe Ensoniq (did they exist already?) and others. Companies that were fiercely competing in the market, individualistic industry leaders who agreed to get together to create a bigger market for their instruments and equipment.

My diploma work as an electronics engineer was about MIDI, in 1983 – I created a MIDI output interface that could be retrofitted into accordions. The spec was not final at the time (or at least I could get a final version – that was before the web of course), all I had in terms of specs were a few magazine articles, a Yamaha DX7 and one of the first Korg synths to have MIDI. Both synths had slightly different implementations, and some compatibility problems, as can be expected from an early and not yet widespread spec.

What’s happening with CMIS today sounds quite similar: competing vendors finally agreeing on an interoperability spec, even if it’s limited to a lowest common denominator. If this works as with MIDI, we’re in for some exciting times – the few years after 1982 saw a boom in MIDI-related electronic instruments and systems, as suddenly all kinds of equipment from different companies could talk together.

MIDI had serious shortcomings: a slow transmission rate, serial transmission meaning each note in a thick chord is delayed by nearly one millisecond, and somewhat limited data ranges for some real-time controllers. But the basic idea was great, let’s get something done that allows our instruments to talk together in a usable fashion, even if it’s not perfect. MIDI has survived until today, 27 years later, which is quite amazing for such a standard. It’s been tweaked and workarounds (including hardware extensions) have been used to adapt it to evolving needs, and often travels via USB or other fast channels today, but it’s still here, and the impact on the music equipment industry is still visible.

I must admit that I was quite disappointed with the CMIS spec when I first looked at it, especially due to the so-called REST bindings which aren’t too RESTful. And CMIS seems to consider a “document” as the unit of content, whereas JCR converts like myself prefer to work at a more atomic level. And don’t tell me that hierachies are a bad thing in managing content – you might want to ignore them in some cases, but micro-trees are a great way of organizing atoms of content.

Nevertheless, seeing the enthusiasm around the soon-to-be-incubating Apache Chemistry project (that link should work in a few days, how’s that for buzz building?) made me think about MIDI, and how amazing it was at the time that “commercial enemies” could get together to do something that finally benefitted the whole industry.

I still don’t understand why WebDAV can’t do the job if this is about documents, and still prefer JCR for actual work with content (considering that everything is content), but I’m starting to think that CMIS might make a big difference. It will need a test suite for that of course- software engineers know that interoperability without test suites can’t work – and this week’s CMIS plugfest is a good step in this direction. I’ll be around on Thursday, looking forward to it!

[LOTD] Project Jackalope: JCR for PHP

There is an exciting JCR project going on in the PHP world: Christian Stocker of Liip is busy on project Jackalope which aims to make available a Jackrabbit repository from PHP.

First tweets about the project start appeared only 3 weeks ago so I am surprised that there is already a Getting Started tutorial.

Make sure to have a look at the file test.php. Nice, isn't it?

Content Technology at the ApacheCon US 2009


I’m putting together a plan for a Content Technology track at the ApacheCon US 2009 in Oakland later this year. The original plan for the track was focused on JCR and related stuff, but there’s some interest in expanding the scope to cover a wider range of things related to content management and web publishing.

The track proposal has been discussed on the Jackrabbit and Sling mailing lists, and people from POI and Lenya have chimed in with interest. I also contacted Wicket, Cocoon, JSPWiki and Roller about their interest, and the initial feedback seems good. Any other projects I should be contacting?

I’m not sure how this works for the conference planners, who are probably facing some real deadlines in terms of fixing the conference schedule and contacting selected speakers. Let’s see how it all plays out.

Update: Added JSPWiki and Roller.

cq5 content models: the tags

In software engineering, modularity often leads to hard choices when it comes to to how big or small things should be. In a JCR content repository, the question is how granular should my content be?. A more granular structure contains more information, but too much granularity might slow things down.

Inside a JCR node, we can create a simple or complex hierachy of content atoms and metadata. But how far should we go? Should we think in terms of files, mini-databases, or simple name-value pairs?

JCR beginners often have a hard time figuring out the best content models for their problem, so we thought we'd share some of our experience here.

Starting with this post, we will explain some of the cq5 content structures. Without going into theoretical details - just by describing and explaining those structures.

Today, we'll have a look at the cq5 tags, used as semi-structured metadata, mostly for content pages. In cq5, tags like stockphotography/animals/birds can be added to content pages. Tags belong to namespaces (stockphotography in our example), and can be arranged hierarchically within their namespace.

cq5 tags - user view

Looking at the tags from the cq5 site admin console, we see a simple tree of concepts, grouped in namespaces (Marketing) and categories (Interest). Each tag has a unique TagID, visible in the first column on the right, that will later be used to connect content with those tags.

Nothing surprising here, except maybe the fact that our tags live in a hierarchical space, as opposed to a flat one. This creates simple namespaces for our tags, allowing several "worlds" of tags to be combined without conflicts.

How are we going to store this in JCR? In cq5, the tags are stored as a tree of JCR nodes, with a structure similar to the above one, using the cq:Tag node type. The content model simply reflects the reality of the tags and their natural organization.

The cq:Tag node type

Here's the definition of the cq:Tag node type in CQ5.2:

The tag node is required to have a sling:resourceType property with a default value of tagging/tag. That property is used by the Sling rendering system to select the appropriate components to render the tags, in the cq5 site admin console for example.

The node can contain nt:base child nodes which have the cq:Tag type by default. The cq:Tag node can also contain any number of additional ("residual" in JCRspeak) properties, single or multi-valued.

The cq:Tag node type also uses the mix:title mixin, which defines two optional String properties, jcr:title and jcr:description. The jcr:title property is used to allow tags to be renamed without changing their identifier. The cq5 user interface displays the jcr:title value, which can change over time, but it's the path of the cq:Tag node that is used as the tag identifier.

There's no specific node type for tag namespaces: a cq:Tag node that doesn't have a cq:Tag parent is considered as being a namespace. In cq5, tag definitions are stored under /etc/tags, and that node is not a cq:Tag, so cq:Tag child nodes like /etc/tags/marketing define tag namespaces.

At Day we like to keep things open whenever possible: the cq:Tag node type is not designed to put strong constraints on the content, and that's inline with David's model rule #1:

Data First, Structure Later. Maybe

We haven't reached the maybe stage yet. The cq:Tag node type is clearly here to help, not to restrict what we can do.

Tags content model

Switching to the CRX Explorer, we notice that the tree structure under /etc/tags simply maps the namespace/category/tag structure of our tags. Nothing surprising again, and that's a good thing. Obvious content structures will help others understand what we're doing.

Looking at the properties of the /etc/tags/stockphotography/animals/baby_animals node, we see that the TagId property that's visible in the cq5 site admin console is not explicitely stored in the content - it is simply defined by the storage path of the tag node under /etc/tags, to avoid redundant information.

Don't you love the Principle of Least Surprise?

At this point you're probably thinking that all this is quite obvious - and you're right! The beauty of a JCR content repository is that you can in most cases store information without any structural transformations. Tags are items grouped in namespaces and categories, so a tree of namespace/category/tag nodes makes perfect sense, and is largely self-explaining.

Tagging content

To tag content, we simply add a multi-value tags property to the jcr:content nodes of cq5 pages, or to other pieces of content. A page might have:

cq:tags = 
[
	marketing:interest/business,
	marketing:interest/investor,
	marketing:interest/services
]

if it was tagged with the business, investor, services tags of the interest category of the marketing namespace.

We don't use JCR references, but simply store paths in properties, as this gives us more flexibility when restructuring things. It's hard to say what will happen to those tags, and to the very concept of tagging, over the expected lifetime of our product, so we accept potentially dangling references (and cope with them at the application level) to gain content agility.

Coda

That's it for now! We hope to write more about our content models in the near future, to help our readers see how simple JCR content models can be - and should be.

As usual, feedback is very welcome - let us know if this information is useful to you!

Rapid Development with Apache Sling using an IDE

Apache Sling allows you to use scripts to render your content. Usually these scripts are stored in the repository as specific path. While it is possible to directly edit a script in the repository (using WebDAV), this editing happens out of your project’s context. But fortunately there is a better way!
Usually your project consists of modules (or a single module) which are deployed as OSGi bundles. To get your scripts into the repository, you add the scripts as resources to your project and use the initial content feature from Apache Sling to add the scripts to the repository.
So you usually end up with your module (bundle) checked out in your IDE (Eclipse for example), you do your initial development here (develop your OSGi services and scripts). For testing purposes you deploy the bundle (using the Maven Sling Plugin) which copies your scripts into the repository. From here they get picked up by Sling. If you now edit your scripts directly in the repository you have to take care to synchronize the changes with your checked out project in your IDE which can be an error prone and annoying task. Or you can edit the scripts in your IDE and then either redeploy your bundle or manually copy the scripts via WebDAV - which doesn’t make the process easier.
Fortunately Sling provides some tooling which makes these extra steps obsolete - actually we have this feature for a long time now but I always forgot to write about it…of course the following is only interesting for you, if you’re using Maven for your project.
Now, imagine your scripts are for your own node types which use the namespace prefix “myapp”, so you have a “src/main/resources/SLING-INF/content” (this is a convention for initial content) directory in your project. This content directory now contains a sub directory “libs” (or “apps” or any other configured search path) with your scripts. Underneath “libs” you have the “myapp” folder with a folder for each node type and this folder contains then your scripts (it’s really easier than it is to describe in textual form).
You’ll add a configuration for the Maven Bundle Plugin for adding the initial content header to your pom.xml:

<Sling-Initial-Content>
SLING-INF/content/libs/myapp;overwrite:=true;path:=/libs/myapp
</Sling-Initial-Content>

This basically copies the contents of the “/libs/myapp” folder on bundle install into the repository at the same location. On each update the contents gets overwritten.
Now add the Maven Sling Plugin to your pom.xml:

<plugin>
<groupId>org.apache.sling</groupId>
<artifactId>maven-sling-plugin</artifactId>
<version>2.0.3-incubator-SNAPSHOT</version>
<executions>
<execution>
<id>install-bundle</id>
<goals>
<goal>validate</goal>
<goal>install</goal>
</goals>
<configuration>
<mountByFS>true</mountByFS>
</configuration>
</execution>
</executions>
</plugin>

In the configuration above you can spot to new features of the latest plugin version: the validate goal will validate all *.json files and more important for our topic, the configuration “mountByFS” with the value “true”. Apache Sling has an extension bundle called file system provider (aka fsresource) which allows you to mount a file system path into the resource tree. So basically you can point for example the Sling resource path /myphotos to some directory on your server containing photos. This allows you to directly use files with Sling without copying them into the repository. Once you have installed this bundle into Sling and use the above configuration, each time you build your bundle with Maven and do a “mvn install”, the Maven Sling plugin will create a fsresource configuration for your initial content. In our case the Sling resource path “/libs/myapp” points to the file system directory “/src/main/resources/SLING-INF/content/libs/myapp”. So once you’ve done an initial install of your bundle, you can directly change the scripts in your IDE in your project. And the changes get immediately active, no need to sync and no need to copy. This makes development turnarounds for scripting much shorter as there is no turnaround any more.
The whole thing comes with a little drawback - with the configuration from above, your build fails if no Sling instance is reachable. So you should use a Maven profile for this configuration.

JAX, Apache Sling and Accidental Complexity

This year the number one Java conference in Germany, the JAX 09, is great as always. With nearly the same number of attendees it’s crowded in the positive sense - although you have to go to a session in time to get a good seat. I’ve seen a lot of highlights today - but first was my own presentation about JCR, OSGi, Scripting and REST - which presented the great Apache Sling web framework. Apache Sling uses Apache Felix as the OSGi framework and Apache Jackrabbit for the content repository. My session was very well attended, more than I expected, and while preparing the talk I noticed that I actually have more presentation material for JCR/Jackrabbit and OSGi/Felix than for Sling itself; although Sling is of course the topic of the session.
I think this shows that Sling is really a very clever framework, leveraging existing stuff and combining it in a productive way - this goes hand in hand with the great keynote tonight from Neal Ford - now, I can’t summarize it for you, but one of the key points was: don’t make things more complex than they should be (just because somethings that it makes sense or is cool). Look around and make use if available stuff - even if that doesn’t look cool for your developers. Most of the time is spent on accidental complexicy which in the end causes projects to fail.
I’m in the software world for over 20 years now and Neal is spot on with his message; i’ve seen a lot of projects fail because of this or because of one (or more) of the anti-patterns Neal mentioned.
Unfortunately I missed the sessions from Neal’s collegue Ted Neward which I enjoyed last year, but I saw an interesting session about the tools in Spring IDE for Spring, OSGi and Spring DM - now this looks pretty cool to me; I’m not in favour of using Spring as I think that there are more lightweight :) solutions when it comes to OSGi - but on the other hand giving the fact that everyone and his dog knows Spring and given such good tooling with Spring IDE, it’s really hard to advocate something else. With the upcomming OSGi blueprint specification this will also have an impact on OSGi. And in the light of Neal’s talk, what could possible reasons be to not use this stuff? It would be great if the tooling would work with any OSGi framework and not just the Spring server.
But these are just excerpts from the day - and it’s just the first day; I’m looking forward to tomorrow when the OSGi day takes place!
During the breaks I’m trying to finish some stuff for the upcomming Sling release - we hope to get it out in the next weeks. So stay tuned.

JAX, Apache Sling and Accidental Complexity

This year the number one Java conference in Germany, the JAX 09, is great as always. With nearly the same number of attendees it's crowded in the positive sense - although you have to go to a session in time to get a good seat. I've seen a lot of highlights today - but first was my own presentation about JCR, OSGi, Scripting and REST - which presented the great Apache Sling web framework. Apache Sling uses Apache Felix as the OSGi framework and Apache Jackrabbit for the content repository. My session was very well attended, more than I expected, and while preparing the talk I noticed that I actually have more presentation material for JCR/Jackrabbit and OSGi/Felix than for Sling itself; although Sling is of course the topic of the session. I think this shows that Sling is really a very clever framework, leveraging existing stuff and combining it in a productive way - this goes hand in hand with the great keynote tonight from Neal Ford - now, I can't summarize it for you, but one of the key points was: don't make things more complex than they should be (just because somethings that it makes sense or is cool). Look around and make use if available stuff - even if that doesn't look cool for your developers. Most of the time is spent on accidental complexicy which in the end causes projects to fail.

I'm in the software world for over 20 years now and Neal is spot on with his message; i've seen a lot of projects fail because of this or because of one (or more) of the anti-patterns Neal mentioned. Unfortunately I missed the sessions from Neal's collegue Ted Neward which I enjoyed last year, but I saw an interesting session about the tools in Spring IDE for Spring, OSGi and Spring DM - now this looks pretty cool to me; I'm not in favour of using Spring as I think that there are more lightweight :) solutions when it comes to OSGi - but on the other hand giving the fact that everyone and his dog knows Spring and given such good tooling with Spring IDE, it's really hard to advocate something else. With the upcomming OSGi blueprint specification this will also have an impact on OSGi. And in the light of Neal's talk, what could possible reasons be to not use this stuff? It would be great if the tooling would work with any OSGi framework and not just the Spring server.

But these are just excerpts from the day - and it's just the first day; I'm looking forward to tomorrow when the OSGi day takes place! During the breaks I'm trying to finish some stuff for the upcomming Sling release - we hope to get it out in the next weeks. So stay tuned.

Oracle buys MySQL (as part of Sun) – a great time to have another look at content repositories!


Lots of noise (and some gems) about the acquisition of Sun by Oracle on Twitter today. But contrary to Oracle’s content servers, Twitter seems to be holding up quite well.

I half-jokingly added my own noise saying that now’s a good time for people worried about MySQL’s future to switch to JCR, and Bergie agrees!

Rereading this post, what follows sounds a bit like marketingspeak, but it’s not – just enthusiasm!

We’ve been discussing the similarities between Midgard and JCR earlier this year with him and also with Jukka, and it’s amazing to see how close the models of Midgard and JCR are. With their TYPO3CR, Typo3 also agree that the JCR model is extremely well suited for content storage and manipulation. Midgard2 doesn’t use the JCR APIs, but as mentioned above the concepts are very similar.

Having made the move myself from wire-some-object-relational-stuff-on-top-of-sql-and-suffer-forever to JCR as an API that’s designed from the ground up to work with granular content, including observation, unstructured nodes and many other nice features, I’m not going back!

If you’re working with content (and yes, everything is content anyway), and started wondering about the future of MySQL today, now might be a good time to take another look at JCR. Apache Jackrabbit has been making huge progress in the last two years with respect to performance and reliability, and Apache Sling makes it much easier than before to get started with JCR, mostly due to its HTTP/JSON storage and query interface which takes the J out of JCR.

Never had so many (meaningful) replies and retweets on Twitter before today – but I started by wondering why CMIS wants to reinvent WebDAV, so no wonder. We’ll save that one for another time I guess.

IKS project: the first three months

It's been three months since I started working with the IKS project team, where Day is involved as an industrial partner among an impressive group of both academic and industrial teams.

The goal of this integrating project, partly funded by the European Commission, is to create an open source technology platform for semantically enhanced content management systems.

Now what's that? There are many ways to semantically enhance content management systems, including many scenarios that would be easy to explain to users but are still hard to implement today. Finding similarities between blocks of text written in different languages, for example, is an obvious use case that's next to impossible to implement reliably today for arbitrary content. Would that be useful? You bet - but only if the signal to noise ratio is on par with what you get from Google searches today, sufficiently good for the human user to do the final selection.

That's only one example, and talking to users of our products leads to a number of scenarios where semantics would help. Suggesting tags based on content? Some CMSes already do that, but is that good enough to be useful? Rarely. Tagging images automatically based on their content or graphical features? Now that would be cool, and that's probably possible today, but not yet as a mainstream feature.

After spending a few days with the IKS partners during two project meetings, I think creating a framework that allows semantic algorithms to be plugged in, and provides a simple RESTFul interface to its features, would be very valuable. We're planning to work for four years on IKS, and during that time semantic algorithms will improve, so whatever we create needs to allow new and improved tools to be plugged in.

In my opinion, CMS vendors will not change a significant part of their technology stack just to get semantic features - so it is important for IKS to provide tools that can be interfaced with existing software. In our discussions with IKS project members, we have mentioned Apache Solr as an excellent example of making the power of Lucene available to the programming masses, through a simple HTTP interface.

Whatever tools IKS makes available should in my opinion provide simple HTTP/JSON or HTTP/XML interfaces to their services, based on REST principles, to allow them to be used from any programming language and technology stack. Integrating semantic features in Lucene or Solr would be the best, of course, but that doesn't seem too easy to do now, so we might want to first create some prototypes and then look at possible synergies.

Those first three months working (part-time) for the IKS project have been very interesting for me, and I have a good feeling about the group. To put it simply (and half jokingly), the academic partners seem to have really nice semantic tools but don't really know what to do with them, and we industrial partners have needs in this area but don't know how to design and implement the required features. We just have to mix both together to create great things!

With thirteen project partners, we'll all need to push hard for concrete things to emerge quickly. Judging from our first meetings and workshops, the will is there to create something, and if that can help people be less scared of "semantic features", by making those more accessible, the project will be a success.

One important thing on which we have agreed early on, is to work in an open source way right from the start, and make some noise about our results as soon as we have something concrete to play with. Incubating parts of the IKS stack at the Apache Software Foundation might be the best way to grow a community both inside and outside of the IKS team - I hope that happens as soon as we identify the components for which that would make sense.

Looking forward to all of that. For now, my immediate goals are to learn more about Linked Data techniques and tools, and to find out how far Apache UIMA could help in extracting semantic information from arbitrary content. We'll have more about IKS here as the work progresses.

Ready to serve requests ...

In the Apache Sling project we have an interesting problem: Knowing when the application has finished its startup.

Coming from a background of a traditional application, you know when the system has finished its startup. For example, a servlet container knows it has finished the startup, when all web applications have been started.

In Apache Sling, the situation is a bit different: Apache Sling is an extensible system, where extensions may simply be added by adding more bundles. "Easy", you say, "just wait for all bundles to have been started and you know when the application is ready". True, but there is a catch.

To extend Apache Sling, you register services with the OSGi registry. "Still easy", you might say. Right, if the services are all started by bundle activators, we still can depend on having all bundles started for the system to be ready.

Again, this is only part of the story: Some services depend on other services. So the dependent services may only be started when the dependencies get resolved. This is where the trouble starts.

To help solve the dependency issues in a simple way, we employ OSGi Declarative Services. Great things to define components and services and have the dependency requirements being enforced and have dependency injection and configuration support and ... much more.

"What does it cost?", you say. Well, we buy this functionality with a lot of asynchronicity: When all bundles have been started, not all components may have been activated and not all services may have been registered.

Now, when is the application ready ? I cannot easily tell.

One approach could be to have a special service to watch out for a configurable list of services to be available. When all services are available and after the framework has started, the service signals Application Ready. As soon as one of the services goes away, the service might signal Application Not Ready.

The real question raising now is: What services are required for the application to be considered ready ? Can we come up with such a list ? How to we manage this list in light of more services to come, which might be considered vital ?

Any input would be appreciated ;-)

Dependency Injection in OSGi

The OSGi framework and its compendium services provide a whole lot of fun to build applications. Defining bundles is a cool stuff to cut the big job into pieces and enjoy the coolness of separation of concerns just like the old Romans said: Divide et Impera !

One interesting compendium specification is the Declarative Services Specification. This specification tries and IMHO succeeds very well to bring some of the cool stuff of Spring, namely Dependency Injection, into the OSGi world. Just like the application descriptors in Spring you have component descriptors in Declarative Services.

Using a component descriptor, you define the following properties of a component:


  • The name of the component and whether it is activated immediately or not

  • Whether the component is a service and the service interfaces to register the component with

  • Which other services are used by the component. These services may be injected (bound in OSGi speak) or may be looked up. There is also the notion of mandatory and optional services which provides the functionality to delay the component action until the mandatory service becomes available.

  • Configuration properties. Some properties may be injected by the descriptor itself. But at the same time, configuration properties may also be overwritten by configuration from the Configuration Admin Service. Thus the configuration of components may even be very dynamic.



The good news for the XML-haters like me: Over in the Apache Felix project we have Maven 2 plugin which takes annotations (JavaDoc or Java 5 Annotations) from your Component classes and builds the descriptors on your behalf.

So the next time, you are looking for dependency injection, you might want to consider OSGi and Declarative Services ;-)

Just for completeness, here is a list of other frameworks providing some sort of dependency injection:



In the end all work more or less the same, in that the provide some abstraction layer on top of the basic OSGi framework functionality: the Service Registry. This is really, the greatest things of all and IMHO shows the cleverness of the OSGi Framework specification: With just three basic layers (modularity, lifecycle and the service registry), you get the whole world in your hand to build flexible, modular and extensible applications.

[LOTW] Launch of the week: University of Phoenix

Can I re-tweet on a blog? Hmm, let's try:

RT @jaykerger www.phoenix.edu is officially live on CQ5! Congrats to the team!

Congratulations, guys!

The latest meme: 10 things about me


Ok…so I got tagged by Irina Guseva on this one, started by Kas Thomas.

No meme ID though, or is it 42a4263e9ae40c23da79bd43370fd814 ?

Anyway, here we go…

1-5: I already took part in such a meme, with only five things, which are still valid.

6: I was recently fired by the band where I was playing drums, not playing well enough for their new and improved musical goals. You can’t win every time I guess ;-)

7: I have three kids aged 17 to 21. Some people complain about losing the cuteness factor of babies when they grow up, but I’ve been enjoying all phases until now. You just have to constantly adjust to everything ;-)

8: I’m selling my motorbike, meaning that I’ll be without one for the first time in about 15 years. The idea is to rent fun bikes from time to time, and buy the Right One later.

9: Twenty-four hours a day is clearly not enough when you like cycling, mountainbiking, sailing, skiing, hiking, playing music, DIY work, go-karting, motorcycling and barbecues in remote places. Lots of plans for retirement.

10: I enjoy working in the content management space because it’s like love songs: the basic story has been the same for ages, but we seem to constantly find new ways to tell it. I usually hate love songs by the way, but that’s another story.

Tagging @michaelmarth and @alexkli.

The Lifetime of a CMS Installation

(cross-posting from here)

CMS analyst Janus Boye has blogged about the expected lifetime of a CMS installation, i.e. for how long an installed CMS can be expected to be in production. His guess is a lifetime of 3 years. On the blog's comments Janus and I got into a discussion about the accuracy of that guess where he asked Day to publish actual real data about this topic.

I like this idea because publishing this data provides a benefit to our potential new customers: a reliable indicator (without any hand-waving or gut feelings) of the CMS's lifetime that can be used in business plan

The data

The data I have used is taken from Day's support contracts. Only customer data from outside ouf Europe was used (simply because it was available to me). This selection is likely to bias the results towards shorter lifetimes as Day's oldest customers are based in Europe. The basic assumption is that the life time of the CMS is equivalent to the duration of the support contract. The used end point of each contract period is the date up to which the contract is paid for as of today.


You might argue that there could be customers that have a contract but do not actually use the product anymore, which could in fact be the case (I do not know of any). On the other hand, I am aware of customers that still use the product and have terminated their support contract. Therefore, in order to reduce selection bias I did not remove any data points due to this particular consideration.


Each customer was counted once for each product he purchased, i.e. a customer that has two distinct support contracts for CRX and CQ was sampled twice. I discarded all OEM contracts because they are of their different nature (they would skew the result towards longer lifetimes). Finally, I also dropped a data point where the support contract was cancelled because the customer went out-of-business alltogether.


I believe that this data set is reasonably unbiased to provide meaningful results with respect to the question of the lifetime of a customer's CQ/CRX installation.

The Method

Luckily for Day, the data is what is called "right censored". That means that it is unknown for how long an existing support contract will go on - actually the majority of the available data points are right censored.

The scientific discipline that is concerned with analyzing data of this kind is called "survival analysis". One is interested in the survival function which maps a set of events onto time. The survival function is a property of a random variable, i.e. it needs to be estimated (in the statistical sense of the word).


One well know estimator for the survival function is the Kaplan-Meier estimator (which is non-parametric, i.e. there are no underlying assumptions about the distribution of the data). In a nutshell:


The Kaplan-Meier estimate of the survival function, S_hat(t), corresponds to the non-parametric MLE estimate of S(t). The resulting estimate is a step function that has jumps at observed event times, ti. In general, it is assumed the ti are ordered: 0 <1>i is di, and the number of individuals at risk (ie, who have not experienced the event) at a time before ti is Yi, then the Kaplan-Meier estimate of the survival function and its estimated variance is given by:

The quantity of interest is the mean survival time (and its respective estimate) which is given by:

Because S(t) may not converge to zero, the estimate may diverge. Therefore the integral is only taken up to a finite number. A reasonable choice of is the largest observed or censored time.

Results

Resisting a geek's urge to implement the estimator myself I used the freely available R to calculate the results. Here is a plot of the Kaplan-Meier estimate for the survival function with 95% confidence bounds (time is in days):

And finally, the estimated value for the mean survival time, i.e. the estimated lifetime of a Day CMS installation is: 2453 days with a standard deviation of 154 days. That's about 6.7 years. Mind you, this result is likely to be lower than if the whole customer base had been analyzed.

The Lifetime of a CMS Installation

CMS analyst Janus Boye has blogged about the expected lifetime of a CMS installation, i.e. for how long an installed CMS can be expected to be in production. His guess is a lifetime of 3 years. On the blog's comments Janus and I got into a discussion about the accuracy of that guess where he asked Day to publish actual real data about this topic.

I like this idea because publishing this data provides a benefit to our potential new customers: a reliable indicator (without any hand-waving or gut feelings) of the CMS's lifetime that can be used in business plans.

The data

The data I have used is taken from Day's support contracts. Only customer data from outside ouf Europe was used (simply because it was available to me). This selection is likely to bias the results towards shorter lifetimes as Day's oldest customers are based in Europe. The basic assumption is that the life time of the CMS is equivalent to the duration of the support contract. The used end point of each contract period is the date up to which the contract is paid for as of today.

You might argue that there could be customers that have a contract but do not actually use the product anymore, which could in fact be the case (I do not know of any). On the other hand, I am aware of customers that still use the product and have terminated their support contract. Therefore, in order to reduce selection bias I did not remove any data points due to this particular consideration.

Each customer was counted once for each product he purchased, i.e. a customer that has two distinct support contracts for CRX and CQ was sampled twice. I discarded all OEM contracts because they are of their different nature (they would skew the result towards longer lifetimes). Finally, I also dropped a data point where the support contract was cancelled because the customer went out-of-business alltogether.

I believe that this data set is reasonably unbiased to provide meaningful results with respect to the question of the lifetime of a customer's CQ/CRX installation.

The Method

Luckily for Day, the data is what is called "right censored". That means that it is unknown for how long an existing support contract will go on - actually the majority of the available data points are right censored.

The scientific discipline that is concerned with analyzing data of this kind is called "survival analysis". One is interested in the survival function which maps a set of events onto time. The survival function is a property of a random variable, i.e. it needs to be estimated (in the statistical sense of the word).

One well know estimator for the survival function is the Kaplan-Meier estimator (which is non-parametric, i.e. there are no underlying assumptions about the distribution of the data). In a nutshell:

The Kaplan-Meier estimate of the survival function, S_hat(t), corresponds to the non-parametric MLE estimate of S(t). The resulting estimate is a step function that has jumps at observed event times, ti. In general, it is assumed the ti are ordered: 0 < t1 < t2 < · · · < tD. If the number of individuals with an observed event time ti is di, and the number of individuals at risk (ie, who have not experienced the event) at a time before ti is Yi, then the Kaplan-Meier estimate of the survival function and its estimated variance is given by:

The quantity of interest is the mean survival time (and its respective estimate) which is given by:

Because S(t) may not converge to zero, the estimate may diverge. Therefore the integral is only taken up to a finite number. A reasonable choice of is the largest observed or censored time.

Results

Resisting a geek's urge to implement the estimator myself I used the freely available R to calculate the results. Here is a plot of the Kaplan-Meier estimate for the survival function with 95% confidence bounds (time is in days):

And finally, the estimated value for the mean survival time, i.e. the estimated lifetime of a Day CMS installation is: 2453 days with a standard deviation of 154 days. That's about 6.7 years. Mind you, this result is likely to be lower than if the whole customer base had been analyzed.

RESTful Architectures: what's in for the business?

This is a guest post of Juerg Meier who runs restfulness.info where he describes a prototypical enterprise working on the principles of ROA. He blogs at blog.restfulness.info.

Based on my professional activities, I have been faced to explain the advantages of Resource Oriented Architectures (ROA) to a number of customers. Some of them with technical, but most with business background. Granted, you don’t want to provide the link to Roy’s dissertation to a business person… nor tell them, what great infrastructure we have to implement the architecture. Personally, I came to a conclusion, which I am glad to share here. This article tries to elaborate what the number one business driver for ROA is, IMHO.

Back at BEA eWorld 2002, BEA co-founder/CEO Alfred Chuang had somebody to steal his car during his keynote. The car was found again - courtesy to a car-locating-system, which communicated with registered cars via the latest technology hype of the coming era: web services. The car was hot, the show was fun, but one big question remained: what difference made the web services? To make this story short: none. Because the same could have been achieved with any other RPC-based technology. From the viewpoint of the business – in this case the car owner – web services did not add any specific value, so business can remain essentially agnostic about the current communications fashions in IT. IT is simply perceived as a cost center.

With almost 3 decades in IT, I have seen many of those fashions come and go: bare-boned socket connections, DCE (yes, I do mean distributed computing environment), transaction monitors, CORBA, component models such as EJBs and DCOM, WebServices... Have any of these evolutionary RPC-steps made us, information systems developers, more efficient? Not much comes to my mind. It generally was rather an exercise to move bytes from A to B in the hypest fashion.

Even corporate intranet projects were driven by rather weird, heavyweight technologies like portal/portlets or "document-driven collaboration solutions". In particular, they appeared overly complex in comparison to that system that has been growing at light speed for 15 years now: the WWW. Moreover, it has exactly the qualities we’ve been looking for in enterprises: scalability, flexibility, reliability, speed. In 2009, the prime time for the "Enterprise Web" seems to come, thanks to progresses in web oriented middleware technologies and standards, but also owing to a common gut feeling that WS* have somehow failed to fulfil their promises.

So, what's the ROA advantage?

Many will happily cry: Web 2.0 support! I won’t. Sure, information exchange can improve with blogs and wikis, thereby smoothing the biggest nightmare of the information age, I mean the tons of emails that flood our mailboxes every day.

But I think that ad-hoc information generation is rather the exception than the rule in the enterprise. The typical enterprise is highly process-driven, hopefully by well-defined ones, but more often than not by ones that have been defined informally. Many of these processes, even the well defined, are not well implemented: they lack access to related information, either intentionally (too costly to access data on different systems) or un-intentionally (process designers have been unaware of other relevant information). Both reasons may have their origins in one of RPC’s major deficiencies: services still are not really loosely enough coupled and they lack a truly overall addressing scheme.

It is primarily the latter where I see the biggest advantage of ROA, and that advantage has a name: URI, the Uniform Resource Identifier. With it, we are able to give any chunk of information a unique identity and a location - on an enterprise wide level. The "chunks" can be anything, from small to large, from a database record to an image in some DMS, from pure data encapsulations to function-only resources. Consequently, we give process designers the opportunity to integrate any relevant information into their "process space". And vice-versa, the process’ output will be as easy available to others, no matter where else in the enterprise the consumer resides.

Thus, Resource Oriented Architectures promote an enterprise addressing scheme that is truly re-usable. Instead of building, like in RPC-based architectures, thousands of point-to-point, tightly-coupled connections between systems, business analysts are asked consume information out of large, business-oriented classification trees, and store their outcomes back into some specific position into this tree.

Consider an old-fashioned cash withdrawal at a teller’s window. This process consumes base information about you, your account, the teller, and may require viewing an image containing the scan of your signature for verifying your identity. Each of these information fragments might come from different systems, i.e. different information silos, but the addressing/classification scheme acts as glue here.

On the other end, the process produces information about your withdrawal, which might include the data (time, amount, …), and perhaps a physical receipt you had to sign. These artefacts can be stored in a well defined place in the information tree, e.g. at //mybank/privateCustomers/4711/accounts/checking/transactions/20090510-1/details, and .../20090510-1/receipt respectively.

As you may have noticed, I have not mentioned IT in the description above. REST comes with inherent concepts that help the enterprise to run its business simpler and with more transparency.

The concept of an overall addressing scheme that makes information directly available via uniform resource identifiers, comes as a direct advantage to businesses.

Of course, the engine that empowers this uniform access has to come from IT. But unlike its RPC-style predecessors, this is not something that happens under the covers in some obscure manner; the introduction of ROA impacts directly on how businesses and their knowledge workers will operate.

This is the main message we have to carry on to the management and business levels of companies. Still noteworthy though that the technical implementation of ROA can happen on relatively simple and low priced infrastructure. The same time, we can simplify our SW stack by leaving alone many unnecessary SW layers. This might be considered a side effect; however, it improves the Return-On-Investment of ROA, a "hard" argument, which will be paid much attention these days.

Video: open source collaboration tools are good for you

Bertrand Delacretaz's talk on open source tools at the OpenExpo (mentioned previously) is now available as a video.

The slidedeck can be downloaded here.

[LOTD] JCR developer portal (built with Sling)

Our partners eForce have just come out with a new JCR-focused developer portal called JCRDEV. The covered topics include Apache Sling as well as JCR-based CMSs. The site itself is built on Sling.

Alongside the web site there is also a Twitter account to follow and a LinkedIn group to join.

[ANN] CMIS list added to discussion groups

The CMIS Technical Committee's mailing list has been added to dev.day.com's discussion groups. Check it out here. The list is, of course, searchable.

[ANN] CRX 1.4.2 Released

Along with the release of CQ5.2 we have released an update of CRX, which is now available in version 1.4.2. Apart from bug fixes there are improvements are in the areas Quickstart, virtual repository as well as search. Also, CRX is now optionally supported on Amazon EC2 virtual machines. Please see the release notes for full details.

Here's the download of the Developer Edition and the Documentation Pack.

The Art of Mining a Folksonomy

As you all know, CQ5 supports tagging and taxonomies and both side by side. Taxonomies are great, because they allow multi-dimensional classification of content, but sometimes there are things that do not fit into the taxonomy. And this is where it comes handy that you can just type and add a new tag to the standard tag namespace folksonomy. Using this feedback from the folksonomy you and enhance and improve your taxonomy. But what happens if you do not start with a neatly organized taxonomy, but with a wild-west folksonomy that has been created by numerous authors and you want to bring order into the chaos?

Actually, you are in a very good position. Starting with data first, gives you the ability to come up with a meaningful taxonomy that is relevant to your content in the first shot. Using a folksonomy as a starting point to create a taxonomy is what I (and others) call "Folksonomy Mining". As an illustration how to use my Folksonomy Mining technique, I will be using the folksonomy created by the last.fm community.

  1. Start with a folksonomy of viable size in a well-defined domain. You need at least 1000 tagged items and the domain should not be all-encompassing like "web pages on the internet". The last.fm folksonomy is certainly the right size and with music we have a domain model that is restricted enough.

  2. Get 100 most popular tags out of the folksonomy. With CQ5 tagging you have the "count" column that says you how popular a tag is. With last.fm, there is an API method for that. (100 tags)

  3. Remove obvious duplicates. "favorite", "favorites", "favourites" and "favourite" for instance need to be merged.

  4. Create dimensions for groups of similar tags. Examples that I can find in the last.fm folksonomy are: time (60s, 70s, 80s, 90s, 00s), origin (american, australian, british), mood (ambient, atmospheric, chillout), vocals (female vocalists, male vocalists, instrumental), ownership (albums i own, seen live), origin (soundtrack, covers, live, remix), season (christmas, summer), preference (amazing, awesome, beautiful)

  5. There is usually low co-occurrence between different items in the same dimension.

  6. For more complex dimensions such as genre (rock, pop, country, folk) you might want to create sub-and super-categories. For example rock-metal-death metal-brutal death metal (yes, this is part of the top 100)

  7. There is usually high co-occurrence between super- and sub-categories. And the super-category usually has more entries than the sub-category.

  8. Fill in "holes" in the taxonomy. For instance the time dimension: add 30s, 40s, 50s. In the season dimension add: spring, fall, winter

  9. For categories with many sub-categories add grouping categories where they become helpful, for instance in the origin dimension it might help to add american, european, asian, african to group origins.

  10. There will be a number of tags to remain uncategorized, just leave them this way, and leave the folksonomy open, so that new tags can be added over time

With these ten simple rules you have managed to grow trees out of the tag cloud, added structure where needed that can be re-used in query builders and other places of the system.

Invitation to CMIS PlugFest

Dear CMIS community,

As per the "official invitiation" ;) email below we are hosting an CMIS PlugFest in Basel. Since this is not constrained by OASIS TC membership we would like to make this as open as possible and invite other interested parties as well. Come along, bring your CMIS implementation and see if it interoperates with others.

Please let me know if you intend to join us (david(at)day.com). If you have not been to Basel before see Roy Fielding's "Visitor's Guide to Basel".

Regards,
David

---------- Forwarded message ----------

From: David Nuescheler
Date: Sun, Apr 5, 2009 at 12:58 PM
Subject: Official Open CMIS PlugFest Invitation April 29-30 in Basel, Switzerland
To: "cmis@lists.oasis-open.org"

Dear TC Members,

Please find below the "official" invitation to the frequently disussed CMIS Plug Fest at the Day offices in Basel, Switzerland in April.

We would like to invite all CMIS TC members and other interested parties to an open PlugFest at our Offices in Basel Switzerland. (http://is.gd/qPs0)

Agenda:
---
April 29-2009
9-17h Setup and Connectivity Setup, Smoke Tests
19h- CMIS Dinner

April 30-2009
9-16h Actual Testing and compliance reporting
16h-17h Wrap-up, Next steps
---

We will provide wireless access to the Internet and we will also have inbound HTTP access through a reverse proxy ready for people to do remote testing to our servers.

For people who would like to check in remotely, we will use the following webex conference.

---< CMIS Plugfest >----------------------------------------------------------

Monday, April 29, 2009, 9h00 CET, 03:00 am EST, 12:00 am PST
Meeting Number: 709 533 911 (Password: day)
https://day.webex.com/day/j.php?ED=119801362&UID=0&PW=50b835524b&RT=MiMyMg%3D%3D
Dial-in: USA +1 (718) 354 1382
More Dial-in numbers:
https://day.webex.com/day/globalcallin.php?serviceType=MC&ED=110021337&tollFree=1

--------------------------------------------------------------------------------------

For planning purposes it would be great if you could let me know (either on this list or directly) by the beginning of next week if you plan to attend and if you have any further questions please feel free to ask.

regards,

david

One month, five languages


The past month was probably the first time in about 20 years when the number of natural languages I used was greater than the number of programming languages I wrote code in. I’ve never thought of myself as much of a language person, but here I am actively using five different languages! Here’s a list of the languages in order of my fluency.

Finnish

FinnishOf course. I was in Finland twice in the past month and every other day or so I spend a lot of time on Skype talking Finnish with Kikka. I read Finnish news every day, and keep in contact with my Finnish friends mostly through various Internet channels.

My main concern with my Finnish is that nowadays I don’t do much serious writing in Finnish. Of course I write letters, postcards and email to friends and family, but that’s about it. I used to be a fairly good writer (grammatically, etc., not so much artistically), but now I think my skills are rapidly eroding.

English

EnglishEnglish is currently the language I use most actively. I speak it daily at work and elsewhere. I read and write piles of email in English every day. All the code and documentation that I read and write is in English, just like the various tech and world affairs sites and blogs I follow.

Even though I understand English well and can get myself understood with little trouble, I still don’t think I’m particularly good with the language. As they say: The universal language of the world is not English; the truly universal language is bad English. The last time I actually studied English was in high school 15 years ago, so I believe I would really benefit from taking some more advanced courses on the finer points of the language.

Swedish

SwedishLearning Swedish is mandatory in Finland, so I spent ten years studying the language at school. Thus I have a reasonably strong theoretical background in the language, but since I very rarely use it anywhere my practical skills aren’t that great. Prodded by Kikka to do something about that, I recently bought and started reading Conn Iggulden’s book Stäppens Krigare (Wolf of the Plains) in Swedish. The first 20 or so pages were a struggle, but then it all came back to me and now I’m going strong at around page 200 and can barely set the book aside.

The funny thing about the Swedish I’ve learned is that it’s not really what they speak in Sweden, but rather a dialect spoken only by a small Swedish-speaking minority in Finland. I have a feeling that I’m going to end up with something similar, just on a larger scale, also for German…

French

FrenchI’ve never been too enthusiastic about learning languages, so in high school I dropped French (that I had studied for two years earlier) in favor of more math and physics. I did some more French courses at the university to fill up the mandatory language studies, but I’ve never really mastered the language. However, I have relatives in France and Morocco, so I do have a “live” connection to the language that I’ve lately tried to keep up through occasional visits.

My latest visit was a few weeks ago when I took the TGV train from Basel for a quick weekend visit to Paris. During the visit I tried to speak as much French as I could, and was able to keep up reasonably well when people around me were speaking French.

German

GermanLast but not least. I started actively learning German when I moved to to Switzerland about half a year ago. First I used an online course, and after finishing it I’ve now been taking an evening course with a real teacher and a group of seven students. It’s hard work, especially since the Swiss German I hear around me every day is quite different from the Standard German I’m learning at the course.

I can increasingly well manage simple shopping and restaurant interactions in German, and I try to read (or at least browse) the local newspapers every day. I’ve also started using the German Wikipedia as my first source of any non-technical trivia. I go there a few times a week and only switch to the English counterpart when I can’t figure out some specific details.

I guess my studies are starting to take effect, as my first germanism already found it’s way to a tweet I posted yesterday. Earlier this week I also had my first dream in German! In my dream I continued doing the German exercises that I had been doing when I fell asleep…

What’s missing?

All the languages I’m using are (originally) European. I’d really love a chance to brush up my Japanese (I studied it for a while at the university) or learn the basics of Mandarin (and Arabic would be cool too), but I guess that for the next few years I’ll be too busy getting up to speed with German to even consider doing something new.

Speaking of Switzerland…best April’s fools of 2009


Swiss tourism’s mountain cleaners movie by far my favorite April’s fools / marketing gimmick for 2009. I’m sure tons of people will be going back to their website next year on April first to discover the next one. Thanks Marie for pointing me to it.

I quit blogging

Personal blogging is dead. It has been succeeded by microblogging and lifestreaming on one end and corporate and professional blogging on the other end. Look at the Technorati Top 100 blogs - how many personal blogs in the sense of a blog in 2002 can you find. Hardly one or two. The reason is that the world has changed a lot since 2002 when I started blogging (first on blogger.com, not aquired by Google yet, then on Movable Type, when it was still a side project of a husband and wife team, then on Roller, before it became an Apache project). The world has changed, personal blogging is dead and so is this blog.

This does not mean that I will quit blogging altogether - you will find my posts on dev.day.com (corporate blogging) and on gettingsoftware.posterous.com (think of it as an extended version of Twitter). weblogs.goshaky.com will remain for the time being, but I will redirect all requests to the blog's start page to gettingsoftware.posterous.com, my new macro-microblog.

Custom Validation Function In Dialogs


Sometimes it is desirable to have a particular input widget in a CQ5 WCM dialog to be fitted with a custom validation. Imagine you have two password fields and you would like to validate that the passwords match prior to saving the dialog.

The Dialog
The following is a simple dialog that features two input widgets for passwords:

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:cq="http://www.day.com/jcr/cq/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"
    jcr:primaryType="cq:Dialog"
    xtype="panel">
    <items jcr:primaryType="cq:WidgetCollection">
                <password
                    jcr:primaryType="cq:Widget"
                    fieldLabel="Password"
                    name="./password"
                    allowBlank="false"
                    xtype="password"/>
                <passwordConfirm
                    jcr:primaryType="cq:Widget"
                    fieldLabel="Confirm Password"
                    name="./password"
                    allowBlank="false"
                    validator="function(value) { verifyPasswords(value) }"
                    xtype="password"/>
    </items>
</jcr:root>

Notice the validator property on the passwordConfirm input. It specifies a function that is evaluated and itself references a function, that you can easily define in a custom JavaScript file included in the head of your page in authoring mode.

Not all widgets support a validator property. You can verify support for a validator property by consulting the Ext JS API documentation. Check the Ext – form branch, e.g. TextField, of which the Password widget is an extension.

The Validator Function
The validation function itself is quite easy in this case and also runs within the scope of the widget, as such you have access to the widget’s object tree and variables. The Ext JS API documentation for the TextField states the following for the validator configuration property:

“A custom validation function to be called during field validation (defaults to null). If specified, this function will be called only after the built-in validations (allowBlank, minLength, maxLength) and any configured vtype all return true. This function will be passed the current field value and expected to return boolean true if the value is valid or a string error message if invalid.”

As such the function might look as follows in your custom JS file:

function verifyPasswords(value) {
	var pwd = this.ownerCt.items.get(2).getRawValue();
	if (pwd == value) {
		return true;
	}
	return CQ.I18n.getMessage("The passwords do not match.");
}

Some Important XTypes Supporting a Validator

  • datefield
  • numberfield
  • password
  • textarea
  • textfield

You can check the widget sources under /libs/cq/widgets/source/** to determine the widget hierarchy for an xtype, and whether any of the involved types support the validator property based on the Ext JS API documentation.

Validation With Regular Expression
Dialogs fields also support validation using regular expressions. See the following example:

<contentOwner
    jcr:primaryType="cq:Widget"
    fieldLabel="Content Owner Email"
    name="./contentOwner"
    regex="/^[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$/"
    regexText="Please enter a valid email address."
    xtype="textfield"/>

Opening a Page in CQ5 WCM without Content Finder


In a CQ5 WCM authoring environment, when you open a page via double-click, by default the Content Finder is loaded and the actual content page displayed in a frame. While this is desirable in most cases, the are pages whose functionality does not require the presence of the Content Finder, e.g. tool templates that don’t allow content creation, like adding new paragraphs.

In order to prevent a page opening in the Content Finder view, choose from the following two options:

  • On the page component’s definition, set the following property:
    cq:defaultView="html"
    This is probably the preferable way, as renouncing the content finder in most cases pertains to all pages based on a template and thus page component.
  • On any content page, e.g. on /content/geometrixx/jcr:content, set the following property:
    cq:defaultView="html"
    This is the way to specifically set an exception where otherwise the page component would allow the content finder view (implicitly: cq:defaultView=”contentfinder”).

Open Source Collaboration Tools are Good for You – relooked and live tomorrow!


I have relooked and slightly expanded this presentation for tomorrow at OpenExpo in Bern – the main addition is a discussion of the fear of making mistakes in public.

Talking to attendees last week at ApacheCon shows that people often struggle to introduce these tools and the open way of working in their companies. It seems like that fear can be an important blocking factor, and people are rarely explicitely aware of it.

(See how serious I am? This is April 1st and I’m not even making lame jokes!)

Update: the video is available on YouTube as part of the OpenExpo channel.

Welcome CQ5 DAM, welcome CQ5 Social Collaboration

Today, our CQ5 Digital Asset Management and CQ5 Social Collaboration become available with the launch of CQ 5.2. This is a great day for me, because my two products that I have been working on for the last one and a half years are going to market.

Both with CQ DAM 5.2 and CQ Social Collaboration 5.2 I have been standing on the shoulders of giants, which have enabled me to implement a bold vision of Digital Asset Management and Social Collaboration integrated into Web Content Management in such a short time, following our CQ5.1 release last November. These giants are our developer team and our product marketing team who have a done a marvelous job on getting this product to countless demos, POCs, beta projects, press and analyst briefings and who have implemented one cool feature after the other.

But standing on the shoulders of giants also means reusing a great platform to build upon. We did not have to think a single second about clustering, backup, scaling, permissions, LDAP integration, workflow, development environment, because all this is at some level provided out of the box by CRX (and its underlying Apache Jackrabbit and Apache Sling foundations) and the CQ5 Platform we share with CQ5 Web Content Management.

I would like to take the opportunity to shed the light on two smaller features, one in CQ5 DAM and one in Social Collaboration that we typically do not include in demos, but that offer a huge potential for everyone who sees our products as a platform to realize his own ideas.

Feature number one is the workflow launcher we are using especially for CQ5 DAM. This workflow launcher will listen for repository events at a certain part of your content repository, for example the DAM bulk upload hot zone or a Sharepoint repository we are monitoring through our connectors. Once a new file is being created here, we launch the workflow that will take care of the actual processing. For the end user this means: more flexibility in creating complex workflow-driven processing solutions for digital assets. And it means a reduction of magic, because everything that will happen to your assets is visible (in the Workflow Models tab) and the reason why it happens is visible in the workflow launcher tab. Not attempting to do magic and to appear as magician should be a goal for every software engineer.

The second feature is even less graphic (guess why we are not demoing it), but it is one of the small pieces that makes Social Collaboration so great. We call it the 'Feed Importer', but it actually does a lot more. The idea of the feed importer is to poll a remote resource at a specified interval, to parse it and to create nodes in the content repository that represent the content of the remote resource. We are using this at two places right now - for subscribing to remote iCalendar files just like you do in Apple's iCal or Google Calendar and in the blog, where you can aggregate existing blogs in one 'planet blog' that for instance contains all bloggers of the JCR community. This auto-blogging-feature has been in the blog since Advanced Collaboration 1.1, but with Social Collaboration 5.2 there is a proper and powerful API that can be used in customer projects as well. Needless to say, in order to implement a new parser & importer all you have to do is implement one OSGi component and I expect it to be not too long till we see Twitter-mashups on CQ5-powered websites that are using the Feed Importer.

So thank you very much for all the help with the 5.2 release and I am looking forward to see many CQ5-powered community websites and public asset repositories soon.

JSR-283 enters Proposed Final Draft Stage and gets restructured

I am very pleased to announce that JSR-283 has reached the proposed final draft stage. The specification is now available for download.

The most notable change from the public review stage is a very exciting one: the specification has been reorganized into two separate parts:

  • the repository model and

  • its Java bindings.

This should allow for much simpler consumption and implementation of the spec in other language environments.

The notable extensions of JSR-283 from JCR 1.0 have been discussed previously, see InfoQ for a summary. My personal favourites are.

  1. Query extensions mainly around extended support for SQL, specifically JOINs; We also introduced Java Bindings for the Query Object model that allow for easier "query wizards" and last but not least "Prepared" queries.

  2. Access Control Management to go beyond the introspection that is already specified in JCR v1.0.

  3. Retention Policy & Hold Support to enable records management applications sitting on top of JCR repositories in a standardized fashion.

  4. Simple versioning to provide for repositories that only support linear versioning. Versioning extensions around "Baselines" and "Activities" to cover the full configuration management spectrum.

  5. Lifecycle Management to allow to easily hook content into a process engine.

  6. Standardized Nodetype Registration that allow application to register and manage their nodetypes with repository.

  7. New property types and new nodetypes to enhance application interoperability around common meta data.

  8. Workspace Management to allow for creation and deletion of workspaces in a repository.

  9. Shareable nodes that allow the tree in a content repository workspace to become a more implicit network.

  10. Journalling Observation that allows offline/polling applications to find out what happend in a content repository since they last checked.

As next steps we will implement the RI (reference implementation) and the TCK (Technology Compatibility Kit) in the open as part of the Apache Jackrabbit project.

Please send any comments to jsr-283-comments@jcp.org

Screencast: CQ5.2 in Fully Social Mode

One of the many totally cool features in CQ5.2 SocialCollab is that you can optionally run it in "Fully Social Mode". In this mode a number of social tweaks are performed in the underlying content infrastructure. For example:

  • In Fully Social Mode workflow tasks come to your inbox as "invitations"

  • all paragraphs are restricted to a maximum of 140 characters

  • there is an additional field "relationship status" in your user profile page

Not conviced, yet? Have a look at the screencast.

Update: check this post's publication date before you click ...

The ASF is the Switzerland of Open Source


matterhorn-bahn.jpgBy popular demand (two people – that’s about 100% of my readership!), here’s an essay similar to my lightning talk of last week at ApacheCon: The Apache Software Foundation is the Switzerland of Open Source.

This is based on my rough notes and failing memory, so that won’t be exactly the same thing.

Did you know that I am an ASF analyst? I didn’t know either, found out last week at James Governor’s great keynote. You had to be there ;-)

So, tongue firmly set in cheek, here we go!

The Apache Software Foundation (ASF) is a lot like Switzerland. Having been involved in the former for the last nine years, and in the latter for all of my life (which means…a few years), I can rightly consider myself an expert in comparing them.

There are lots of similarities, but Switzerland is a lot older, 708 years to be precise. Switzerland was founded by people from what we call the primitive cantons: Uri, Schwytz and Unterwald. City people often consider folks from those places as being old guys with obsolete ideas, and don’t really pay attention to them. The ASF has founders. Don’t get me wrong – I didn’t say the comparison applies.

Switzerland is multilingual, and we don’t always understand each other well. It also has a solid Roeschtigraben that separates the german and french speaking parts. A wall of Roeschti…like between the C and Java camps of the ASF?

Switzerland comprises 26 cantons, each with their own laws. This sometimes creates funny contradictions and loopholes that people can take advantage of when moving between cantons, or when living in one and working in another (tax deals anyone?). The ASF comprises a number of PMCs, each with their own rules and regulations. This sometimes…well, you get the idea!

Switzerland has the fondue, a slimy mix of stuff that people try to share among themselves in a fair way. The ASF has a famous member’s discussion list, which is also a bit slimy at times.

Switzerland has beautiful, bright mountains that you can see from far away. The ASF has the HTTP server, Hadoop, and other world famous projects. Switzerland also has dark, remote, deep valleys where almost no one wants to go. The ASF…well, let’s not go there…

According to Wikipedia, the Swiss Alps constitute an extreme environment. Lots of places in the ASF are like that…and Wikipedia goes on to say that the climate varies a lot between localities in Switzerland. You probably see what I mean!

Switzerland is not part of the European Community. We are the best anyway, why should we care? Lots of parallels here.

Contrary to the ASF, Switzerland does not have an Incubator and does not accept new cantons. Swiss folks are not that crazy.

Some Swiss cantons still use the Landsgemeinde to vote. People gather on the village’s main square and vote by raising their hands. A raised hand means +1, and unraised hand means -1 (or something like that). The ASF’s voting rules are more precise here! It’s only relatively recently that women started having the right to vote in the Landsgemeinde, and people used to have to show their military sword to be accepted. Do you have an iCLA on file?

Switzerland does not have a real boss. The citizens are the boss, at least in theory, and to handle current affairs we have a federal council of 7 people. Some Swiss citizens think that those people just talk a lot and don’t do anything useful. I won’t make any parallels with the ASF’s board of directors, of course ;-)

Day talks at ApacheCon EU 09

ApacheCon EU 09 is history, but there's a second chance to see and hear Betrtrand Delacretaz' talk "Open Source Collaboration Tools are Good For You!": Bertrand will be at the OpenExpo in Berne this week. For the meantime, here are the ApacheCon talk slide decks of Day's engineers:





Maven meetup report


A few days late, here’s a quick report on what I managed to do this Monday here at the ApacheCon EU. As mentioned earlier, I arrived at the conference hotel on Monday evening and headed straight for the Maven meetup.

Maven meetup

The meetup was already in progress when I arrived, but I managed to catch a part of a presentation about the Eclipse integration that just keeps getting better. Nowadays it’s so easy to import and manage Maven projects in Eclipse, that I get really annoyed every time I need to do manually set things up for projects with Ant builds.

Other interesting topics covered were Maven archetypes and the release plugin. I’ve for a long time been thinking about doing some archetypes to help setting up new JCR client applications. We should probably also do something similar for setting up new Sling bundles.

The release plugin demo was interesting, though I’m not so sure if I agree with all the conventions and assumptions that the plugin makes. On a related note, we should configure the GPG plugin for the Maven build in Jackrabbit.

We talked a bit about Maven 2.1.0 and the upcoming 3.0 release.  I’m already pretty happy with the recent Maven 2.0.x releases, so we’ll probably take a while before upgrading, but it’s good to hear that things are progressing on multiple fronts. We also briefly touched on the differences between the Maven and OSGi dependency models and the ways to better bridge the two worlds.

In summary the meetup was really interesting and served well in giving me a better idea of what’s up in the Maven land. Thanks for everyone involved!

Chops, ribs and beer

After the meetup a few of us headed out to Amsterdam city center for some food and drinks. Monday evening wasn’t perhaps the best time to go out as we needed to wander around looking for places that would be open long enough. Anyway, we found some “interesting” places to visit before returning to the hotel in the early hours. Good times.

CMS Vendor Meme Roundup

About a week ago we started the CMS Vendor Meme. It's time for a little roundup on how the meme got around:

The amount of responses that were published since last week completely blew me away. So far, the vendors that have responded are (in chronological order):

Magnolia, Alfresco, Jahia, Escenic, GX, CoreMedia, Infopark, dotCMS, Midgard, Vignette, Nuxeo, OpenText, EPiServer, Sitecore, Interwoven, Alterian, Hippo, Ektron, Knowledge Tree, and ez Systems

Also, the meme received quite some attention from CMS users and analysts: on Twitter look for hash tags #cmsmeme and #realitycheck. In the blogosphere Irina Guseva picked it up first, Jon Marks brilliantly commented on the meme as it evolved (here, here and here), Julian Wraith kept an eye on the scores (also correcting the scores of the vendors that did not add up correctly) and commented on individual responses, Bertrand Delacretaz suggested the ID and Juerg Stuker blogged about it in German. Make sure you also check out the discussions: in some of the blog post's comments Kas Thomas, the author of the original list, showed up. And, finally, if you are the type of person that enjoys extensive stats: Cedric Huesler has collected all sorts of data about the answers and compiled this spreadheet.

This has been a thoroughly enjoyable excercise so far, but I believe that there is real value in it as well. As Kas Thomas wrote: the responses reveal the vendor's DNA.

Oh, almost forgot. 9c56d0fcf93175d70e1c9b9d188167cf

Update: added Ektron to the list - thanks Kas for the hint. (30/03/09).

Update: synched the list with Jon's and added Knowledge Tree and ez Systems (07/04/09).

Tales from the OSGi trenches


My Tales from the OSGi trenches presentation today at ApacheCon went well, timing was surprisingly good given that I gave this talk for the first time.

People can certainly relate to the issues that we’ve been facing with OSGi, and the realization is that the large majority of them can be linked to lack of developer education and lack of documentation and examples.

Things will get better, but my conclusions page already has a lot more smileys than monsters!

Apache Software Foundation turns 10 today

The Apache Software Foundation (ASF) celebrates its tenth birthday today. On March 25 in 1999 Roy Fielding signed the foundation's incorporation papers. Happy birthday!

A lot of successful and industry-changing software has been developed within the ASF in the last 10 years: see here for some of the ASF's highlights and the very well written "How the ASF works" page.

Day has strong relations with the ASF: we heavily contribute to and base our products upon the projects Jackrabbit and Sling. Moreover, many of Day's employees are involved in other Apache projects or serve in other ASF functions (see here for more information about Day's open source activities).

Therefore, I was more than happy to take this opportunity for an interview with Roy Fielding and Bertrand Delacretaz where they share some of their insights about the ASF's past and future. Day's Chief Scientist Roy is co-founder of the ASF and its former chairman. Roy also wrote the Apache License 2.0. Currently, he is V.P. for the Apache http server project. Bertrand (Senior Software Developer at Day) is a member of the Apache Software Foundation and an active committer on the Sling and Tika projects. He is also committer but currently inactive on the Cocoon, FOP, and Solr projects. Bertrand is also on the ASF's board of directors.

Roy Fielding

Founding the Apache Software Fondation was extraordinary for an open source project at that time (in 1999). The history pages of the ASF describe the reasons for founding the ASF as a kind of defensive act to protect individuals against legal threats. Was this the only reason or were there other ideas involved?

There were several reasons for incorporating. Although the Apache Group had been successful operating as a group of individuals, it was becoming increasingly difficult to deal with the outside world without a formal legal entity. We wanted to protect the Apache brand name against abuse by other organizations. People wanted to donate money to the project, but we had no way to accept it other than as individual income. Some of us wanted to organize a conference, which later became ApacheCon '98, but the liability issues required that it be produced by one of the companies that employed some of our volunteers. Most of all, it was the fact that we couldn't make non-technical decisions quickly: we actually started to discuss incorporation in September 1996, two and a half years before I signed those final papers.

The final straw came in 1998, when folks at IBM contacted the Apache Group in private to investigate contributing as part of the team of developers (IBM's first exposure to open source development). Before we could talk, however, IBM demanded that each of us sign a non-disclosure agreement, which is standard practice for any large public company. It would have been easy to do so if we were a legal entity, but instead we had to obtain written signatures from every single person in the core group, located all over the world.

An aspect that sets the ASF apart from many open source efforts is the invention of "incubation of projects" and, even more revolutionary, the "retirement of projects". What was the background of coming up with these concepts? Were there concrete projects that triggered this idea?

The Apache way of developing software is ideal for collaborations among developers regardless of their individual employers. Once we had incorporated and formalized the licensing agreements, the ASF grew very quickly as new projects were proposed. At first, the board created projects that covered entire areas, such as Java (later renamed Apache Jakarta), XML, and Perl. The Jakarta project, in particular, attracted hundreds of developers that wanted to contribute their own libraries and applications to Apache, and so each project started to act like the board and created sub-projects of their own.

However, there is more to the Apache way of development than just starting a project and letting the developers run wild. We have learned over the years that each team must be organized for peer review and formally vote on all releases, which requires some cultural learning on the part of new developers. Likewise, we need to ensure that we have sufficient legal paperwork for any software that is licensed to the foundation for further development, since copyright laws are applicable for seventy years (or more), far beyond the normal memory of any project. The early projects grew so fast, however, that it was impossible for the experienced Apache developers to keep track of the growth and mentor new volunteers.

The Incubator was the board's way of adjusting the organization to that rapid growth: by forcing all new projects to learn about the Apache way of development and actively engage ASF members as new project mentors, we found a way to grow the organization at the same rate that we were growing projects and volunteers. Of course, there are also many negatives to that approach: it can often seem to new projects that Apache is dominated by bureaucracy and process, since much of that process is only needed when conflicts emerge.

You wrote the Apache License 2.0. One key feature of the license is its commercial friendliness. Was that an explicit design decision of yours or is it a by-product of other design considerations you had in mind?

Apache has always used a commercial-friendly license, since we were founded by a diverse group of individuals from both academia and industry. Many of us had prior experience with the BSD license, which does not place restrictions on other software distributed with the product. Furthermore, since one of our original goals was to enable standards-based interoperability on the Web, we encouraged the other software-producing organizations to use our code instead of their own (often buggy and nonstandard) implementations.

This openness to collaboration among developers, no matter why they are interested in participating, is one of the key virtues that make Apache projects both fun and educational. Our licensing goal was to ensure that the final work product would be usable and redistributable by anyone, for any purpose, but without any warrantee. That is why we used a variant of the BSD license for the first five years.

The 2.0 license came about as an attempt to clarify both what the license grants under copyright law and what the contributors had agreed to grant under both copyright and patent laws. We wanted to protect ourselves and our users against deliberate contribution of works under "submarine patents" or other license that were more restrictive than our own. We also wanted to enable compatibility with GPL-based free software by providing an innovative way to acknowledge the original developers (if they so desired) without exceeding the restrictions in the GPL.

Ultimately, we measure the success of our license by how many organizations and individuals can safely redistribute our software without entangling the Apache developers in lawsuits. So far, we've been 100% successful.

Looking back at the ASF's history: what surprises you the most? What did you least expect to happen?

I think what has been most surprising is the ability of Apache projects to persist and evolve as the individual volunteers change their interests and move on to other things. In every way, Apache is entirely dependent on the imagination and effort of individuals. We are often working on problems that are only indirectly perceived by everyone else, and each is only working on Apache part-time.

We've had at least six different technical leaders within the HTTP server project over the past fourteen years, each of whom led the group in implementing massive improvements to the software in relatively short periods of time and then moved on to other problems within or outside the ASF. We have designed an organization that is able to make use of the vast numbers of volunteers on the Internet, and yet remain cohesive enough to retain its unique project culture and freedom to innovate.

I think what I least expected to happen is that I would still be involved in the project after all these years. I have been less involved in the code than I was in the past, but I still managed to fix a few issues this year and look forward to trying out a massive redesign in the coming year.

Over the years I've stretched into the various roles of just being one of the hackers to becoming the standards cop, from resolving technical conflicts into codifying the voting rules, from people management into project historian, and from release manager into semi-retirement status (only to come back again). To incorporate the ASF, I had to become familiar with corporate law and learn to communicate with corporate lawyers, create bylaws and a board to enforce them, learn how to be a Chairman, and teach a bunch of techies how to use the organization to make non-technical decisions in a business-like manner, preferably without turning the foundation into another work environment. The 2.0 licensing work required an understanding of copyright and patent law, starting Apache Jackrabbit meant learning how to be a mentor for contributors (and co-workers) new to Apache, and becoming chair of my old HTTP server project has been a challenge to motivate the revival of a project that had become complacent and overburdened by its own success. I am proud of each of those roles, though there is plenty of work still left to do.

Staying involved in Apache has proven to be the most consistent educational experience of my career -- I learn something new every week, especially when new volunteers enter the project and add their own perspectives and experience.

ASF's software has been integrated in endless commercial offerings. How should interested companies give back to the ASF? Day's approach of paying wages for open source developers is only one way. Straight payments, sponsoring events, but also opening patents or evangelizing the open source idea come to my mind. What is your thoughts on this?

First, I always encourage companies in our industry to hire the developers on Apache projects. How could they possibly go wrong in hiring a person who develops software for fun and has already proven their ability to produce peer-reviewed quality software as part of a self-selecting and meritocratic team?

However, there are many other ways to directly contribute to the Apache Software Foundation that are explained on the ASF website. The ASF is a nonprofit US corporation (IRS 501(c)(3) charity), so individual donations are often tax-deductible. In addition, we have a corporate sponsorship program that contributes the bulk of funds needed to run our Internet infrastructure and the other non-sexy parts of running a major foundation.

In terms of evangelizing open source, I actually encourage folks to do that within other organizations, such as OSI or the EFF, rather than at Apache. I think what Apache does best is to lead by example rather than by publicity or marketing. We do that by producing collaborative software development projects in a safe and consistent fashion, by producing software that is better than any single person or company could produce on its own, and to do all that while having fun.

Bertrand Delacretaz

Thanks for your questions. Please let me note that the answers below are my personal opinion, and do not necessarily represent an "official" position of the ASF. Nobody voted +1 on those yet ;-)

Bertrand, like most of Day's engineers you are a long time Apache committer. Currently, you are most active in the Apache Sling project. What would you say in how far the "Apache way" of software development influences development within Day in terms of process and product.

The Apache Way is very present in our development team's way of working, at Day. We use similar collaboration tools internally as in our Apache projects (source code control, issue trackers, mailing lists, wikis, continuous integration, etc.), decision-making principles are similar (including the "+1 / +0 / -1" way of voting), and in general all the necessary project information is available in self service mode for our developers. No need to run around the office to find out who wrote a particular piece of code or why it is needed - all this info is available in the "central knowledge base" that these collaboration tools manage without much intervention on our part. Among other benefits, this allows us to work efficiently from anywhere and at any time. We can also make much better use of face-to-face meetings, when they happen, without needing to first exchange that basic day-to-day information which is flowing automatically all the time.

You are one of the nine members of the board of directors and that job probably gets you to look at many aspects of Apache projects. What are the common struggles for projects and what are the upcoming challenges for the ASF as a whole?

I would say that, as a general rule, the ASF's projects have many more successes than struggles. Most of our projects are doing very good, thanks to a very good understanding, among developers and users, of what it takes for their communities to thrive. The overall well-being of our projects has improved in the last few years,in my opinion, thanks to developers getting more familiar with the open source way of working. That's remarkable, considering people who collaborate in our projects often come from different cultural backgrounds, speak different languages and usually meet only rarely, if ever. That would usually be a recipe for failures, but our communities work very well!

There are challenges related to our growth, and to the fact that the foundation has grown into a federation of projects that don't always know each other very well. It is the members and the board's duty to make sure our values are actively promoted in all of the ASF's projects, and until now that works quite well. But staying true to our values despite with the kind of growth that we're experiencing does require constant vigilance and regular adjustments to the way we communicate and share those values.

The ASF has grown from 200 committers to about 2000 (acording to the Wednesday's State of the Feather at ApacheCon). Do you expect this trend of rather strong growth to continue? What would you consider the limits of organizational scalability of the foundation - if there are any?

The trend will certainly continue, and the federated way in which the ASF is structured makes it possible to grow to larger numbers. The board of directors currently reviews about 20-30 project reports every month, to keep the oversight, and that works as we do trust our projects, try not to interfere with them unless really needed, and ask them for reports on specific topics (mostly community and development events, and issues that might require board attention). The members of the foundation (about 300 of them) do a very good job in keeping an eye on things at all levels and raising alarms when needed, which is not very often.

We are making some minor organizational changes, like hiring outside help for administrative tasks, to cope with this growth, but from a structural point of view the ASF is ready to grow more. As long as we have enough active members that we can trust, and sustainable mechanisms to allow the board of directors to keep the oversight as we have now, we should be fine.

There are some Apache projects that collaborate with Google's summer of code program. Do further plans exist within the ASF to assist in the education of the young about open source development?

We don't have specific plans, as far as I know, but the people who lead the Summer of Code effort in the ASF are members of the Press Relations Committee, who is also tasked with outreach in general. I have talked to many of our members and committers who are willing to help youngsters get on board and understand our way of working, so we can probably expect more activity in this area in the future.

Projecting your views of the ASF into the future, say, 5-10 years: what do you think will change, what will stay the same?

The ASF projects are mostly about web infrastructure, and I don't see that changing soon. Like developers in the last few years, I think more companies will learn how to collaborate efficiently with the ASF, while respecting our basic principles, and that will help them and us work together in a sustainable way in the long term.

The Apache Way is here to stay, even though tools will evolve; distributed source code control is probably the next important thing that will happen in our toolset, and although that will change some aspects of how we work, I don't expect a revolution.

It is impossible to say exactly how the ASF will look in ten years, but I'm quite convinced that we will stay true to our basic principles, and continue to be a successful neutral ground for people and companies to collaborate on creating great software, often with a global impact.

All this looks like a bright future, and the amount of dedication that I see in so many volunteers involved in the ASF gives me every reason to be optimistic about it.

Hello world


This blog shall be about experiences and musings resulting from my daily work with the products of Day Software, such as CQ and CRX.

ApacheCon plans


It’s ApacheCon time again. I’ll be flying to Amsterdam later today, and will probably be pretty busy for the entire week. Some highlights:

Monday

  • Maven meetup. I’ll probably arrive at the conference hotel just in time for the Maven meetup, where I’m hoping to catch up with the latest news from the Maven land.

Tuesday

  • Git hacking. During the Hackathon on Tuesday I hope to get together with Grzegorz and anyone else interested in setting up git.apache.org.
  • Commons Compress. There’s some useful code in the Commons Compress component that I hope to use in Apache Tika. If I have time during the Hackathon I want to help push the component towards its first release.
  • CMIS / Chemistry update. I’ve been meaning to check out the CMIS code that Florent Guillaume has been working on recently. I’d love to get the effort better integrated into Jackrabbit.
  • Commons XML. I’ve been gathering some JAXP utility code to a new XML library in the Commons sandbox. I hope to spend some time pushing more code there and perhaps discussing the concept with some interested people.
  • Juuso lab. I have lots of new ideas about RDF processing and Prolog. Hoping to turn those into working code.
  • Lucene meetup. Catching up with the latest in Lucene and telling people about Tika and the Lucene integration we have in Jackrabbit. Unfortunately I only have one hour to spend here before the JCR meetup starts.
  • JCR meetup. Starting at 8pm, the JCR meetup is one of the key highlights of the conference for me. We’ll be covering stuff related to the Jackrabbit and Sling projects. You’re welcome to join us (sign up here) if you’re interested in the latest news from the content repository world.

Wednesday

And lots of other stuff, too much to keep track of…

Ready for ApacheCon Europe 2009



I’ll be giving three talks next week at ApacheCon, on OSGi, Apache Sling and Open Source collaboration tools.

Ruwan Linton’s OSGi talk, which is scheduled after mine on Wednesday, also presents practical experiences with OSGi. I’m looking forward to comparing our experiences, and people should probably attend both talks to get the whole picture.

I’m also very much looking forward to meeting new people and old friends there, including the Jackrabbit/Sling folks at Tuesday’s JCR/Jackrabbit/Sling meetup.

Before that I’ll be in Rome for a meeting of the IKS project, talking about requirements and use cases for semantically enhanced CMSes. Looks like a packed but very interesting week ahead – lots of context switches though ;-)

Update: forgot to mention Carsten Ziegeler’s Embrace OSGi – A Developer’s Quickstart presentation, which comes right before mine – attending that one will also help put mine in context, as I won’t cover the basics of OSGi.

It is okay to use POST

Tim Bray’s article on RESTful Casuistry revisits an odd meme in the REST debates that I’ve been meaning to discredit for a while.

Some people think that REST suggests not to use POST for updates.  Search my dissertation and you won’t find any mention of CRUD or POST. The only mention of PUT is in regard to HTTP’s lack of write-back caching.  The main reason for my lack of specificity is because the methods defined by HTTP are part of the Web’s architecture definition, not the REST architectural style. Specific method definitions (aside from the retrieval:resource duality of GET) simply don’t matter to the REST architectural style, so it is difficult to have a style discussion about them. The only thing REST requires of methods is that they be uniformly defined for all resources (i.e., so that intermediaries don’t have to know the resource type in order to understand the meaning of the request). As long as the method is being used according to its own definition, REST doesn’t have much to say about it.

For example, it isn’t RESTful to use GET to perform unsafe operations because that would violate the definition of the GET method in HTTP, which would in turn mislead intermediaries and spiders.  It isn’t RESTful to use POST for information retrieval when that information corresponds to a potential resource, because that usage prevents safe reusability and the network-effect of having a URI. But why shouldn’t you use POST to perform an update? Hypertext can tell the client which method to use when the action being taken is unsafe. PUT is necessary when there is no hypertext telling the client what to do, but lacking hypertext isn’t particularly RESTful.

POST only becomes an issue when it is used in a situation for which some other method is ideally suited: e.g., retrieval of information that should be a representation of some resource (GET), complete replacement of a representation (PUT), or any of the other standardized methods that tell intermediaries something more valuable than “this may change something.” The other methods are more valuable to intermediaries because they say something about how failures can be automatically handled and how intermediate caches can optimize their behavior. POST does not have those characteristics, but that doesn’t mean we can live without it. POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.”

I think the anti-POST meme got started because of all the arguments against tunneling other protocols via HTTP’s POST (e.g., SOAP, RSS, IPP, etc.). Somewhere along the line people started equating the REST arguments of “don’t violate HTTP’s method definitions” and “always use GET for retrieval because that forces the resource to have a URI” with the paper tiger of “POST is bad.”

Please, let’s move on. We don’t need to use PUT for every state change in HTTP. REST has never said that we should.

What matters is that every important resource have a URI, therein allowing representations of that resource to be obtained using GET.  If the deployment state is an important resource, then I would expect it to have states for undeployed, deployment requested, deployed, and undeployment requested. The advantage of those states is that other clients looking at the resource at the same time would be properly informed, which is just good design for UI feedback. However, I doubt that Tim’s application would consider that an important resource on its own, since the deployment state in isolation (separate from the thing being deployed) is not a very interesting or reusable resource.

Personally, I would just use POST for that button. The API can compensate for the use of POST by responding with the statement that the client should refresh its representation of the larger resource state. In other words, I would return a 303 response that redirected back to the VM status, so that the client would know that the state has changed.

[LOTD] Joerg Hoh's CQ Blog

Today's Link Of The Day is Joerg Hoh's excellent Communique blog "Things on a content management system - Tips and tricks for Day Communique".

In his latest post Joerg publishes a little Perl script to help determine average response times (and provide a visual indicator of the system's performance status).

Although the blog started only recently there is already a good number of performance-related posts. All of them are worth the read since they clearly reflect Joerg's first hand experiences in tuning CQ (so do his hands-on sys mgmt topics like: how to lock out the users).

@davidnuescheler you might want to check out Joerg's take on your 5 rules for performance tuning.

The CMS vendor meme


Yesterday my colleague Michal Marth launched a cool CMS vendor meme, challenging other vendors to self-evaluate their products according to the we-get-it-checklist suggested by Kas Thomas.

Many vendors have already responded. Except those who don’t know about Twitter or blogs, of course. You don’t want to buy from them anyway ;-)

To help people find pages related to this meme around the web, I suggest adding the string 9c56d0fcf93175d70e1c9b9d188167cf to such pages, so that a Google query can find them all.

As I said on the dev.day.com post, this number is the md5 of some great software, the first person to tell me which file that is gets a free beer or equivalent beverage!

Introducing the 'CMS Vendor Meme'

In the last weeks the "7 things about me"-meme has been all over the blogosphere (here are some random examples.). In case you missed it, here's how it works: a blogger reveals 7 previously unknown things about himself. That permits him to "tag" other bloggers, i.e. to publicly challenge them to reveal 7 things as well. If the tagged blogger accepts the challenge he:

  1. blogs about the 7 things he wants to reveal,

  2. provides a backlink to the blogger that tagged him,

  3. and tags some other bloggers he wants to challenge.

It is in this great tradition that we herewith introduce: the

"CMS Vendor" Meme

Tadaa!

The rules:

  • A CMS vendor is challenged to honestly answer all items on the "Reality checklist for vendors" suggested by CMSWatch's Kas Thomas (aka the "we-get-it checklist for vendors").

  • If possible the vendor has supply screenshots, links or other means to make it easy to verify the answers.

  • The answers also need to be supplied in a short form of one to three stars (denoting "no", "sort-of", "yes").

  • Answering all questions on his blog allows the vendor tag some other WCMS vendors.

  • A tagged vendor should provide a link back to the blog that tagged him.

So here we go:

1. Our software comes with an installer program.

Sure, one double click: installed. One folder removed: uninstalled. There are no strings attached. (see the screencast here)

2. Installing or uninstalling our software does not require a reboot of your machine.

Of course not. Installing is one double-click only, no reboot needed.

3. You can choose your locale and language at install time, and never have to see English again after that.

Well, there is really no install time.. But after that, you change the language in the preferences.

Hmm. Not perfect. Will settle for "sort-of".

4. Eval versions of the latest edition(s) of our software are always available for download from the company website.

CRX (our content infrastructure platform upon which our WCM, DAM, and Social Collaboration application are built) is available for free download by developers for non-production use. Our CQ WCM, DAM, and Social Collaboration applications are available upon request under an eval license. 2 stars only.

5. Our WCM software comes with a fully templated "sample web site" and sample workflows, which work out-of-the-box.

Yes

6. We ship a tutorial.

Yes, it is part of the help files.

7. You can raise a support issue via a button, link, or menu command in our administrative interface.

Ermhh.... Good idea :). Well, in CRX there is a direct link to the mailing list where support is provided. But I'll settle for 1 star.

8. All help files and documentation for the product are laid down as part of the install.

Yep.

9. We run our entire company website using the latest version of our own WCMproducts.

Of course (we updated www.day.com to CQ5 on the day of the release). dev.day.com is running on CRX and Sling at the moment (will be running on CQ SocialCollab when that's released)

10. Our salespeople understand how our products work.

As part of our product launch process, no product goes GA without our internal Sales, Consulting, and Support staff having downloaded, installed, and trained on the new product capabilities.

I know this for sure because I did parts of the training.

11. Our software does what we say it does.

;)

12. We don't charge extra for our SDK.

There is no extra charge for CQDE (our IDE for CQ development). For lower level JCR development anyone can download our Eclipse plugin. Finally, CQ5 is built upon Sling which is open source, hence the APIs are free and open.

13. Our licensing model is simple enough for a 5-year-old to understand.

Make it a nine year old, but we are working on it... however, it is much easier than the usual industry level enterprise pricing... A "sort-of".

14. We have one price sheet for all customers.

Same pricelist, different currencies though.

15. Our top executives are on Skype, Twitter, or some similar channel, and: Feel free to contact them directly at any time.

Absolutely, all email addresses are on the web. The addresses also work as Jabber/GTalk addresses. Moreover, there is:

David Nuescheler (CTO): Slideshare, Xing, LinkedIn, david.nuescheler on Skype

Roy Fielding (Chief Scientist): Slideshare, @fielding on Twitter, Blog, LinkedIn

Kevin Cochrane (CMO): @kevinc2003 on Twitter, LinkedIn, kevinvcochrane on Skype, YIM: kevinvcochrane, Facebook: kevinc2003


So our final score is:

 

And we are tagging: OpenText, Coremedia, Interwoven, Vignette (where's your blog?), Fatwire (where's your blog?), Nuxeo, Magnolia and Tridion (where's your blog?)

Come on, guys. Don't be shy.

Update: adding the meme ID 9c56d0fcf93175d70e1c9b9d188167cf suggested by Bertrand. Google is our friend.

links for 2009-03-13

links for 2009-03-12

Screencast: installation, cluster setup, backup and restore in CQ5

How many times do you setup a server software? Once! ... hmm.. wait. On my laptop, on the dev server, the load testing setup, live servers. We think that task such as install, updates, backup, setup cluster should be easy, so you can do it anytime you need it (and you will be surprised how many time that is, once it's just a few clicks). In the unlikely event that you need to recover from a total disaster - such as a data loss or hardware defect - we added a few more tweaks to help you out. But what do I write here.. just watch the screencast for yourself....

Update: the screencast is valid for CQ5 and CRX, so if you want to give those features a quick try download CRX.

Tweet your app: a 140 characters web app

It was only yesterday evening that I became aware of the "140 Characters Webapp Challenge" - just when it was closed. The challenge consisted of, well, writing a web app with a 140 characters code base (140 characters is the limit imposed on Twitter's status updates).

I could not resist giving it a try anyway. Naturally, using Sling. So here's the result: a micro-Twitter app that lets you update your status and displays the last status update in 136 characters (so there is even some space left for commenting the code):

<form method="POST"><input type="hidden" name=":redirect" value="t.html"><input name="w"><input type="submit"></form><%=currentNode.w%>

(the line needs to be put in /apps/t/html.esp and there needs to be an unstructured node at /content/t. The generated html is not for purists.).

If you can beat that (or have another funky sub-140 characters Sling app): tweet the code to @daysoftware or leave a comment with a link.


(from Geek and Poke)

Looking for use cases for a semantically enhanced CMS


iks-logo.jpgDay is participating as an industrial partner in the Interactive Knowledge project, which aims to provide an open source technology platform for semantically enhanced content management systems.

We are starting to collect use cases for a semantically enhanced CMS - although I’m not 100% sure what semantically enhanced means (and I assume that means different things to different people), I have started with use cases like the following:

When I drop an image of a house in my content, the system allows me to see images of similar houses, and pages that talk about houses.

When I start writing a new piece of content, the system optionally shows me similar content that’s already in the repository, even if written in other languages.

The system allows me to formulate queries like “recent pages that talk about houses to rent in the french part of Switzerland”.

If you have additional ideas for such use cases, or examples of systems that provide such features, I’m all ears!

Portals Meetup at ApacheCon Europe

If you’re attending the ApacheCon Europe in Amsterdam in two weeks and if you’re interested in portals or portal related stuff (and I guess we all are interested into portals, right?), then you should join the portals meetup and add yourself here.
Even if you’re not into portals but are interested in visiting a zoo, there are a lot of hippo’s and dinosaurs (yes, really!) there. So it’s fun for everyone :)

Day talks at ApacheCon EU 09

This year's ApacheCon EU will take place in Amsterdam in two weeks. If you are there, check out the talks given by Day's engineers:

Bertrand Delacretaz: Rapid JCR applications development with Sling: Apache Sling is a scriptable applications layer, built on the Apache Felix OSGi framework, that provides a RESTful interface to a JCR content repository. In this talk, we'll see how Sling enables rapid development of JCR-based content applications, by leveraging the JSR 223 scripting framework along with the rich set of OSGi components provided by Sling. We will create a simple application from scratch in a few minutes, and explain a more complex multimedia application that does a lot with few lines of code. This talk will help you get started with Sling and understand how the different components fit together.

Open Source Collaboration Tools are Good For You!: What are the core requirements for a set of team collaboration tools? Looking at how ASF project communities collaborate online, we have identified four core drivers that help these projects succeed. We will show how the collaboration tools used by the ASF can allow any project team to move from an "ask around the office" collaboration model to our efficient "distributed self service information" model, while focusing on those core drivers to avoid being distracted by the tools themselves. Our analysis will help you estimate the effort and expected benefits of such a move.

Tales from the OSGi trenches: In this talk we share our experience with the Apache Felix OSGi framework, used for a major rewrite of Day's family of content management products. After more than two years working with OSGi, the impact on our products, developers, customers and service people is very high, in a positive way. OSGi is no silver bullet either. The extreme modularization and dynamic service deployment features of OSGi make our products much more robust and maintainable, but the costs associated with changing people's way of thinking about code and modules, and with testing and debugging highly dynamic systems, must not be underestimated. Based on real-life code samples, we will show how OSGi is used at several levels in our products, from low-level interactions with the framework to very simple creation of (compiled or scripted) services. We will also present some of the automated testing techniques used in our project. Sharing our experience will help you decide if OSGi is for you, and more importantly at which level you should use it.

Regarding OSGI: If you use OSGi and have some (positive or negative) feedback to share Bertrand is eager to hear from you.

Carsten Ziegeler: Embrace OSGi - A Developer's Quickstart: The first choice for highly modular, dynamic, and extensible applications is the OSGi technology. However the theory sounds very tempting but what about the real world? Starting with the basics of OSGi this session is focused on practical examples, tools, and procedures for a rapid adaption of OSGi in own projects. Learn how to avoid the typical traps and how to use the advantages of OSGi.

Meet Carsten at the Portals Meetup.

Felix Meschberger: Scripting your Java Application with BSF 3.0: One very important functionality of modern extensible applications is support for developping such extensions in any scripting languages. Many scripting languages available today provide some sort of Java integration but each integration is different making it very difficult for the vendor of the application to support more than one scripting language. Enter the Java Script API as defined in JSR-223. This API provides support for standardized integration of scripting languages in Java applications. Bindings already exist for a number fo scripting languages such as Groovy, JavaScript, Python, Ruby, Tcl. This session will show how easy it is to add scripting support to a Java application using the Java Scripting API and thus support whatever scripting language the user of the application likes to use. Practical demonstrations using Apache BSF 3.0 as the Java Scripting API implementation and Apache Sling as a Java application to be scripted will show how easy it is to add scripting support and to add scripting languages quickly and at runtime without even restarting the application.

Jukka Zitting: Content storage with Apache Jackrabbit: Hierarchical database or a transactional file system? Apache Jackrabbit combines some of the best features of relational databases and traditional file systems to implement a flexible high level storage solution for a wide range of applications. Jackrabbit, a fully conforming implementation of the Content Repository for Java Technology API (JCR), comes packed with features like full text search, versioning, and transactions. Built-in HTTP mappings with WebDAV extensions make content stored in Jackrabbit easily accessible in network environments. This presentation introduces you to the key concepts of JCR and shows how to use Apache Jackrabbit and related projects to build various types of content applications like wiki and blog engines, email archives or image galleries. Special emphasis is placed on a data-first approach to content design that helps make your applications extensible with little or no extra code.

Jackrabbit SVN Disco

Here's a useless nice to look at visualization of the Apache Jackrabbit svn commits. It was generated using "code_swarm". The dots denote committed files and the names are the developers' login names:

This visualization, [...], shows the history of commits in a software project. [...] Both developers and files are represented as moving elements. When a developer commits a file, it lights up and flies towards that developer. Files are colored according to their purpose, such as whether they are source code or a document. If files or developers have not been active for a while, they will fade away. A histogram at the bottom keeps a reminder of what has come before.

The Quicktime rendition looks best, but here's also a Flash version and an avi. Be warned if you are in an office: there is music (creative commons track from SuperNova)

Conference Season - ApacheCon Europe and JAX

In just two weeks the conference season for 2009 starts. This spring I’ll speak at the ApacheCon in Amsterdam about OSGi and some of the stuff we have at Apache for using OSGi within your own project.
Later, in April, it’s JAX time again - and again with a new location. My talk for the JAX is about the great Apache Sling project (JCR, REST, OSGi etc.). I’m happy to see that OSGi is one of the key topics for the conference with a lot of famous people there :)
Going to these conferences for several years now, I still think that the chance to meet people and talk with them is one of the greatest things - and especially these two conferences provide excellent opportunities. So don’t miss them! See you in Amsterdam and/or Mainz, Germany :)

LinkedHashMap's hidden (?) features

Recently I discovered two very nice features of the java.util.LinkedHashMap: accessOrder and removeEldestEntry(Entry). These features combined let you implement simple LRU caches in under two minutes.

accessOrder

The accessOrder flag is set when creating the LinkedHashMap instance using the LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder) constructor. This boolean flag specifies how the entries in the map are ordered:


accessOrder=true

The elements are ordered according to their access: When iterating over the map the most recently accessed entry is returned first and the least recently accessed element is returned last. Only the get, put, and putAll methods influence this ordering.

accessOrder=false

The elements are ordered according to their insertion. This is the default if any of the other LinkedHashMap constructors is used. In this ordering read access to the map has no influence on element ordering.



removeEdlestEntry(Entry)

The second feature of interest is the removeEldestEntry(Entry) method. This method is called with the eldest entry whenever an element is added to the map. Eldest means the element which is returned last when iterating over the map. So the notion of eldest is influenced by accessOrder set on the map. The removeEldestElement in its default implementation just returns false to indicate, that nothing should happen. An extension of the LinkedHashMap may overwrite the default implementation to do whatever would be required:


  • If the implementation decides to remove the eldest element for any one reason, say a size limitation, it just returns true and the eldest element is removed from the map

  • The implementation may also decide to modify the map itself in some way or the other. But in this case, the implementation should return false, otherwise the eldest element will still be removed.




A simple LRU Cache

Taking the two features together, a very simple LRU Cache may be implemented in just a few lines of code:


public class LRUCache<K, V> extends LinkedHashMap<K, V> {
private final int limit;
public LRUCache(int limit) {
super(16, 0.75f, true);
this.limit = limit;
}
protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
return size() > limit;
}
}


The mechanism is very easy: The LRUCache(int) constructor initializes the map with the default initial size and load factor and sets the map into accessOrder mode. The removeEldestEntry just checks the current map size (after the addition of a new entry) against the limit and returns true if the limit has been reached.

A real world implementation would of course have to check and handle the limit value on the constructor.


To see a LinkedHashMap based LRU Cache in action, have a look at the BundleResourceCache.BundleResourceMap. This implements a simple entry cache to speed access to OSGi Bundle entries. To not waste memory, the size of the cache is limited.

Google Summer Of Code 2009 - Real Soon Now!


The 2009 logo is fantastic, isn’t it? Flower power is not dead apparently.

Google Summer Of Code 2009 starts soon, open source organizations can sign up starting March 9th (a very important date planet-wide anyway), and students can sign up starting March 23rd.

An A4 flyer is available to display in your school, or anywhere geeks graze.

Assuming we’re accepted as an organization, projects of the Apache Software Foundation will be listed on our wiki. I’m probably going to suggest one or two Sling-related projects.

Philip Johnson’s video presentation (below) gives a good overview of the program and of its requirements, for students.

links for 2009-03-05

HATEOAS in 3 lines

Stefan Tilkov brilliantly sums up the out-of-band information problem in REST's HATEOAS constraint on the REST-discuss group:

Given the representation contains

<link rel="some-concept" ref="/some-uri">

you don't hardcode the string "/some-uri" into your client, but rather the string "some-concept".

Sling news roundup: OpenID, Scala, and more

Here's some Sling news highlights from the last weeks:

There have been two interesting contributions in the authentication and authorization area:

It would be cool to see them combined such that users logged in through OpenID became repository-based users on the fly (rather than be known as a technical user to the repository).

Notable news from the "Give them enough rope" department include:

Last, but not least I was delighted to see that Sling and JCR are being taught in a university course in Grenoble (thanks to @keepthebyte for the tip).

Interview with Roy Fielding

Day's Chief Scientist Roy Fielding will speak at the SofTech 2009 conference in Milan (26 March). In this context Roy has been interviewed by Alessandro Giacchino about REST, SOA, JCR, and content management.

The Italian version of the interview has been published in the ToolNews magazine (here's a PDF of the article). Find the English version below (as a teaser: the interview contains the first definition of the term "SOA" that I can grok).

First of all, you contributed to the HTTP, REST and URI definition. What did take you to Day Software, a small swiss company working on Content Management solutions? Are you still working on the Java Content Repository or are you working on new "concept/project"?

Day Software has a much more international breadth and perspective than most companies. I met David Nuescheler, Day's CTO, in December 2001, when he was first exploring the idea of a standard Java API for content repositories and wanted my advice on the JSR process. After talking over dinner, it was clear to both of us that we shared a vision of how the Web architecture could be utilized by software and the role that standards development can play in establishing future infrastructure.

The REST architectural style is about making use of standard data formats and an "everything is a resource" conceptual view in order to tie together network-based applications into a reusable system of long-term resources.

Content management software is about managing the content of standard data formats and an "everything is content" conceptual view to tie together all of the content generated by an enterprise into a reusable system of long-term assets.

It is, in fact, exactly the same problem space with only a minor change in terminology and a realization that the software is content too. A content management system is an ideal RESTful application development environment if it remains true to the architecture of the Web.

I helped David with the process associated with starting and evangelizing the Java Content Repository (JCR) API in its first two years, but the technical work of JCR has been accomplished by David's work and that of our outstanding developers in Basel. My role has been to grease the gears and make it possible for Day to expand that work into open development projects at the Apache Software Foundation.

My current work is split between representing Day at various conferences/meetings around the world and developing the next generation of Web standards. I am currently revising the standard for HTTP within the IETF and am chair of the Apache HTTP server project, an important component not just for all of our customers but for the entire Web. I am also working on a longer-term project to create a new protocol that may eventually replace HTTP.

According to some analysts, instead of booming on the market, and evolving to 2.0, SOA seems to be in and "hold" position, slowing down all development and investment on it. Do you agree on this? If yes, why do you consider this happened? If no, what do you think will be the next drivers? Do you see any relations between SOA and Cloud Computing?

"SOA" has become all things to all people, and thus has no useful definition other than "stuff that you can access on servers." I don't think there has been any reduction in services. If anything, I find that there has been a huge move by companies onto outsourced services such as Gmail, Salesforce, and Amazon. I expect that trend to continue.

What has been moved to a "hold" position is the antiquated architecture of tightly coupled services that was promoted as "Web Services" (in spite of the fact that it had nothing to do with the Web) and SOAP. I think customers finally reached the end of their tolerance for software pixie dust frameworks that did not interoperate across multiple vendors. The architectures were insufficiently constrained for interoperability and reuse.

Cloud computing is another term that is often being misused. In my opinion, it means "outsourcing your data centers." There is significant value in doing so for many customers that do not want to be in the business of managing servers, power, etc. It also represents a significant challenge to enterprise software developers, since we need to incorporate secured access, remote deployment, and high-availability management into the core software architecture.

That is why Day's products have an emphasis on easy installation and automatic clustering support: we see the cloud computing environment to be the dominant platform for our customers, whether that "cloud" be in the form of a third-party, like Amazon Web Services, or an internally managed network like those currently being used by some of our larger customers.

Most of your project have been around Java and Open Source. Now, it happens that one of the best supporter of REST is Microsoft, while you are working for a company that "uses" Open Source, but sells commercial products. How do you consider will be moving in the future borders between this two totally different models to go to the market?

Day Software doesn't just "use" open source. We actively create and participate in the major infrastructure projects that are important to our customers. For example, Apache Jackrabbit, Apache Sling, and Apache Felix are major components of CQ5 (our flagship product for web content management). Likewise, Apache HTTP Server and Apache APR are used by all of our customers, often without knowing it, and our active participation in these communities allows us to leverage the experience and brainpower of hundreds of developers.

We develop our open source products under the banner of the Apache Software Foundation, rather than hosting our own open source projects, because we have found that a richer community of software developers can be formed around a nonprofit foundation like Apache. It allows everyone, even some of our competitors in the content management industry, to participate on an equal footing and ensures that the software won't be subject to any single company's moods. In turn, a larger community means more bugs are found and fixed quickly, more features are added by others, and more leverage is obtained for our commercial products that depend on open source.

The only software that Day has not open sourced is our application software, which is often highly integrated with specific customer needs. In addition to paying our salaries and supporting all of those open source infrastructure costs, keeping the application software private allows us to incorporate third-party products, such as components that integrate with legacy data sources and professionally designed user interface modules.

Microsoft has a quite different model. Microsoft gains much of its platform sales based on developer mindshare: their primary revenue stream is a proprietary infrastructure platform and the only way they can keep people buying that platform is to encourage developers to program for it. Microsoft's recent interest in REST is therefore just a reaction to where they see the developers moving -- it is a reaction to having lost the mindshare associated with Web Services.

What do you think about "Web 2.0", considering it in terms of Social Networking and new Client paradigms (Ajax, Silverlight), where all become "content" for informations, transactions and relations. Is it ready to fill its promises or should we wait for Web 3.0, 4.0 and so on? What will be "the next big thing"?

Web 2.0 is just a marketing phrase for the second wave in venture capital funding. I think that has clearly ended, at least within the Silicon Valley environment. The funny thing about these version numbers is that the "2.0" paradigms (AJAX, javascript, CSS, etc.) are all technologies invented back in 1997. It has only been the sad state of browser software that held back those advances for so long, and it only took one big company (Google, via Google Maps) to turn that around.

The Web architecture is currently on version 3.2, rapidly moving to 3.3, and my work is on 4.0. The biggest changes in the coming year will be in the nature of authentication systems and security, both of which haven't been improved significantly since 1995.

Did I miss the "right" question? Which one should you consider that it had to be?

I need to keep something in reserve for my presentation. See you in Milan.

The misuse of the term "RESTful" in the Rails community

Today I went to a talk at the local Ruby on Rails group. The speaker was quite clueful. He had even implemented his own DSL to describe his business problem. Obviously, the guy was not a noobie in Ruby.

However, what really turned me off was his usage of the word "RESTful". For him, it seemed to be a way to describe the inner workings of his application, like, say, "separation of concerns". RoR guys are generally not the most clueless people, but nobody in the audience challenged him about this. It seemed to be the generally accepted usage of the term in the Rails community.

This made me think that DHH and Rails have done two things to REST:
  • First, they greatly help to evangelize the term "RESTful"
  • Second, they hijacked the meaning of the term and changed it from "architectural style" to "application architecture"
As it happens I listened to a podcast from the Pragmatic Programmers on my way home. It was about the .Net Ruby implementation and, of course, Rails and consequently REST were brought up. One of the speakers said that he was only introduced to REST through Rails. He went on to explain REST in way that confused the hell out of me, but the essence was along the lines of "http is good". If the Rails community is fuzzy about what REST is, people who get it second hand from them are as well.

I believe that a part of the misunderstanding is that the term "architectural style" (as opposed to "architecture") is not understood well enough in the development community. However, Roy Fielding has written a brilliant post about that difference between an architectural style and an architecture: "On Software Architecture".
Web implementations are not equivalent to Web architecture and Web architecture is not equivalent to the REST style.
RESTful-Rails-people please have a look at that post.

PS: Ted Neward had some predictions for 2009 (as I silently predicted, nobody cared that I did not make any predictions for 2009), one of them just came to my mind (emphasis mine):
Roy Fielding will officially disown most of the "REST"ful authors and software packages available. Nobody will care--or worse, somebody looking to make a name for themselves will proclaim that Roy "doesn't really understand REST". And they'll be right--Roy doesn't understand what they consider to be REST, and the fact that he created the term will be of no importance anymore. Being "REST"ful will equate to "I did it myself!", complete with expectations of a gold star and a lollipop.

Open Source Collaboration Tools - at OpenExpo, Bern, April 2nd


The program for OpenExpo 2009 Bern is just out, I’ll be giving my Open Source Collaboration Tools are Good For You! talk on April 2nd.

I’ve been giving this talk a few times in various places already, and it often leads to interesting conversations, we’ll see if that works in Switzerland as well! I must not forget to indicate that I can understand questions in German, as many people are more likely to ask questions in their native language.

Does OSGi work for you?


apachecon-eu09.jpgI’m looking for additional input for my Tales from the OSGi trenches talk, at ApacheCon EU 2009 next month in Amsterdam.

My main angle for this talk is how the move to OSGi changes the way developers and customers work. Day’s complete product line is based on OSGi (using Apache Felix and Apache Sling), and this has a tremendous impact on how our developers work. Users of our products, depending on the level at which they decide to interact with them, can also reap big benefits from OSGi’s modularity and service-oriented features.

However, while OSGi might look like a silver bullet on paper, rethinking modularity and services has an important impact of the way people work, and on how we test our systems.

For this talk, I intend to describe the impact that OSGi has on our ways of working, including the potential downsides, or misuses, of extreme modularity and extreme dynamic behavior of services and components.

I’d be very happy to include other people’s opinions (converging or not) in my talk, so let me know if you have similar experiences to share. Either in comments here, or by mail, bdelacretaz () apache.org. All contributions will be duly acknowledged, of course!