John Resig of JQuery fame has written an interesting article about a Javascript library called Genetify by Greg Dingle which is for A/B Testing web sites. Wikipedia explains A/B Testing as:
A/B testing, or split testing, is a method of advertising testing by which a baseline control sample is compared to a variety of single-variable test samples in order to improve response rates. A classic direct mail tactic, this method has been recently adopted within the interactive space to test tactics such as banner ads, emails and landing pages.
Significant improvements can be seen through testing elements like copy text, layouts, images and colors. However, not all elements produce the same improvements, and by looking at the results from different tests, it is possible to identify those elements that consistently tend to produce the greatest improvements.
In the context of a web page one might for example change the colors or the texts, display each variation to a subset of the site's visitors and determine the most successful variant by the number of page views or sold items.
There's two things to note about Genetify: first, it takes this process to the client, i.e. the served HTML page already contains all possible variants and a particular variant is chosen on the client-side. Second, over time the optimal variation will be shown more often than suboptimal versions. This is the "genetic" part (as in Genetic Algorithms).
John provides a good overview of the library and also points to Genetify's instructive demo. After John's post Genetify's author Greg Dingle has open-sourced Genetify on GitHub including a PHP/RDBMS-based backend which is announced and discussed in the comments of John's post. In another comment of that thread Rob Howell says:
Also, would be very cool to see it integrated server-side into a decent CMS.
Hmm, I happen to know a decent CMS so I had a look how Genetify could be ported (to Apache Sling actually, which makes it suitable for CQ5 or any other Sling-based web application):
Originally, I planned to simply re-implement the PHP backend and leave the JS untouched. But I realized that the style of interaction between the JS script and the PHP-backend was so much out of tune with how one would design the interaction in a RESTful framework like Sling that I decided to tweak the JS script as well. As such, this excercise became more interesting in the sense that some differences between PHP/RDBMS-backends (I should rather say: the way PHP-based backends are usually designed) and Sling/JCR-backends became visible.
The first difference was for recording "variants" and "goals". The variants are the permutations of the genes that are shown to a specific user. The goals are the desired outcomes that shall be measured, like buying something. Both need to be persisted, obviously. In the original version both are recorded by sending a GET request to the backend. I changed this to the (arguably more "correct") POST method. The original version sends a random number parameter with each request. As far as I understand the code this is needed to get around caching issues. Using POST would allow to drop this parameter. Whatsmore, Sling requires no backend code at all for writing a new node when the request is sent using POST method.
The second change involves the layout of the stored data. In a RDBMS-based system one (obviously) puts the different entities into different tables (which need to be defined beforehand). In a JCR-based system one possible, if not even the natural approach is to utilize the hierarchy - and potentially not define any node types at all, like I did. Since I store all variants and goals in nodes of type nt:unstructured there is no need to define a data schema or the like beforehand. One can start writing into the empty repository.
For example, the variants are stored in one node of type nt:unstructured that stores all the properties like on wich domain the variant was shown. The actual genes are stored in a child node below. A similar approach is taken for the goals where there is a node for each goal (named like the goal) and child nodes for the achieved goals.
It is actually possible to create a node hierarchy like this in one POST request by simply setting parameter names accordingly:
(this approach is also used in the blog sample application where a blog post can have an attachment which is stored as a child node of the blog post's node).
As said above, this part did not involve server-side scripting. However, the Genetify JS script not only writes the goals, but also retrieves information about the previous performance of the genes when it starts (in order to lean towards more successful genes in the long run). I have (hopefully correctly) reverse-engineered the PHP scripts that generate this response and written an ESP script (server-side JS) that should do the same. It should be noted that the original Genetify server-side scripts do a lot more error checking which is not implemented in the ESP.
If you want to check out the Sling'ed Genetify version grab the attached zip file, unzip it into your CRX repository at /apps/gen and point your browser to http://localhost:7402/apps/gen/index.html. The upper part of the page displays the values of two genes (the first one is "rock", "paper", or "scissors"). If you click the "vary" link below the genes will change (because keeping always the same state on one particular browser with a cookie is switched off for development). Clicking one of the two links further down "want it!" or "badly!" will be counted as an achieved goal for the genes that are curently displayed. If you click one of them and reload the page afterward the stats table will have changed. The stats table represents the success of particular genes on a particular page. For restarting just delete the results stored in /content/gen.
While it's fun to look at how do things in Sling and how little code is needed to get things running it needs to be said that the approach presented above will not scale very well. For once, all variations are stored flat, i.e. without a hierarchy. Since each page view creates a variation the number of child nodes will quickly become much too large to be handled efficiently. The second scaling problem is the calculation of the previous results which takes will take much too long as well. Both problems could be remedied by another JCR-typical approach "Observations". A listener for /content/gen could be registered and move old variations into a properly structured archive a s well as pre-calculate the previous results table.