Latest Posts

Archives [+]

Entries filed under 'development'

    Posted by Kas Thomas SEP 09, 2011

    Posted in cq5, development, documentation and fise Comments 8

    Last time, I reviewed the basics of the powerful Clickstream Cloud feature of Adobe WEM (formerly CQ5), which is the feature whereby, if you type Ctrl-Alt-C, you get a popup summary of various bits of contextual information about the user, the user's browser, and the page the user is currently visiting (see illustration further below).

    As with almost everything else in WEM, the Clickstream Cloud can be customized relatively easily, because the code for the Cloud is easily accessible (and modifiable) in the repository.

    If you go in the repository under /libs/cq/personalization/clientlib/source/shared (best done in CRXDE Lite: just aim your browser at http://localhost:4502/crx/de/index.jsp#/crx.default/jcr%3aroot/libs/cq/personalization/clientlib/source/shared), you'll see a half dozen *.js files that govern the Clickstream Cloud's basic behavior, and if you look under /libs/cq/personalization/clientlib/source/clickstreamcloud, you'll find the *.js files that contain code for the various session stores that manage the information fields displayed in the cloud dialog. There's also a js.txt file at /libs/cq/personalization/clientlib/js.txt that governs how all these *.js files are loaded.

    As a very simple example of customization of the Clickstream Cloud, let us suppose that you wanted to add a timestamp to the cloud dialog under "Surfer information" as shown below:

    Clickstream Cloud Dialog

    Notice the part, under Surfer Information, where it says "Thu Sep 08 2011 17:09:45," etc. This information was added as a result of custom code.

    There are a couple of ways to do this. One way would be to alter the setSurferInfoInitialData() function in config.json.jsp (which is located in a somewhat obscure place, namely /libs/cq/personalization/components/clickstreamcloud/command/config/). You might be very tempted to do this since that's the function where the user's IP address, for example (which appears under Surfer Information), is set. But making a change in this function would actually be a bad thing to do, for a number of reasons. First, you're dealing with a core WEM file. And you're making hard-coded changes to it. There's no guarantee that this file will stay unmodified (or even continue to exist) in future versions of WEM, and by putting custom code in it, you've created a maintenance nightmare.

    A better alternative is to create your own separate file, perhaps called custom.js, and place it under /libs/cq/personalization/clientlib/source/clickstreamcloud/. The content of custom.js is simply:
     
    CQ_Analytics.CCM.addListener("configloaded", function() {
                  CQ_Analytics.SurferInfoMgr.setProperty( "timestamp", new Date() );
     }, CQ_Analytics.SurferInfoMgr);

    To ensure that custom.js loads at runtime, you do need to make a change to the aforementioned js.txt file (namely, the one at /libs/cq/personalization/clientlib/js.txt). Just add the line "clickstreamcloud/custom.js" to the end of the file.

    Now you should be able to go to a new page (or reload the current page) in WEM, type Crtl-Alt-C, and see the timestamp information in the Surfer Information portion of the Clickstream Cloud dialog.

    What's neat is that if you now click the Edit link in the upper right portion of the Cloud dialog, then click the Surfer Information tab of the dialog that pops up, you'll see timestamp info among the editable fields of the dialog:

    file

    For more information on the Clickstream Cloud API (including how you can create your own custom session store), see the documentation here.

    Posted by Kas Thomas MAY 02, 2011

    Posted in contentmodels, crx gems, development, jackrabbit, java content repository, javascript, jcr and performance Comments 4

    Adobe CRX is an extremely versatile content store that can handle a wide range of content types (structured and unstructured), capable of reliably storing many millions of objects. In fact, the system's ultimate storage limits are actually not subject to any particular limitations of CRX itself but (rather) depend on the underlying persistence manager. You can choose from a number of different types of persistence (DB2, MySQL, Oracle, TarPM; see documentation here), each with its own particular limitations.

    In general, the default TarPM persistence manager gives better performance than most RDBMS alternatives for the typical CRX use cases (involving web content and user management). But in certain situations, with certain use cases, performance with TarPM can take a hit. The most common problem? Big Flat Lists.

    Although read performance remains good, write performance can suffer in the case where you need to store, say, thousands of sibling nodes under one parent node. This has to do with the fact that TarPM is an append-only store in which objects are immutable and never overwritten, only rewritten. What it means is that the cost of adding (or updating) Node No. N-thousand-plus-one can be quite high.

    Of course, the answer is to divide and conquer: Break the nodes up into smaller groups, preferably hierarchical groups.

    Suppose you have a large number of users whose user-data you want to store in CRX, and you'd like to be able to store users by name. The naive way (we'll keep the example simple and assume no name collisions) would be to store Joe Smith under a node named users/joe_smith, Lee Jones under users/lee_jones, etc. But after a thousand names or so, performance will start to suffer noticeably as new entries are written to the repository. Far better performance will result if container nodes (buckets) are created for each letter of the alphabet, and for each Last Name, so that you can add Joe Smith as /users/S/Smith/Joe, for example.

    A more sophisticated approach would be to hash user IDs and chunk the hash to form an ad-hoc hierarchy. For example, "Joe Smith" might give a hash of ab12cd34. The user data for Joe Smith can be stored at users/ab/12/cd/34. When the time comes to look up data for Joe Smith, you would first hash the name (to obtain ab12cd34), then create the necessary path from the hash, and look up the data.

    As it turns out, the Jackrabbit API (which of course is built into CRX) offers yet another alternative for efficient hierarchical storage of arbitrary data, in the form of the BTreeManager. This class provides B+ tree-like behavior in allocating subtrees of nodes that are always balanced, with a fixed limit on how many siblings any given node can have. (You provide the limit as an argument in the constructor.)

    I wrote a very short test script (in ECMAScript) to show how the BTreeManager operates, as shown below:

    <html>
    <body>
    <%
    /* Create a new TreeManager instance rooted at the current node.
    Splitting of nodes takes place
    when the number of children of a node exceeds 40 and is done such that each new
    parent node has >= 10 child nodes. Keys are ordered according to the natural
    order of java.lang.String. */
     

     var treeManager = new Packages.org.apache.jackrabbit.commons.flat.BTreeManager(    this.currentNode, 10, 40, Packages.org.apache.jackrabbit.commons.flat.Rank.comparableComparator(), true);

     // Create a new NodeSequence with that tree manager
     var nodes = Packages.org.apache.jackrabbit.commons.flat.ItemSequence.createNodeSequence(treeManager);
     
     var totalNodes = 100;
     
     // Do some profiling:
     var start = 1 * new Date();
     
     // add a bunch more nodes
      for (var i = 0; i < totalNodes; i++)
       nodes.addNode( "MyNode" + i,
       Packages.javax.jcr.nodetype.NodeType.NT_UNSTRUCTURED);
       
     var end = 1 * new Date();
     
     %>
     
    <%= "Total time: " + (end - start) + " millisecs" %>
    </body>
    </html>

    I called this script tree.esp and placed it under /apps/tree in CRX, then created a dummy node under /content and gave the dummy node a sling:resourceType of "tree" (to trigger the script when navigating to content/dummyNode.tree).

    The performance benefits of BTreeManager are notable. On my (decrepit Dell) laptop, adding 100 nodes as a flat list took 1.6 seconds (which includes about 200 milliseconds for servlet compilation). Adding 1000 nodes as a flat list (no B-tree) took 22 seconds. Adding 5000 nodes took 289 seconds. Note that adding five times as many entries took almost 13 times as long.

    By contrast, using BTreeManager (set to a maximum sibling breadth of 40), adding 1000 nodes took 14 seconds and adding 5000 took 86 seconds. (Five times the data takes roughly five times as long.)

    The real lesson here is: If your content is hierarchical (or can be made to look hierarchical), by all means capitalize on that fact! Don't try to treat your content as a Big Flat List, especially if you'll be doing a lot of updates. (If you're doing mostly reads and few writes, on the other hand, it doesn't much matter.) Introducing a bit of hierarchy to your content organization scheme will go a long way toward promoting fast update performance.

    (Many thanks to Felix Meschberger and Marcel Reutegger for input into this blog.)

    Posted by Ben Peter JAN 28, 2011

    Posted in ajax, cms, development and performance Comment 1

    This post is cross-posted here.

    In your typical CMS setup, most of the content is actively managed as such. But you often come across scenarios where other data needs to appear on a page, e.g. prices or product data that are provided by external sources and change on their own schedule.

    There are various way to handle such a requirement, each with its own upsides and downsides.

    One obvious way would be to access the data from within the CMS domain, i.e. build a component that reaches out to the data source and renders the appropriate data accordingly. It’s straightforward to implement, the only variation being the complexity of data access (which is involved anyways).
    The one issue with this approach is that it will leave your page uncacheable to make sure that the data is always up to date. Every request to the page needs to hit the Publisher so that the component can reach out to the data source and pull the most recent data. Caching that page on a Dispatcher or CDN level is out of the question.

    If you want a page that doesn’t hit the Publisher and can be cached by the Dispatcher and on the CDN, you can take a slightly different approach: build a component that in edit mode will allow you to pull updated data from the data source, and store it as part of the page’s content. In publish mode the data will be rendered just as the rest of the page’s content.
    The issue with data updates is not eliminated, but it’s now pushed to the authoring side. On the publisher, the page is fully cacheable, but you need to make sure that whenever the data changes, the page is activated. That can happen automatically, if you have technical ways to be notified of data changes, or can be organizational (read: phone call). Altough technically often not a problem, the automated update often is impossible because the involved pages contain other content that may or may not be ready for activation and still need to be checked by a human. Or they are simply part of a review and approval workflow that is not certain to complete within the time that the data is allowed to be out of date on the public-facing systems.
    From an implementation perspective, this option is slightly more complex than the previous option.

    If there is a need to update the data in the page in a fully automated fashion, there are more options available that merge content and data not at the time the page is baked, but on the webserver or in the browser.

    Bringing data and content together in the browser is easily done through AJAX requests, as long as you can expose the data source in a way that will give you the right data per page as e.g. JSON. For both performance and information control reasons, you want to put a layer on top of the data source that will not simply spit out all of the data, but just the data that are required for that particular page.
    This approach works well, is very simple to implement, and matches the second approach in terms of cacheability: the page can be cached at all levels. A request for the page can be offloaded at a CDN layer and from the Dispatcher cache. Only an activation of the page due to content changes will require these layers to be purged or invalidated. The delivery layer on top of the data source can follow its own caching strategy.
    The approach may be inappropriate if the display of data must not be deferred until the request for the data is complete, or if it must not depend on Javascript being available. While today such restrictions are typically not considered important from a user experience point of view, legal requirements can often enforce that a page be either displayed completely with accurate information and independent of Javascript, or not at all.

    That leads to the fourth option that’s available which allows for good cacheability, data accuracy and user agent compatibility. Instead of aggregating in the browser, the aggregation is performed on the webserver.
    For that, a layer is built on top of the data source that renders HTML fragments that go into the page. The respective CQ component does nothing but render an appropriate SSI statement that fetches the HTML fragment from the data rendering layer and plugs it into the page. As typically SSI does not allow you to include remote sources, a reverse proxy is required to make the data available as a local path if the data source is not deployed within the same virtual host.
    That leaves the page cacheable on the publisher: the page including the SSI statement is pulled from the dispatcher cache, SSI statements are evaluated, and the page is then returned to the requesting layer – either the CDN, a proxy, or a browser. It can however not be cached at a CDN level, as for data consistency the webserver needs to re-perform the SSI on each request.
    In terms of implementation this option is pretty simple, but you want to make sure you have someone handy who knows your webserver well.

    For completeness’ sake, there’s an option that has similar characteristics as the first option in terms of cacheability and process interdependencies but is slightly more complex to implement. If you can afford to hit the Publisher on each request and your data source has a hierarchical structure, you can choose to make it visible to Sling as resources within the repository. This is typically only worth the effort if that data is used in many different scenarios and if it is semantically part of your content, but not managed as such because it comes from an established source that is outside of the system’s domain.

    Which of these options is appropriate is up to the concrete situation. None of them can be generally ruled out or recommended, although it may be considered bad manners to introduce interdependencies between content publication and data update processes, which the first two and the fifth option do.

    file

    Posted by Kas Thomas AUG 25, 2010

    Posted in ajax, crx, crx gems, development, javascript, rest and sling Comment 1

    In previous posts, I've shown how to load movie data into CRX and how to render data for individual movies via HTML, SVG, and PDF. What I'd like to do now is show how easy it is to build interactivity into an app using a bit of AJAX combined with Sling's support for RESTful XPath-based search.

    It turns out that all we have to do to query the repository for, say, all nodes that have a value of "Hitchcock" under the property named "Director" is put together an XPath expression like

    //*[jcr:contains(@Director,'hitchcock')]

    and pass it to Sling in a URL that looks like:

    http://localhost:7402/content.query.json?queryType=xpath&statement=//*[jcr:contains(@Director,'hitchcock')]

    (assuming the repository is on port 7402 of localhost). This request will invoke a Lucene search of all nodes stored under the /content subtree. The results will come back as a JSON-formatted array:

    [
        {
            "name": "notorious",
            "jcr:path": "/content/films/notorious",
            "jcr:score": 3331
        },
        {
            "name": "under_capricorn",
            "jcr:path": "/content/films/under_capricorn",
            "jcr:score": 3331
        },
                . . .
    ]

    This is perfect, because it means we can use the JSON data to populate a dropdown menu (a "select" control in an HTML form) showing the names of films; and we can arrange things so that when the user clicks a "Show Details" button, the form updates to show detail information (title, director, year, genre, actor, actress, etc.) for the film in question. To get the detail information, of course, we can perform a behind-the-scenes AJAX query to the server. I already showed, in a previous post, how to render detail information for a given movie in an HTML page. All we really need to do at this point is put that HTML page into its own iframe, and (right next to it) add search controls to the page.

    The following form shows one possible way of handling things.

    file

     

    Basically, we have an HTML form in which there are two action buttons: One is a Search button ("Search Films by") that initiates an XPath-based search of the repository based on a user-chosen criterion of Title, Director, Year, Genre, Actor, or Actress. The other is a Show Details button, underneath a picklist of films. Clicking the Search button populates the picklist with hits. When the user chooses a hit from the list and clicks Show Details, the left side of the page updates with detail information.

    The form consists of 200 lines of JavaScript and markup, as follows:

    <html>
    <head>
    <script>

    var CRX_BASE_URL = "http://localhost:7402";

    function addEventListeners( ) {

            document.getElementById( "_Query_" ).addEventListener(
            "keypress", function( e ) {
                    if ( 13 == e.keyCode )
                    handleClick( null );
            },
            false );

            document.getElementById( "_QueryButton_" ).addEventListener(
            "click", handleClick, false );



            document.getElementById( "_Fetch_" ).addEventListener(
            "click", handleFetch, false );

    }

    function getSearchMode( ) {

            return document.getElementById("_Select_").value;
    }

    function handleFetch( e ) {

            var list = document.getElementById("_Hits_");

            if (list.value) {
                    var url =  CRX_BASE_URL + list.value + ".html";
                    var iframe = document.getElementsByTagName("iframe")[0];

                    // force a reload of the iframe:
                    iframe.src = url;
            }
    }

    // get user's input and call server
    function handleClick( e ) {

            var userData =
            document.getElementById( "_Query_" ).value;

            if ( !userData )
                return;   // nothing to do

            var CRX_QUERY_PATH = "/content.query.json?queryType=xpath&statement=";
            var GETheader = {
                    "Accept": "application/json",
            };

            var query = createXPathQuery( userData );
            var url = CRX_BASE_URL + CRX_QUERY_PATH + query;

            myHttpGet( url, GETheader, handleResponse ); // hit server
    }


    function myHttpGet( url, header, handler ) {

            try {
                    request = new XMLHttpRequest();
                    request.open("GET", url, true);
                    for (i in header)
                    request.setRequestHeader( i, header[i] );
                    request.onreadystatechange = handler;

                    request.send("");
            }
            catch(e ) {
                    alert("Problem sending request: " + e.toString());
            }
    }

    function handleResponse( ) {

            if (request.readyState == 4) {
                    showResults( request );
            }
    }

    function showResults( request )  {

            var json = request.responseText;

            var hits = eval ( json );

            if ( null == hits ) {
                    alert( "No hits were found." );
                    return;
            }

            display( hits );
    }


    function display( hits ) {

            var div = document.getElementById( "_Hits_" );

            if ( null == div )
                throw( "Problem getting div for hitlist." );

            showHitCount( hits.length );

            var markup = "";

            for (var i = 0; i < hits.length; i++) {
                    markup += "<option value=\"" + hits[i][ "jcr:path" ] + "\">";
                    markup += fixName( hits[i].name );
                    markup += "</option>";
            }
            div.innerHTML = markup;
    }

    function fixName( name ) {
            var tmp = name.split("_");
            for (var i = 0; i < tmp.length; i++)
                tmp[i] = capitalize( tmp[i] );
            return tmp.join(" ");
    }

    function capitalize(a) {

            return typeof a[0] == 'undefined'?
               "":a[0].toUpperCase() + a.substring(1);

    }

    function showHitCount( numberOfHits ) {
            var div = document.getElementById( "_hitcount_" );
            if ( null != div )
                div.innerHTML = ("Total hits: " + numberOfHits).italics();
    }

    // build xpath query url
    function createXPathQuery( userString ) {

            var xpathTerms = [];

            var querySemantics = " and ";

            // trim leading & trailing spaces off query
            var terms = userString.replace(/^\s+/,"").replace(/\s+$/,"");

            // split on whitespace
            terms = terms.split(/\s+/);

            var _queryBasis =  "//*[_#_]/@location";
            var mode = getSearchMode( );

            for ( var i = 0; i < terms.length; i++ )
                xpathTerms.push( "jcr:contains(@" + mode + ",'" + terms[i] + "')" );

            var query = _queryBasis.replace( '_#_', xpathTerms.join( querySemantics ) );

            return query;
    }

    </script>
    </head>


    <body onload="addEventListeners()">

    <iframe width="380" height="410" style="border:none" src="http://localhost:7402/content/films/wild_at_heart.html"></iframe>

    <span style="font-size:small;position:absolute;right:80px;top:7px;">
    <input type="text"     id="_Query_" size="25"/>
    <input type="button"   id="_QueryButton_" value="Search Films by:"/>

    <select id="_Select_">
    <option value="Title">Title</option>
    <option value="Director">Director</option>
    <option value="Actor">Actor</option>
    <option value="Actress">Actress</option>
    <option value="Year">Year</option>
    <option value="Subject">Genre</option>
    </select>
    <br/>

    <select id="_Hits_" size="10"></select><div id="_hitcount_"></div>
    <br/>
    <input type="button" id="_Fetch_" value="Show Details"/>


    </span>
    <div id="hitlist"></div>
    </body>
    </html>
     

    This form (movieForm.html), along with the data for 1700 films (and scripts and PDF files discussed in prior posts), is available in the zip file below, which can also be downloaded from Day Package Share. After installing the package, go to http://localhost:7402/apps/films/movieForm.html to see the form in action (assuming your CRX is on port 7402).

    * MovieApp-1.zip
    Sample code and data for MovieApp.

    Posted by Alexander Saar AUG 16, 2010

    Posted in crx, crx gems and development Comments 2

    CRXDE Lite is a web-based repository browser for CRX's JCR repository and a development environment for CQ5 Platform in CRX, based on Apache Sling content delivery and development platform and Apache Felix OSGi runtime framework.

    In contrast to the CRX 1.x Content Explorer, which maintains a server-side CRX session, CRXDE Lite handles all modifications directly within the browser and uses the JCR remoting interface to retrieve content and persist changes.

    This article looks behind the scenes of how this rich set of functionalities was implemented in the browser. CRXDE Lite functionality and tips&tricks for using it were presented in the previous blog entry on CRXDE Lite.

    CRXDE Lite Design Goal. The most important design goal for CRXDE Lite was providing rich functionality with a near-desktop experience in a web application. There were three main architectural decisions we had to make while designing CRXDE Lite:

    • Which web Javascript framework to use for the user interface?
    • Where to host the CRXDE Lite web application?
    • How to implement remote access to JCR repository & server-side features from the browser user interface code?

    For the user interface framework the natural choice was the ExtJs library, which provides good user experience and is used in CQ5 Platform hosted in CRX. It also has a good internal architecture, separating the underlying model from the view.

    For the deployment model, we decided to host the CRXDE Lite application in CRX's web application. It minimizes the dependencies on other parts of the system, like OSGi container & Apache Sling content delivery platform. CRXDE Lite is available also when Apache Sling does not run, which helps in cases of system troubleshooting, recovery, etc.

    Transient space architecture. We were considering a number of approaches of providing remote interface to the JCR repository to the in-browser implementation of the user interface. In the end we decided to leverage CRX's JCR Remoting Server based on Jackrabbit JCR WebDAV Server, which provides an end-point for remotely accessing the repository. The JCR remoting protocol, extending WebDAV and adding DAVEx batch operations, is used as one of the possible remoting layers in the overall Apache Jackrabbit client-server SPI Architecture, upon which CRX is built. As the protocol is based on HTTP and uses JSON format, it is a good match for the user interface code written in browser's Javascript.

    The only, somewhat challenging, thing left to do was to implement a JCR remoting client on the browser side. We implemented a simplified JCR transient state layer (client code) in Javascript, leveraging ExtJS model classes.

    file

    ExtJS provides a good separation of model and view. In ExtJs, list-style content, like properties of a node, is stored in records which are maintained by so-called stores. Stores are responsible for retrieving and persisting the content and defines the serialization format and the server endpoint. For tree-style data, like JCR nodes, ExtJS defines a tree node type that handles the common properties of a node like display text, parent node or child nodes. When a tree node is assigned to a tree the rendering of that node is delegated to a separate configurable view class. Retrieval of data is managed by an instance of the tree loader class.

    For CRXDE Lite we make use of both, JCR nodes are represented as tree nodes and properties of a node are handled as records. To integrate with the JCR remoting server endpoint a custom tree loader was implemented that is able to deal with the JSON format that is used by JCR remoting and which creates nodes and records for the properties of a node.

    The records for properties can be displayed and edited in the property grid at the bottom. Once a record was edited it automatically gets marked dirty so we can easily find modified records when we want to persist our changes.

    Handling content changes. If a new node is created, it gets marked as transient and gets added to a list of transient nodes that is maintained by the store. Like with dirty records this allows for easy finding of new nodes in order to persist them. Deleting a node that is already persisted does not remove it automatically from the tree but just hides the according node and all its children. This allows to easily revert such changes by just displaying it again. We don't need to keep the path somewhere. If changes are persisted those nodes are deleted on the server first and only if this was successful they are deleted locally.

    When the changes that a user has made should be persisted, the transient storage generates a multi-part message body that is send via an AJAX call. Once all changes are persisted and the call returns successfully all dirty flags are removed from the according records, the list of transient nodes gets cleared and deleted nodes are removed from the tree.

    Note: The current version of CRXDE Lite persists all changes since the last save operation at once, which is similar to saving the changes made in a CRX Session. Future version might support more fine grained support for saving so you can just save single files like with a Desktop based IDE. While this is technically possible with JCR remoting, it will require some more research on display models and user feedback. Imagine you want to save a file whose parent is a transient node and not stored in the repository yet. In this case you either need to persist the parent automatically or prevent saving only that file in which case we need some means to find the parent that can be saved.

    Plugins. CRXDE Lite architecture is internally based on a plugin concept. The plugin architecture helped us to develop and maintain CRXDE Lite code in a clean, modular manner.

    Plugins are plain Javascript files that are loaded during CRXDE Lite loading phase. The implementor of a plugin is responsible for registering the plugin with the statically available plugin registry. This can be done by calling the following method:

    CRX.PluginRegistry.reg(ID, CRX.ide.MyAction);

    The first parameter has to be the ID of the widget that will be extended (e.g., the ID of a menu or toolbar) and the second will be the plugin class.

    Each plugin has to provide 2 static methods:

     

    • canHandle(context): check if plugin is active for the current context (e.g., menu item active for selected node
    • getInstance(context, args): return instance of the plugin

    Note: CRXDE Lite Plugin API is not yet a supported CRXDE Lite feature as of CRX 2.1 and is only used internally by the implementation. It may change or be removed in the next versions.

    At the moment plugins are not loaded from the repository but have to reside in the web-apps working directory and added to the index.jsp manually. One of the possible extensions in the future future would be to allow loading of the plugins directly from the repository so you could modify and adapt CRXDE Lite according to your needs.