Latest Posts

Archives [+]

Categories [+]

Authors [+]

Entries filed under 'tutorial'

    Posted by Michael Marth DEC 28, 2009

    Posted in cq5, iks-project and tutorial Add comment

    As part of the IKS project each CMS vendor completes a couple of benchmarks in order to establish a baseline against which future semantic improvements can be measured. For benchmark 3 "Workflow Service" Bertrand and I chose to implement the task "Create a multi-channel (email, SMS, instant messaging, Twitter,...) notification service for workflow transitions". We have created an automated workflow step that can be inserted into a custom workflow and either send an e-mail, send a direct message on Twitter or send a chat message on GTalk/Jabber. The corresponding message's payload is the path to the content node in the workflow plus an optional custom text.

    Below follows a description how this functionality was implemented in CQ5. The complete code is attached to this post as a CQ5 package. I will outline of some of the considerations and gotchas regarding this particular feature, but some issues apply to CQ5 development in general as well. The environment I used for development was CRXDE Lite (the web-based IDE available at /crxde of your CQ5 installation) and a beta version of the upcoming CQ5 release 5.3. It is probably helpful to install the package (see setup section below) and read the code alongside with this post.

    OSGi services

    A good way to hook up external services like Twitter etc. is to create a custom OSGi service that exposes only the business functionality and hides the internal classes. Moreover, it is good practice to provide a Java interface and the separate the implementation of the service (allowing the replacement of the implementation without affecting relying parties). The services will show up in the Sling configuration console at /system/console/configMgr. This allows the administrator to configure the service's private parameters at deployment time (in our case Twitter account credentials and Jabber user credentials). The config is consumed by the service like e.g.:

    /** @scr.property */
    public static final String GTALK_USER = "gtalk.service.user"; 
    /** @scr.property */
    public static final String GTALK_PASSWORD = "gtalk.service.password";   
    
    protected void activate(ComponentContext context) {
        Dictionary config = context.getProperties();
        user = (String) config.get(GTALK_USER);
        password = (String) config.get(GTALK_PASSWORD);
    }

    3rd party libraries

    In order to use Twitter and Jabber I utilized the open source libraries Twitter4J and Smack, respectively. With CRXDE (Lite) it is very simple to include such 3rd party jars in a custom OSGi bundle: just drop them in the bundle's /libs folder. When building the bundle CRXDE will embed them. Compilation and deployment is done by executing "Build Bundle" (right-click on the .bnd file in the bundle root).

    A note on 3rd party jar's dependencies

    It might well be that the bundle compiles and deploys, but does not start. Check the OSGi console at /system/console/bundles to find out if your bundle's state is "Active" (good) or just "Installed" (not good). The latter happens e.g. when the embedded jar has dependencies on other jars that are not embedded. In such a case check the bundle's details page in the Sling console to find out which dependencies are missing and either add them to /libs as well or take them out of the OSGi imports. That is achieved by editing the .bnd file's import directive, e.g.

    Import-Package: !com.sun.syndication.*, !dalvik.system, *
    

    Workflow action

    The last needed piece is a workflow step that can be added into a custom workflow. For that purpose one simply needs to create a class that implements the Interface JavaProcessExt. The method execute will receive the workflow's payload - from there is is trivial to obtain the services described above and pass them the content. CQ Workflow Actions can be customized for each particular workflow they are used in. I use this feature to customize the accounts to which a message shall be sent (the custom format is explained in the setup section below). The customization string is passed to the execute method as well: comma-separated values will arrive as a String[] array.

    Setting up the package

    To get this up and running download the attached CQ5 package and install it throught the package manager. In the Sling console configure the the services com.day.cq.mailer.impl.MailerService, com.day.iks.service.impl.TwitterServiceImpl and com.day.iks.service.impl.GtalkServiceImpl. For Twitter and Gtalk you need to supply the credentials of the (technical) user that shall send the DMs or chat messages, respectively. In the case of e-mail you need to configure your mail server.

    Next, create a custom workflow in the CQ5 workflow section and add the workflow action (name). The configuration options are:

    • for sending an e-mail: email,user@mydomain.com,some_message
    • for sending a direct message on Twitter: dm,twitter_user,some_message
    • for sending a chat message on Gtalk: gtalk,user@gmail.com,some_message

    The (optional) message will be appended with the content item's path.

    Here is an example for GTalk:

    In the cases of Twitter DM and GTalk make sure that the recipient has opted-in to receive messages from the technical user you have configured as a sender.

    Posted by Michael Marth NOV 12, 2009

    Posted in cq5, iks-project and tutorial Add comment

    CQ5 search comes with some improvements over JCR's search capabilities, e.g. adapting result rankings to what users choose or faceted search. Within the IKS project Bertrand and I have experimented with another possibility: link-based ranking, i.e. adjusting search results based on the content of link tags. For example: if page A links to page B with the link text "lorem ipsum" then page B should get a higher ranking when a user searches for "lorem ipsum". This is essentially what Google does, but we wanted to apply it to internal links (within the same site) only.

    To give away the results right away: for many web sites the results will probably not improve dramatically, because there are not enough internal links. However, it might help for some projects so our implementation approach is described below in case you want to give it a try in your project.

    In order to extract links from a node we opted for parsing the complete rendered HTML presentation of a node rather than looking only at the Rich Text properties of one node. In that way we could also catch programmatically generated links from templates. So we ended up by setting up a little spider on the publish server that retrieves HTML representations of all pages. The spider is deployed as an OSGi bundle within the server so it gets the locations of all pages from an internal repository query. For each page the HTML is retrieved and parsed. The found links are stored as child nodes below the page that is linked to. In the example from above: if page A links to page B with the link text "lorem ipsum" then page B gets a child node with properties source=A and text="lorem ipsum". Implemented in that way we could basically use the Jackrabbit indexer without further changes.

    We have also implemented a JCR Observer that catches changes to pages and fixes the corresponding links. Template updates are not caught, yet.

    The sources are attached to this post. The Java program can be used as a standalone application or deployed as an OSGi bundle. The standalone program takes a couple of optional arguments for running a full upfront spidering, deleting all found link nodes etc. In case you want to give it a try please be aware:

    • The standalone program requires RMI to be enabled on the repository which is not the case by default (in the code port 1235 is used).
    • The searches must take into account the new properties of the link nodes. One possibility is to re-configure the Jackrabbit indexing, which in CQ5 is done in the crx-quickstart/server/runtime/0/_crx/WEB-INF/classes/indexing_config.xml file, by adding:
      <index-rule nodeType="nt:unstructured"
        condition="parent::backlinks/@jcr:primaryType=''{http://www.jcp.org/jcr/nt/1.0}unstructured">
        <property boost="5.0">linkedText</property>
      </index-rule>
      

    The boost factor in this configuration can be adjusted to give links a proper weight relative to the other properties of a node
    For reindexing delete these directories:
    crx-quickstart/repository/repository/index
    crx-quickstart/repository/workspaces/crx.default/index
    crx-quickstart/repository/workspaces/crx.system/index

    Results

    We tested the approach on the content of our corporate website (a rather small content corpus). Overall, the search results improved slightly, but not much (although we did not spend a lot of time on tweaking the boost factor). As stated above I believe that corporate websites in general will not benefit from link-based ranking very much as the majority of links in them are often reflecting the navigation (i.e. the hierarchical structure of the site) so they provide little additional information. Of course, on the other side there is no harm in using links for search relevance either.

    Alternative approach

    Marcel Reutegger (the MAN when it comes to JCR searches) gave a lot of great input to our experiment (thanks a lot for this). He also hinted how an alternative implementation could look like: using an output filter, which can process HTML content as it's being generated. In CQ5 the validity of links is already checked that way, so storing them would naturally fit there. Also, he suggested storing the links not below the pages themselves, but in a separate part of the repository. In a background processing job these links could be aggregated and the most relevant key words would eventually be written into the page nodes.

    Posted by Michael Marth JAN 20, 2009

    Posted in jackrabbit, jcr, tools and tutorial Comments 4

    There is an interesting new piece in the Jackrabbit sandbox: Jukka Zitting has commited a JDBC to JCR bridge. This bridge acts as a JDBC driver and thus allows users to connect to a JCR repository through JDBC. I was very happy to see this because I have one or two use cases where this comes in very handy: for example I would like to use a standard reporting tool to produce reports for JCR content. These tools work best with relational data and therefore need JDBC connections (*).

    Jukka has outlined how to use the driver and how it works on the Jackrabbit mailing list. Apart from Jukka's instructions it is useful to know that the driver internally bundles the Apache Derby DB so that the SQL queries are restricted to what Derby can deal with.

    Want to get your feet wet?

    To get started check out Jackrabbit 1.5.2, compile it with mvn clean install and start the fabulous new standalone server in jackrabbit-standalone\target with

    java -jar jackrabbit-standalone-1.5.2.jar

    Afterwards hit http://localhost:8080 and populate the repository with some documents.

    You also need to checkout the driver from the Jackrabbit sandbox and build it with mvn package. Beware, here's a gotcha: you need to use Java 5 (yes, it did not know it still existed either). On Windows I additionally encountered a problem when the JDBC connections get closed and a temp file cannot be deleted. This can be remedied by commenting the line FileUtils.deleteDirectory(tmp) in JCRConnection.java's close() method and later deleting the temp files manually.

    Once you got it compiled put the driver on your classpath and you are ready to hit Jackrabbit with some old-school JDBC program. Here's an example:

    package com.day.samples;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.ResultSetMetaData;import java.sql.Statement;public class JCRDriverTest {    public static void main(String[] args) throws Exception {        Class.forName("org.apache.jackrabbit.jdbc.JCRDriver");        Connection connection = DriverManager                .getConnection("jdbc:jcr:http://localhost:8080/rmi");        try {            Statement statement = connection.createStatement();            try {                ResultSet resultSet = statement                        .executeQuery("SELECT a.jcr_path as path, a.jcr_primaryType as type FROM NT_FILE as a");                try {                    ResultSetMetaData rsMetaData = resultSet.getMetaData();                    int numberOfColumns = rsMetaData.getColumnCount();                    System.out.println("Number of Columns=" + numberOfColumns);                    // get the column names; column indexes start from 1(!)                    for (int i = 1; i < numberOfColumns + 1; i++) {                        String columnName = rsMetaData.getColumnName(i);                        String columnType = rsMetaData.getColumnTypeName(i);                        // Get the name of the column's table name                        System.out.println("column name=" + columnName                                + " type=" + columnType);                    }                    while (resultSet.next()) {                        System.out.println(resultSet.getString(1) + " "                                + resultSet.getString(2));                    }                } finally {                    resultSet.close();                }            } finally {                statement.close();            }        } finally {            connection.close();        }    }}

    The connection string is "jdbc:jcr:http://localhost:8080/rmi". All nodes are arranged in views where the table name corresponds to the node type (i.e. all nodes of type nt:file are accessible in the view nt_file). ResultSetMetaData is available as well. It is possible to use quite complex queries, for example have a look at Jukka's test code:

    SELECT a.jcr_path as path, a.jcr_primaryType as type,  COUNT(*) as children  FROM nt_base AS a JOIN nt_base AS b  ON (a.jcr_path || '/' = SUBSTR(b.jcr_path, 1, LENGTH(a.jcr_path) + 1))  GROUP BY a.jcr_path, a.jcr_primaryType

    Reporting

    Now that the JCR repository looks like a RDBMS it is possible to throw a reporting tool at it. I took iReport 3.1.3 (a GUI for JasperReports), installed the JDBC driver and created a report that produces a pie chart of the top level domains from where the documents where downloaded (the path of the documents corresponds to their URL so the JCR path can be evaluated for this report).

     

    It should be noted that the driver is "in the sandbox" which means nowhere near production. If you are interested in using it and run into problems the Jackrabbit list is a good place to turn to.

    (*) Actually, for completeness, some reporting tools also work with XML data so one could also use JCR's XML export or Sling's XML rendering if available.

    Posted by Michael Marth OCT 21, 2008

    Posted in ria, sling and tutorial Add comment

    While Adobe Flex has some drawbacks like a broken implementation of http it also has its virtues. One of them is that it makes it easier for taste-challenged developers like me to come up with decent user interfaces. So here's a little post on building Flex UIs for Sling (that are confined to reading content from the repository). The example app I would like to discuss is a slideshow where the images get retrieved from a Sling-powered JCR repository (where they are stored as regular files so that new images can be added through WebDAV). The images shall be displayed using a (simplified) Ken Burns effect.

    This is the Flex app in action:

    (does anyone else think that zooming into the snail is somewhat scary?)

    In order to retrieve the images from Sling I looked at two integration strategies: client-side and server-side.

    Client-side

    There is Sling's JS-based client library at /system/sling.js. Amongst other things this library allows read-access to the content (the library is for example used in the JSTs of the CRX Quickstart sample Firststeps). Flex has facilities to interface with JS so one can call the Sling lib's method for retrieving the content (as JSON), pass the JSON object back into the Flex app, parse it and retrieve the images from there.

    The content is assumed to reside within one folder within the repository. The folder's path is passed to the Flex app as a FlashVar which can be read from the Flex app like:

    var contentPath : String = Application.application.parameters.contentPath;

    In the Actionscript code I have (thinly) encapsulated some of Sling client lib's methods like this:

    public function getContent(path : String, maxLevel : int = 0, filter : Boolean = false) : Object {  return ExternalInterface.call("Sling.getContent", path, maxLevel, filter);}

    Passing the content root folder to the Sling client lib will give us the JSON response which we can parse like this:

    var sling : Sling = new Sling();var c : Object = sling.getContent(contentPath, 3);for (var a : String in c) {  if(a.indexOf(":") == -1 && a.charAt(0) != "." && a != "desktop.ini" && a != "Thumbs.db") { // this is a slide    ...    var slide : Slide = new Slide();    slide.imageUrl = contentPath + "/" + a;    ...  }}

    After that the rest is pure Actionscript code that knows nothing about Sling or any repositories.

    It is possible to remove all traces of Sling from the Actionscript code (this might be desirable if e.g. a Flex coder is not aware of Sling). This is achieved by not using the Sling client library but rather retrieving the content in JSON format from within Flex directly (through the content's URL with .json extension, i.e. http://somehost/path/to/content.3.json for retrieving the content 3 levels deep). Since Flex has not built-in JSON capabilities something like the as3 core library can be used. One disadvantage of this approach is that the existing functionality from the Sling client lib has to be re-implemented.

    Server-side

    Another possibility is to pass all image links on the server-side as FlashVars. This method passes all required URLs to the Flex app when the app is loaded so any additional requests for content and subsequent JSON parsing are not necessary. However, the FlashVars have to be parsed, of course:

    private function onCreationComplete() : void {  for (var i:String in Application.application.parameters) {    prepareData(Application.application.parameters[i]);  }}private function prepareData(imageUrl : String) : void {  var slide : Slide = new Slide();  slide.imageUrl = imageUrl  ...}

    The FlashParams are set on the server-side in an .esp script:

    var flashvars = {  <%  i =0;  var children = currentNode.getNodes();  for(child in children) {    i++;    if(child.charAt(0) != "." && child != "desktop.ini" && child != "Thumbs.db") {      %>url<%=i%>:"<%=currentNode.getPath()%>/<%=child%>",<%    }  }  %>  };

    The server-side solution appears to require less code, mainly because there is less parsing to be done. However, once the content structure becomes "richer" (e.g. by including image captions and links) or some sort of hierarchy the JSON-based approach looks more intuitive to me.

    The example code

    For running the sample: there is a CRX content package attached that contains the compiled code and some content. After importing the package hithttp://localhost:7402/content/sexyflexy/albums/album1.clientside.html for the client-side example and http://localhost:7402/content/sexyflexy/albums/album1.serverside.html for the server-side example.

    There is also a Flex Builder project file that contains the Flex sources.

    In case you wonder about the weird if-clauses when looking at the image file names: when you drop images into WebDAV-mounted repository folder various operating systems will additionally create miscellaneous files that need to be ignored (thanks Steve and Bill).

    I do not know too much about other RIA technologies like Silverlight and JavaFX, but I suspect that similar approaches should be possible. If you know some more, please leave a comment.

    The example images are taken from:http://openphoto.net/download/index.html?image_id=16690,http://openphoto.net/download/index.html?image_id=18186, andhttp://openphoto.net/download/index.html?image_id=10035.

    Posted by Michael Marth OCT 15, 2008

    Posted in osgi and tutorial Comments 3

    Currently, I am moving this blog onto the latest version of Sling. Part of this effort is the migration of the comment spam checker into an OSGi bundle (mostly, that means wrangling with Maven). Here's two little bits of information I encountered along the way. Maybe they can be useful to someone.

    The actual backend services that are used for comment verification are Akismet and Typepad's Anti Spam. I took David Czarnecki's Akismet-Java library that wraps the respectice REST APIs of these services (both service providers actually use the same API).

    The trouble with David's code is that it uses commons-httpclient which depends on commons-logging. That clashes with Sling's use of log4j (note to the Java community: how could we get into this logging mess?). I found the solution for this annoying problem in Sling's parent pom.xml. Here's the relevant bit:

    <dependency>
      <groupId>commons-httpclient</groupId>
      <artifactId>commons-httpclient</artifactId>
      <version>3.1</version>
      <scope>provided</scope>
      <exclusions>
        <exclusion>
          <groupId>commons-logging</groupId>
          <artifactId>commons-logging</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    

    The second thing I would like to point out: it is quite simple to make OSGi bundles configurable through the web console (at http://localhost:7402/system/console).

    This is useful e.g. for configuring the API key of the above-mentioned service providers. In order to expose a property in the console use the annotation @scr.property:

    /** @scr.property */
    public static final String PARAM_API_KEY = "akismet.service.api.key"; 
    

    Other types like integer or boolean can also be used:

    /** @scr.property value="0" type="Integer" options 0="Akismet" 1="Typepad" */
    public static final String PARAM_SERVICE_PROVIDER = "akismet.service.provider";  
    

    The values are read in the service's setup method:

    Object key = configuration.get(PARAM_API_KEY);
    if (key != null) {
        this.apiKey = key.toString();
    }
    

    As usual, the full sources are attached.