Latest Posts

Archives [+]

Entries filed under 'everything is content'

    Posted by Michael Marth APR 18, 2008

    Posted in crx quickstart, data first, davids model, everything is content, graph, jcr, open, rest, sling, social and tutorial Comments 4

    In case you follow emerging Internet standards you will have come across OpenSocial, the Google-led spec for social network applications. Major supporters are MySpace, LinkedIn, XING, Google's own Orkut, Hi5 and others. The Apache Software Foundation's implementation of this spec is called Apache Shindig. It is a container (runtime) for OpenSocial applications (which are called gadgets).

    In my opinion OpenSocial and Apache Sling are a good technical fit for at least two reasons:

    1. On a raw technology level both use the same technology building blocks, e.g. JavaScript: in Sling JS is used on the server-side for .esp templates and on the client-side in the case of JST templates. OpenSocial gadgets are coded in JS as well. Moreover, associated technologies like JSON, feeds and REST are supported by both.
    2. On a more conceptional level: As a spec that must work across a number of different social networks the majority of information that is accessible through the OpenSocial API is optional, i.e. it is up to the container if data is returned or not. This situation is a good fit to the unstructured, "Data First" approach that is enabled by Sling (respectively the underlying JCR).

    I would like to show Apache Shindig (Apache's OpenSocial container implementation) and CRX Quickstart (a bundle of Apache Sling and Day's JCR-compliant repository) working together in this blog post.

    Installation

    In this screencast I have shown how to install CRX Quickstart: double-click on its icon (CRX Quickstart is not available, yet, but it will be very soon). Strictly speaking, you do not need CRX Quickstart for the examples below. It all works with "plain" Sling as well.

    Installing Shindig is a tad more complicated and described on Shindig's web site. You need to check out, do a Maven build (I used revision 648157 for this example) and start Shindig's Jetty server on port 8080 with:

    mvn jetty:run-war

    Once you started Shindig hit /gadgets/files/samplecontainer/samplecontainer.html on http://localhost:8080. You should get a kind of gadget console that looks like this (click to enlarge):

     

    Shindig comes with an example implementation of a social network. By default it runs the "Hello World" example gadget located at:/gadgets/files/samplecontainer/examples/SocialHelloWorld.xml on http://localhost:8080.(btw Shindig comes with some example data so don't worry, if you have no friends - Shindig has some imaginary ones for you).

    Friends are Content

    What I would like do is: grab the gadget's viewer's friends and all the available data about them and store this data in the repository. For this purpose I have written a little gadget (see below) and saved it in my JCR repository at /apps/friends/friendsaver.html. By default the repository is running on port 7402, so when I point the gadget console to http://localhost:7402/apps/friends/friendsaver.html I get (click to enlarge):

     

    The gadget retrieves the viewer's friends and displays them in HTML. Moreover, in the background the viewer's data and the available friend data is posted to my repository. In the Content Explorer this looks like (click to enlarge):

     

    Hey, remember the "Everything is Content" mantra? Well, your imaginary friends are content, too.

    Please note that this works without setting up any schema or any other configuration of the repository. I ran it on an out-of-the-box CRX Quickstart (see also this screencast and this post about Data First). Only for the fields that are actually sent node properties are created.

    The Gadget Code

    The gadget is completely standard OpenSocial code, no surprises here. In onLoadFriends() the viewer's friends (variable viewerFriends) are iterated and displayed in HTML. For each opensocial.Person object the function createFriendNode() is called. In this function an HTTP POST request is sent to the repository that persists the person. Available opensocial.Person.Field data is sent as POST parameters (in the code only gender and first phone number are implemented) and thus persisted as node properties. I want to leverage the repository's hierarchy and store the friends as child nodes below the viewer (see David's model, rule 2). Here's the relevant snippet:

     /**  * Request for friend information.  */function getData() {      var req = opensocial.newDataRequest();  req.add(req.newFetchPersonRequest(opensocial.    DataRequest.PersonId.VIEWER), 'viewer');  req.add(req.newFetchPeopleRequest(opensocial.    DataRequest.Group.VIEWER_FRIENDS),    'viewerFriends');  req.send(onLoadFriends);}; /**  * Parses the response to the friend request  * @param {Object} dataResponse Friend      information that was requested.  */function onLoadFriends(dataResponse) {  var viewer = dataResponse.get('viewer').    getData();  var html = 'Friends of ' + viewer.    getDisplayName();   html += ':<br><ul>';  createFriendNode(viewer);  var viewerFriends = dataResponse.    get('viewerFriends').getData();  viewerFriends.each(function(person) {    html += '<li>'      + person.getDisplayName()      + '</li>';    createFriendNode(person, viewer);  });  html += '</ul>';  document.getElementById('message').    innerHTML = html;};  function createFriendNode(person, parent) {     var url = "http://localhost:7402/content/friends/";  if(parent) {    url += sanitizeId(parent.getId())+"/*";     } else {     url += "*";    }        var params = {};    params[gadgets.io.    RequestParameters.CONTENT_TYPE] =    gadgets.io.ContentType.TEXT;    params[gadgets.io.    RequestParameters.METHOD] =    gadgets.io.MethodType.POST;    var postParams = "";  postParams += 'name=' +    sanitizeId(person.getId()) + '&fullname=' +    person.getDisplayName();  if(person.getField(opensocial.Person.Field.    PHONE_NUMBERS)) postParams +=    ('&phone=' +    person.getField(opensocial.Person.Field.    PHONE_NUMBERS)[0].    getField(opensocial.Phone.Field.NUMBER))  if(person.getField(opensocial.Person.Field.    GENDER)) postParams += ('&gender=' +    person.getField(opensocial.Person.Field.    GENDER).getKey())  // I could add more fields here...        params[gadgets.io.RequestParameters.    POST_DATA] = postParams  gadgets.io.makeRequest(url, null, params);};    function sanitizeId(id) {  return id.replace(".", "_");   }gadgets.util.registerOnLoadHandler(getData);  

    Round-Tripping

    Now that the friends are stored in the repository each one has a URL. Displaying a friend in a simple HTML page can be done with e.g. server-side Javascript. Storing this file in the repository in /apps/friends/html.esp

    <html>  <body>    <h1><%= currentNode["fullname"] %></h1><ul><li>gender: <%= currentNode["gender"] %></li><li>phone: <%= currentNode["phone"] %></li></ul>  </body></html>

    will yield for the URL http://localhost:7402/content/friends/john_doe/jane_doe.html

    But this is only half the fun. It is much more interesting to retrieve the friends data in another OpenSocial gadget. This can easily be done without any repository-side code as Sling natively supports the json format. For example the URL http://localhost:7402/content/friends/john_doe/jane_doe.json will return this node in json format. Like that, we can easily access the friends nodes through a gadget containing this snippet:

    function makeCRXRequest() {    var params = {};    params[gadgets.io.RequestParameters.    CONTENT_TYPE] =    gadgets.io.ContentType.JSON;    var url =    "http://localhost:7402/content/friends/john_doe/"+    document.getElementById("person_name").value+    ".json";    gadgets.io.makeRequest(url, response, params);};function response(person) {    var html = "";  html += "name: " + person.data.fullname +    "<br/>";  html += "phone: " + person.data.phone +     "<br/>";  html += "gender: " + person.data.gender +    "<br/>";    document.getElementById('content_div').     innerHTML = html;};

    The gadget in action looks like this (click to enlarge)

     

    This little hack could be the starting point for a cross-social network phone book application.

    Final remarks

    I hope I could show that Sling and Shindig go really well together. Especially, being able to utilize the JCR repository as a backend without any coding on the repository side looks tempting to me. Maybe at one point Sling will even be able to run OpenSocial gadgets natively.

    In this post I concentrated on frontend intergration technologies. But OpenSocial will soon add a REST API next to its JS API. For Shindig the implementation of this REST API is likely to be Apache Abdera which uses JCR as an optional persistence storage. So there will be additional points of contact.

    Posted by Michael Marth FEB 15, 2008

    Posted in atom, atompub, everything is content, jcr and jsr-170 Comment 1

    JBoss has started a new project called DNA. As far as I understand it DNA is a new implementation of the concepts of the MetaMatrix software they bought last year. They describe it as:

    a repository and set of tools that make it easy to capture, version, analyze, and understand the fundamental building blocks of information

    Take a look at the architecture diagram of DNA. There are a number of interesting aspects on this project:

    • DNA uses Java Content Repositories for information storage

      JBoss DNA manages its information in JCR repositories
      I could not find any information about the actual implementation that is used, though.
    • More important, in DNA JCR is also seen as an API to all other sorts of existing content:

      Integrate multiple JCR repositories. Use relational databases. Access applications and services. JBoss DNA can federate and integration information from multiple JCR repositories, external databases, applications and services - all in real time without having to make copies.
      This is very much in line with the original vision of JSR-170: "JCR the API" is more important than "JCR the implementation". It is also very similar to the"everything is content" vision Day has been sharing for a very long time.
    • One last aspect I want to mention is displayed in the upper part of the architecture diagram: access to the repository is possible through HTTP, REST and ATOM (as well as JDBC!). The JCR-ATOM love affair seems to have a new offspring.

    It looks to me that this project might produce pieces of infrastructure that are useful far beyond the use case the project currently targets.

    Posted by Michael Marth DEC 20, 2007

    Posted in everything is content, jcr and rest Comments 2

    A while ago Stefano Mazzocchi has written an excellent post titled "Data First vs. Structure First". In it he describes a strategy called "Data First" where the data structures of an information system are, well, not structured in advance, but allow for data structures to emerge over time.

    He proclaims that:

    1. Data First is how we learn and how languages evolve. We build rules, models, abstractions and categories in our minds after we have collected information, not before. This is why it's easier to learn a computer language from examples than from its theory, or a natural language by just being exposed to it instead of knowing all rules and exceptions.

    2. Data First is more incrementally reversible, complexity in the system is added more gradually and it's more easily rolled back.

    3. Because of the above, Data First's Return on Investment is more immediately perceivable, thus lends itself to be more easily bootstrappable.

    And gives these real-life examples for Data First approaches:

    But look around now: the examples of 'data emergence' are multiplying and we use them every day. Google's PageRank, Amazon's co-shopping, Citeseer's co-citation, del.icio.us and Flickr co-tagging, Clusty clustering, these are all examples of systems that try to make structure emerge from data, instead of imposing the structure and pretend that people fill it up with data.

    The opposite approach is Structure First. Stefano asks:

    But then, one might ask, why is everybody so obsessed with design and order? Why is it so hard to believe that self-organization could be used outside the biological realm as a way to manage complex information systems?

    One important thing can be noted:

    On a local time-scale and once established, "Structure First" systems are more efficient.

    This is a great and thought-provoking post, because I am, like many others, trained to think about data in terms of structures (first). But I realize that this way of thinking can also be a limitation in what can be achieved.

    I would actually like to add one more aspect to Stefan's question why we are "so obsessed with design and order": our tools. In many developer minds thinking about data is equivalent to mentally setting up tables and rows in a relational model. In a good part it is the tools that shape our thinking.

    But actually there are tools that do NOT force us to structure the data in advance or, even better, that allow us to structure as much as we like. As you might expect on this blog one tool to mention is a Java Content Repository like CRX. In a JCR you can go along the full structure route and fully define node types, but you can also leave all your data unstructured (like David suggests in his model) or do anything in between. That is why I have been suggesting that JCRs are well-suited for rapid application development. The structure is allowed to emerge as you go along.

    (see Stefano again:)

    But there is more: we all know that a complete mess is not a very good way to find stuff, so "data first" has to imply "structure later" to be able to achieve any useful capacity to manage information. Here is where things broke down in the past: not many believed that useful structures could emerge out of collected data.

    Now, I am pleased to see that these ideas are gaining traction within the IT industry. Only recently two alternative implementations of these concepts have surfaced:

    Amazon SimpleDB

    Like all of the Amazon web services SimpleDB is a large (massively scalable, I presume) hosted service. Amazon describes it as a spreadsheet, but to me it looks more like hash map. What is important, the value part of the key-value hash map relation can take multiple attributes:

    In Amazon SimpleDB, to add the items above, you would PUT the three itemIDs into your domain along with the attribute-value pairs for each of the items. Without the specific syntax, it would look something like this:

    - PUT (item, 123), (description, sweater), (color, blue), (color, red)
    - PUT (item, 456), (description, dress shirt), (color, white), (color, blue)
    - PUT (item, 789), (description, shoes), (color, black), (material, leather)

    Amazon SimpleDB differs from tables of traditional databases in several important ways. First, you have the flexibility to easily go back later on and add new attributes that only apply to certain items - for example, sleeve length for dress shirts. Additionally there is no need to pre-define data types.[...]

    Amazon SimpleDB automatically indexes all of your data, enabling you to easily query for an item based on attributes and their values. In the above example, you could submit a query for items where (color = blue AND description = dress shirt), and Amazon SimpleDB would quickly return item 456 as the result.

    Note that there is no schema or data structure to set up. In fact, it is even impossible (as opposed to a JCR).

    David Dossot had the same idea I had when I stumbled across this: there should be a JCR interface to SimpleDB.

    I would personally be interested in a JCR adapter for SimpleDB: this would enable a semantically meaningful data storage layer to be plugged on top of the Amazon service. Think about massively distributed content management system...

    CouchDB

    If you want to put big corporate Amazon at one end of the IT spectrum you might put CouchDB at quite the opposite end: it is an experimental geeky project in alpha state. It describes itself like:

    What CouchDB is

    - A document database server, accessible via a RESTful JSON API.
    - Ad-hoc and schema-free with a flat address space.

    And further:

    Unlike SQL databases which are designed to store and report on highly structured, interrelated data, CouchDB is designed to store and report on large amounts of semi-structured, document oriented data.[...].

    In an SQL database, as needs evolve the schema and storage of the existing data must be updated. This often causes problems as new needs arise that simply weren't anticipated in the initial database designs, and makes distributed "upgrades" a problem for every host that needs to go through a schema update.

    With CouchDB, no schema is enforced, so new document types with new meaning can be safely added alongside the old. [...]

    You get the picture. The key word is "no schema" again.

    I welcome these new(*) approaches to storing data. While they will certainly not make relational data bases obsolete by any means they will broaden our minds when it comes to thinking about data. And they provide an additional tool in our tool chest.

    (*) Well, "new". JCRs have been around for quite a while. The rest of the industry has woken up. I am tempted to quote "Imitation is the sincerest form of flattery" :)

    REST

    While we are at "watching industry trends": it should also be noted that the two persistence technologies form above both expose a REST interface to applications. For JCRs this is implemented through Apache Sling or Microjax.

    While this is not a real surprise given the REST's success it is still worth noting. Compare it to the situation a few years ago, when accessing data invariably meant installing a driver and opening a socket connection.

    Update (3/1/2008)

    Seems like IBM has "bought" CouchDB and plans to donate the code to Apache.

    Posted by Lars Trieloff NOV 28, 2007

    Posted in cms, everything is content and graph Add comment

    Tim Berners Lee recently coined the term "The Graph". The idea here is that computer networks undergo a conceptual evolution. The starting point, the Net (internet) connects computers, abstracting from the cables between computers allowing the creation of networked applications. The most successful networked application became known as the Web (world wide web) and provided another abstraction. It is not the computers we care about, it is the documents (resources) and links between them (hyperlinks). This web is what most content management systems and content repositories care about - they manage documents after all.

    The Graph is nothing else than the realization that we actually do not care about documents (or even the computers where these documents are stored), but about the things that are described by the documents. A document can describe a person, an idea, a place. And in the same way as people are connected to each other, as ideas are connected to people and people to places, documents can be used to describe these connections by the means of hyperlinks or more advanced technologies like RDF.

    To refrain this thought - the graph is not about the documents and their connections, but about the things described in the documents and their relations, about the document's contents. The graph is about content and content relations.

    If you are a user of content management system or a content repository, if you are developing applications that deal with content, if you are a vendor of a content repository or content management systems there are some relevant implications for you:
    Your content repository should be about content, not about data or documents. It should allow you to deal with content in all representations, from highly structured to unstructured. It should allow you to unify access over all content stores. It should allow you to expose your content to the outside, allowing you to build a stronger graph.

    If you are looking for this kind of content repository, you should have a look at CRX and its open source companion Apache Jackrabbit.