Latest Posts

Archives [+]

Categories [+]

Authors [+]

Entries by Bertrand Delacretaz

    Posted by Bertrand Delacretaz JUN 28, 2010

    Comment 1

    Here are my slides from last week's excellent TransferSummit conference in Oxford. Kudos to the OSSWatch team for flawless organisation, as usual!

    Why do we open source our infrastructure code? It's mostly about sustainability and quick high quality feedback for our core components.

    I dont like much text in my slides to this is a very synthetic view on things, I hope you get the idea.

    See also http://day.com/about/opensource

    Posted by Bertrand Delacretaz JUN 21, 2010

    Posted in fise Comments 2

    In the last few weeks, a number of discussions around FISE with IKS consortium and community members have helped clarify my vision of a semantic engine that sits alongside content management systems to provide semantic functionality to them.

    For now, this is just my own view on things; we will need to check how well this matches the goals of IKS and its community. The main goal of IKS is to create a revolutionary semantic CMS technology stack, but this is more evolutionary, an add-on that CMS vendors like us, who are not looking at changing their technology stack in the near future, can use alongside their existing software.

    With this in mind, here's my vision for FISE:

    • A RESTful engine that provides semantic functionality as an add-on to existing CMS.
    • A minimal core that uses plugins for most everything (the current prototype is built on OSGi, like Sling and CQ5).
    • Provides semantic lifting (which means extracting semantic information from raw content and existing metadata) using pluggable semantic engines: recognize persons, places and other entities in text, map geographic locations, intelligent metadata extraction from images and other multimedia content to enable semantic similarity searches, etc.
    • Provides a search engine for that semantic metadata (SPARQL endpoint in the current prototype).
    • Provides extension points to plugin semantic reasoners and other fancy experimental components, while keeping the core as simple as possible.
    • Use-case driven: nothing gets added to the FISE core without a concrete use case to justify it. 
    • RESTful interface uses existing standards as much as possible, and might extend them where it makes sense. Semantic CMIS and/or Atom come to mind here. Extensions should be contributed to those standards where possible.
    • Open source, open development, reproducible builds, continuous integration, readable automated tests to validate and describe functionality, etc. 
    • Should be an Apache project, to build a community and survive the 4-year lifespan of the IKS project itself.

    Would such a semantic engine add value to your CMS or content applications? Does the above vision match your expectations?

    Please speak up, either in the comments below, on your blogs, on the FISE mailing list or tomorrow at the IKS early adopters workshop if you're there. Your opinion will help us shape this exciting project!

    Many thanks to the members of the burgeoning FISE community for bringing it from a vague idea to where it stands now: we have working code, far from perfect but an excellent catalyzer for clarifying our vision and needs. Let's take this to the next level!

    Posted by Bertrand Delacretaz JUN 18, 2010

    Posted in fise and iks-project Comment 1

    Next week in Salzburg, the IKS Early Adopter's Workshop brings together a number of CMS vendors, to discuss what we've been doing at IKS for more than a year now, and what comes next.

    As my involvement in IKS mostly revolves around the FISE prototype semantic engine, I'll present it and explain how it can help CMS vendors graft value-adding semantic features to their CMSes. In a RESTful way of course - we don't want you to do open heart surgery on your CMS at this point.

    Finding image similarities with FISE

    Here's a use case that I think summarizes what FISE could do, medium-term, to help CMS vendors manage their content in more semantic ways. I won't scare you with RDF, ontologies and the like: at this level we're just looking at providing valuable features to our users, without requiring them to learn anything new. There might well be RDF, ontologies and SPARQL queries under the hood, but at our level we don't care, this is just about the user story.

    Here's a picture that I took on a trip to Iceland a few years ago. Typical Icelandic house with typical big Icelandic four-wheel drive vehicle (unlike many places you actually need those there, believe me) parked in front, with a canoo on top. Kinda makes you want to live there if you like wide open spaces.

    Icelandic House

    Now, here's a drawing by young Bertrand which has much the same content, at the semantic level: a big car in front of a house, with a boat on top of the car. Not too stylish, but the same basic information is in there. Smiling sun of course, which you might get in Iceland every ten minutes in between showers...

    Young Bertrand's House Drawing

    For our eyes and brain it is trivial to see that both images describe a similar scene. However, I doubt your CMS or digital asset management system would consider them as similar. You need a good semantic understanding of them to find out that they pretty much tell the same story - it's not just about the raw bits.

    That's where FISE comes into play. We don't have all the required semantic analysis algorithms in FISE for this use case today, but the current infrastructure would (mostly) enable it if we had them.

    FISE allows you to plug in such algorithms, using a simple Java EnhancementEngine interface. Based on OSGi, FISE makes it possible to mix and match a wide range of Java libraries without conflicts, allowing pre-existing or new analysis modules to collaborate. Analyzers written in other languages can be integrated using either native language integration or remote access, ideally over HTTP.

    Image analysis scenario

    Here's how FISE would help find out that our images are similar:

    • A JPEG engine extracts the EXIF metadata from the images if present.
    • A text-based entity extraction engine looks at that metadata, and if the images have a good title or description connects them with some well-known entities. For example Country=Iceland and Contains=House for the first one, and Contains=House and Contains=Car for the second one, if the images titles are "House in Iceland", and "The Big Car in front of Dad's House" for the second one.
    • A shape-based entity recognition engine adds metadata like Contains=Car and Contains=House for both images.
    • A graphical analysis engine adds metadata like Style=Photo for the first and Style=Drawing and Style=Childish for the second image, due to its strong primary colors and ragged lines.
    • A similarity search engine integrated in FISE can then find out that both images contain similar objects, so they can be considered similar even though the style of image is very different. You could also search for childish drawings of houses, and then get a link to the nicer photo besides young Bertrand's drawing.

    The role of FISE is to coordinate the various analyzers, combine and store their results, make them searchable and provide a RESTful interface to all this.

    FISE is the integration engine that makes such scenarios possible once analyzers of sufficiently good quality are available. As usual, the sum is greater than its parts, so being able to combine various such analyzers should lead to very valuable results, even with imperfect analyzers.

    Orchestration and intents

    What's currently missing in FISE is a way of orchestrating the enhancement engines: currently they only run in a configurable sequence, without real interactions between them.

    We'll have to discuss this on the FISE mailing list, but right now I'm thinking that something similar to the Android Intents mechanism, where an engine broadcasts information about what it has found so that other engines can build upon that information, might be well suited to that problem. The orchestator would start by broadcasting an "analyze incoming content" intent, to which a few engines would respond. The engines in turn broadcast intents like "enhance title and description", "analzye image content" etc. and the orchestator keeps going, iteratively, until there are no outsanding intents left.

    That analysis might take some time, depending on which analyzers are used, but the FISE design allows for asynchronous computing of metadata as well. In some cases, involving humans in parts of the analysis (a la mechanical Turk) might be the best way to get meaningful results, at least until Kuzweil's Singularity hits us. Asynchronous analysis would then be required, and FISE would have to be able to say "I have some metadata for your content already, and more is supposed to come at some point". This is foreseen in the current FISE design but not yet fully specified nor implemented.

    Coda

    I think this image similarity use case is a good way to explain what FISE is about, and will help validate the FISE design.

    FISE is on a very good track to making such things possible, while keeping things simple from the CMS integrator's point of view, thanks to its RESTful interface. The design needs some refinements, and we'll very certainly get some good input about that next week at the workshop - looking forward to it!

    Posted by Bertrand Delacretaz MAY 20, 2010

    Posted in fise, iks-project and screencast Add comment

    Here's a recording of my presentation of IKS FISE to my colleagues today.

    As you can imagine, convincing Day folks of the value of a RESTful interface is quite easy. None of them sent me code for that killer EnhancementEngine yet, but I'm hopeful ;-)

    IKS is organizing a FISE hackathon next week during the General Assembly, hope to see you there, and remote participation via IRC is also welcome.

    IKS FISE

    Posted by Bertrand Delacretaz APR 01, 2010

    Posted in fise, iks-project and rest Comments 2

    I'm happy to report that the first IKS FISE (pronounced like an aussie would say "phase") Hackathon, held in Furtwangen earlier this week, has been a success.

    We have implemented a very simple "content enhancement server" to which you PUT or POST content using HTTP requests. The server uses a series of "enhancement engines", plug-ins that can enhance the content with automatically generated tags, entities based on natural language recognition, etc.

    All in a very simple way for now, but the important thing is that we have demonstrated our vision of a very simple RESTful engine for semantic enhancement of content, in the form of working prototype software. The services API is extremely simple, and building the system as OSGi services makes it very easy to plug-in new enhancement engines.

    To whet your appetite, here's a quick walkthrough.

    Starting the FISE server is the hardest part for now, it's all source code that you need to build yourself, including snapshots so don't try it unless you're very familiar with building bleeding edge java software.

    Once that's done, however, using FISE is very simple. To add content to it, use an HTTP PUT request like

    $ curl -H Content-Type:text/plain -T data/text-examples/obama-signing.txt http://localhost:8181/fise/obama

    which, if all goes well, returns a status code 200 and the ID of the FISE content that was stored (/obama in this case).

    Then, make a GET request on the same URL to get the metadata of that piece of content, generated by the currently active FISE EnhancementEngines:

    $ curl http://localhost:8181/fise/obama
    **ContentItem:/obama
    **Metadata:
    </obama>
    <http://rdfs.org/sioc/ns#related_to>
    <http://dbpedia.org/resource/Texas_Health_Resources>

    </obama>
    <http://rdfs.org/sioc/ns#related_to>
    <http://dbpedia.org/resource/Richard_Gottfried>
    ....more RDF triplets

    The FISE architecture allows for each engine to suggest running its content enhancement operations in asynchronous mode, which can be very helpful for analyzing large items.

    So, from the CMS developer point of view, integrating FISE is very simple. Queries are handled in the same way, using HTTP GET requests. The current prototype runs either in standalone mode (in-memory storage, no queries) or on top of Apache Clerezza which provides persistence and SPARQL queries.

    On the other side of things, supplying new enhancement engines or wrapping existing ones to make them available in FISE is also very simple - one just needs to implement the following interface (shown in simplified form here, constant declarations removed for brevity):

    public interface EnhancementEngine {
    /** Can this engine enhance supplied content item? */
    int canEnhance(ContentItem ci);

    /** Enhance supplied item's metadata */
    void computeEnhancements(ContentItem ci);
    }

    To create a new engine, one just needs to create an OSGi service that implements this interface, and register it with the FISE runtime.

    FISE is in my opinion a very exciting development for the IKS project, fulfilling our hopes of creating an interface between CMS developers, who can use FISE easily from the HTTP side, and semantic researchers, who can provide new EnhancementEngines.

    This week's hackathon has more than met the goals set in my presentation at the last IKS workshop in Rome - looking forward to where FISE will lead us!

    More info at http://wiki.iks-project.eu/index.php/FISE