Tassilo Pellegrini

Is OpenCalais becoming a Search Engine?

Open Calais Logo

From the very beginning I was wondering, what Reuters is going to do with all that data generated by OpenCalais. So I took a moment and browsed through the Privacy Statement (formerly their Terms Of Use), stepping over an enlightning paragraph:

We may build a search capability in the future. This capability would allow users to search the metadata repository and receive back a list of entries that match that search criteria. Unless you have authorized it via an API parameter, this list would not include the original metadata contained in the document but would expose the URL and description of the original document if you have provided it to us. If you do not want your content included in the search functionality you should indicate so in the appropriate area of the API. If you want to maximize the exposure of your content on the web you should not opt out of inclusion in the search functionality.

Hypothetical in wording this paragraph states it very clear: engagement in the search market is definitely an option. But they even go one step further.

We may build a syndication capability in the future. This capability would allow us to generate feeds of content that match certain selection criteria based on the metadata. As with search, unless you have authorized it via an API parameter, these feeds will not expose the original metadata contained in the document but would expose the URL and description of the original document if you have provided it to us. If you do not want your content included in the syndication functionality you should indicate so in the appropriate area of the API. If you want to maximize the exposure of your content on the web you should not opt out of inclusion in the syndication functionality.

This sounds to me like content reselling business. In this regard it might be interesting to take a look at the latest developments from IPTC: a policy standard called ACAP, which stands for Automated Content Access Protocol. Its designed to express access policies for robots on content items. Coupling ACAP with (hypothetical) search capabilities of OpenCalais could result in a major commercial distribution engine especially for traditional media content owners. Especially with the following marketing capabilities in mind:

We may build other products in the future based on statistical or other analysis of the metadata, such as trend analysis, emerging topics or others. In no case will these products expose the original document’s metadata.

Finally a business model for the Semantic Web? Whatever … smart guys, great service!

Tassilo Pellegrini

Is Reuters unleashing the Semantic Web?

Open Calais LogoOpen Calais – a new and smart API from Reuters – finally does what critics say to be the greatest obstacle to the Semantic Web: Taking the metadata burden from the enduser by providing an automatic meta-tagging tool. The principle behind Open Calais is easy: Put in some unstructured text and get in return nicely structured RDF-data. Backed by powerful Text Mining and machine learning techniques the API automatically detects entities like persons, events, countries and other facts.

Open Calais takes account of the fact that the added value of content is hidden in its structure. Uncovering that structure and representing it in a interoperable format makes existing ressources more programmable and reusable.

But what is in for Reuters? Nothing less than the biggest structured content repository on the web. Should not we talk about this little fact aswell?

For more information look up our current newsletter or subscribe for a monthly Semantic Web update.