Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Calais, Zemanta or textwise?

July 07, 2009 By: Andreas Blumauer Category: Mashups & Web services, Text Mining 2 Comments →

Beside W3C´s Linked Data Initiative, it were semantic services like Calais, Zemanta or textwise which have made the advantages of the Semantic Web visible for a broader community in the last few months.

Each of those services follow a slightly different approach, but in a nutshell: They all offer an API to provide “similarity search” around social media or also to enhance enterprise information management.

Like a magic bullet those services offer a relief from information overflow and seem to become kind of a “semantic web killer application“.

If you´re familiar with one or many of those services, drop a comment and let us know, what you´ve been experienced so far, or also if you can think of any applications or further developments you would like to see around these kind of services.

If you are not familiar with this stuff, for a quick demo go to

The widget uses text from this blog to calculate similar stuff from the web.


Reblog this post [with Zemanta]
Sphere: Related Content

Tom Tague on Open Calais 4

January 29, 2009 By: Thomas Schandl Category: Companies & Institutions, Linked Data & Open Data, Mashups & Web services, Text Mining, Tools & Software No Comments →

The recent release of Open Calais v4 offers excting new possibilities by making a great contribution to Linked Data efforts.

Previous releases of Thomson Reuter’s Open Calais web service already produced promising results by extracting named entities, facts and events from user submitted contet – especially news articles. Now these extracted concepts come with an URI and are linked into the LOD cloud – specifically to DBpedia, Freebase, Musicbrainz, CIA world fact book and others. Tom Tague

On this occasion Tom Tague, vice president of the Calais creators ClearForest, answered questions the Semantic Web Company had about the goals of Open Calais. 

The latest release of Open Calais produces metadata conforming to linked data principles. You provide this great service free to everyone via your web service.
What led to that decision, which benefits are there for Thomson Reuters?

Thomson Reuters has the largest trusted content sources in the world – but we don’t have all the content in the world. We believe that the world is going to want to integrate highly managed and trustworthy content assets such as those provided by Thomson Reuters with the low latency, highly diverse content exploding on the web. Fundamentally what we’re trying to achieve is nearly effortless interoperability of content between any two partners – Calais enables this by extracting the semantic metadata buried in your content but then takes it a step further. By linking those semantic elements to the Linked Data cloud we are setting the stage for the dramatic enhancement of any content source – and we hope that many will choose Thomson Reuters as one of the methods for enhancing that content.

It seems with Open Calais you use a hybrid business model, which integrates end users in a form of enterprise collaboration into value creation.
Do you think such a business model is viable during the long run and what are your experience so far?

As of right now Calais isn’t truly a “Business”.  It’s a strategic initiative that’s setting at least a piece of the stage for the Linked Content Economy. Our goal is to understand how this new content economy is going to involve and to make certain that we have a leadership position as it moves from a concept to reality.

Apart from the thousands of users submitting content to Open Calais, there is also a community of developers making their own applications around your core app. How important are the social dynamics of the Open Source community for the success of Open Calais?

Extraordinarily important. Calais is a web service – which means it’s relevant to about 0.0001% of the population. We are absolutely reliant on the creativity, energy and domain expertise of our developer community to translate Calais from a technology to an end-user relevant capability. And – as a user-driven project we also rely on our developers and users to give us feedback on what they like, what they don’t and where they think we should head.
What are your plans regarding to offering your service in German?

We hope to get there in 2009. We’ve released basic French and are gearing up for additional languages in the coming year.

Thank you, Tom, for your answers! We look forward to more applications like Semantic Proxy and Linked Facts that demonstrate the great protential of the Calais engine.

Sphere: Related Content

The Semantic Web becomes mainstream, again.

December 05, 2008 By: Andreas Blumauer Category: Enterprise 2.0, Literature & Publications No Comments →

The roll-out of semantic web technologies seems to enter the next stage. And it will be a quiet (r)evolution like the open source movement was. Two examples: Next year´s JAX in Mainz/Germany will have its first Semantic Web track. Organisers say that “the Semantic Web is going to conquer the business market soon” – we will see if it will be that martial.

Another example: One of the biggest Open Source Magazines in Germany, t3n, has recently published its new magazine with many stories around the Semantic Web. Editor in chief, Jan Christe says: “We have constantly stumbled upon semantic web related stuff  when we scanned the news, so we decided to set a focus on this topic.”

The Semantic Web is tangible now – Christe says: “Applications like OpenCalais, Zemanta or Tagaroo show the end-users what´s really in for them.” And it is also nice to see, that the semantic web won´t be reduced down to “search” anymore: t3n´s new issue has also interesting articles about Linked Data, for instance Sören Auer´s “How to develop Semantic Web Applications”.

So, as a conclusion: Paul Miller´s waiting for the “Semantic Web in Business” (a great blog post!) has an end. It won´t be found in heavy books, rather in the open source community and sometimes in light-weight magazines.

Yes, we can!

Sphere: Related Content

Multimedia in the Web of Data – Annotating and Interlinking Photos, Music, Multimedia [WOD-PD]

October 23, 2008 By: Jana Herwig Category: Conferences & Events, Internet & Media, Linked Data & Open Data, Mashups & Web services, Social Software 4 Comments →

The Web of Data Practitioners Days concluded with the session on Multimedia in the Web of Data, the first part of which was led by Ansgar Scherp (University of Koblenz-Landau, Germany).

Multimedia content, as Ansgar pointed out, is hardly annotated, badly organized, and hardly ever looked at again – just think of the 300 something pics you might take on an average week-end getaway, and which you never touch again. Annotating multimedia content requires a lot of work and dedication – but most of the time, these pictures eventually dissappear in the “digital shoe box” that is your photo management software.

The most obvious remedy is to annotate content as early as possible, ideally when creating the content, ideally already on your portable camera (formerly known as: mobile phone:) Ansgar suggested to provide incentives for people to encourage picture annotation – professionals could for instance receive a higher financial reward if the deliver already annotated pictures. And of course there are ‘Games with a purpose’ such as Google Image Labeler, where players tag images in pairs, with and against each other, and are rewarded with the entertainment factor of the game.

The slide below shows what has happened (or will happen) to the process of creating photo books in the digital age and the age of mashups:

Ansgar Scherp's slides

After all, this is the age of the social semantic web, so why not try and (re-)use the content, structure and contexts that other users have already created on the web? Content augmentation, for the scope that Ansgar is concerned with, consists in the reuse of content and structures (e.g. from sources such as Flickr and Wikipedia, Geonames) made possible through the definition of rules, e.g.:

  • If there are two or less pictures on a page*
  • then automatically augment the page with additional photos using location information.

* Page here means a page in the album you are currently working on – you probably took a picture of yourself and your friend in Paris, and even though you went to the Centre Pompidou, you forgot to actually take a pic of the building itself – well, let the web be your library!

So the goal is clear: develop a procedure for applying automatic content augmentation in the creation of good photo books.

But what makes a ‘good’ photo book anyway? Here are some of the results of a structural analysis of real, human-created photobooks conducted at CeWe Color:

  • % of photos with faces: 36%
  • Number of album pages: 16.96
  • Photos per page: 6.69
  • Text fields per page: 1.45
  • % of pages with text: 87%

There are many rules that can be established from the structural analysis, which can be applied in turn in the creation of photoboooks, e.g. rules like this one,

  • If the text located in the upper third of a page
  • if the font size is equal or larger that 16 points
  • if the number of words is less than 10
  • if there is no caption on the page that has a bigger font size
  • then this page is the title

Ansgar recommended xSmart, which he described as a “context-driven authoring tool for page-based multimedia presentations.”

Ansgar’s presentation was followed by two more: one by Yves Raimond on Interlinking Music on the Web of Data, and one on Interlinking Multimedia – in spite of better intentions, I did not manage to cover these two in detail, but at least I gathered the links to relevant resources from all three sessions… (more…)

Sphere: Related Content

Is OpenCalais becoming a Search Engine?

March 31, 2008 By: Tassilo Pellegrini Category: Mashups & Web services, Privacy & Information Ethics, Search Engines 5 Comments →

Open Calais Logo

From the very beginning I was wondering, what Reuters is going to do with all that data generated by OpenCalais. So I took a moment and browsed through the Privacy Statement (formerly their Terms Of Use), stepping over an enlightning paragraph:

We may build a search capability in the future. This capability would allow users to search the metadata repository and receive back a list of entries that match that search criteria. Unless you have authorized it via an API parameter, this list would not include the original metadata contained in the document but would expose the URL and description of the original document if you have provided it to us. If you do not want your content included in the search functionality you should indicate so in the appropriate area of the API. If you want to maximize the exposure of your content on the web you should not opt out of inclusion in the search functionality.

Hypothetical in wording this paragraph states it very clear: engagement in the search market is definitely an option. But they even go one step further.

We may build a syndication capability in the future. This capability would allow us to generate feeds of content that match certain selection criteria based on the metadata. As with search, unless you have authorized it via an API parameter, these feeds will not expose the original metadata contained in the document but would expose the URL and description of the original document if you have provided it to us. If you do not want your content included in the syndication functionality you should indicate so in the appropriate area of the API. If you want to maximize the exposure of your content on the web you should not opt out of inclusion in the syndication functionality.

This sounds to me like content reselling business. In this regard it might be interesting to take a look at the latest developments from IPTC: a policy standard called ACAP, which stands for Automated Content Access Protocol. Its designed to express access policies for robots on content items. Coupling ACAP with (hypothetical) search capabilities of OpenCalais could result in a major commercial distribution engine especially for traditional media content owners. Especially with the following marketing capabilities in mind:

We may build other products in the future based on statistical or other analysis of the metadata, such as trend analysis, emerging topics or others. In no case will these products expose the original document’s metadata.

Finally a business model for the Semantic Web? Whatever … smart guys, great service!

Sphere: Related Content

Is Reuters unleashing the Semantic Web?

February 12, 2008 By: Tassilo Pellegrini Category: Social Software, Text Mining 1 Comment →

Open Calais LogoOpen Calais – a new and smart API from Reuters – finally does what critics say to be the greatest obstacle to the Semantic Web: Taking the metadata burden from the enduser by providing an automatic meta-tagging tool. The principle behind Open Calais is easy: Put in some unstructured text and get in return nicely structured RDF-data. Backed by powerful Text Mining and machine learning techniques the API automatically detects entities like persons, events, countries and other facts.

Open Calais takes account of the fact that the added value of content is hidden in its structure. Uncovering that structure and representing it in a interoperable format makes existing ressources more programmable and reusable.

But what is in for Reuters? Nothing less than the biggest structured content repository on the web. Should not we talk about this little fact aswell?

For more information look up our current newsletter or subscribe for a monthly Semantic Web update.

Sphere: Related Content