Andreas Blumauer

Calais, Zemanta or textwise?

Beside W3C´s Linked Data Initiative, it were semantic services like Calais, Zemanta or textwise which have made the advantages of the Semantic Web visible for a broader community in the last few months.

Each of those services follow a slightly different approach, but in a nutshell: They all offer an API to provide “similarity search” around social media or also to enhance enterprise information management.

Like a magic bullet those services offer a relief from information overflow and seem to become kind of a “semantic web killer application“.

If you´re familiar with one or many of those services, drop a comment and let us know, what you´ve been experienced so far, or also if you can think of any applications or further developments you would like to see around these kind of services.

If you are not familiar with this stuff, for a quick demo go to

The widget uses text from this blog to calculate similar stuff from the web.


Reblog this post [with Zemanta]
Thomas Schandl

Tom Tague on Open Calais 4

The recent release of Open Calais v4 offers excting new possibilities by making a great contribution to Linked Data efforts.

Previous releases of Thomson Reuter’s Open Calais web service already produced promising results by extracting named entities, facts and events from user submitted contet – especially news articles. Now these extracted concepts come with an URI and are linked into the LOD cloud – specifically to DBpedia, Freebase, Musicbrainz, CIA world fact book and others. Tom Tague

On this occasion Tom Tague, vice president of the Calais creators ClearForest, answered questions the Semantic Web Company had about the goals of Open Calais. 

The latest release of Open Calais produces metadata conforming to linked data principles. You provide this great service free to everyone via your web service.
What led to that decision, which benefits are there for Thomson Reuters?

Thomson Reuters has the largest trusted content sources in the world – but we don’t have all the content in the world. We believe that the world is going to want to integrate highly managed and trustworthy content assets such as those provided by Thomson Reuters with the low latency, highly diverse content exploding on the web. Fundamentally what we’re trying to achieve is nearly effortless interoperability of content between any two partners – Calais enables this by extracting the semantic metadata buried in your content but then takes it a step further. By linking those semantic elements to the Linked Data cloud we are setting the stage for the dramatic enhancement of any content source – and we hope that many will choose Thomson Reuters as one of the methods for enhancing that content.

It seems with Open Calais you use a hybrid business model, which integrates end users in a form of enterprise collaboration into value creation.
Do you think such a business model is viable during the long run and what are your experience so far?

As of right now Calais isn’t truly a “Business”.  It’s a strategic initiative that’s setting at least a piece of the stage for the Linked Content Economy. Our goal is to understand how this new content economy is going to involve and to make certain that we have a leadership position as it moves from a concept to reality.

Apart from the thousands of users submitting content to Open Calais, there is also a community of developers making their own applications around your core app. How important are the social dynamics of the Open Source community for the success of Open Calais?

Extraordinarily important. Calais is a web service – which means it’s relevant to about 0.0001% of the population. We are absolutely reliant on the creativity, energy and domain expertise of our developer community to translate Calais from a technology to an end-user relevant capability. And – as a user-driven project we also rely on our developers and users to give us feedback on what they like, what they don’t and where they think we should head.
What are your plans regarding to offering your service in German?

We hope to get there in 2009. We’ve released basic French and are gearing up for additional languages in the coming year.

Thank you, Tom, for your answers! We look forward to more applications like Semantic Proxy and Linked Facts that demonstrate the great protential of the Calais engine.

Andreas Blumauer

The Semantic Web becomes mainstream, again.

The roll-out of semantic web technologies seems to enter the next stage. And it will be a quiet (r)evolution like the open source movement was. Two examples: Next year´s JAX in Mainz/Germany will have its first Semantic Web track. Organisers say that “the Semantic Web is going to conquer the business market soon” – we will see if it will be that martial.

Another example: One of the biggest Open Source Magazines in Germany, t3n, has recently published its new magazine with many stories around the Semantic Web. Editor in chief, Jan Christe says: “We have constantly stumbled upon semantic web related stuff  when we scanned the news, so we decided to set a focus on this topic.”

The Semantic Web is tangible now – Christe says: “Applications like OpenCalais, Zemanta or Tagaroo show the end-users what´s really in for them.” And it is also nice to see, that the semantic web won´t be reduced down to “search” anymore: t3n´s new issue has also interesting articles about Linked Data, for instance Sören Auer´s “How to develop Semantic Web Applications”.

So, as a conclusion: Paul Miller´s waiting for the “Semantic Web in Business” (a great blog post!) has an end. It won´t be found in heavy books, rather in the open source community and sometimes in light-weight magazines.

Yes, we can!

Jana Herwig

Multimedia in the Web of Data – Annotating and Interlinking Photos, Music, Multimedia [WOD-PD]

The Web of Data Practitioners Days concluded with the session on Multimedia in the Web of Data, the first part of which was led by Ansgar Scherp (University of Koblenz-Landau, Germany).

Multimedia content, as Ansgar pointed out, is hardly annotated, badly organized, and hardly ever looked at again – just think of the 300 something pics you might take on an average week-end getaway, and which you never touch again. Annotating multimedia content requires a lot of work and dedication – but most of the time, these pictures eventually dissappear in the “digital shoe box” that is your photo management software.

The most obvious remedy is to annotate content as early as possible, ideally when creating the content, ideally already on your portable camera (formerly known as: mobile phone:) Ansgar suggested to provide incentives for people to encourage picture annotation – professionals could for instance receive a higher financial reward if the deliver already annotated pictures. And of course there are ‘Games with a purpose’ such as Google Image Labeler, where players tag images in pairs, with and against each other, and are rewarded with the entertainment factor of the game.

The slide below shows what has happened (or will happen) to the process of creating photo books in the digital age and the age of mashups:

Ansgar Scherp's slides

After all, this is the age of the social semantic web, so why not try and (re-)use the content, structure and contexts that other users have already created on the web? Content augmentation, for the scope that Ansgar is concerned with, consists in the reuse of content and structures (e.g. from sources such as Flickr and Wikipedia, Geonames) made possible through the definition of rules, e.g.:

  • If there are two or less pictures on a page*
  • then automatically augment the page with additional photos using location information.

* Page here means a page in the album you are currently working on – you probably took a picture of yourself and your friend in Paris, and even though you went to the Centre Pompidou, you forgot to actually take a pic of the building itself – well, let the web be your library!

So the goal is clear: develop a procedure for applying automatic content augmentation in the creation of good photo books.

But what makes a ‘good’ photo book anyway? Here are some of the results of a structural analysis of real, human-created photobooks conducted at CeWe Color:

  • % of photos with faces: 36%
  • Number of album pages: 16.96
  • Photos per page: 6.69
  • Text fields per page: 1.45
  • % of pages with text: 87%

There are many rules that can be established from the structural analysis, which can be applied in turn in the creation of photoboooks, e.g. rules like this one,

  • If the text located in the upper third of a page
  • if the font size is equal or larger that 16 points
  • if the number of words is less than 10
  • if there is no caption on the page that has a bigger font size
  • then this page is the title

Ansgar recommended xSmart, which he described as a “context-driven authoring tool for page-based multimedia presentations.”

Ansgar’s presentation was followed by two more: one by Yves Raimond on Interlinking Music on the Web of Data, and one on Interlinking Multimedia – in spite of better intentions, I did not manage to cover these two in detail, but at least I gathered the links to relevant resources from all three sessions… Continue reading