Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for the ‘Text Mining’

Calais, Zemanta or textwise?

July 07, 2009 By: Andreas Blumauer Category: Mashups & Web services, Text Mining 2 Comments →

Beside W3C´s Linked Data Initiative, it were semantic services like Calais, Zemanta or textwise which have made the advantages of the Semantic Web visible for a broader community in the last few months.

Each of those services follow a slightly different approach, but in a nutshell: They all offer an API to provide “similarity search” around social media or also to enhance enterprise information management.

Like a magic bullet those services offer a relief from information overflow and seem to become kind of a “semantic web killer application“.

If you´re familiar with one or many of those services, drop a comment and let us know, what you´ve been experienced so far, or also if you can think of any applications or further developments you would like to see around these kind of services.

If you are not familiar with this stuff, for a quick demo go to

The widget uses text from this blog to calculate similar stuff from the web.


Reblog this post [with Zemanta]
Sphere: Related Content

Keep the Semantic Web trusty

March 13, 2009 By: Thomas Thurner Category: Corporate Semantic Web, Mashups & Web services, Politics, Privacy & Information Ethics, Text Mining 1 Comment →

Tim Berners-Lee at a Podcast Interview
Image via Wikipedia

In recent days – here at Semantic Web Company – we have had a lot of discussions on how the future of the Semantic Web (name it Web3.0 if you like) will develop. Several stakeholders on the future of the Semantic Web see already, that also a potential danger will come along with the technical realisation of the web3.0: This is the present possibility to create applications and mashups with semantic technologies that are a real drain on privacy and information ethics. Without an underpinning discussion about the ethical framework within technolgies like linked data, text-mining, biometric-systems and geo-systems in combination with the web of data, the whole domain is in danger to be doomed like genetic engineering some years ago.

It’s crucial for the public opinion on the Semantic Web, to adress the immanent risks regarding privacy and ethics. In this context I’ll see also Tim Berners-Lee’s statement yesterday: “W3C wants to help make sure data use is appropriate,” he said. Berners-Lee, who is director of W3C, said in an interview on Wednesday that the teams working on the Semantic Web project are making sure that privacy principles are included in its architecture: “The Semantic Web project is developing systems which will answer where data came from and where it’s going to — the system will be architectured for a set of appropriate uses.”

Maybe it’s an important step in keeping the further development of Semantic Web trusty in the eyes of public opinion, that the W3C has privacy and information ethics on their agenda and persons like Berners-Lee stand with their reputation for it. But it is also crucial to build this awareness on the corporate side. Only if everyone within the domain follows a common ethic understanding we have a public opinion, which is on the future potential of the Semantic Web, and not in fear of the same.

Reblog this post [with Zemanta]
Sphere: Related Content

Tom Tague on Open Calais 4

January 29, 2009 By: Thomas Schandl Category: Companies & Institutions, Linked Data & Open Data, Mashups & Web services, Text Mining, Tools & Software No Comments →

The recent release of Open Calais v4 offers excting new possibilities by making a great contribution to Linked Data efforts.

Previous releases of Thomson Reuter’s Open Calais web service already produced promising results by extracting named entities, facts and events from user submitted contet – especially news articles. Now these extracted concepts come with an URI and are linked into the LOD cloud – specifically to DBpedia, Freebase, Musicbrainz, CIA world fact book and others. Tom Tague

On this occasion Tom Tague, vice president of the Calais creators ClearForest, answered questions the Semantic Web Company had about the goals of Open Calais. 

The latest release of Open Calais produces metadata conforming to linked data principles. You provide this great service free to everyone via your web service.
What led to that decision, which benefits are there for Thomson Reuters?

Thomson Reuters has the largest trusted content sources in the world – but we don’t have all the content in the world. We believe that the world is going to want to integrate highly managed and trustworthy content assets such as those provided by Thomson Reuters with the low latency, highly diverse content exploding on the web. Fundamentally what we’re trying to achieve is nearly effortless interoperability of content between any two partners – Calais enables this by extracting the semantic metadata buried in your content but then takes it a step further. By linking those semantic elements to the Linked Data cloud we are setting the stage for the dramatic enhancement of any content source – and we hope that many will choose Thomson Reuters as one of the methods for enhancing that content.

It seems with Open Calais you use a hybrid business model, which integrates end users in a form of enterprise collaboration into value creation.
Do you think such a business model is viable during the long run and what are your experience so far?

As of right now Calais isn’t truly a “Business”.  It’s a strategic initiative that’s setting at least a piece of the stage for the Linked Content Economy. Our goal is to understand how this new content economy is going to involve and to make certain that we have a leadership position as it moves from a concept to reality.

Apart from the thousands of users submitting content to Open Calais, there is also a community of developers making their own applications around your core app. How important are the social dynamics of the Open Source community for the success of Open Calais?

Extraordinarily important. Calais is a web service – which means it’s relevant to about 0.0001% of the population. We are absolutely reliant on the creativity, energy and domain expertise of our developer community to translate Calais from a technology to an end-user relevant capability. And – as a user-driven project we also rely on our developers and users to give us feedback on what they like, what they don’t and where they think we should head.
What are your plans regarding to offering your service in German?

We hope to get there in 2009. We’ve released basic French and are gearing up for additional languages in the coming year.

Thank you, Tom, for your answers! We look forward to more applications like Semantic Proxy and Linked Facts that demonstrate the great protential of the Calais engine.

Sphere: Related Content

examples how the semantic web may be monetized

January 28, 2009 By: Thomas Thurner Category: Mashups & Web services, Search Engines, Text Mining 9 Comments →

Two new services based on semantic technologies came up recently: Jinni a recommendation portal for movies and tv shows and BooRah’s Restaurant Reputation Report. Both are good examples how the semantic web may be monetized.

jinni_jan09Jinni provides recommendations, answering a free given search. Based on semantic technologies Jinni uses Natural Language Processing on plot, mood, style, setting, soundtrack and more in combination with an ontology, created by film professionals (like Jinni says). When it launched in December, Jinni had 10,000 movie, TV and video titles.

In Jinni you don’t need to know about exact title, actor, director, place or year of production to get an result, you can enter simply a phrase describing the mood, genre or place the movie is about, and you will guided through a facilitated search to narrow your search and get at the end what you want. Or alternative, if you search for a movie and you have only a vague idea of the plot, you can formulate a plot’s description in free phrasing. As it also offers APIs for Internet and TV content providers you can make your way direct to an online store to download or purchase the movie.

boorahlogoAnother idea how to develop business orientated semantic web services comes with BooRah. BooRah is a service targeting restaurant owners to provide them reports of positive and negative reviews of food, service and ambiance at their restaurants. For that the service monitors negative and positive trends across hundreds of online review sites. Now restaurant owners can subscribe to receive a PDF of their monthly reports for an introductory price of $15 and a regular price of $25 per month. This PDFs came with charts, trends, rankings, summaries and some quotes from users, month by month. The reports may enable those restaurant owners to react and improve their services in the specific field. A simple but straight forward way to make money with semantic technologies

Reblog this post [with Zemanta]
Sphere: Related Content

Integrating Information Extraction into the KiWi-System: a proposal from Brno

June 27, 2008 By: Jana Herwig Category: Social Software, Text Mining 1 Comment →

Semantic technology isn’t about technology: It’s noble concern is to make the life and work of people easier. Yesterday, Marek Schmidt and Petr Knoth, both working on PhD project within Natural Language Processing (NLP) at Brno University, introduced their vision of how Information Extraction could be integrated into the KiWi-System.

First off: What is Information Extraction? In natural language processing, “information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents” (Wikipedia). Marek and Petr’s vision for using IE in KiWi is to support the user in the creation of semantic annotations.

Annotations and Ontology

The image above illustrates their vision: If a user, for instance, enters the text “Hello, I am the best expert in Java around the Sun” into the content editor, structured information is extracted, analyzed on the fly and returned as suggestions. Through the application of reasoning on existing annotations and on further information that is available on the system – e.g. relevant domain ontologies, but also information about the user himself – the system will be able to infer new statements: E.g. the system will be able to infer that Bill Rodgers has Java programming skills, even though this information has never been explicitly stated in the knowledge base.

Sphere: Related Content

KIWI Project Partners, Pt.1: LMU

March 13, 2008 By: Jana Herwig Category: Companies & Institutions, Text Mining 1 Comment →

François Bry“My expectations for this project are that we are going to acquire experience in system development, learn more about social software, deepen our knowledge of information extraction and gain more insight into personalisation adaptation.” This unassuming statement is not coming from me (who is definitely going to learn a lot through KIWI), but from François Bry, Professor in the Teaching and Research Unit ‘Programming and Modelling Languages’ in the Dept. of Computer Science at Ludwig-Maximilians-Universität Munich. Their areas of expertise are in Automated Reasoning, Rule-based Query Languages, Event-Condition-Action Rule Languages and Web Information Systems – naturally, this is also going to be the area in which they are contributing to KIWI, i.e. developing enabling technologies in the area of reasoning, querying and reason maintenance. You have probably already heard of the REWERSE – Reasoning on the Web project, which is another project that François and his team have been involved with.
The people on François’ KIWI team are: Norbert Eisinger, a senior researcher at the research unit for programming and modelling languages, Klara Weiand, a German doctoral student who did her master’s thesis in the Netherlands, Jakub Kotowski, a doctoral student hailing from Prague, and Ingeborg v. Troschke, supporting the team as an administrative assistant. Jakub wrote his master’s thesis about ontology engineering at Charles University, Prague. While working at Sun he developed a prototype for a semantic-web based project tracking tool. Klara put a focus on Artifical Intelligence, Computational Linguistics and Cognitive Psychology when studying toward a BSc at the University of Osnabrück, further pursueing this interest when doing an MSc in Artificial Intelligence with a focus on Language and Speech Processing at the University of Amsterdam.

The goals of LMU’s contribution to KIWI are:

  • to develop a rule-based language that can be used by wiki users to specify queries and derivation rules, ideally in a simple and intuitive way
  • to develop a reason maintenance component for this language that gives users an opportunity to understand why derivations exist and that allows for versioning of updates of the knowledge base
Sphere: Related Content

Semantic for the Zend Framework with OpenCalais

February 29, 2008 By: Rene Kapusta Category: Text Mining No Comments →

Semantic Blogging: Prototype of a Rich Text Editor with some semantic content flavour.

This example demonstrates how to use the Yahoo! YUI Editor and OpenCalais API as Zend Framework Service to flavour your content with some semantic.

Sphere: Related Content

Is Reuters unleashing the Semantic Web?

February 12, 2008 By: Tassilo Pellegrini Category: Social Software, Text Mining 1 Comment →

Open Calais LogoOpen Calais – a new and smart API from Reuters – finally does what critics say to be the greatest obstacle to the Semantic Web: Taking the metadata burden from the enduser by providing an automatic meta-tagging tool. The principle behind Open Calais is easy: Put in some unstructured text and get in return nicely structured RDF-data. Backed by powerful Text Mining and machine learning techniques the API automatically detects entities like persons, events, countries and other facts.

Open Calais takes account of the fact that the added value of content is hidden in its structure. Uncovering that structure and representing it in a interoperable format makes existing ressources more programmable and reusable.

But what is in for Reuters? Nothing less than the biggest structured content repository on the web. Should not we talk about this little fact aswell?

For more information look up our current newsletter or subscribe for a monthly Semantic Web update.

Sphere: Related Content