Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for the ‘Search Engines’

Why SKOS thesauri matter – the next generation of semantic technologies

August 31, 2010 By: Andreas Blumauer Category: Search Engines, Semantic Web Applications, Text Mining, Tools & Software No Comments →

As a matter of fact still a lot of “semantic technologies” are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like “similarity search”, the search for similar documents to enable cross-reading or recommendation systems.

Providers of first generation semantic technologies calculate rather basic “semantic networks” by co-occurency analysis which results sometimes in  disappointing results. Bearing in mind that Google just bought a company (“Google buys Metaweb“) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.

A demo application was recently developed by PoolParty team where one can find out how thesauri will improve search results on top of second generation semantic technologies. With PoolParty SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag & Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) STW (Standard Thesaurus für Wirtschaft), DBpedia and respective articles from Wikipedia.

STW which was developed by the German National Library of Economics (ZBW) provides vocabulary on any economic subject: about 6,000 standardized subject headings and about 18,000 entry terms to support individual keywords.

This background knowledge is used in this demo app to improve the search for similar documents dramatically:

Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as “similar documents”.

This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from Econstor) but of course for other recommender systems thesauri from other domains can be used instead of STW.

Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of third generation semantic technologies are currently developed by LASSO project and LOD2 project, two innovative projects in the area of linked data and the semantic web.

Sphere: Related Content

What if the biggest web company bought one of the central semantic web players?

July 17, 2010 By: Andreas Blumauer Category: Companies & Institutions, Search Engines 4 Comments →

Well, exactly this happened yesterday: Google bought Metaweb – provider of Freebase. Freebase is an important hub in the linked data cloud providing 12 million entities with uniform resource identifiers most of them linked to other semantic web datasets like DBpedia or New York Times. For example: Google´s page on Freebase offers a rich source for machine-readable facts around this company.

What does this mean to the Semantic Web Community which has  been working on a smarter web in the last decade?
Well, a lot… First of all, it´s good to hear that Google will continue to develop Freebase as a free and open database to everyone, saying “… we would be delighted if other web companies use and contribute to the data.”

Until yesterday still a lot of companies were not fully convinced if the Semantic Web will play a central role in the further development of the Internet. Now the game has changed. The entity-driven approach to develop web applications has just started now:

We will keep on reporting and discussing how Google will influence the development of the Semantic Web – and if I had a wish for free: Please add RDF(a) to the Freebase widgets!

Sphere: Related Content

Linking Open Data to Thesaurus Management

February 16, 2010 By: Tassilo Pellegrini Category: Corporate Semantic Web, Knowledge Management, Linked Data & Open Data, Search Engines, Semantic Web Applications, Software Development 2 Comments →

The Vienna-based company punkt. netServices is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a demo.

Purpose

Poolparty was conceived to facilitate various applications like

  • Semantic search engines
  • Recommender systems (similarity search)
  • Corporate bookmarking
  • Annotation- & tag recommender systems
  • Autocomplete services and facetted browsing.

These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.

Thesaurus Management

PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag & drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.

poolparty-blueskin

PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus’ domain in order to glean candidate terms. This is done by the key-phrase extractor of KEA. The extracted terms can be selected by the user, thereby becoming “free concepts” which later can be integrated into the thesaurus, turning them into “approved concepts”.

Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept’s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic search. Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.

PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: Cocktail thesaurusStandard Thesaurus for Economics)

PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or Zthes format. Those functionalities can also be consumed as stand-alone web services via PoolParty SKOS Services. Additionaly, lists of concepts and their labels can also be imported via CSV files.

Linked (Open) Data

PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.

Concepts in the thesaurus can be linked to e.g. DBpedia  via a service like Georgi Kobilarov’s DBpedia lookup service, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.

poolparty-lod

Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.

PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the KiWi  project funded by the European Commission. KiWi also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.

Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.

This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using REST/HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.

Technology

The software is written in Java and utilizes the SAIL API, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on Yahoo User Interface (YUI). Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the KEA 5 API, which is extended for the use of controlled vocabularies stored in a SAIL Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in Lucene/Solr or any other (enterprise) search system along with extracted and semantically related concepts.

Reblog this post [with Zemanta]
Sphere: Related Content

1000-and-one pulldowns

May 12, 2009 By: Thomas Thurner Category: Internet & Media, Knowledge Management, Search Engines 2 Comments →

Personalisation interface
Image by wocrig via Flickr

Luckily, times have come, where semantic search techniques have found their way to enhance knowledge providing theme portals. Nearly once a week a new knowledge portal with built-in semantic search pops up. They deal with environmental issues, health care, economy etc. These sites are good examples how the vision of a knowledge web is fostered by semantic technologies. Such focused approaches are great showcases for “a” semantic web (even if they are not based on “the” RDF semantic web) in the next few months besides general knowledge portals like Wolfram Alpha.

But the potential of these semantic theme portals is often reduced essentially by their bad usability. You get lost in categories and flags – you get puzzled by pulldowns, mouseovers and embedded hierachies – it’s sometimes a mess out off 1001 functions. You need to understand the underpinning semantic concept to get oriented within these applications – and this is not the goal of the exercise. Search has to be easy.

To show the potential of semantic technologies, we need good examples, which offer good usability. This is a call to everyone to provide such examples.

See my favorites:

  • NextBio, a platform that enables life science researchers to search, discover, and share knowledge locked within public and proprietary data
  • reegle, the Search Engine for Renewable Energy and Energy Efficiency
  • CultureSampo, a Finnish cultural heritage platform for institutional organizations as well as private citizens
Reblog this post [with Zemanta]
Sphere: Related Content

Linked Data is not owl:sameAs Semantic Web

March 30, 2009 By: Andreas Blumauer Category: Linked Data & Open Data, Search Engines 3 Comments →

twitter_cloudletWhile some people work heavily on the extension of the semantic web infrastructure, like Talis Connected Commons or OpenLink´s Amazon EC2 Instantiation others have started to bring the semantic web closer to the developers and therefore to a much broader audience: They offer search facilities or Linked Data Navigators like OpenLink´s Entity Finder or DERI´s VisiNav.

Those kind of applications should not be confused with “semantic web” end-user-applications like Google´s Wonderwheel or INTSPEI´s Cloudlet: To add some semantics to existing user-interfaces can be helpful and obviously users are ready for such experiments, but of course this is NOT the innovation which the semantic web will bring but it is a very important step to be taken in parallel with the linked data initiative.

Let´s take a look at Cloudlet: This tool is an easy-to-use free Firefox extension that adds context-sensitive tag clouds to the most popular search engines and helps people more efficiently navigate through their search results. The previous version of Search Cloudlet worked with Google and Yahoo; the new version also works with Twitter. It adds Tag Clouds, Author Clouds, Recipient Clouds and Hashtag Clouds to Twitter search, Twitter user profiles and home pages. See some reviews on this popular tool.

Cloudlet is a child of the Web. INTSPEI has learned all lessons from Web 2.0 especially how to promote ideas using the blogosphere and how to identify market trends as early as possible, and it generates some added value for the users which is obvious. Sure, it doesn´t make use of linked data yet, but as a typical representative of the fast growing “semantic search evolution” it reminds me on Chris Welty´s famous insight: “In the Semantic Web, it is not the Semantic which is new, it is the Web which is new.”

Web 1.0 was the WWW without tons of network effects. Web 2.0 changed that a lot.

Linked Data is not the Semantic Web, it´s the basement for it. From a software developer´s and an IT archictect´s perspective it might seem as those two concepts were the same. But this community represents a very small percentage of all web-users.

So where is the User´s Web in the Linked Data architecture? If you´re looking at TimBL´s Linked Data principles one can clearly see that this is a “Web” for developers.

But things evolve. And some Web companies will jump on the bandwagon and will, for instance, improve their tagclouds, their semantic search, their recommender systems (Twine?) or their similarity search a lot by making use of linked data.

Like semantic search becomes mainstream (or call it “semantic search 2.0″) right now, then (in about three years, I guess) linked data will become part of a lot of mainstream applications. Linked data will generate tons of new network effects, maybe even new business models, it won´t be avant-garde anymore. It will be part of the Semantic Web.

Sphere: Related Content

the next google

March 25, 2009 By: Thomas Thurner Category: Search Engines, Software Development, Tools & Software No Comments →

Google in 1998
Image via Wikipedia

Maybe you have noticed it already; today in the morning something new appeared at Google’s search engine interface: A bunch of corresponding search-suggestions based on your search query. Google spoke about this enhancement:

Starting today, we’re deploying a new technology that can better understand associations and concepts related to your search, and one of its first applications lets us offer you even more useful related searches (the terms found at the bottom, and sometimes at the top, of the search results page).

I tried it. So, if you type in “time travel” you also get search proposals like “theory of relativity time travel” or “wormhole time travel”. Google annouced, that the service is available in various languages. The direct test with German is a little disillusioning: Searching for “zeit reise” (which is the same concept as above, in german) leads to alternative searches like “reisen 50er jahren” (travel 50ies) and “reisen im mittelalter” (travel in the medieval).

Even if this semantic-like extension of the basis search function still needs some tuning, the point is getting clearer: Also Google is doing developments to get more meaningful results into their search algorithms. And parts of the semantic methodology are finding their way into mainstream services like search engines – as we have seen with Wolfram Alpha some days ago. So keep your eyes open – maybe next morning you’ll find another piece of the semantic puzzle embedded into one of your favorite web-apps.

Reblog this post [with Zemanta]
Sphere: Related Content

Google and the Semantic Web: About Quad Stores and URIs

March 20, 2009 By: Andreas Blumauer Category: Internet & Media, Search Engines, Vocabularies & Languages 6 Comments →

Just recently Google launched another interesting service called “In Quotes”. It delivers quotes from stories linked to from Google News and users can compare opinions of e.g. politicians in a very comfortable way.

If  a closer look is taken at the system, one can see that any person whose quotes are listed has got a URI: Barack Obama has got the uniform “qsid” tPjE5CDNzMicmM.

It seems like “qsid” stands for “Quad Store ID” which would perfectly support such a URI based system.

Does Google slowly approximate to the Semantic Web?

Sphere: Related Content

Why Wolfram Alpha won´t replace Google

March 12, 2009 By: Andreas Blumauer Category: Search Engines 7 Comments →

If Nova Spivack and Doug Lenat are positive with what they have seen from Wolfram Alpha, I am also close of being convinced that the internet community won´t be dissapointed by Alpha´s first release. Just remember, which hype was caused by Cuil´s PR-strategy of spreading news about their first release throughout the blogosphere, and scarcely anybody would talk about this engine anymore.

After all what I have read about Wolfram Alpha, one thing obviously can be stated: Wolfram Alpha will be a perfect addition to traditional search engines like Google, but will never replace it. For example: In the first paragraph of this blog I have used Google Services like “Google Blog Search” or “Google Trends” to prove some of my statements (in a broader sense: to give answers to those, who want to know, why this is my opinion). Such services Alpha won´t deliver, but it will do other things much better than Google. Doug Lenat:

At one extreme is, say, Google, which responds to almost anything like a faithful puppy bringing in the morning newspaper without understanding much of anything it’s fetching (recognizing words in what it returns, often leading to amusing or hair-raising inappropriate “ads” being displayed, and leading to tons of false positives and false negatives).  At the other extreme is, say, Cyc, which only can answer a small fraction of user queries, but can answer ones that require common sense (not just common sense queries like “Do surgeons often operate on themselves?”, but ones where the logical application of such knowledge is required to correctly disambiguate and parse the user’s query containing pronouns, elisions, ambiguous words, ellipsis, and so on) and where every piece of the query and every piece of the answer is as deeply understood as, say, arithmetic.  Wolfram Alpha is somewhere around the geometric mean of those two extremes.

Search engines or question answering machines (QA) which understand the meaning of the query and/or of the result are not completely new and some of them are really useful like good old START.

But the point is: In many cases of information demand people can´t express the right question.

Why didn´t START become the default browser if it can even answer questions? I think the USP of Alpha will be, that it can give the right answer to more questions than any other QA machine before. But still, the real “search engine revolution” won´t happen, until engines will be able to help users to formulate the proper questions and will help to interprate the right results. Therefore we need to rethink some search paradigms from scratch.

Sphere: Related Content

Semantic-like tools to pimp your blog

March 09, 2009 By: Thomas Thurner Category: Mashups & Web services, Search Engines, Tools & Software 1 Comment →

Presently more and more tools come up in the Web 2.0 – Domain, which bring semantic technologies into blogger´s everyday life. Zemanta was for sure a break-through in annotation of blog entries. I’m running this service on my private and my corporate blog. It is easy to integrate in every common blog-software and it is really a save of time in my daily work. Unfortunaly it is avaible only for english blogs.

bild-2Another service which came up recently is Quintura, which provides search capabilities for your own blog with a visual map of tags or hints based on an index created of the own blog entries. It is easy to customize for the own blog’s style with the use of a simple interface. Quintura offers code-snippets to copy to your blog-post or sidebar. Even if it is no semantic search engine in the narrow sense, Quintura provide a fine semantic-like interface for a meaning-sensitive search. See how Quintura is implemented into The Semantic Puzzle at our sidebar.

Reblog this post [with Zemanta]
Sphere: Related Content

Springer´s new semantic search engine

March 04, 2009 By: Andreas Blumauer Category: Literature & Publications, Search Engines 1 Comment →

Just recently Springer came up with AuthorMapper, a great new tool to explore the scientific world, see trends on a map and find related articles etc.:

AuthorMapper, an online tool for visualizing scientific research, enables document discovery based on author locations and geographic maps. Integrating content and mapping technology, AuthorMapper provides an easy-to-use, dynamic interface that allows you to:

  • Explore patterns in scientific research
  • Identify new and historic literature trends
  • Discover wider relationships
  • Locate other experts in your field

Let´s have a look on the global map of the “Semantic Web World” (at least the scientific part of it):

bild-1

Sphere: Related Content