The Semantic Puzzle

Tassilo Pellegrini

Linking Open Data to Thesaurus Management

The Vienna-based company punkt. netServices is just about to release a demo version of their PoolPartyWeb based ontology manager which can serve as a central hub for your knowledge organization. With PoolParty you can organize and maintain knowledge models based on widely accepted specifications like RDF, SPARQL and SKOS. service, a SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ...-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a demo.

Purpose

Poolparty was conceived to facilitate various applications like

  • Semantic search engines
  • Recommender systems (similarity search)
  • Corporate bookmarking
  • AnnotationAn annotation is notes that you make to yourself while you are reading information in a book, document, online record, video, software code or other information, "in the margin", or perhaps just underlined or highlighted passages. Annotated bibliographies, give descriptions about how each source ...- & tag recommender systems
  • Autocomplete services and facetted browsing.

These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.

ThesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than ... Management

PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag & drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.

poolparty-blueskin

PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus’ domain in order to glean candidate terms. This is done by the key-phrase extractor of KEA. The extracted terms can be selected by the user, thereby becoming “free concepts” which later can be integrated into the thesaurus, turning them into “approved concepts”.

Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept’s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic searchPHP module for Drupal which allows for the building of search interfaces, indexes, and data synchronization using RDF stores. (http://www.openrdf.org/contrib.jsp). Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.

PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: Cocktail thesaurusStandard Thesaurus for Economics)

PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or Zthes format. Those functionalities can also be consumed as stand-alone web services via PoolParty SKOS Services. Additionaly, lists of concepts and their labels can also be imported via CSV files.

Linked (Open) Data

PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.

Concepts in the thesaurus can be linked to e.g. DBpediaDBpedia is a project aiming to extract structured information from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, ...  via a service like Georgi Kobilarov‘s DBpedia lookup service, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.

poolparty-lod

Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.

PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the KiWi  project funded by the European Commission. KiWiEU-funded project combining the wiki philosophy with methods of the Semantic Web, aiming to develop a new approach to knowledge management. (http://www.kiwi-project.eu/index.php/about) also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.

Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.

This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using RESTRepresentational State Transfer (REST) is a style of software architecture for distributed hypermedia systems such as the World Wide Web. The term Representational State Transfer (REST) was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation. Fielding is one of the .../HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.

Technology

The software is written in Java and utilizes the SAIL API, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on Yahoo User Interface (YUI). Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the KEA 5 APIAn application programming interface (API) is an interface implemented by a software program to enable interaction with other software, similar to the way a user interface facilitates interaction between humans and computers. APIs are implemented by applications, libraries and operating systems ..., which is extended for the use of controlled vocabularies stored in a SAILCollection of interfaces defining an API for RDF repositories. (http://www.openrdf.org/doc/sesame/api/org/openrdf/sesame/sail/package-summary.html) Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in Lucene/Solr or any other (enterprise) search system along with extracted and semantically related concepts.

Reblog this post [with Zemanta]
This entry was posted in Corporate Semantic Web, Knowledge Management, Linked Data & Open Data, Search Engines, Semantic Web Applications, Software Development and tagged , , , , , , , , by Tassilo Pellegrini. Bookmark the permalink.
Tassilo Pellegrini

About Tassilo Pellegrini

From Wikipedia, the free encyclopedia: Prof. (FH) Dr. Tassilo Pellegrini (born 1974) studied International Trade, Communication Science and Political Science at the University of Salzburg and University of Málaga. Since end of 2007 he is running the New Media Division at the University of Applied Sciences in St. Pölten. He obtained his master degree in 1999 from the University of Salzburg on the topic of telecommunications policy in the European Union, which was followed by a PhD in 2010 on the topic of bounded policy-learning in the European Union with a focus on intellectual property policies. His current research encompasses economic effects of internet regulation with respect to market structure and basic civil rights. He is member of the International Network for Information Ethics (INIE), the African Network of Information Ethics (ANIE) and the Deutsche Gesellschaft für Publizistik und Kommunikationswissenschaft (DGPUK). Beside his specialisation in policy research and media economics Tassilo Pellegrini has worked on semantic technologies and the Semantic Web. He is co-founder and Head of Division Research and Development of the Semantic Web Company in Vienna, co-editor of the first German textbook on Semantic Web and Conference Chair of the annual I-SEMANTICS conference series founded in 2005.

3 thoughts on “Linking Open Data to Thesaurus Management

  1. Pingback: Linking Open Data to Thesaurus Management « INSEMTIVES

  2. Could it be that Europe is taking the lead in the semantic web?
    And the US is lagging behind fooled by Twitter, Facebook and iPhone Apps?

  3. Alan Green said:

    > Could it be that Europe is taking the lead in the semantic web? <
    Judging from the excellent demonstration of PoolParty at the ISKO UK conference this week (http://www.iskouk.org/conf2011/index.htm), and the presence of other SemWeb companies like Mondeca in the market, I would say so.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>