Helmut Nagy

Geological Survey Austria launches thesaurus project

Throughout the last year the Semantic Web Company team has supported the Geological Survey of Austria (GBA) in setting up their thesaurus project. It started with a workshop in summer 2010 where we discussed use cases for using semantic web technologies as means to fulfill the INSPIRE directive. Now in fall 2011 GBA published their first thesauri as Linked Data using PoolParty’s new Linked Data front-end.

The Thesaurus Project of the GBA aims to create controlled vocabularies for the semantic harmonization of map-based geodata. The content-related realization of this project is governed by the Thesaurus Editorial Team, which consists of domain experts from the Geological Survey of Austria. With the development of semantically and technically interoperable geo-data the Geological Survey of Austria implements its legal obligation defined by the EU-Directive 2007/2/EC INSPIRE and the national “Geodateninfrastrukturgesetz” (GeoDIG), respectively.

Marcus Ebner, from the GBA Thesaurus Editorial Team

Marcus Ebner, from the GBA Thesaurus Editorial Team

The construction of the thesauri has been done using the PoolParty Thesaurus Manager so they all are based on SKOS and fully compliant to the Linked Data principles. Apart from the standard implementation of SKOS some additions were made to the data model using Dublin Core terms for extra metadata and custom sub properties of skos:related to give some semantic constraints to related properties. This basically means that a big effort was put into the integration of bibliographic references for every concept in the data set using dcterms:source. This aims at the requirements of reuse by the scientific community and incorporation in domain specific data sets. On the other hand rdfs:subProperityOf was used to express how international geologic time scales map on regional concepts.

Currently four thesauri have been published, all are available in English and German and can be used under the cc-by-sa license. Also mappings to DBpedia have been made:

With the new PoolParty Release (3.0) the Linked Data front-end has been redesigned and is now highly customizable and extendable. In the GBA Thesaurus Project it is used as an publishing interface for the created controlled vocabularies both for the machine readable RDF version and an custom HTML version for comfortable browsing and searching.

GBA Linked Data frontend

GBA Linked Data frontend

After all it’s satisfying to see a project we’ve supported and worked on for some time now come to live and now we are looking forward to the next steps that will be done in this project.

P.S.: Thanks to Marcus Ebner from GBA for his contribution to his blog post.

Tassilo Pellegrini

I-SEMANTICS 2011: Best Paper Award & Triplification Challenge Winners

This year the I-SEMANTICS conference gave away prices for the best scientific paper and the most promising triplifications.

The best paper award went to Pablo N. Mendes, Max Jakob, Andrés García-Silva and Christian Bizer for their contribution DBpedia Spotlight: Shedding Light on the Web of Documents.

Abstract: The paper impressively shows how Linked Open Data can be utilized  as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, the authors developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. They compare their approach with the state of the art in disambiguation, and evaluate their results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of the system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.

For the 4th time I-SEMANTICS hosted the Triplification Challenge, an event aiming at stimulating the availability of large quantities of RDF data and showcasing practical applications built on top of them. The Challenge consisted of an unspecific “open data track” and a dedicated “open government data track” for which one winner was selected. The prize money of 1000 Euro each was sponsored by Wolters Kluwer Germany.

The “open data track” award went to Daniel Garijo, Boris Villazón and Oscar Corcho for their contribution A Provenance-Aware Linked Data Application for Trip Management and Organization.

Abstract: The authors present El Viajero, an application for exploiting, managing and organizing Linked Data in the domain of news and blogs about travelling. El Viajero makes use of several heterogeneous datasets to help users to plan future trips, and relies on the Open Provenance Model for modeling the provenance information of the resources.

The “open government data track” award went to John Erickson, Yongmei Shi, Li Ding, Eric Rozell, Jin Zheng and Jim Hendler for their contribution TWC International Open Government Dataset Catalog.

Abstract: The TWC International Open Government Dataset Catalog (IOGDC) integrates a diverse selection of more than 70 government dataset catalogs from around the world. IOGDC demonstrates a practical dataset catalog metadata model for integrating diverse dataset catalogs collected from the real world and linking those catalogs into Linked Data Cloud. IOGDC’s faceted browsing and search interface provides a scalable and reconfigurable solution for finding and browsing open government datasets which also offers a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDC highlight the potential for useful Linked Data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide.

All papers are available in the ACM Digital Library.

We thank all participants for their contributions and wish the winners all the best for their future work!

 

Thomas Schandl

PoolParty 3.0 and its all new Linked Data framework

The new major release of PoolParty boasts with new Linked Data capabilities that further unlock the potential that the Semantic Web can bring to improve your metadata management, to enhance your data with external knowledge and to ease data integration efforts within your organization and with your partners.

In PoolParty 3.0 we created a Linked Data interlinking editor, making it easier than ever to add your own lookup and interlinking services (even for non-RDF sources) and made the Linked Data publishing front-end fully customizable in design, layout and regards to which parts of your content will be displayed.

But let’s start at the beginning:

Step 1 – Hook into the Linked Data Cloud!

In the era of the rapidly growing Linked Data Cloud your knowledge models don’t need to stay isolated from the outside world anymore. Simply use PoolParty’s new and improved lookup service to find matching resources from the Linked Open Data Cloud (e.g. from DBpedia).

Imagine having different data models that all refer to the same product categories and world regions. Once you have them represented in PoolParty you can use its lookup service to find matching resources from the Linked Data Cloud. In this way you will get globally used identifiers for your product categories and regions, usually in the form of a URI like http://dbpedia.org/resource/Berlin. This eases your internal data integration efforts, and it can aid the data exchange with partners or customers and enables hassle-free distributed management of knowledge models.

Image 1: Lookup of concept ‘Austria’ and selection of properties and values to be imported

 

With PoolParty 3.0 we increased the number of included lookup services: DBpedia, Geonames, Wordnet, Umbel, Yago, Freebase, Sindice, dmoz and LCSH – BBC Wildlife, Enis and Gemet are available on request.

Step 2 – Pull in Semantic Data!

There is a vast amount of Linked Data out there just waiting to be leveraged for thesaurus creation and extension. To meet that end we had a close look at our interlinking module and decided to enhance it a way that it becomes more of a Linked Data editor.

Once you have a base thesaurus in PoolParty and hooked a couple of your concepts into the cloud as described above, you can proceed to pull in the good stuff that comes with the Linked Data resources you have found.

Image 2: Imported Linked Data for concept ‘London’

 

As you can see in the image above, you can extend your local thesaurus with labels, definitions and all kinds of other information like e.g. in the case of countries their population, GDP, spoken languages, famous people born there, newspaper articles related to the political situation, and so on.

Now PoolParty 3.0 takes this approach a couple of steps further. You can not only specify which of your local concepts corresponds to which Linked Data resource and grab all semantic information that comes with this resource, but now you are able to selectively pick out the data items you are interested in and even transform the predicates they originally came with. Just switch them to whatever custom properties you created or want to re-use from any ontology (see an example in Image 1).

In this way you can easily enrich your own knowledge models with external information – which in turn can be utilized for better content recommendation, easier data integration and improved search services.

Step 3 – Publish your Linked Data in Style

Previous PoolParty versions already offered the possibility to instantly publish your thesauri, taxonomies or vocabularies and display their concepts as HTML while additionally providing machine-readable RDF versions for them. This means that anyone using PoolParty intuitive GUI can become a W3C standards compliant Linked Data publisher without having to know anything about Semantic Web technicalities.
Of course you don’t need to publish all your valuable models, just choose the parts that safely can be shared with the public and keep everything else behind your firewall, available only to you and trusted partners!

In this new release of PoolParty the design of all pages on the Linked Data front-end is now under your full control. You can use your own style sheets and create views on your data with velocity templates. It is even possible to develop project- and thesaurus-specific templates and layouts, so they can have an individual look and display different predicates and their values.

Take a look at PoolParty´s standard linked data frontend!

The following images show a PoolParty default Linked Data page and a custom-made Linked Data page of a PoolParty concept that has some DBpedia info imported.

Image 3: PoolParty default Linked Data page

PoolParty Linked Data page of ScOT thesaurus courtesy of Educational Services Australia
Image 4: Custom Linked Data page of ScOT thesaurus (courtesy of Educational Services Australia)

 

Step 4 – Unlock new Linked Data Sources

With PoolParty 3.0 you are in no way limited to DBpedia, Freebase, Geonames and the other lookup services that PoolParty provides out of the box: you can add your own non-Semantic Web data sources to the mix, thereby enabling you to boldly go where no Linked Data tool has gone before.

Maybe you have a product thesaurus and want to specify which products are related to patents that can be found with Google Patents?
Or you want to interlink concepts from a company taxonomy with related articles from the Guardian’s search service or any other newspaper that provides a search API?

All those sources are not available as RDF, so how can you re-use them easily as data sources for Linked Data style interlinking? For such cases PoolParty introduces the Unified Lookup API, which makes it easy to turn almost any third party Web API into a source for interlinking your concepts with third party resources as described above.

This makes it possible to interlink your concepts with many kinds of data out there, be it New York Times articles, UN data, synonym services, abbreviations, press releases, juridical information – or any web API important for your knowledge domain.

That being said, if you have suggestions for additional lookup services that you think are interesting, let us know!

To gain a first hand impression of the new PoolParty just apply for a demo account!

Andreas Blumauer

“Thesaurus based search engines will become main stream in the near future”

The results of the survey titled “Do controlled vocabularies matter?” which was conducted by Semantic Web Company from May until June 2011 are public now. Over 150 participants from 27 countries draw a picture of the current and future usage behaviour in the realm of controlled vocabularies.

Here are three of the most interesting outcomes of this questionnaire – the whole report can be found and downloaded on issuu:

Do you think enterprises and other organizations can significantly benefit from using Linked Data?

The answer is a clear YES. A subsequent question also reveals that all kind of organisation sizes have about the same opinion concerning linked data. Only few people think that linked data is a “niche thing”. In general it can be said, that over 90% of the participants think that most or at least some organisations can benefit from using linked data.

Do you think that search engines which utilize thesauri to improve results will become main-stream

The results of this question are amazing: Two thirds of the participants think that thesaurus based search is already or will become main-stream in the near future. Scepticism towards this development seems to be low – at least it can be stated, that a clear majority thinks that thesaurus based search engines will become main stream in the near future.

 

How important is the usage of standards like SKOS for controlled vocabularies?

The results speak for themselves. The majority of the participants are convinced that standards like SKOS are important for their daily work. In August 2009 W3C announced the new SKOS standard – now, nearly two years after, it looks like this standard has well arrived. 48.7% stated that standards like SKOS are very important and 29.1% voted for “relevant”.

 

As an overall result of the survey it can be stated: Semantic Web community has done a great job to convince the controlled vocabulary people to benefit from SKOS and linked data – on the other side only 3-5% are aware of SPARQL as a valuable resource to build standard APIs around controlled vocabularies to lower costs when implementing such knowledge organization systems.

Many thanks to all participants of this survey!