Helmut Nagy

The ESA vocabulary site – Making Publishing and Reusing Vocabularies Easier

Reviewing the interview we made with Les Kneebone (project manager of the vocabulary projects at Education Services Australia) in November 2010 we can see that ESA has been one of the early adopters of SKOS as a standard for thesaurus development. Les said then: “We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as our new thesaurus management tool”. Around a year later ESA´s vocabulary site went online with PoolParty as its basis.

We asked Les to comment on his statement from last year and he confirmed that SKOS continues to be central to the ESA vocabulary business model and that it has also been important for ESA that PoolParty has been flexible enough to support continued publication of non-RDF formats, especially IMS VDEX.

In the course of this project it became more and more obvious that SKOS cannot only be used as yet another format for publishing thesauri but rather as a unified model to build thesauri in general. This approach made possible several improvements to the vocabulary development model and the maintenance process of ESA. Since all data is stored as RDF in a triple store, and SKOS and RDF are flexible formats supporting interoperability and interchangeability of data, many manual transformations that had to be done before are not needed anymore and all other systems using the vocabularies are dynamically fed by PoolParty offering the data in its needed formats (see image below).

Changes in ESA’s vocabulary development model

Les states that while some manual processes still exist to support legacy systems, PoolParty ensures the integrity and richness of ESA data. Support and customizations for legacy systems can be achieved in the confidence that the linked-data capabilities are centrally managed and stored in the PoolParty triple store.

From the publishing perspective, the previous vocabulary publishing site has been replaced by the PoolParty Linked Data Frontend (LD-Frontend) that has been customized especially for this project to offer more flexibility in the display and the layout of the data. Similar to the frontend for the Austrian Geological Survey mentioned in a previous blog post , the LD-Frontend has been adapted to the ESA styleguide and the display of the data in the HTML view of the frontend has been adapted to be more user-friendly (see screenshot below).

From ESA’s perspective Les commented here that for the vocabulary manager, edits to the frontend styles and templates are intuitive and can be tested in staging environments. But he also stated that for publishing support is important, and that SWC was very responsive.

Example ESA linked data frontend

Of course we asked Les to give a preview of the next steps for ESA. He stated that they include language translation projects so that its vocabularies, especially Schools Online Thesaurus (ScOT), can be accessed by wider markets and by students of other languages. He also stated that PoolParty handles multi-lingual thesauri very well.

We here at SWC are glad to see PoolParty used in more and more applications and usage scenarios. We are looking forward to the next steps that will be done in this project and also to see how the data offered by the ESA vocabulary site is used in other applications.

Thanks to Les Kneebone from ESA for his contribution to his blog post.

Andreas Blumauer

Introducing SKOSsy – generate thesauri on the fly!

Imagine you could generate any thesaurus you would like for nearly any knowledge domain you can think of with quite a good quality! Sounds impossible? Reminds you of all the promises made by text mining software which generates “semantic nets” from scratch?

Let me introduce you to SKOSsy. I will explain what this web service can do for you:

SKOSsy generates SKOS based thesauri in German or in English for a domain you are interested in. Not any domain but nearly any: SKOSsy extracts data from DBpedia, so it can cover anything which is in DBpedia. Thus, SKOSsy works well whenever a first seed thesaurus should be generated for a certain organisation or project. If you load the automatically generated thesaurus into an editor like PoolParty Thesaurus Manager (PPT) you can start to enrich the knowledge model by additional concepts, relations and links to other LOD sources. But you don´t have to start in the open countryside with your thesaurus project.

Let me give you an example: Imagine you are working for a company which is an international plant builder and you would like to index several thousands of documents the “semantic way”. You have to walk through the following steps:

  1. Identify proper categories in Wikipedia/DBpedia which describe best what your business or your domain is all about. Those categories should contain pages / resources which are related to the documents you would like to index. For example: http://dbpedia.org/resource/Category:Metalworking or http://dbpedia.org/resource/Category:Industrial_automation
  2. After you have selected proper categories SKOSsy will traverse DBpedia for you and collect all resources, their hierarchical and non-hierarchical relations, alternative labels, definitions and other properties and put them together as a valid SKOS thesaurus; this step will last a couple of minutes. (Find the resulting vocabulary here)
  3. Load the resulting thesaurus into PPT, explore it, improve it and enrich it with additional facts.
  4. After you´re done you can generate a tailor-made text extractor by using PoolParty Extractor (PPX) which is the second component of PoolParty product family
  5. With PPX and its extraction model especially curated for your special use case you can extract named entities from your documents automatically and index your documents in a meaningful way.
  6. After a few seconds your semantic search engine is ready to be used. PoolParty Semantic Search (PPS) which is the third PoolParty component will offer some nice facilities like categorized auto-complete, faceted search, content recommendation (similarity search) and smart search suggestions to ease your life as a knowledge worker.

We have constantly discussed the application of thesauri and other knowledge models to improve search over the last years. Many people understood straight away why thesaurus based search is most often much better than search algorithms purely based on statistics. Of course the big contra always was, “the costs are too high to establish a “good-enough” thesaurus or even a “high-quality” one”.

With SKOSsy in place those kinds of arguments become weaker and weaker. To sum up,

  • SKOSsy makes heavy use of Linked Data sources, especially DBpedia
  • SKOSsy can generate SKOS thesauri for virtually any domain within a few minutes
  • Such thesauri can be improved, curated and extended to one´s individual needs but they serve usually as “good-enough” knowledge models for any semantic search application you like
  • SKOSsy based semantic search usually outperform search algorithms based on statistics since they contain high-quality information about relations, labels and disambiguation
  • SKOSsy works perfectly together with PoolParty product family

If you are interested in the results produced by SKOSsy, just send us a short note about your domain or your project and we will send you an invitation as beta-tester or prepare a demo for you.

Enhanced by Zemanta
Tassilo Pellegrini

Looking back at I-SEMANTICS 2011

For the 7th time, I-SEMANTICS, the International Conference on Semantic Systems, took place in Graz, presenting latest research outcomes and industry-ready applications to the wider public. Co-located with I-KNOW, the 11th International Conference on Knowledge Technologies, the event proved once again that the interest in semantic information processing is high and of increasing practical relevance.

Participants by Country
More than 70 scientific and 40 industry presentations provided a valid overview over current technological and organisational trends in various areas of semantic computing like text mining, information retrieval, visual analytics, semantic content engineering, social semantic web and linked data. Especially the last topic appeared in many different contexts, showing that the linked data paradigm is gaining traction as a horizontal topic that crosses domains and communities.

Participants by SectorOne of the conference’s unique characteristics is the high amount of attendees from industrial domains, searching for inspiration and solutions for practical problems on the one side, but also for diversification potentials of their business on the other side. In this respect the applied scientific approach of I-SEMANTICS / I-KNOW has proven to be a valid approach to scrutinize academic research against its reusability in industrial settings, transfer knowledge and skills between both communities and provide incentives for cooperative research and project engagement.

This cooperative spirit was also represented by the four key note speakers, who took a deliberate practical approach to show how high level research fertilizes organizational reflexivity and triggers change for sustainability on a societal, cultural and economic level. Hence, Daniel A. Heim, Professor at the Computer Science Department of the University of Konstanz, Germany, showed how visual analytics have a stake in solving organizational problems. Gloria Mark, Professor at the University of California, Irvine, USA, talked about the challenges that derive from informational multi-tasking. Stefan Rueger, Professor at the Knowledge Media Institute of The Open University, United Kingdom, gave a talk about “potential, automation and limits of knowledge discovery in the web” and Christian Dirschl, Head of Content Strategy Department at Wolters Kluwer Germany, gave an insight in how one of the global players in legal publishing is utilizing linked data and semantic web technologies to prepare for the next step in web-based business diversification.
The next I-SEMANTICS will take place from September 5-7, 2012 in Graz again. Hope to see you there and enjoy the impressions …

www.flickr.com

Thomas Schandl

Interview on Enhancing Semantic Web applications with Linguistic Information

John McCrae (Uni Bielefeld), Elena Montiel-Ponsoda (Universidad Politécnica de Madrid) and Tobias Wunner (DERI Galway) will hold a tutorial at the ESWC 2011 with the title “Enriching the Semantic Web with Linguistic Information“. We had a chance to talk to them beforehand:

Can you please tell us about the aims and purpose of your tutorial and the importance of incorporating linguistic information in the Semantic Web?

With the continuing growth of linked data and semantic technologies the incorporation of linguistic descriptions into Semantic Web resources has become a challenging issue. The integration of linguistic information especially on a multilingual level could greatly benefit Natural Language Processing (NLP) applications. Furthermore, the continuing growth of ontologies for semantic modeling and the use of terminological resources to add human language descriptions has raised the issue of how to add linguistic information to ontologies and linked data vocabularies and to represent models of lexical and terminological information in a way which is compatible with Semantic Web standards. Prominent examples here are, for instance, multilingual language tags in RDF Schema or SKOS’s success in bringing terminological information to the Semantic Web.

In the Tutorial we would like to discuss trends and novel models such as Lemon – the lexicon model for ontologies – to show possible future directions. The tutorial is targeted at researchers and practitioners interested in learning how to enrich ontologies with linguistic information in one or several natural languages and NLP tool developers interested in understanding how Semantic Web resources can be leveraged fro NLP. There will be two hands-on sessions in this tutorial.

Why did you choose to use PoolParty thesaurus management system in your tutorial?

To create terminology models on the web there are only few tools available which are often very technical and not straightforward to use for non-experts. We found that PoolParty in contrast to other SKOS editors has an attractive and usable interface. In addition the web based interface was preferable, as it did not require the participants to download software, the immediate publishing of linked data is more compatible with linked data principles and the tool has similarities to our own tools for working with lemon.

Thank you for this interview!