Thomas Thurner

SKOSsy-Lottery: Free Pass to Semantic Tech & Business Conference, Berlin

As PoolParty Team is present at SemTechBiz Berlin 2012 (February 6-7), we want you to join us.  This is why we have issued a little lottery to give away a full conference pass (€795) plus our unique PoolParty Cocktail Shaker in a set

How to enter the SKOSsy-lottery:

  • Enter a comment in this post. One comment per person. Describing which type of thesaurus you are interested in.
  • All comments must be submitted before Jan 25, 2012.
  • The winners will be selected at random.

Together with our PoolParty Suite, we are ready to present SKOSsy on our booth at SemTechBiz Berlin 2012 Exhibition area.  SKOSsy is a handsome tool, which generates SKOS based seed-thesauri in German or in English by extracting data from DBpedia. See our finger exercise on a thesaurus describing the world of Alan Turing – done with SKOSsy.

Let us know, which knowledge realm you are interested in and join the lottery now. Good luck, and see you in Berlin.

 

Helmut Nagy

The ESA vocabulary site – Making Publishing and Reusing Vocabularies Easier

Reviewing the interview we made with Les Kneebone (project manager of the vocabulary projects at Education Services Australia) in November 2010 we can see that ESA has been one of the early adopters of SKOS as a standard for thesaurus development. Les said then: “We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as our new thesaurus management tool”. Around a year later ESA´s vocabulary site went online with PoolParty as its basis.

We asked Les to comment on his statement from last year and he confirmed that SKOS continues to be central to the ESA vocabulary business model and that it has also been important for ESA that PoolParty has been flexible enough to support continued publication of non-RDF formats, especially IMS VDEX.

In the course of this project it became more and more obvious that SKOS cannot only be used as yet another format for publishing thesauri but rather as a unified model to build thesauri in general. This approach made possible several improvements to the vocabulary development model and the maintenance process of ESA. Since all data is stored as RDF in a triple store, and SKOS and RDF are flexible formats supporting interoperability and interchangeability of data, many manual transformations that had to be done before are not needed anymore and all other systems using the vocabularies are dynamically fed by PoolParty offering the data in its needed formats (see image below).

Changes in ESA’s vocabulary development model

Les states that while some manual processes still exist to support legacy systems, PoolParty ensures the integrity and richness of ESA data. Support and customizations for legacy systems can be achieved in the confidence that the linked-data capabilities are centrally managed and stored in the PoolParty triple store.

From the publishing perspective, the previous vocabulary publishing site has been replaced by the PoolParty Linked Data Frontend (LD-Frontend) that has been customized especially for this project to offer more flexibility in the display and the layout of the data. Similar to the frontend for the Austrian Geological Survey mentioned in a previous blog post , the LD-Frontend has been adapted to the ESA styleguide and the display of the data in the HTML view of the frontend has been adapted to be more user-friendly (see screenshot below).

From ESA’s perspective Les commented here that for the vocabulary manager, edits to the frontend styles and templates are intuitive and can be tested in staging environments. But he also stated that for publishing support is important, and that SWC was very responsive.

Example ESA linked data frontend

Of course we asked Les to give a preview of the next steps for ESA. He stated that they include language translation projects so that its vocabularies, especially Schools Online Thesaurus (ScOT), can be accessed by wider markets and by students of other languages. He also stated that PoolParty handles multi-lingual thesauri very well.

We here at SWC are glad to see PoolParty used in more and more applications and usage scenarios. We are looking forward to the next steps that will be done in this project and also to see how the data offered by the ESA vocabulary site is used in other applications.

Thanks to Les Kneebone from ESA for his contribution to his blog post.

Andreas Blumauer

rNews and its benefits for publishers

Last Wednesday at the Open House event of the Semantic Web Company in Vienna, Evan Sandhaus, Lead Semantic Architect at NY Times gave a comprehensive and entertaining introduction to rNews and its potential benefits for publishers.

Evan Sandhaus (f.l.t.r) busy preparing his talk in the kitchen of SWC, together with Andreas Blumauer (SWC) and Leo Sauermann (Gnowsis). Mr. Sandhaus in action.

rNews is a RDFa vocabulary, which is basically a carefully selected subset of the very rich IPTC vocabulary and some additional elements that came up during the standardization process. It is now available in version 1.0 and – according to Evan – actively supported by schema.org.

As showed above the data model of rNews is really simple and centered around two classes: the NewsItem and the Concept. This deliberate simplicity is a major advancement compared to standards like NewsML (whose complexity probably prohibited its critical uptake among the news industry). But due to the functional extensions attributed to RDFa, rNews might also be considered more complex than hNews, the microformat equivalent issued by the IPTC in 2009.

Evan mentioned three scenarios that might drive the uptake of rNews for the benefit of news publishers:

1) Better news search

rNews allows you to explicate and differentiate various documents elements like, title, author, text body, picture etc., thus giving the publisher better control of what to expose for indexers and web crawlers. This might not just improve the display of rich snippets in the search results of Google and other search engines, but also allow automated population of faceted search and metadata based similarity search.

2) Better ad placement

As rNews can be applied to any kind of news-relevant media irrespective of its format (grafics, audio, video, etc.) the metadata can be used to avoid “unfortunate juxtapositions” between editorial content and ads. Hence, media agencies could profit from this additional data by fuelling their matching algorithms and gain better insight into the context specificities of content items.

3) Better analytics

By improving the semantic granularity of a news item this additional information can be used to carry the web analytics beyond the page level and provide a better insight into usage patterns. The additional data can be applied for visualization and exploration purposes i.e. for search engine optimization, sentiment detection and many more.

This is just a small fraction of things rNews could be used for. All in all it is exciting to see that IPTC has finally started to provide publishers with a standard that is relatively easy to implement and help them to overcome the obstacles of existing technologies without disrupting existing publishing workflows. In multi-sided markets like the news industry this might be a crucial success factor!

 

Enhanced by Zemanta
Andreas Blumauer

Controlled vocabularies: “Data integration is king”

Just recently a survey about “Controlled vocabularies” and their significance for enterprise information management has started. Until today 143 participants have responded and completed the survey at least partially. To give a first example what was found out, I would like to take a closer at the question: What are the main application areas of controlled vocabularies from your perspective?

A bit surprising is the intermediate result, that it´s not “Semantic Search” or “Support of multilingual applications” which was considered to be the most important application. Instead of this it turned out that “Data Integration” is king:



The bar graph shows the weighed value of each application candidate (1.0 would be a 100% acceptance that this is an important application area of controlled vocabularies). Regarding the top candidate “data integration”

  • 57,4% said “very important”
  • 29,8% “relevant”
  • 7,4% “somewhat relevant”
  • 2,1% “not relevant”
  • 3,2% “Don´t know”

If you don´t think this should be the final result, please help to get a better overview of what´s going on in the controlled vocabulary community. The survey is open until May 18th, 2011 – all participants will gain access to a report with the results within the following month. Most interesting results will be made public on this blog.