Thomas Schandl

Linked data based thesaurus management in collaborative settings

The creation and management of controlled vocabularies in companies often takes place in a distributed manner. Different departments in different branch offices often rather create their own vocabularies, than have one large central knowledge model, where everyone contributes.

How to model divergent views on one concept?

Such a central model is not only much harder to manage, but there is also the general problem that differerent departments like marketing, quality assurance, R&D, etc. will have divergent views on the model and its concepts. These different perspectives on one and the same concept are hard to unify in a single model.

Think of a company that sells mobile phones and wants to create a model of its line of products. It wants to utilize this model in the context of its online shop as well as in the context of its user support forum. While the structure of the model (i.e. the relationships between the products) might be very similar or the same in both contexts, there will be differences in which properties of the products are actually relevant in the respective contexts.

In the model of the marketing department there might be a concept for a “Phantastax StamiMaxx” cell phone with a definiton “The StamiMaxx has a powerful battery and is great for professionals who travel a lot”. They might relate it to manufacturer “ACME Corporation” and to several concepts representing different features like “Android OS”, “Multi-touch touchscreen”, etc.
The very same phone has different properties that are interesting from the Quality Assurance departement’s perspective. They might call it by a more specific name like “Phantastax i3000 StamiMaxx S”, have a different definition for it like “3G cell phone implementing the new WTF3000 protocol, …” and relate it to concepts representing known problems and their solutions.

Now they face the task to integrate these different models, as it is not desirable to use a bunch of isolated models within one company.

Support of collaborative work on distributed models

To support this kind of collaborative work on distributed knowledge models, we would like to link the concepts of the models, just as is we link documents in the World Wide Web. Fortunately the Simple Knowledge Organisation System (SKOS) offers mapping properties that can be used to define relationships between concepts from different knowledge models.

E.g. when we want to say that concept “Phantastax StamiMaxx” in the product line thesaurus refers to the same real world entity as concept “Phantastax i3000 StamiMaxx S” in the Quality Assurance thesaurus, then we can use skos:exactMatch to express that. If we want to express that the concepts are merly similar, skos:closeMatch could be used.

The other SKOS mapping properties express a hierarchical (narrowMatch, broadMatch) or an associative (relatedMatch) mapping relation between concepts from different concept schemes. With those we can say that my Samsung Galaxy concept has a skos:broadMatch “Smartphone” in the product line vocabulary and a skos:relatedMatch “ACME Corporation” in a controlled vocabulary about Tech companies.

Modularisation of knowledge models

In this way SKOS thesaurus management systems like PoolParty make it possible to modularise knowledge models, represent concepts in their different contexts and consequently enable collaborative work on those models: The marketing guy can work on his model with the concept properties focused on sales without disrupting the work of the quality assurance expert on her own thesaurus. Later one or both of them can create the skos:exactMatch link between the concepts that are the same, like seen in the “Exact Matching Concepts” box in screenshot of PoolParty below.

Enrich your knowledge: Get connected with the LOD Cloud

Going a step further the models could be connected to external knowledge, e.g. a source from the Linked Open Data (LOD) Cloud. Once we establish links to LOD hubs like DBpedia, we can import additional information for their concepts or use it to establish whether similar concepts from different models really refer to the same real world resource.

Andreas Blumauer

Les Kneebone: “Semantic web technologies are one solution to linking education data in Australia”

Les Kneebone is Project Manager at Education Services Australia Ltd.
Among other projects he is responsible for Schools Online Thesaurus (ScOT).

PoolParty Team asked Les a couple of questions about thesaurus management, linked data and the semantic web. Here is a short summary of this interview:

Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?

A thesaurus approach was chosen rather than a subject headings approach because we assumed (and continue to assume) that post-coordinate indexing will drive vocabulary-assisted discovery.

Which role does SKOS and/or Linked Data play in order to achieve your goals?

ScOT concepts are now published as URIs. This approach solves the problem of different ScOT versions in disparate systems.

What are the most important values you generate for your stakeholders? What kind of applications can be built or have been built on top of your thesauri?

The Achievement Standards Network (ASN) provides a model for profiling curriculum statements and linking those statements to education resources using various rdf vocabularies. By profiling curriculum statements to learning resources, more precise matching is achieved.

What are the most important arguments to use Semantic Web standards and linked data, especially in education?

The Australian education sector is characterized by many disparate systems in different education jurisdictions. Semantic web technologies are one solution to linking education data in Australia.

Why did you choose PoolParty to manage your thesauri?

We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as a our new thesaurus management tool.

What are your future plans and next steps? How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work? (Do you plan to publish ScOT on the LOD cloud? Under which licenses?)

Our vocabularies are currently for non-commercial use and we don’t anticipate any change to the license at this stage. The ScOT license requires attribution, permits derivatives that must be shared, and is for non-commercial use.

Read the full interview here.

Tassilo Pellegrini

Marrying ARML with Linked Data

First of all, since ARML (augmented reality markup language) is based on KML and KML uses „Placemarks“ (which all have corresponding identifiers) as basic entities, these could be identified quite easily via URIs within the W3C Resource Description Framework (RDF).

Another basic concept of KML is „Point“. Geo RDF provides properties like „geo:long“ or „geo:lat“ which express longitude and latitude of a POI and thus makes it possible to uniquely identify certain points on a map using RDF standards.

Thus it is possible to map the geo conventions of ARML to the geo conventions of the Semantic Web which are mainly based on Geo RDF.

As soon as a placemark has received a URI it is also possible to expose it as linked data and interlink it with repositories like Geonames, DBpedia or LinkedGeoData (which is based on Open Street Map) to generate Linked Geodata.

ARML makes it possible to link / make a  relation between a „Provider“ and a „Placemark“. Thus it is also possible to use a URI to describe a provider and link it to a placemark using the typical triple-struture imminent to RDF.

OpenARML/Wikitude uses tags to describe certain things. These tags are currently represented as literals (strings), seperated by commas. This poses that obstacle that these tags can hardly be processed by machines. With RDF each tag would be assigned a URI, thus changing it from a literal to a resource, which further can be represented in SKOS/RDF, another Semantic Web specification of the W3C.

ARML/Wikitude also offers attributes to describe POIs like phone, URL, email, attachment etc. which all of them could be represented by Semantic Web defacto standards like FOAF, SIOC etc.

Summing up, ARML/Wikitude documents could relatively easily be transformed in valid RDF / Linked Data Graphs. This could help to enrich AR-applications with data from the LOD (Linked Open Data) cloud. Vice versa data generated by ARML applications could be exposed as Linked Data.

As a pragmatic approach we recommend  to generate on top of existing Wikipedia URLs the corresponding DBpedia URIs which would directly transform ARML placemarks into a resource as part of the existing LOD cloud.

As soon as placemarks are mapped to DBpedia additional metadata could be added to a placemark which opens up totally new perspectives on content enrichment in ARML environments enabling new and exciting AR-applications.

We want to thank Martin Lechner from Salzburg based Mobilizy for a fruitful discussion we had so far on this topic.

BTW: check out the paper by Reynolds et al. (2010) from DERI on “Exploiting Linked Open Data for Mobile Augmented Reality

Andreas Blumauer

Why SKOS thesauri matter – the next generation of semantic technologies

As a matter of fact still a lot of “semantic technologies” are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like “similarity search”, the search for similar documents to enable cross-reading or recommendation systems.

Providers of first generation semantic technologies calculate rather basic “semantic networks” by co-occurency analysis which results sometimes in  disappointing results. Bearing in mind that Google just bought a company (“Google buys Metaweb“) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.

A demo application was recently developed by PoolParty team where one can find out how thesauri will improve search results on top of second generation semantic technologies. With PoolParty SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag & Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) STW (Standard Thesaurus für Wirtschaft), DBpedia and respective articles from Wikipedia.

STW which was developed by the German National Library of Economics (ZBW) provides vocabulary on any economic subject: about 6,000 standardized subject headings and about 18,000 entry terms to support individual keywords.

This background knowledge is used in this demo app to improve the search for similar documents dramatically:

Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as “similar documents”.

This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from Econstor) but of course for other recommender systems thesauri from other domains can be used instead of STW.

Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of third generation semantic technologies are currently developed by LASSO project and LOD2 project, two innovative projects in the area of linked data and the semantic web.