Thomas Schandl

Linked data based thesaurus management in collaborative settings

The creation and management of controlled vocabularies in companies often takes place in a distributed manner. Different departments in different branch offices often rather create their own vocabularies, than have one large central knowledge model, where everyone contributes.

How to model divergent views on one concept?

Such a central model is not only much harder to manage, but there is also the general problem that differerent departments like marketing, quality assurance, R&D, etc. will have divergent views on the model and its concepts. These different perspectives on one and the same concept are hard to unify in a single model.

Think of a company that sells mobile phones and wants to create a model of its line of products. It wants to utilize this model in the context of its online shop as well as in the context of its user support forum. While the structure of the model (i.e. the relationships between the products) might be very similar or the same in both contexts, there will be differences in which properties of the products are actually relevant in the respective contexts.

In the model of the marketing department there might be a concept for a “Phantastax StamiMaxx” cell phone with a definiton “The StamiMaxx has a powerful battery and is great for professionals who travel a lot”. They might relate it to manufacturer “ACME Corporation” and to several concepts representing different features like “Android OS”, “Multi-touch touchscreen”, etc.
The very same phone has different properties that are interesting from the Quality Assurance departement’s perspective. They might call it by a more specific name like “Phantastax i3000 StamiMaxx S”, have a different definition for it like “3G cell phone implementing the new WTF3000 protocol, …” and relate it to concepts representing known problems and their solutions.

Now they face the task to integrate these different models, as it is not desirable to use a bunch of isolated models within one company.

Support of collaborative work on distributed models

To support this kind of collaborative work on distributed knowledge models, we would like to link the concepts of the models, just as is we link documents in the World Wide Web. Fortunately the Simple Knowledge Organisation System (SKOS) offers mapping properties that can be used to define relationships between concepts from different knowledge models.

E.g. when we want to say that concept “Phantastax StamiMaxx” in the product line thesaurus refers to the same real world entity as concept “Phantastax i3000 StamiMaxx S” in the Quality Assurance thesaurus, then we can use skos:exactMatch to express that. If we want to express that the concepts are merly similar, skos:closeMatch could be used.

The other SKOS mapping properties express a hierarchical (narrowMatch, broadMatch) or an associative (relatedMatch) mapping relation between concepts from different concept schemes. With those we can say that my Samsung Galaxy concept has a skos:broadMatch “Smartphone” in the product line vocabulary and a skos:relatedMatch “ACME Corporation” in a controlled vocabulary about Tech companies.

Modularisation of knowledge models

In this way SKOS thesaurus management systems like PoolParty make it possible to modularise knowledge models, represent concepts in their different contexts and consequently enable collaborative work on those models: The marketing guy can work on his model with the concept properties focused on sales without disrupting the work of the quality assurance expert on her own thesaurus. Later one or both of them can create the skos:exactMatch link between the concepts that are the same, like seen in the “Exact Matching Concepts” box in screenshot of PoolParty below.

Enrich your knowledge: Get connected with the LOD Cloud

Going a step further the models could be connected to external knowledge, e.g. a source from the Linked Open Data (LOD) Cloud. Once we establish links to LOD hubs like DBpedia, we can import additional information for their concepts or use it to establish whether similar concepts from different models really refer to the same real world resource.

Thomas Schandl

Report of Linked Data Camp Vienna

Earlier this month the first ever Linked Data Camp took place in Vienna at the Quartier für Digitale Kunst. This two day event attracted about 35 people to discuss and to jointly work on novel applications for the Web of Data.

The first day started off with a keynote by Richard Cyganiak form DERI Galway’s Linked Data Research Center. He talked about the technical challenges that have to be overcome to allow for more Linked Data applications over heterogenous RDF data. These challenges revolve around discovery of and access to Linked Data, identifier and schema reconciliation, data fusion, quality assessment, aggregation, analytics and mining.
As Richard pointed out, the good news is “that linked data makes it possible that different people do the different steps, e.g., the publisher can help doing the identifier reconciliation by publishing sameAs links, and 3rd parties can help with access by providing a single SPARQL store over multiple related but independent datasets.” Check out the transcript
or slides for Richard’s talk.

Linked Data Camp Vienna Working Groups

After this keynote participants presented their topics of interest in Lightning Talks and working groups formed, some of their outcomes can be found online:
One group worked on the topic of “Dataset Dynamics”. As data in Linked Data sets change, clients having some dependency on data need to be notified about these changes. You can read about their proposed solutions here.
Another group had a go at “Expert search and profiling on the Semantic Web”, their discussions are summarized in this blog post.
Andreas Langegger demonstrated XLWrap, which is a versatile RDF wrapper for spreadsheets. A lot of feature request from participants came up (see here), so he and others worked on this handy application.

On day 2 Leigh Dodds from Talis talked about “Rights Statements on the Web of Data” (slides and transcript). Leigh raised awareness for the issue that the majority of LOD sources do not have licensing information associated with their data. This of course conflicts with the proposed openness of Linked “Open” Data, as it is doubtful whether these sources can be used for commercial puropses.

The organizers from the universities of Linz and Vienna, Joanneum Research, Gnowsis, DERI Galway, STI Innsbruck and the Semantic Web Company would like to thank all participants for making the camp a success! As with VoCamps anyone can organize a Linked Data Camp, so we hope for more camps in 2010!

Tassilo Pellegrini

Linked Data Flows: A new picture to illustrate the “openness” we mean

(Original post taken from “About the Social Semantic Web“)

A lot of activities around Linking Open Data (“LOD”) and the associated data sets which are nicely visualised as a “cloud” are going on for quite a while now. It is exciting to see how the rather academic “Semantic Web” and all the work which is associated with this disruptive technology can be transformed now into real business use cases.

What I have observed in the last few months, especially in business communities, is the following:

  • “Linked Data” sounds interesting for the business people because the phrase creates a lot of associations in a second or two; also the database crowd seems to be attracted by this web-based approach of data integration
  • “Web of Data” is somehow misleading because many people think that this will be a new web which replaces something else. Same story with the “Semantic Web”
  • “Linking Open Data” sounds dangerous and not trustworthy to many companies

For insiders it is clear, that the “openness” of data, especially in commercial settings, can be controlled and has to be controlled in many cases i.e. by defining the right licensing models. But here we are still at the beginning as a workshop at ISWC 2009 has illustrated.

Anyway, looking at the characteristics of Linked Data Flows, they can be one-way or mutual. In some cases data from companies will be put into the cloud, and can be opened up for many purposes, in other use cases it will stay inside the boundaries. In other scenarios only (open) data from the web will be consumed and linked with corporate data, but no data will be exposed to the world (except the fact, that data was consumed by an entity).

And of course: On many other occasions datasets and repositories will be opened up partly depending on the CCs (or similar, not yet defined attributes) and the underlying privacy regulations one wants to use.

This makes clear that LOD / Linking Open Data is just one detail of a bigger picture. Since companies (and governments) play a crucial role to develop the whole infrastructure, we need to draw a new picture that illustrates the various Linked Data Flows in a better way:

linkeddataworld

Concluding from this the best thing would be to talk about Linked Data in general and just refer to Linking Open Data in the right context. Despite better knowledge for business people the term  “open” is still associated with “free” and “dubious provenance”. And given the fact that hardly anybody has given hard evidence on the ROI of open business models the “open argument” does count little in a time of decreasing economic prosperity.

So what would be critical to get the Linked Data thing running is to provide the corresponding business and licensing models for your Linked Data strategy. But this includes having a good understanding of the assets you want to capitalize. Given the fact that metada assets are still a novel and vastly unexplored business field which so far lack a regulated supply and demand structure there are still lots of structural obstacles that hinder the uptake of Linked Data. Providing more of the same in a laissez faire mode – like TimBL critisized at this year’s Web 2.0 Summit – might be inspiring for the in-crowd, but it might not be sufficient to build a linked data business.

Thomas Thurner

55 people enjoyed the first semantic web meetup in vienna

dsc_0494Yesterdays first “semantic web meetup” attracted 55 attendees to join in for presenting, talking and socialising. Approximately one year after the series of semantic web meetups started in NYC, there is now also a vital community gathering in vienna. Beside an inside view on brandnew ideas and developments of austrias semweb-labs in presenations and lightning talks, Steve Sandhouse of New York Times joined in via webmeeing to give an insight on NY-Times’s Semantic Web – efforts, which have a back-history of about 100 years now – as he explained.

In conclusion: A good start for the First Vienna Semantic Web Meetup, which may paved the way for a next meeting in the very next future. In the meanwhile some pictures of the venue to amuse those which were there and to inspire new people to join: www.meetup.com

Reblog this post [with Zemanta]