Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Linking Open Data to Thesaurus Management

February 16, 2010 By: Tassilo Pellegrini Category: Corporate Semantic Web, Knowledge Management, Linked Data & Open Data, Search Engines, Semantic Web Applications, Software Development 1 Comment →

The Vienna-based company punkt. netServices is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a demo.

Purpose

Poolparty was conceived to facilitate various applications like

  • Semantic search engines
  • Recommender systems (similarity search)
  • Corporate bookmarking
  • Annotation- & tag recommender systems
  • Autocomplete services and facetted browsing.

These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.

Thesaurus Management

PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag & drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.

poolparty-blueskin

PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus’ domain in order to glean candidate terms. This is done by the key-phrase extractor of KEA. The extracted terms can be selected by the user, thereby becoming “free concepts” which later can be integrated into the thesaurus, turning them into “approved concepts”.

Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept’s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic search. Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.

PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: Cocktail thesaurusStandard Thesaurus for Economics)

PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or Zthes format. Those functionalities can also be consumed as stand-alone web services via PoolParty SKOS Services. Additionaly, lists of concepts and their labels can also be imported via CSV files.

Linked (Open) Data

PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.

Concepts in the thesaurus can be linked to e.g. DBpedia  via a service like Georgi Kobilarov’s DBpedia lookup service, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.

poolparty-lod

Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.

PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the KiWi  project funded by the European Commission. KiWi also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.

Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.

This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using REST/HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.

Technology

The software is written in Java and utilizes the SAIL API, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on Yahoo User Interface (YUI). Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the KEA 5 API, which is extended for the use of controlled vocabularies stored in a SAIL Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in Lucene/Solr or any other (enterprise) search system along with extracted and semantically related concepts.

Reblog this post [with Zemanta]
Sphere: Related Content

Jordan S. Hatcher: “Why we can’t use the same open licensing approach for databases as we do for content and software.”

January 14, 2010 By: Tassilo Pellegrini Category: Linked Data & Open Data, Miscellaneous, Politics No Comments →

jordanJordan S. Hatcher is, among other things, a lawyer, academic, and entrepreneur working on Intellectual Property and Internet law issues in the UK and worldwide. He is heavily involved in the Open Data Commons initiative. Last month he gave me an interview on IPR issues associated with data licensing. His brief answer to the question why data needs a seperate licensing framework:

The answer to me is that database and data are different.  They’re different legally and different practically in what consumers and producers of open data want to do with it.  They’re also different in what the future looks like in terms of things like linked data.

Read the details in the full interview.

Reblog this post [with Zemanta]
Sphere: Related Content

I-SEMANTICS 2010: Call for Papers

January 04, 2010 By: Tassilo Pellegrini Category: Calls & Competitions, Conferences & Events, Linked Data & Open Data No Comments →

isemantics_logoFrom September 1 – 3, 2010 I-SEMANTICS, the 6th international conference on semantic systems, will take place in Graz / Austria. This year’s focus is „Towards a Web of Linked Data”. As a conference aiming to bring together science and industry, I-SEMANTICS encourages both, scientific (research/application) and industrial contributions.

Additionally I-SEMANTICS will host the 2nd Pragmatic Web Track and the 3rd Triplification Challenge.

The combined CfP for I-SEMANTICS, Pragmatic Web Track & Triplification Challenge is available here.

Reblog this post [with Zemanta]
Sphere: Related Content

Report of Linked Data Camp Vienna

December 15, 2009 By: Thomas Schandl Category: Conferences & Events, Linked Data & Open Data No Comments →

Earlier this month the first ever Linked Data Camp took place in Vienna at the Quartier für Digitale Kunst. This two day event attracted about 35 people to discuss and to jointly work on novel applications for the Web of Data.

The first day started off with a keynote by Richard Cyganiak form DERI Galway’s Linked Data Research Center. He talked about the technical challenges that have to be overcome to allow for more Linked Data applications over heterogenous RDF data. These challenges revolve around discovery of and access to Linked Data, identifier and schema reconciliation, data fusion, quality assessment, aggregation, analytics and mining.
As Richard pointed out, the good news is “that linked data makes it possible that different people do the different steps, e.g., the publisher can help doing the identifier reconciliation by publishing sameAs links, and 3rd parties can help with access by providing a single SPARQL store over multiple related but independent datasets.” Check out the transcript
or slides for Richard’s talk.

Linked Data Camp Vienna Working Groups

After this keynote participants presented their topics of interest in Lightning Talks and working groups formed, some of their outcomes can be found online:
One group worked on the topic of “Dataset Dynamics”. As data in Linked Data sets change, clients having some dependency on data need to be notified about these changes. You can read about their proposed solutions here.
Another group had a go at “Expert search and profiling on the Semantic Web”, their discussions are summarized in this blog post.
Andreas Langegger demonstrated XLWrap, which is a versatile RDF wrapper for spreadsheets. A lot of feature request from participants came up (see here), so he and others worked on this handy application.

On day 2 Leigh Dodds from Talis talked about “Rights Statements on the Web of Data” (slides and transcript). Leigh raised awareness for the issue that the majority of LOD sources do not have licensing information associated with their data. This of course conflicts with the proposed openness of Linked “Open” Data, as it is doubtful whether these sources can be used for commercial puropses.

The organizers from the universities of Linz and Vienna, Joanneum Research, Gnowsis, DERI Galway, STI Innsbruck and the Semantic Web Company would like to thank all participants for making the camp a success! As with VoCamps anyone can organize a Linked Data Camp, so we hope for more camps in 2010!

Sphere: Related Content

George Anadiotis: “Linked Data brings value by offering an alternative approach to lightweight data integration and mashups.”

December 10, 2009 By: Tassilo Pellegrini Category: Linked Data & Open Data, Mashups & Web services, Semantic Web Applications, Software Development, Tools & Software, Vocabularies & Languages No Comments →

george-imcGeorge Anadiotis is an expert on artificial intelligence with academic roots at the Vrije Universiteit, Amsterdam. In February 2009 he took the position as R&D Director at the Greek technology company IMC. I met him in September at I-SEMANTICS 2009 where he and his team contributed to the Triplification Challenge. In their paper Linked Data for the Masses they were pondering about the pragmatic value of Linked Data from an inbound and outbound perspective.  In his words:

We started experimenting with the technical infrastructure needed and created some proof-of-concept applications. Part of this work was enabling Linked Data access for the front-end infrastructure we used, Liferay portal. We decided on the appropriate vocabularies for the type of content we wanted to publish (FOAF, SIOC and MOAT mainly), delved on the internals of Liferay and used D2R to map its relational database to the vocabularies of choice, also using techniques to improve performance as much as possible. Since Liferay itself is also based on the notion of communities, we thought our work would be more widely applicable and useful, so we chose to submit it for review at the Triplification Challenge and make it available to the community as open source software. Our applications have gradually matured and are about to be deployed in our commercial projects, while at the same time we are now making the Liferay Linked Data Module available as a Sourceforge project and we are working with Liferay management in order to disseminate this effort to the community and also include it in a future release of the software.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Linked Data Flows: A new picture to illustrate the “openness” we mean

October 28, 2009 By: Tassilo Pellegrini Category: Corporate Semantic Web, Linked Data & Open Data 1 Comment →

(Original post taken from “About the Social Semantic Web“)

A lot of activities around Linking Open Data (“LOD”) and the associated data sets which are nicely visualised as a “cloud” are going on for quite a while now. It is exciting to see how the rather academic “Semantic Web” and all the work which is associated with this disruptive technology can be transformed now into real business use cases.

What I have observed in the last few months, especially in business communities, is the following:

  • “Linked Data” sounds interesting for the business people because the phrase creates a lot of associations in a second or two; also the database crowd seems to be attracted by this web-based approach of data integration
  • “Web of Data” is somehow misleading because many people think that this will be a new web which replaces something else. Same story with the “Semantic Web”
  • “Linking Open Data” sounds dangerous and not trustworthy to many companies

For insiders it is clear, that the “openness” of data, especially in commercial settings, can be controlled and has to be controlled in many cases i.e. by defining the right licensing models. But here we are still at the beginning as a workshop at ISWC 2009 has illustrated.

Anyway, looking at the characteristics of Linked Data Flows, they can be one-way or mutual. In some cases data from companies will be put into the cloud, and can be opened up for many purposes, in other use cases it will stay inside the boundaries. In other scenarios only (open) data from the web will be consumed and linked with corporate data, but no data will be exposed to the world (except the fact, that data was consumed by an entity).

And of course: On many other occasions datasets and repositories will be opened up partly depending on the CCs (or similar, not yet defined attributes) and the underlying privacy regulations one wants to use.

This makes clear that LOD / Linking Open Data is just one detail of a bigger picture. Since companies (and governments) play a crucial role to develop the whole infrastructure, we need to draw a new picture that illustrates the various Linked Data Flows in a better way:

linkeddataworld

Concluding from this the best thing would be to talk about Linked Data in general and just refer to Linking Open Data in the right context. Despite better knowledge for business people the term  “open” is still associated with “free” and “dubious provenance”. And given the fact that hardly anybody has given hard evidence on the ROI of open business models the “open argument” does count little in a time of decreasing economic prosperity.

So what would be critical to get the Linked Data thing running is to provide the corresponding business and licensing models for your Linked Data strategy. But this includes having a good understanding of the assets you want to capitalize. Given the fact that metada assets are still a novel and vastly unexplored business field which so far lack a regulated supply and demand structure there are still lots of structural obstacles that hinder the uptake of Linked Data. Providing more of the same in a laissez faire mode – like TimBL critisized at this year’s Web 2.0 Summit – might be inspiring for the in-crowd, but it might not be sufficient to build a linked data business.

Sphere: Related Content

Topic Maps and the Semantic Web

October 16, 2009 By: Tassilo Pellegrini Category: Conferences & Events, Miscellaneous, Tools & Software 1 Comment →

tmraFrom November 11 – 13, 2009 this will be one of the big issues at the 5th International Conference on Topic Maps taking place in Leipzig/Germany. When asked about the relationship between TM and SemWeb conference organizer Lutz Maicher says:

With the vision of the web of data Topic Maps and the Semantic Web move closer over time. Anywhere URIs represent subjects, structured statements are gathered around them. In this context I see subj3ct.com as an interesting ventures. This recently launched service provides URIs for 15 million subjects to be used in structured data. Naturally, linked data hubs like dbpedia or geonames.org are part of it. The crowd is invited to contribute to this collection, also the Topic Maps Lab provides several feeds to register new URIs. Subj3ct.com turns out to be an infrastructure technology for Web 3.0 applications, regardless whether they are based on Topic Maps or other Semantic Web technologies.

Through this convergence the uniqueness of each technology sharpens. Reasoning is the strong point of the Semantic Web. But the strength of Topic Maps are semantic portals and the global federation of facts around subjects. Bringing together all and even contradictory information about each subject – and not building reasoning-ready consistent models of the world – is built into the genes of Topic Maps.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

December 2009: Austria will be the hot spot of the Semantic Web World

September 24, 2009 By: Andreas Blumauer Category: Conferences & Events, Linked Data & Open Data No Comments →

There will be a series of events around the Semantic Web & Linked Data in Vienna and Graz at the end of this year. This is a comprehensive list of all of these events, which might help you to make a decision to come to Austria:

Linked Data Camp

Linked Data Camp

The Jackson tribute might have been a flop for Vienna, but this is history! So, what are you waiting for – come & join the party!

Reblog this post [with Zemanta]
Sphere: Related Content

55 people enjoyed the first semantic web meetup in vienna

July 17, 2009 By: Thomas Thurner Category: Conferences & Events No Comments →

dsc_0494Yesterdays first “semantic web meetup” attracted 55 attendees to join in for presenting, talking and socialising. Approximately one year after the series of semantic web meetups started in NYC, there is now also a vital community gathering in vienna. Beside an inside view on brandnew ideas and developments of austrias semweb-labs in presenations and lightning talks, Steve Sandhouse of New York Times joined in via webmeeing to give an insight on NY-Times’s Semantic Web – efforts, which have a back-history of about 100 years now – as he explained.

In conclusion: A good start for the First Vienna Semantic Web Meetup, which may paved the way for a next meeting in the very next future. In the meanwhile some pictures of the venue to amuse those which were there and to inspire new people to join: www.meetup.com

Reblog this post [with Zemanta]
Sphere: Related Content

Calais, Zemanta or textwise?

July 07, 2009 By: Andreas Blumauer Category: Mashups & Web services, Text Mining 2 Comments →

Beside W3C´s Linked Data Initiative, it were semantic services like Calais, Zemanta or textwise which have made the advantages of the Semantic Web visible for a broader community in the last few months.

Each of those services follow a slightly different approach, but in a nutshell: They all offer an API to provide “similarity search” around social media or also to enhance enterprise information management.

Like a magic bullet those services offer a relief from information overflow and seem to become kind of a “semantic web killer application“.

If you´re familiar with one or many of those services, drop a comment and let us know, what you´ve been experienced so far, or also if you can think of any applications or further developments you would like to see around these kind of services.

If you are not familiar with this stuff, for a quick demo go to

The widget uses text from this blog to calculate similar stuff from the web.


Reblog this post [with Zemanta]
Sphere: Related Content