Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Linking Open Data to Thesaurus Management

February 16, 2010 By: Tassilo Pellegrini Category: Corporate Semantic Web, Knowledge Management, Linked Data & Open Data, Search Engines, Semantic Web Applications, Software Development

The Vienna-based company punkt. netServices is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a demo.

Purpose

Poolparty was conceived to facilitate various applications like

  • Semantic search engines
  • Recommender systems (similarity search)
  • Corporate bookmarking
  • Annotation- & tag recommender systems
  • Autocomplete services and facetted browsing.

These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.

Thesaurus Management

PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag & drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.

poolparty-blueskin

PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus’ domain in order to glean candidate terms. This is done by the key-phrase extractor of KEA. The extracted terms can be selected by the user, thereby becoming “free concepts” which later can be integrated into the thesaurus, turning them into “approved concepts”.

Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept’s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic search. Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.

PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: Cocktail thesaurusStandard Thesaurus for Economics)

PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or Zthes format. Those functionalities can also be consumed as stand-alone web services via PoolParty SKOS Services. Additionaly, lists of concepts and their labels can also be imported via CSV files.

Linked (Open) Data

PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.

Concepts in the thesaurus can be linked to e.g. DBpedia  via a service like Georgi Kobilarov’s DBpedia lookup service, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.

poolparty-lod

Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.

PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the KiWi  project funded by the European Commission. KiWi also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.

Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.

This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using REST/HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.

Technology

The software is written in Java and utilizes the SAIL API, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on Yahoo User Interface (YUI). Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the KEA 5 API, which is extended for the use of controlled vocabularies stored in a SAIL Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in Lucene/Solr or any other (enterprise) search system along with extracted and semantically related concepts.

Reblog this post [with Zemanta]
Sphere: Related Content

A new Semantic Web journal – with an open review process

January 17, 2010 By: Pascal Hitzler Category: Literature & Publications

SWJ-logoA new journal was launched yesterday, called “Semantic Web – Interoperability, Usability, Applicability.” The publisher is IOS Press, who is already active in the Semantic Web area, e.g. by means of their journal “Applied Ontology,” their book series “Studies on the Semantic Web,” and a considerable number of Semantic Web publications in their series “Frontiers in Artificial Intelligence and Applications” (and, not to forget, a frequent physical presence at major Semantic Web conferences).

Since I am one of the editors-in-chief (the other one is Krzysztof Janowicz), I prefer to refrain from discussing the rationale behind launching (yet another) Semantic Web journal. Let’s just say that a growing community requires a growing communication infrastructure, and let history deal with the rest …

But I’d like to point out that we have made a very conscious decision to run the journal under an open and transparent review process: with non-anomyous reviews which are made publicly available on the journal homepage.  And any researcher – not only those explicitly asked to review – can add reviews to submitted papers and thus influence the transparent decision process. We’ve already received a lot of positive feedback about this set-up, and we’re looking forward to seeing it in motion.

Besides the types of papers one usually finds in journals, such as traditional research papers and surveys, the journal will also sport short papers on ontologies, tools, and applications.

We’re looking forward to your contributions to this new and exciting endeavour!

Pascal Hitzler

Sphere: Related Content

Jordan S. Hatcher: “Why we can’t use the same open licensing approach for databases as we do for content and software.”

January 14, 2010 By: Tassilo Pellegrini Category: Linked Data & Open Data, Miscellaneous, Politics

jordanJordan S. Hatcher is, among other things, a lawyer, academic, and entrepreneur working on Intellectual Property and Internet law issues in the UK and worldwide. He is heavily involved in the Open Data Commons initiative. Last month he gave me an interview on IPR issues associated with data licensing. His brief answer to the question why data needs a seperate licensing framework:

The answer to me is that database and data are different.  They’re different legally and different practically in what consumers and producers of open data want to do with it.  They’re also different in what the future looks like in terms of things like linked data.

Read the details in the full interview.

Reblog this post [with Zemanta]
Sphere: Related Content

Pew Research investigates the Internet in 2020

January 08, 2010 By: Tassilo Pellegrini Category: Uncategorized

Found this survey on an O’Reilly blogpost. Some questions are quite trivial but PEW also asks about the impact of the Semantic Web in 2020.

Take your chance: If you’d like to take the survey, you can currently visit http://www.facebook.com/l/c6596;survey.confirmit.com/wix2/p1075078513.aspx and enter PIN 2000.

Sphere: Related Content

I-SEMANTICS 2010: Call for Papers

January 04, 2010 By: Tassilo Pellegrini Category: Calls & Competitions, Conferences & Events, Linked Data & Open Data

isemantics_logoFrom September 1 – 3, 2010 I-SEMANTICS, the 6th international conference on semantic systems, will take place in Graz / Austria. This year’s focus is „Towards a Web of Linked Data”. As a conference aiming to bring together science and industry, I-SEMANTICS encourages both, scientific (research/application) and industrial contributions.

Additionally I-SEMANTICS will host the 2nd Pragmatic Web Track and the 3rd Triplification Challenge.

The combined CfP for I-SEMANTICS, Pragmatic Web Track & Triplification Challenge is available here.

Reblog this post [with Zemanta]
Sphere: Related Content

Report of Linked Data Camp Vienna

December 15, 2009 By: Thomas Schandl Category: Conferences & Events, Linked Data & Open Data

Earlier this month the first ever Linked Data Camp took place in Vienna at the Quartier für Digitale Kunst. This two day event attracted about 35 people to discuss and to jointly work on novel applications for the Web of Data.

The first day started off with a keynote by Richard Cyganiak form DERI Galway’s Linked Data Research Center. He talked about the technical challenges that have to be overcome to allow for more Linked Data applications over heterogenous RDF data. These challenges revolve around discovery of and access to Linked Data, identifier and schema reconciliation, data fusion, quality assessment, aggregation, analytics and mining.
As Richard pointed out, the good news is “that linked data makes it possible that different people do the different steps, e.g., the publisher can help doing the identifier reconciliation by publishing sameAs links, and 3rd parties can help with access by providing a single SPARQL store over multiple related but independent datasets.” Check out the transcript
or slides for Richard’s talk.

Linked Data Camp Vienna Working Groups

After this keynote participants presented their topics of interest in Lightning Talks and working groups formed, some of their outcomes can be found online:
One group worked on the topic of “Dataset Dynamics”. As data in Linked Data sets change, clients having some dependency on data need to be notified about these changes. You can read about their proposed solutions here.
Another group had a go at “Expert search and profiling on the Semantic Web”, their discussions are summarized in this blog post.
Andreas Langegger demonstrated XLWrap, which is a versatile RDF wrapper for spreadsheets. A lot of feature request from participants came up (see here), so he and others worked on this handy application.

On day 2 Leigh Dodds from Talis talked about “Rights Statements on the Web of Data” (slides and transcript). Leigh raised awareness for the issue that the majority of LOD sources do not have licensing information associated with their data. This of course conflicts with the proposed openness of Linked “Open” Data, as it is doubtful whether these sources can be used for commercial puropses.

The organizers from the universities of Linz and Vienna, Joanneum Research, Gnowsis, DERI Galway, STI Innsbruck and the Semantic Web Company would like to thank all participants for making the camp a success! As with VoCamps anyone can organize a Linked Data Camp, so we hope for more camps in 2010!

Sphere: Related Content

George Anadiotis: “Linked Data brings value by offering an alternative approach to lightweight data integration and mashups.”

December 10, 2009 By: Tassilo Pellegrini Category: Linked Data & Open Data, Mashups & Web services, Semantic Web Applications, Software Development, Tools & Software, Vocabularies & Languages

george-imcGeorge Anadiotis is an expert on artificial intelligence with academic roots at the Vrije Universiteit, Amsterdam. In February 2009 he took the position as R&D Director at the Greek technology company IMC. I met him in September at I-SEMANTICS 2009 where he and his team contributed to the Triplification Challenge. In their paper Linked Data for the Masses they were pondering about the pragmatic value of Linked Data from an inbound and outbound perspective.  In his words:

We started experimenting with the technical infrastructure needed and created some proof-of-concept applications. Part of this work was enabling Linked Data access for the front-end infrastructure we used, Liferay portal. We decided on the appropriate vocabularies for the type of content we wanted to publish (FOAF, SIOC and MOAT mainly), delved on the internals of Liferay and used D2R to map its relational database to the vocabularies of choice, also using techniques to improve performance as much as possible. Since Liferay itself is also based on the notion of communities, we thought our work would be more widely applicable and useful, so we chose to submit it for review at the Triplification Challenge and make it available to the community as open source software. Our applications have gradually matured and are about to be deployed in our commercial projects, while at the same time we are now making the Liferay Linked Data Module available as a Sourceforge project and we are working with Liferay management in order to disseminate this effort to the community and also include it in a future release of the software.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Solve the Semantic Puzzle!

December 01, 2009 By: Andreas Blumauer Category: Conferences & Events, Linked Data & Open Data

Yesterday´s Vienna Semantic Web Meetup was quite a big success. Around 55 attendees came around and enjoyed great atmosphere at “Museumsquartier” / “Quartier for Digital Culture”.

The event took place  just after the first day of the “Linked Data Camp Vienna“, where quite a lot of important questions around linked data were tackled, e.g.: Which role will linked data consolidators play in the future?

After a day full of “insider”-discussions it was also a pleasure to meet people from “outside” the linked data community to re-check if the semantic puzzle is still something worth to be solved.

Sphere: Related Content

Lyndon Nixon (STI): “Clear guidelines on how to best make use of Linked Open Data by enterprises is needed”

November 13, 2009 By: Andreas Blumauer Category: Conferences & Events

The European Semantic Technology Conference 2009 will take place in Vienna at the beginning of December 2009. Andreas Blumauer (Semantic Web Company) talked with Lyndon Nixon who is the program advisor of this conference:

estc09_logo

SWC: In its own saying the European Semantic Technology Conference brings together the smartest minds in Semantic Technologies. What will be the highlights of this year’s conference?

There are many highlights this year! We have a full program of presentations, workshops, panels and a keynote by Susie Stephens from Johnson&Johnson. On top of this, the first day will see the first ever ESTC Innovation Seed Camp, where enterpreneurs and young start ups are invited to pitch their ideas to a panel of venture capitalists and there will be cash prizes! Besides the main program, an open demo space will continually offer new showcases of semantic technologies and products, while a networking zone gives attendees a relaxed space to make business away from the conference hectic. We will also be holding matchmaking sessions, where attendees can schedule one-to-one meetings with other attendees organized by a handy online tool. Finally, in line with ESTC’s focus on semantics and innovation – during the two days we will give participants the chance to check out two innovative conference tools: an electronic vCard exchange and the “Web Comparator”. So, too much to explain, but you can get full information on every aspect of the conference at its webpage www.estc2009.com.

SWC: How would you describe the state of the art in semantic technology business especially in Europe?

We are at a very exciting period in the enterprise uptake of semantic technologies, which can be seen in the growth in attendance at events such as ESTC. Semantic technologies are finally maturing and can be used away from toy examples in real, critical business processes. Technologies are being standardized and tools aligned to those standards, while progress in being made in supporting the sorts of extensions that businesses need (see OWL 2, or SPARQL 1.1). This year, the case studies and the business applications that will be presented are going to reflect that. We are still inside the early adopter phase in semantic uptake, with the critical mass of companies still checking out semantics at a research and prototyping level. However, the balance is shifting and enabling the technology transfer to real business projects is key; ESTC’s focus on direct contact between the technology vendors and the business clients – the networking zone, the open demo space, or the matchmaking sessions – is a reflection of the importance of an event such as ESTC to bring these two groups together.

SWC: One of the big issues at the moment is Linking Open Data. How do you perceive this development and how can you start?

 

Yes, Linked Open Data is an interesting development, making a significant amount of semantic data about a broad subject range available to everyone. I think it has real value in the research community where large data sources have been needed. For industry, I would say its value is less straight-forward: the data is not always so clean and care needs to be taken before building business applications on top of it. Clear guidelines on how to best make use of Linked Open Data by enterprises is needed. ESTC picks up on this in its program this year: we will have an expert panel precisely on this subject! Of course, leveraging the Linked Data Cloud in the enterprise is already a topic for many organisations – one of our paper sessions is on Linked Data and I am sure there will be plenty mentions of it elsewhere during the conference!

SWC: In recent years large IT-companies and system integrators have rather been playing around with semantic web technologies than identifying the semantic web
 as a market opportunity. Do you think that this situation has changed already?

I think a lot of these companies are being cautious – the Semantic Web was so hyped to industry in its first years that a certain level of cynicism grew. Now, that we have in my opinion very real and valuable tools and technologies built on semantics, the companies are being careful in how to present this to the market. There are already many very encouraging examples of semantics making inroads in key markets where data heterogeneity, integration, and management have become key issues: I think Health Care and Life Sciences is simply the market which is being most open about it (the ESTC keynote speaker Susie Stephens will also report on semantic technologies in this market). There are further examples we just don’t know about, because companies don’t want to let their competitors know, or mention that it is semantics which is being used.

SWC: Please add another statement which is important for you!

ESTC 2009 will be *the* meeting place in Europe this year for semantic technology vendors and users – don’t miss it!

Sphere: Related Content

Demozone for semantic applications launched

November 10, 2009 By: Thomas Schandl Category: Semantic Web Applications, Tools & Software, Videos & Tutorials

The Semantic Web Company compiled a suite of some of the best semantic web applications and put them in one place for you to try out: The SWC Demozone.

swc demozone logo

We selected tools pertaining to the different application areas of the Semantic Web – be it for finding, creating, linking and/or publishing information.

The showcased applications and services so far are:

Have a look at the demos and try them out for yourself – we provided explanations and links to screencasts teaching you how to use them.

We will add more demos in the future. If you are the owner of or a contributer to an application that you’d like to see showcased in the demozone, too, please drop us a line and we’ll try to add a demo for your software.

Sphere: Related Content