Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for the ‘Corporate Semantic Web’

Linking Open Data to Thesaurus Management

February 16, 2010 By: Tassilo Pellegrini Category: Corporate Semantic Web, Knowledge Management, Linked Data & Open Data, Search Engines, Semantic Web Applications, Software Development 1 Comment →

The Vienna-based company punkt. netServices is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a demo.

Purpose

Poolparty was conceived to facilitate various applications like

  • Semantic search engines
  • Recommender systems (similarity search)
  • Corporate bookmarking
  • Annotation- & tag recommender systems
  • Autocomplete services and facetted browsing.

These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.

Thesaurus Management

PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag & drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.

poolparty-blueskin

PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus’ domain in order to glean candidate terms. This is done by the key-phrase extractor of KEA. The extracted terms can be selected by the user, thereby becoming “free concepts” which later can be integrated into the thesaurus, turning them into “approved concepts”.

Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept’s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic search. Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.

PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: Cocktail thesaurusStandard Thesaurus for Economics)

PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or Zthes format. Those functionalities can also be consumed as stand-alone web services via PoolParty SKOS Services. Additionaly, lists of concepts and their labels can also be imported via CSV files.

Linked (Open) Data

PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.

Concepts in the thesaurus can be linked to e.g. DBpedia  via a service like Georgi Kobilarov’s DBpedia lookup service, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.

poolparty-lod

Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.

PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the KiWi  project funded by the European Commission. KiWi also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.

Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.

This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using REST/HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.

Technology

The software is written in Java and utilizes the SAIL API, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on Yahoo User Interface (YUI). Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the KEA 5 API, which is extended for the use of controlled vocabularies stored in a SAIL Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in Lucene/Solr or any other (enterprise) search system along with extracted and semantically related concepts.

Reblog this post [with Zemanta]
Sphere: Related Content

Linked Data Flows: A new picture to illustrate the “openness” we mean

October 28, 2009 By: Tassilo Pellegrini Category: Corporate Semantic Web, Linked Data & Open Data 1 Comment →

(Original post taken from “About the Social Semantic Web“)

A lot of activities around Linking Open Data (“LOD”) and the associated data sets which are nicely visualised as a “cloud” are going on for quite a while now. It is exciting to see how the rather academic “Semantic Web” and all the work which is associated with this disruptive technology can be transformed now into real business use cases.

What I have observed in the last few months, especially in business communities, is the following:

  • “Linked Data” sounds interesting for the business people because the phrase creates a lot of associations in a second or two; also the database crowd seems to be attracted by this web-based approach of data integration
  • “Web of Data” is somehow misleading because many people think that this will be a new web which replaces something else. Same story with the “Semantic Web”
  • “Linking Open Data” sounds dangerous and not trustworthy to many companies

For insiders it is clear, that the “openness” of data, especially in commercial settings, can be controlled and has to be controlled in many cases i.e. by defining the right licensing models. But here we are still at the beginning as a workshop at ISWC 2009 has illustrated.

Anyway, looking at the characteristics of Linked Data Flows, they can be one-way or mutual. In some cases data from companies will be put into the cloud, and can be opened up for many purposes, in other use cases it will stay inside the boundaries. In other scenarios only (open) data from the web will be consumed and linked with corporate data, but no data will be exposed to the world (except the fact, that data was consumed by an entity).

And of course: On many other occasions datasets and repositories will be opened up partly depending on the CCs (or similar, not yet defined attributes) and the underlying privacy regulations one wants to use.

This makes clear that LOD / Linking Open Data is just one detail of a bigger picture. Since companies (and governments) play a crucial role to develop the whole infrastructure, we need to draw a new picture that illustrates the various Linked Data Flows in a better way:

linkeddataworld

Concluding from this the best thing would be to talk about Linked Data in general and just refer to Linking Open Data in the right context. Despite better knowledge for business people the term  “open” is still associated with “free” and “dubious provenance”. And given the fact that hardly anybody has given hard evidence on the ROI of open business models the “open argument” does count little in a time of decreasing economic prosperity.

So what would be critical to get the Linked Data thing running is to provide the corresponding business and licensing models for your Linked Data strategy. But this includes having a good understanding of the assets you want to capitalize. Given the fact that metada assets are still a novel and vastly unexplored business field which so far lack a regulated supply and demand structure there are still lots of structural obstacles that hinder the uptake of Linked Data. Providing more of the same in a laissez faire mode – like TimBL critisized at this year’s Web 2.0 Summit – might be inspiring for the in-crowd, but it might not be sufficient to build a linked data business.

Sphere: Related Content

Webinars about Business Use of Semantic Technologies

September 10, 2009 By: Thomas Schandl Category: Corporate Semantic Web, Enterprise 2.0, Knowledge Management, Linked Data & Open Data, Semantic Web Applications, Videos & Tutorials No Comments →

The Semantic Web Company created a series of online seminars (aka webinars) for you to acquire basic and practical knowledge about methologies, technologies and standards of the Semantic Web. In 90 minute sesseions we will cover the business aspects of topics such as content engineering, Knowledge Management, business intelligence, e-Business and more.

RDF Exit

In order to allow for a high level of interaction, the attendance is limited to ten participants and ample time for questions and discussion with our experts is designated. Each webinar works as a stand-alone module, so you can pick and choose some of them or book the whole series of 6 webinars.

We’ll kick off with a session about Semantic Wikis on Thursday 22nd of October. A German language version will be held at 9 a.m., alternatively you can atted an English version at 6 p.m. CET.

Each Thursday we cover a different topic such as Semantic Search, Corporate Thesaurus Management, Text Mining on the Corporate Semantic Web, Linking Open Data and Semantic Advertising.

In order to participate you only need broadband access to the internet, Windows or a Mac and a fairly up-to-date browser. For detailed system requirement see the webinar overview.

We hope to talk to you in one or more of these sessions!

Sphere: Related Content

New W3C Rule Interchange Format (W3C RIF) standard published

July 28, 2009 By: Tassilo Pellegrini Category: Corporate Semantic Web, Vocabularies & Languages No Comments →

The W3C Working Group working on W3C Rule Interchange Format (RIF) has recently launched a new standard for the interchange of rules. Some guys from the Coporate Semantic Web Working Group of Freie Universität Berlin have been heavily involved. An interview on the practical aspects of RIF will follow in August.

Reblog this post [with Zemanta]
Sphere: Related Content

REMINDER: Berlin Semantic Web Meetup & Industry Day

June 05, 2009 By: Tassilo Pellegrini Category: Conferences & Events, Corporate Semantic Web No Comments →

A new Semantic Web Meetup will take place in Berlin June 19, 2009 starting at 17:30. Please register for participation.

Before the Meetup from 09:30 till 17:30 the Corporate Semantic Web Working Group of FU Berlin and the Semantic Web Company will hold a training workshop about the application of Semantic Web technologies in corporate settings. The workshop will be held in german.

Further details about place, time and the program can be found behind this link.

Sphere: Related Content

loomp supports structured annotation in corporate settings

April 20, 2009 By: Tassilo Pellegrini Category: Corporate Semantic Web, Enterprise 2.0, Knowledge Management, Tools & Software No Comments →

loomp

Markus Luczak-Rösch and his team from FU Berlin have published loomp, a WYSIWYG annotation tool especially designed for inhouse use. loomp is aiming at the Corporate Semantic Web market, providing a semantic application with low entry barriers and high usability designed for non-techies.

When asked about the concrete application area Markus says:

We have found various use cases especially in knowledge and content intense domains. The most interesting one is the journalists use case. Consider journalists which research and write articles and editors which revise and publish the work of journalists.

Journalists research specific topics on demand and access various information sources for this purpose, e.g. websites, books, related articles, and human informants. Only few journalists use digital devices for this task and even fewer apply information management systems. To transfer the finished article to the responsible editor at the publishing house the people use free text documents and email communication. Finally, an editor revises and releases the articles for his department. loomp can help journalists to manage their notes, interview logs, references, addresses, etc. loomp helps to link an article to its information sources.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Chris Bizer talks about the commercial opportunities of linked data

April 17, 2009 By: Tassilo Pellegrini Category: Corporate Semantic Web, Linked Data & Open Data, Mashups & Web services, Privacy & Information Ethics No Comments →

bizerIn a recent interview Prof. Chris Bizer from FU Berlin gave some insights into the commercial opportunities of linked data. In the short run he predicts three application areas:

I think we will see a growing number of applications that use data from the public Web as background knowledge to offer better search capabilities and to augment local content with additional content from the Web of Data.
[...]
Beside of the classic search engines, there might also be market opportunities for new search engines that specialize on Linked Data. [...] This will allow them to sell access to cleaned views on the Data Web and to become central components within Linked Data applications.
[...]
Within the corporate market, there is interest in using Linked Data as a lightweight, pay-as-you-go data integration technology.

Additionally Chris comments on the latest developments in the area of triple stores and D2RQ, and the necessity for more privacy awareness and information accountability in an increasingly interlinked world.

Read the full interview on our homepage.

Reblog this post [with Zemanta]
Sphere: Related Content

BBC Music relaunch: Linked Data goes Business?

April 08, 2009 By: Andreas Blumauer Category: Corporate Semantic Web, Linked Data & Open Data 10 Comments →

Since SWC is involved in a couple of semantic web projects in the media industry, I was watching for the BBC Music relaunch. Now the new platform is online – and from an enduser’s perspective the new system offers comfortable ways to navigate through the world of music: Bands, their members, biographies and outgoing links like to Wikipedia or MySpace are retrieved from MusicBrainz and mashed up with BBC blogs, playlists or reviews.

bbc_music

Matthew Shorter, interactive editor for music at the BBC, told silicon.com:

We’re kind of on a journey of moving from what’s effectively a magazine/print publication-based metaphor around web publishing…to a world where we recognise that that’s not the way that people use the web.

No doubt: Linked Data is a great deal for the end-users but what´s in for the providers, in this case for BBC?

From a media company’s perspective Shorter has mentioned a handful of interesting arguments why linked data could be useful:

  1. reusing data from MusicBrainz and Wikipedia also provides better value for the licence payer as the BBC isn’t wasting resources reproducing data already in the public domain
  2. from an SEO point of view, once we start generating a lot of meaningful links among our pages, then we’re going to improve the find-ability of our content via web search
  3. by having as open a platform as we can, then our hope at least is that people will pick up that content and do things with it and we’ll benefit from incoming links as a result

This could be summarised as follows (by adding a fourth item):

  1. re-use existing data
  2. increase find-ability
  3. extend your eco-system
  4. understand users’ interests

By saying that linked data can help providers to understand their users in a more profound way which is based on the more granular way how information is offered in the linked data world (paradigm shift: page versus linked data) I´d like to ask a short, value-free question: Which side of the internet will drive the business in the future – the visible web or the deep web? Was linked data designed only for the visible web?

Reblog this post [with Zemanta]
Sphere: Related Content

Keep the Semantic Web trusty

March 13, 2009 By: Thomas Thurner Category: Corporate Semantic Web, Mashups & Web services, Politics, Privacy & Information Ethics, Text Mining 1 Comment →

Tim Berners-Lee at a Podcast Interview
Image via Wikipedia

In recent days – here at Semantic Web Company – we have had a lot of discussions on how the future of the Semantic Web (name it Web3.0 if you like) will develop. Several stakeholders on the future of the Semantic Web see already, that also a potential danger will come along with the technical realisation of the web3.0: This is the present possibility to create applications and mashups with semantic technologies that are a real drain on privacy and information ethics. Without an underpinning discussion about the ethical framework within technolgies like linked data, text-mining, biometric-systems and geo-systems in combination with the web of data, the whole domain is in danger to be doomed like genetic engineering some years ago.

It’s crucial for the public opinion on the Semantic Web, to adress the immanent risks regarding privacy and ethics. In this context I’ll see also Tim Berners-Lee’s statement yesterday: “W3C wants to help make sure data use is appropriate,” he said. Berners-Lee, who is director of W3C, said in an interview on Wednesday that the teams working on the Semantic Web project are making sure that privacy principles are included in its architecture: “The Semantic Web project is developing systems which will answer where data came from and where it’s going to — the system will be architectured for a set of appropriate uses.”

Maybe it’s an important step in keeping the further development of Semantic Web trusty in the eyes of public opinion, that the W3C has privacy and information ethics on their agenda and persons like Berners-Lee stand with their reputation for it. But it is also crucial to build this awareness on the corporate side. Only if everyone within the domain follows a common ethic understanding we have a public opinion, which is on the future potential of the Semantic Web, and not in fear of the same.

Reblog this post [with Zemanta]
Sphere: Related Content

Enterprise Search goes Open Source

February 19, 2009 By: Thomas Thurner Category: Corporate Semantic Web, Enterprise 2.0, Knowledge Management, Tools & Software 2 Comments →

management_lenz_webIn his recent interview Andreas Blumauer (SWC) asked Mario Lenz, from german-based knowledge management solution provider EMPOLIS, about their OS-Initative SMILA. As Lenz explained, SMILA acts within a domain of various approaches and already established solutions re. Enterprise Resource Planning Systems. So, he sees SMILA’s USP in: “a standardized way of representing, accessing and managing those unstructured data which not exist today. Rather, each vendor ships his own, proprietary solution. SMILA’s goals are to define and implement such a standard infrastructure framework and to establish a community bringing it forward.”

Besides an insight in many aspects of the initiative, the interview provides thoughts on how connected business-models, in providing services, could look like.

[read more]

Reblog this post [with Zemanta]
Sphere: Related Content