Thomas Thurner

KiWi Software Package Released – Call for KiWi Snow Camp

The 14th of October 2010 was a very special date for the KiWi project: After more than two and a half years of development version 1.0 of the semantic collaborative knowledge management software was published. To celebrate that, the project organized a release party in the planetarium in Vienna, Austria. It was a fine evening that featured speeches of Ross Gardler (Vice President Community, Apache Software Foundation) and David Ayers (Free Software Foundation Europe), followed by a demonstration of KiWi by Sebastian Schaffert (KiWi Project Lead).

KiWi, the Open Source development platform for building Semantic Social Media Applications, offers features required for Social Media applications such as versioning, (semantic) tagging, rich text editing, easy linking, rating and commenting, as well as advanced “smart” services such as recommendations, rule-based reasoning, information extraction, intelligent search and querying, a sophisticated social reputation system, vocabulary management, and rich visualisation.

To make sure, that KiWi does not die, after the closure of the EC-funded periode, the project makes effort to form a community. The release party was thus also an opportunity to get in touch with the project team. Another opportunity to get in touch with the Software and it’s developers behind is in February next year. When KiWi Snow Camp will gonna be somewhere in the Salzburg mountains.

The KiWi projects sponsors ticktes to participate in the camp for all those

  • which have a good idea on how semantic technologies can make social media hit the target?
  • and are inspired by the possibilities of the KiWi platform?

Together with the KiWi Team participants will meet in February 2011 in Salzburg’s mountains to develop ideas, programm, discuss and develop amazing new pieces of code – and of course enjoy the skiing experience. Not to mention receive the glory of recognition from others in the open source communities and within the broader semantic web community.

How to get my trip to the KiWi Snow Camp?

You will need to register as a participant for the KiWi Developer Challenge. Please email kiwimail@kiwi-community.eu to register your intention to participate in the Challenge; if you are not already registered on KiWi Community site, please do so and include a brief biography.

Visit the KiWi Snow Camp page for more details…


Tassilo Pellegrini

Interview with Marco Neumann: “It’s definitely an exciting time to be on the Semantic Web!”

Marco Neumann is an Information Scientist and CEO of KONA a consulting and technology service company based in New York City. The Semantic Web activist is an invited expert to the W3C HTML 5 working group. He recently started a discussion on the challenges and difficulties in bringing the Semantic Web into business. SWC asked him for some additional comments.

Marco, you recently initiated a discussion in a Google Group on the difficulty to change Semantic Web standards. What was the background of the discussion? Where do you perceive a need for action?

It’s not so much about changing this existing standards but the challenge to bring them into the world of practitioners and standards developers. The language used in W3C recommendations quite frequently requires advanced topic knowledge and familiarity with the jargon of the discussion about the respective technologies. I recently discussed this with a senior standards maven at the W3C and got the answer that the recommendations can’t be changed retrospectively and that they are intended to be used primarily by vendors for implementation purposes.

Well this might be the case but I also got the impression that Tim Berners-Lee objective for the W3C is primarily to meet the needs of a larger community. And the W3C took this into account for most of the Semantic Web recommendations in the past. Something I still find amazing is the fact that the work process at the W3C is partially and the recommendations are entirely publicly accessible. Though we definitely still need more and better tools to work with semantic web data, higher quality documentation and last but not least more user adoption on the web.

Critics of the Semantic Web often refer to the slow uptake of Semantic Web standards by industry. Is standards adoption actually a valid and sufficient metric to evaluate the maturity of a standard? What would be needed to accelerate the uptake?

I think we might see a similar scenario to the uptake of HTML in the early 90s, a relatively small number of technology mavens will pave the way towards making the Semantic Web more attractive as a technology solution for a wide range of applications and will successfully publish open data before we see business application developers make use of Semantic Web standards.

The availability of trustable and quality approved RDF data is crucial for the success of the Semantic Web. Given the fact that the aggregation business on the WWW is highly concentrated the corresponding formula is simple: If Google just consumes but does not give back RDF the Semantic Web won’t scale. Do you agree?

Yes and no. Yes we need better and more semantic data on the Web, but we will also need better ways to deal with trust in a lightweight and web friendly fashion. I currently see a number of semi automated approaches emerging  that could scale on the web. An example are distributed user based recommendation systems to validate authenticity, open Wikipedia style community evaluation and content curation a la freebase. Increased public accountability for data producers might be an interesting venue as well. In regards to Google I’d say web search engines will go where the web goes. A problem I might see arising is that web search engines will initially develop their own standards to deal with the emerging Semantic Web and confuse users on the web or might pursue a time consuming power play with the W3C. I see a little bit of that in the current discussion in the HTML 5 working group.

As we know from social sciences technological standards are necessary but always incomplete and unsatisfactory. From a standards design and outreach perspective: What would it need to make the Semantic Web flourish?

I’m not sure if we really know all that much about the laws of innovation and the evolution of technology standards at this point. If we draw from the short experience with the World Wide Web I would come to the conclusion that innovation takes place in small to medium size teams that pursue an independent vision of how services should be delivered and how the technology should be designed. In addition Tim Berners-Lee’s encourages the production of lots and lots of data to bootstrap the Semantic Web and create a pull for services in the industry. And indeed we really see some traction for example with the Linked Open Data and Open Government initiatives. It’s definitely an exciting time to be on the Semantic Web!

About Marco Neumann

Marco Neumann is an Information Scientist and CEO of KONA a consulting and technology service company based in New York City. KONA provides semantic technologies to businesses solutions and adds value to products and services in a highly networked economy. In addition Marco currently acts as an Invited Expert to the W3C on the HTML 5 working group and is the director of the global semantic social network lotico.com.

Andreas Blumauer

TuQS QuadStore combines the best of two worlds

A new QuadStore which combines the best of two worlds (Lucene/Fulltext search engines & TripleStores/RDF/SPARQL) is out and can be evaluated online.

TuQs offers the following feature:

  • SAIL accessible
  • True QuadStore with GraphSupport
  • HighSpeed regex SPARQL filters
  • Userrights on TripleBasis
  • Extendable to a QuintStore (or more generally to an n-Store)
  • Cachable SPARQL Queries for further speed improvement
  • Clusterable
  • Federationable
  • FullTextSearchable

Some queries are really complex and high-speed, e.g.:

SELECT ?s ?o
WHERE {
?s <http://www.w3.org/2004/02/skos/core#definition> ?o .
?o <http://www.turnguard.com/tuqs/function#BooleanTerm> ‘Computer AND (java* OR HTML)’
}

The best starting point to find out, what´s the speciality of TuQS is here: Just click the sample queries on the right side and see how fast they perform even on very simple hardware.

Next steps: The developer of TuQS, Jürgen Jakobitsch (aka Turnguard), is currently working on SAIL inferencing.

Tassilo Pellegrini

Linking Open Data to Thesaurus Management

The Vienna-based company punkt. netServices is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a demo.

Purpose

Poolparty was conceived to facilitate various applications like

  • Semantic search engines
  • Recommender systems (similarity search)
  • Corporate bookmarking
  • Annotation- & tag recommender systems
  • Autocomplete services and facetted browsing.

These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.

Thesaurus Management

PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag & drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.

poolparty-blueskin

PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus’ domain in order to glean candidate terms. This is done by the key-phrase extractor of KEA. The extracted terms can be selected by the user, thereby becoming “free concepts” which later can be integrated into the thesaurus, turning them into “approved concepts”.

Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept’s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic search. Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.

PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: Cocktail thesaurusStandard Thesaurus for Economics)

PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or Zthes format. Those functionalities can also be consumed as stand-alone web services via PoolParty SKOS Services. Additionaly, lists of concepts and their labels can also be imported via CSV files.

Linked (Open) Data

PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.

Concepts in the thesaurus can be linked to e.g. DBpedia  via a service like Georgi Kobilarov‘s DBpedia lookup service, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.

poolparty-lod

Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.

PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the KiWi  project funded by the European Commission. KiWi also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.

Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.

This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using REST/HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.

Technology

The software is written in Java and utilizes the SAIL API, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on Yahoo User Interface (YUI). Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the KEA 5 API, which is extended for the use of controlled vocabularies stored in a SAIL Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in Lucene/Solr or any other (enterprise) search system along with extracted and semantically related concepts.

Reblog this post [with Zemanta]