Thomas Thurner

Cultural heritage and the Semantic Web

datacloudThe semantic web is suffering of data. Still. To get the network effects we expect to have with the use of the semantic web, there is still the need to open quality content to the semantic web world. One of the fields where such an opening to the RDF-world should happen, is cultural heritage. As works, people, history and references are distributed over various places, archives, libraries and holders of data, a semantic web approach seems to be perfect to resolve a lot of questions in making the world cultural heritage available.

Europeana is such a promising project. Europeana is funded by the European Commission under the eContent+ programme, as part of the i2010 policy. It is a partnership of 100 representatives of heritage and knowledge organisations and IT experts from throughout Europe. In the last two years Europeana’s prototype was done technically and in terms of connecting contents from various European museums, governmental organisations and art foundations. At Europeana two million books, maps, recordings, photographs, archival documents, and paintings can be found. This figure should be raised – with financial support of the European comission – up to 10 million entries until 2010. An effort which will take approximately 350 million euro.

Under the lead of Stefan Gradmann (University of Hamburg) semantic technologies within the framework and also to the outside semantic web are implemented. Even the now running beta version of Europeana focuses on traditional browsing and search algorithms, an additional semantic europeana prototype gives some insights into further developments of Europeana to a well intergrated semantic web service. So, hopefully we can expect a connection of big content networks to the LOD-cloud soon.

Projects like Europeana will go its way to a rich web of data. Hopefully this is not only a development which public institutions follow. Also commercial initiatives dealing with cultural heritage – say Google – should consider a connection of their harvested data into a bigger semantic web.

Reblog this post [with Zemanta]
Jana Herwig

Linked Data @ TRIPLE-I: Measuring the size of a fact, not of a fiction

The TRIPLE-I 2008 conference ended three days ago, yet there are a couple of loose ends I’d still like to tie up. First of all: Linked Data. Tom Heath was invited to give a keynote on “Humans and the Web of Data” – there are a variety of roles in which people may come across Tom and his LOD related work:

He administrates the site LinkedData.org (on behalf of the Linked Data community), he is the creator of Revyu.com (“Review anything!”), which won him the 1st prize in the Semantic Web Challenge 2007, he was a co-organizer of the Linked Data on the Web Workshop at this year’s World Wide Web conference in Beijing, and he was an interviewee in my 12 seconds definitions mission @ TRIPLE-I – see his micro definition of Linked Data in the vid below. (To learn more about Tom and the different roles he fulfils, look here).


Tom Heath explains Linked Data TRIPLE-I 2008 on 12seconds.tv

His keynote was not so much an introduction to Linked Data (I should expect that a conference like TRIPLE-I/I-Semantics would typically attract people who at least have an idea of what Linked Data is about), but rather a confirmation that the Web of Data is no longer a fiction, but a fact. One of the often cited proofs is the growth of the LOD dataset cloud over the last year, as shown in the image below (clicky for biggy, visualization created by Richard Cyganiak).

At the same time – and this was accordingly acknowledged by a later presentation given by Wolfgang Halb which had been prepared collaboratively by Tom, Wolfgang, Michael Hausenblas and Yves Raimond – it’s not just the sheer number of triples on the web that counts. Over the course of one year, the efforts of the Linked Data community (who seek to populate the web with open data, data in RDF) generated 4 billion triples – but only 3 million interlinks.

Their paper was an attempt to measure the size of the Semantic Web based on interlinks. A brief excerpt from the conclusion:

We have identified two different types of datasets, namely single- point-of-access datasets (such as DBpedia), and distributed datasets (e.g. the FOAF-o-sphere). At least for the single-point-of-access datasets it seems that automatic interlinking yields a high number of semantic links, however of rather shallow quality. Our finding was that not only the number of triples is relevant, but also how the datasets both internally and externally are interlinked. Based on this observation we will further research into other types of Semantic Web data and propose a metric for gauging it, based on the quality and quantity of the semantic links. We expect similar mechanisms (for example regarding automatic interlinking) to take place on the Semantic Web.

Another point raised by Tom in his key note was the issue of trust: According to his research, there are five parameters that have an influence on whether we trust a source or recommendation on the web or not: experience , expertise, impartiality (we don’t trust a travel agent, because we can’t help but believe that she is mainly going to recommend the offer of her ‘favourite’ clients), affinity, and track record, with experience, expertise and affinity being the most important ones. A semantic people search engine Tom presented, Hoonoh.com (currently in alpha), thus allows to weight search results according to these three criteria.

Tom’s concluding statement emphasized that Linking Data makes sense not for the sake of it, but for the sake of being at the service of humans: “A web of machine-readable data is even more interesting from a human than from a machine perspective,” for instance in search engines like Hoonoh.com

Reblog this post [with Zemanta]
Jana Herwig

A good data browser allows you to navigate the knowledge space by car

Or so I would like to paraphrase David Huynh’s words that I read today on the W3C’s Semantic Web mailing list, where he wrote in response to Michiel Hildebrand:

lange carIt’s very perceptive of you to ask about the tasks that Parallax is presumed to address, and who the users are. I don’t have a specific answer beside “browsing graph of data more efficiently”.

I tend to think that contemporary graph-based data browsers either fly the user at 50,000 feet and show her the whole world in one window below (render a huge data graph as a huge visual graph), or leave her at the street level to wander around on foot (single resource view). I’m just wishing to provide her a car. Perhaps the good thing is that the car doesn’t come with a destination built in. (It’d be quite bad in real life if you need different cars to go grocery shopping and to go to work, for example.)

I quite like this metaphor he uses to describe the motivation behind Parallax, the UI prototype David designed as a novel way to browse Freebase data. It also ties in nicely with a wish made by Richard Cyganiak in an interview with him we published yesterday:

On the top of my wish list would be a really good data browser. The current crop of data browsers for RDF, such as Tabulator, Disco and the OpenLink browser, are still very basic and geeky. I hope for some sort of “Excel for Web data”, an application that allows me to browse through different datasets, find the bits that are relevant to my problem, and lets me slice and dice and correlate the data in different ways. I think such an app would be key to the kind of serendipitous reuse I mentioned earlier.

In the mailing list post cited above, David pointed to the Spellbound blog where Jeanne Kramer-Smyth published a showcase of faceted browsing across Olympics games facts using Freebase Parallax and suggested that Parallax would be particularly useful for exploring connected information:

Now take this idea to the world of archives and libraries, OPACs and finding aids and imagine the sorts of questions you can start asking. Yes – it does depend on the data being connected, but that is happening more and more all the time. The promise of the semantic web is structured data everywhere we turn.

Image bei Wiki Commons

Reblog this post [with Zemanta]
Jana Herwig

SWC’s Matthias Samwald contributes to W3C notes

Early June saw the release of two notes drafted by the Semantic Web Health Care and Life Sciences (HCLS) Interest Group within the W3C. One of the contributors, and editor of one note, is Matthias Samwald, a project coordinator at SWC, who is a member of this SIG and who has worked on several Semantic Web projects for the Yale Center for Medical Informatics (USA), Science Commons (USA) and DERI Galway (Ireland).

A Prototype Knowledge Base for the Life Sciences
W3C Interest Group Note 4 June 2008
Editors: M. Scott Marshall, Eric Prud’hommeaux
Contributors: Alan Ruttenberg, Jonathan Rees, Susie Stephens, Matthias Samwald, Kei-Hoi Cheung
Abstract: The prototype we describe is a biomedical knowledge base, constructed for a demonstration at Banff WWW2007 , that integrates 15 distinct data sources using currently available Semantic Web technologies such as the W3C standard Web Ontology Language [RDF]. This report outlines which resources were integrated, how the knowledge base was constructed using free and open source triple store technology, how it can be queried using the W3C Recommended RDF query language SPARQL [SPARQL], and what resources and inferences are involved in answering complex queries. While the utility of the knowledge base is illustrated by identifying a set of genes involved in Alzheimer’s Disease, the approach described here can be applied to any use case that integrates data from multiple domains.

Experiences with the conversion of SenseLab databases to RDF/OWL
W3C Interest Group Note 4 June 2008
Editors: Matthias Samwald, Kei-Hoi Cheung
Contributors: Alan Ruttenberg, Huajun Chen
Abstract: One of the challenges facing Semantic Web for Health Care and Life Sciences is that of converting relational databases into Semantic Web format. The issues and the steps involved in such a conversion have not been well documented. To this end, we have created this document to describe the process of converting SenseLab databases into OWL. SenseLab is a collection of relational (Oracle) databases for neuroscientific research. The conversion of these databases into RDF/OWL format is an important step towards realizing the benefits of Semantic Web in integrative neuroscience research. This document describes how we represented some of the SenseLab databases in Resource Description Framework (RDF) and Web Ontology Language (OWL), and discusses the advantages and disadvantages of these representations. Our OWL representation is based on the reuse and extension of existing standard OWL ontologies developed in the biomedical ontology communities. The purpose of this document is to share our implementation experience with the community.

Zemanta Pixie