Jana Herwig

Just released: UMBEL – A New Vocabulary for the Semantic Web

UMBELNews has reached me this morning that UMBEL has now been publicly released! UMBEL is a new vocabulary for the Semantic Web – I first learned about it when Andreas Blumauer returned from LinkedData Planet where he had met up with Mike Bergman from Zitgist LLC who are working on UMBEL.

Here is the release announcement Mike communicated via email yesterday:

UMBEL (Upper Mapping and Binding Exchange Layer) [1] is a lightweight ontology for relating Web content and data to a standard set of 20,000 subject concepts. Based on OpenCyc [2], these subject concepts have defined relationships between them, and can act as semantic binding nodes for any data or Web content. A further 1.5 million named entities have been extracted from Wikipedia and mapped to the UMBEL reference structure with cross-links to YAGO [3] and DBpedia [4]. The system can easily be extended with additional dictionaries of named entities, including ones specific to enterprises or domains.

UMBEL is provided as open source under the Creative Commons 3.0 Attribution-Share Alike license. The complete ontology with all subject concepts, definitions, terms and relationships can be freely downloaded [see 5]. All subject concepts and named entities are available as Linked Data [see 5]. Five volumes of documentation [5] are also available.

The release is accompanied by about a dozen Web services [6] for using or manipulating UMBEL, along with a new introductory slide show [7]. Additional release information may be found on Fred’s [8] or my [9] separate blog postings. We welcome those with interest or suggestions for improvements to do so through the UMBEL discussion forum [10]. We will shortly be putting easier services online for such input.

So, enjoy! We look forward to your commentary, suggestions and putting UMBEL under production-grade stress. We know will be doing the same!

Regards, Mike

Great release! They have also given us access to a media-oriented article which you can read on our portal.

Jana Herwig

SWC’s Matthias Samwald contributes to W3C notes

Early June saw the release of two notes drafted by the Semantic Web Health Care and Life Sciences (HCLS) Interest Group within the W3C. One of the contributors, and editor of one note, is Matthias Samwald, a project coordinator at SWC, who is a member of this SIG and who has worked on several Semantic Web projects for the Yale Center for Medical Informatics (USA), Science Commons (USA) and DERI Galway (Ireland).

A Prototype Knowledge Base for the Life Sciences
W3C Interest Group Note 4 June 2008
Editors: M. Scott Marshall, Eric Prud’hommeaux
Contributors: Alan Ruttenberg, Jonathan Rees, Susie Stephens, Matthias Samwald, Kei-Hoi Cheung
Abstract: The prototype we describe is a biomedical knowledge base, constructed for a demonstration at Banff WWW2007 , that integrates 15 distinct data sources using currently available Semantic Web technologies such as the W3C standard Web Ontology Language [RDF]. This report outlines which resources were integrated, how the knowledge base was constructed using free and open source triple store technology, how it can be queried using the W3C Recommended RDF query language SPARQL [SPARQL], and what resources and inferences are involved in answering complex queries. While the utility of the knowledge base is illustrated by identifying a set of genes involved in Alzheimer’s Disease, the approach described here can be applied to any use case that integrates data from multiple domains.

Experiences with the conversion of SenseLab databases to RDF/OWL
W3C Interest Group Note 4 June 2008
Editors: Matthias Samwald, Kei-Hoi Cheung
Contributors: Alan Ruttenberg, Huajun Chen
Abstract: One of the challenges facing Semantic Web for Health Care and Life Sciences is that of converting relational databases into Semantic Web format. The issues and the steps involved in such a conversion have not been well documented. To this end, we have created this document to describe the process of converting SenseLab databases into OWL. SenseLab is a collection of relational (Oracle) databases for neuroscientific research. The conversion of these databases into RDF/OWL format is an important step towards realizing the benefits of Semantic Web in integrative neuroscience research. This document describes how we represented some of the SenseLab databases in Resource Description Framework (RDF) and Web Ontology Language (OWL), and discusses the advantages and disadvantages of these representations. Our OWL representation is based on the reuse and extension of existing standard OWL ontologies developed in the biomedical ontology communities. The purpose of this document is to share our implementation experience with the community.

Zemanta Pixie
Jana Herwig

Combining Closed and Open Data Classification Mechanisms in an Extended Thesaurus

Rolf SintIn the next session, Rolf Sint gave us insights into his approach to the combination of closed and open data classification mechanisms, which is informed by his findings in his master’s thesis. The probably most widely used retrieval method for digital content is full-text search; Google and Yahoo’s indexing methods, for instance, rely on full-text search. To be able to use this method, words must be contained within the content, leading to obvious problems with synonyms, ambiguities or the different lexical inventory of different languages. Advantages are that full-text search is easy to use, and that no maintenance is required as this responsibility rests with the content providers.

On the other end of the spectrum, within open data classification mechanisms, we have social tagging. Tagging (in general) means that a user asigns labels to content items. The advantage here is that content is immediately classified; as such, tagging is an easy way to provide metadata for content, in particular as the user does not to have think about (arbitrary, system-dictated) structures. However, this leads to problems if singulars and plurals are used simultaneously, if synonyms are used, spelling mistakes occur etc etc. With tags, the exact same spelling has to be used if items are to be assigned to the same group. But if done collectively (and that is what social tagging is about), the wisdom of crowds can improve the signal to noise ratio significantly – see the miracle of the tag cloud.

What Rolf proposed in his thesis was to combine the two approaches. In his design, he used an extended thesaurus as an instrument to achieve vocabulary control – we’re looking at an extended thesaurus here, because it’s not simply built around a taxonomy, but expanded by tags that were assigned by users and integrated using a vocabulary management tool.
Extended Theasurus

This extended thesaurus can be applied in multiple ways. Continue reading

Jana Herwig

Usage Data Model Day in the KiWi Project

Physical Tagging in a TreeYesterday we dealt with reports, user interaction and interface questions, today is usage data model day (or morning) in the KiWi – Knowledge in a Wiki - Project. Usage data model means that it is concerned with an abstract conceptualization of the data as perceived by the user (and not by the developer/implementer) – at the same time, it is not immmediately concerned with the visualization of data on screen. François Bry gave us an overview of the proposed core concepts and objects which are currently: content item, tag (and tagging), link, rule, user, and access right.

There is no need for me to repeat his full presentation, as François had already in advance made his presentation available on the KiWi-project wiki. Nonetheless, I’d like to highlight a few aspects:

A content item is to be understood as a slight generalisation of a wiki page: Every wiki page is a content item, but not every content item is a wikipage, and content items that are no wiki pages are part of a wiki page. This could include, for instance, media content such as pictures, diagrams or tables. This modularization (content items within pages) meets the demands of the proposal that Kiwi-pages must be composable.

Consequently, not only wiki pages but content items too must be taggable (which takes us to: tagging). Furthermore, it was proposed to make a distinction between atomic tags (short; consisting of a tag name and an associated content item instead of a description) and structured tags (that are made up of atomic tags), as well as between explicit tags (that are applied by users) and implicit tags (that are generated on the basis of rules that have been defined by users).

To illustrate this distinction, I’ll paste in a few illustrating explanations from François’ wiki report:

The tags assigned to the content item of an atomic tag T can be seen as tags assigned to the atomic tag T itself. Tagging of tags in this way can serve, for example, to distinguish between the atomic “hotel” in English and the same atomic tag “hotel” in French or to group or classify tags. [...] A structured tag is build up from atomic tags. [...] Examples of structured tags are as follows:

hotel(3stars downtown)
hotel(location(downtwon))
hotel(comfortable)

A heated debated ensued (which I quite like, because that is the point where our own, yet unchallenged assumptions are exposed), in particular with regard to the implementation of structured tags: Wouldn’t that mean to raise the cognitive barrier too high if users were required to enter complicated tags?

Much was clarified with the agreement that users may use structured tags, but that this wouldn’t be a requirement. Using complex tags (e.g. a structured tag that includes dates or deadlines) might make sense to a particular set of users (e.g. project managers in the Logica use case) – and whether a software feature is going to be used (successfully) or not is primarily depending upon the question whether the user sees a benefit in it or not. Also: The concept of structured tags within the data model does not yet say anything about the way they will be represented on screen – in most cases, users won’t see a hotel(location(downtwon)) spelled out.

On to the coffee break!

[Image: Physical tagging on a tree, by Jean Etienne Poirrier]

Zemanta Pixie