Jana Herwig

Session 4: Using the Web of Data [WOD-PD]

This morning’s first session was dedicated to Using the Web of Data, or, as Alan Dix put it: “In the end, it’s not about data – it’s about use!” Alan and Richard Cyganiak were the keynoters for this session.

Alan Dix is a Professor at the Computing Department of Lancaster University, and author (with Janet Finlay, Gregory Abowd, and Russel Beale) of Human-Computer Interaction.

To start with, Alan pointed to the two sides of achieving the web of data: Firstly generating the web of data (a billion triples, as mighty as this may sound, is actually tiny, says Alan) and then, secondly, accessing the web of data.

Alan Dix giving a talk

With regard to generating the Web of Data, Alan distinguished between top down and bottom up approaches, counting to the former the creation of the web of data from legacy sources (i.e. where you take existing data and semantically lift them, e.g. from structured data) or web scraping such as DBpedia‘s extraction of data from Wikipedia.

N.B.: This notion of ‘top-down’ does not imply a hierarchical relationship, but rather means that there is already a plan for what is going to be put on the web of data (e.g. ‘all semi-structured information on Wikipedia’ or ‘dataset XY from project Z’). The bottom-up idea here implies that data is added as the result of an action, or interaction, as the user/s go, e.g. relationships are created as the user expands his or her social network. For instance on Amazon, user interaction is used to generate semantics: People do not tell Amazon what they like, they simply buy it.

Having relationships of course does not imply yet that these relationships are part of the Semantic Web. Or, as Alan put it, “why should I be RDFizing my online presence if none of my friends are?”

Please take a look at the PDF of the Alan’s slides (2,4 MB) – what I cannot reproduce here is a chart he developed, which was very useful for describing current scenarios on the web and which posed a twofold question:

Does a website/platform have the web of data implemented? YES/NO
Is the web of data on ta website/platform apparent to the user? YES/NO

The possible combinations (YES/YES, YES/NO, NO/YES, NO/NO) provide a good heuristic tool for describing what is currently available, with and without the Semantic Web. Take, for instance, the shiny interface of Talis’ Project Cenote: Cenote’s vision is to “make library data visible in many contexts, inside and outside of the library, making the data much more accessible and visible to a wider audience – benefiting current and potential users of library services wherever they are.” On Cenote, the user doesn’t see that it’s got the Web of Dat in it – it is actually implemented, but not in a way that is apparent to the user.

On the other end of the spectrum, you have a platform like Facebook: Alan referred to Facebook as “the user’s own web of data”, i.e. web of relationships: The user is aware of these relationships (they actually shape his interaction and communication with the site), and the (numerous!) apps on Facebook continually add relationships, but, regrettably, insulated from one another and not using RDF (and don’t you try to take data out of Facebook!).

Two examples of public data that Alan cited and that grow as people/institutions add data do them are Freebase (the “open database of the world’s information” – see previous posts on this blog about Freebase) and Swivel. Swivel allows people, institutions, anyone to upload and explore data, also featuring official data sources such as (links go to their Swivel pages): New York Federal Reserve Bank, UNESCO Institute for Statistics, DukeResearch or EUROSTAT. According to Alan, there is already more data on Swivel now than in the whole Linked Data cloud.

Alan also mentioned the Social Graph API – o yesterday evening Luca Hammer (one of the web 2.0 people who had joined the Open Hacking Session) introduced me to the WordPress Plugin “Meet your commenters” – Meet you commenters uses Social Graph to find social relations on the web, and adds these data to the commenter profiles it creates in WordPress.

Two Christmas crackersImage via WikipediaOn a different note: I took sometime today to explore Alan’s homepage and found the cute Christmas Cracker’s application which was first developed in 1999 and which is now also available on Facebook. As trivial as it may sound at first – sending virtual Christmas Crackers (with more than 5000 possible combinations!) is a good showcase for developing Human Interaction Scenarios, and a number of papers have been written about the application. Here is the casestudy which Alan recommends to begin with: Designing experience – virtual Christmas Crackers.

The abstract and a list of links to all websites and demos Alan discussed can be found here. Full reference: A. Dix and R. Cyganiak (2008). Using the Web of Data. Keynote at WOD-PD 2008 | Web of Data Practitioners Days, Vienna, Austria – Oct 22-23, 2008. http://www.hcibook.com/alan/papers/WOD-PD-2008/

Even if you have not met Richard Cyganiak in person, you have certainly come across one of his creations: The Linked Data Cloud. Richard is a research assistant at DERI Galway. In his demo, he gave us the opportunity to gain hands on experience, introducing a tool he dubbed Snorql, which is basically an easier to use version of a SPARQL-endpoint, as it already has the required prefixes ‘pre-installed’:

Using the Snorql interface, we could explore the dataset we had created collaboratively during Keith Alexander and Yves Raimond’s session. Writing SPARQL queries manually can be a challenge, but is next to impossible if you (like me) don’t know the syntax. But today we could just copy and paste all the queries from a website Richard had put up prior to his session – thanks a lot for the excellent preparation and demonstration!

Richard also showed a couple of RDF browsers in action, e.g. the Tabulator Plugin (“a Firefox extension which allows Firefox to handle data as well as documents”), or the Marbles Linked Data browser which is running right on beckr.org/marbles; enter, for instance http://api.talis.com/stores/wod-pd-sandbox/items/People/JanaHerwig (learn more about Marbles here).

Thank you, Alan and Richard – the combination of talk and demo was indeed a perfect intro towards using the Web of Data.

Reblog this post [with Zemanta]
Pascal Hitzler

ISWC is near

Reporting from Karlsruhe. Final preparations are under way for an exciting week full of Semantic Web.

  • Saturday, October 25th: ISWC08 Doctoral Consortium
  • Sunday, October 26th and Monday, October 27th: ISWC08 Tutorials and Workshops. I’m co-organising one of them, on Monday, called Nature Inspired Reasoning for the Semantic Web, which will feature two invited speakers from natural computing (Jürgen Branke, University of Karlsruhe) and cognitive science (Kai-Uwe Kühnberger, University of Osnabrück), with the goal to provide bridges between their areas of research and Semantic Web reasoning.
  • In parallel, on Sunday, October 26th and Monday, October 27th, there will be the co-located Fifth International Workshop on OWL – Experiences and Directions. Theorists and practitioners will gather to discuss the current state of OWL and future directions in making it an even more successful paradigm for the Semantic Web. If you’re at all interested in OWL, you don’t want to miss it.
  • Tuesday, October 28th to Thursday, October 30th are the main days of ISWC08 with an exciting technical program, featuring keynotes by Ramesh Jain, UCI, John Giannandrea, Metaweb Technologies Inc. and  Stefan Decker, DERI. Winners of the Semantic Web Challenge and the Billion Triple Challenge will be awarded on Wednesday afternoon. The complete programme booklet is available online.
  • Friday, October 31st and Saturday, November 1st, the Second International Conference on Web Reasoning and Rule Systems, RR2008, will take place at the same venue. One of the keynote talks, given by Michael Kifer, will be joint with RuleML-2008 in Orlando, Florida via teleconference!

So I expect to see you all soon in Karlsruhe. And in case you haven’t planned the trip yet – there’s still time enough to pack your suitcase and come join us for the events.

Jana Herwig

Bringing (Legacy) Data to the Web [WOD-PD]

The third session at WOD-PD was dedicated to “Bringing (Legacy) Data on the Web“, and led by Sören Auer (University of Leipzig, Germany) and Orri Erling (OpenLink Software) .

Sören Auer giving a talkSören Auer described the difference between the Web 1.0, 2.0 and 3.0 as follows: On the Web 1.0, you had many websites that provided unstructured, mainly textual content. On the Web 2.0, you have a few large websites that are specialised on specific content types. And, finally, on the Web 3.0, there are many websites which contain, and are able to semantically syndicate, arbitrarily structured content.

So why would we need another web? What you cannot do with the current web is finding answers to seemingly complex, yet in reality pretty mundane question such as: Where in Leipzig do I find an apartment that is close to bilingual, German-French child care facilities? Are there any ERP service providers which have offices in Vienna and Berlin? Who are the researchers in South-East Asia currently working on database related topics?

Sören further discussed three of the present means of bringing relation data to the web: Triplify (a web application plugin that exposes data from relational databases in RDF), D2RQ (a declarative language to describe mappings between relational database schemata and OWL/RDFS ontologies, developed at Free University Berlin), and Virtuoso Universal Server (a middleware and database engine hybrid delivering for instance data integration for SQL, RDF, XML, Web Services). With respect to Triplify, Sören – who is Triplify’s founder and main developer at AKSW Uni Leipzig – showed and discussed the configuration for WordPress 2.1., which can be found here (click here for more configurations, e.g. for Joomla, OpenConf and Drupal). The next aim for Triplify is to become an integral part in enduser web app distibutions.

And important question raised by Sören was: How do next generation search engines know that something has changed on the web of data? He suggested three approaches:

  1. Always try to crawl everything (this may sound silly – but that’s actually what is happening on the current web)
  2. Ping a central update notification service – e.g. PingTheSemanticWeb.com – which works as a showcase, but will probably not scale if the data web gets really deployed.
  3. Each linked data endpoint publishes an update log – e.g. with Triplify, as a special folder inside the Triplify namespace, e.g. http://example.com/Triplify/update

Also discussed by Sören and worth checking out is Reuters’ Semantic proxy – the demo went live in late September.

Orri Erling, as the lead developer of the Virtuoso Team, addressed the issue of mapping relational databases to RDF with OpenLink Virtuoso. In his talk, he addressed the pros and cons of RDF data warehouse:

Pros

  • Even query performance across all data
  • Possibility of forward-chaining inference
  • Some SPARQL features may be better supported, e.g. Unspecified predicates

Cons

  • Keeping data up-to-date
  • Complex set up, needs dedicated servers: you don’t build them on a whim

Orri Erling giving a talkWhat Virtuoso delivers is mapping of SPARQL to SQL against any existing schema (whether stored in Virtuoso or elsewhere); a physical quad-store (quad as in quadruple; not as in quad-bike :) ; and Federated/local Relational Data Base Management Systems (RDBMS).

A more detailed discussion of the requirements for Relational-to-RDF Mapping is available on Orri’s blog, where he discusses it in the light of his own experience. A power point presentation of a previous talk he gave to the W3C RDB2RDF Incubator Group can be downloaded here: Mapping Relational Databases to RDF with OpenLink Virtuoso (PPT, 115KB). His summary of the group discussions around the same topic, Requirements for Relational to RDF Mapping, can be found here.

Orri also showed the Virtuoso billion triples demo which, according to the corresponding blogpost, “is being worked on at the time of submission and may be shown online by appointment.” The demo was a submission to the Billion Triples Challenge.

Reblog this post [with Zemanta]
Jana Herwig

Semantic Desktop, Lifting and Human Language Technology [WOD-PD]

The next session at WOD-PD was given by Leo Sauermann (German Research Center for Artificial Intelligence DFKI, Germany), and Brian Davis (DERI Galway, Ireland). Leo introduced the idea of the Semantic Desktop, and more specifically, the Nepomuk Social Semantic Desktop. There’s good article about Nepomuk on Linux.com, written by Bruce Byfield on August 26, 2008, from which I quote the following, enlightening passages:

Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk’s coordinator, explains, “The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate.” [...] “The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information.”

At a high level of generalization, Nepomuk has three main aspects, according to Bernardi. First, there is a standard framework for annotating pieces of information so that connections can be made between them. Second, there are ontologies, the sets of “documented shared understanding” or common concepts that can be defined for particular types of information, such as bio-science or computer desktop use. Finally, there are the tools for making or using the annotations and ontologies, what Bernardi calls the “workspaces that connect to other workspaces and help you in your day to day activities of collecting information, structuring it, making sense of it, and creating new information and communicating it.”

Leo has provided the relevant download links for those who “want to get their hands dirty” with Nepomuk (as he put it) on his blog. Leo Sauermann and Ansgar Bernardi also contributed an article about the Semantic Desktop to the recently published Social Semantic Web volume – a preview of the article is available here (in German – I’m sorry!).

Brian Davis‘ part of the talk focused on Lifting and Human Language Technology (HLT) for the Semantic Desktop – Semantic Lifting means to capture semantics and translate them into ontologies. Human language technology (HLT), in its broadest sense, can be described as computational methods for processing and manipulating language (for instance text analysis).

One of the goals of the Semantic Desktop is speech act detection for email – speech act here as defined by John Searle. At its most basic definition, a speech act is simply an utterance, but is also often understood more specifically as an illocutionary act (which is a term introduced by John L. Austin in How to do things with words), or a ‘performative utterance’, meaning that by saying something, one actually does something. For instance, the sentence “Please have the document ready for Workshop 1.” contains an instruction: It informs the reader about the requirements for a particular event, and asks him or her to meet these requirements.

Brian also introduced Roundtrip Ontology Authoring (ROA), which is a process that allows non-expert users to author or amend an ontology by using simple, easy to learn, controlled natural language. The process is a combination of Controlled Language for Information Extraction (CLIE) and Text Generation which is developed on top of GATE. ROA is documented on the the Nepomuk website; for further information about CLIE, read this article by Valentin Tablan, Tamara Polajnar, Hamish Cunningham and Kalina Bontcheva: User-friendly ontology authoring using a controlled language (PDF, 64 KB).

Reblog this post [with Zemanta]