Jana Herwig

The Day after Freebase went RDF

So what’s been happening on the blogosphere after John Giannandrea’s keynote at ISWC and the revelation that Freebase now produces Linked Data from an RDF service

Tetherless World sums up the Freebase facts (e.g. 156,000,000 assertions made; 1370 published types; 75 domains; graph model, identity, web based) and further points out that ontology creation “is a social process, and both freebase and semantic wiki are tools that enable users to create ontological vocabulary without worrying too much on building a comprehensive ontology.”

Inkdroid notes that the RDF service release “is important news because Freebase is an active community of content creators, creating rich data-centric descriptions with a wiki style interface, fancy data loaders, and useful machine APIs.” This is followed up by a quick and handy tutorial how you can get machine readable data back from freebase using a URI with Freebase. Conclusion:

So why is this important? Because following your nose in HTML is what enabled companies like Lycos, AltaVista, Yahoo and Google to be born. It allowed for agents to be able to crawl the web of documents and build indexes of the data to allow people to find what they want (hopefully). Being able to link data in this way allows us to harvest data assets across organizational boundaries and merge them together. It’s early days still, but seeing an organization like Freebase get it is pretty exciting.

Yves Raimond was the first to wonder on the public W3C LOD mailinglist: “now, to see whether it links to other datasets :-) ” – the idea of having linked data without the linkage would indeed seem like love’s labour lost. Semantic Focus / James Simmons seconds: “One downside is the data doesn’t appear to link to external resources, in a sense walling itself in. It should be trivial to link the topics that came from Wikipedia back to Wikipedia as well as DBpedia (which would be killer, by the way).” This is followed up a later post, where James expresses concerns regarding the relationship DBpedia / Freebase: “Freebase may see a drop in userbase growth and participation if it becomes a mirror of DBpedia (or vice-versa) and the popularity once garnered by one project may shift towards the other, or away entirely.”

More News / Andrew Newman puts the Freebase RDF service release in context with Cathrin Weiss’ “250 million triples on your iphone” submission, iMoCo, to the Billion triples challenges, also DBpedia and Semaplorer, developed at the University of Koblenz:

DBPedia stood out because it was the only one that allowed you to write data to the Semantic Web rather than just read the carefully prepared triples. For a similar reason I though SemaPlorer was good because they tried to do more than just the standard triples but went that extra bit further by making it more generic like integrating flickr. But they were all excellent, all of them showing what you get with a billion or more triples and inferencing.

That combined with the guys at Freebase making all of their data available as RDF and it was a big day for the Semantic Web.

ARQtick / AndyS plays a bit with the Blade Runner example cited by Freebase, e.g. takes a look at the graph, looks for interesting properties and extracts author names

N.B. If you want to follow ARQtick’s example: use the Linked Data browser plugin Tabulator or go to the Marbles site to view the RDF – without a data browser you’ll be redirected to the HTML page. You will also need it to make sense of rdf.freebase.com.

Jana Herwig

Session 4: Using the Web of Data [WOD-PD]

This morning’s first session was dedicated to Using the Web of Data, or, as Alan Dix put it: “In the end, it’s not about data – it’s about use!” Alan and Richard Cyganiak were the keynoters for this session.

Alan Dix is a Professor at the Computing Department of Lancaster University, and author (with Janet Finlay, Gregory Abowd, and Russel Beale) of Human-Computer Interaction.

To start with, Alan pointed to the two sides of achieving the web of data: Firstly generating the web of data (a billion triples, as mighty as this may sound, is actually tiny, says Alan) and then, secondly, accessing the web of data.

Alan Dix giving a talk

With regard to generating the Web of Data, Alan distinguished between top down and bottom up approaches, counting to the former the creation of the web of data from legacy sources (i.e. where you take existing data and semantically lift them, e.g. from structured data) or web scraping such as DBpedia‘s extraction of data from Wikipedia.

N.B.: This notion of ‘top-down’ does not imply a hierarchical relationship, but rather means that there is already a plan for what is going to be put on the web of data (e.g. ‘all semi-structured information on Wikipedia’ or ‘dataset XY from project Z’). The bottom-up idea here implies that data is added as the result of an action, or interaction, as the user/s go, e.g. relationships are created as the user expands his or her social network. For instance on Amazon, user interaction is used to generate semantics: People do not tell Amazon what they like, they simply buy it.

Having relationships of course does not imply yet that these relationships are part of the Semantic Web. Or, as Alan put it, “why should I be RDFizing my online presence if none of my friends are?”

Please take a look at the PDF of the Alan’s slides (2,4 MB) – what I cannot reproduce here is a chart he developed, which was very useful for describing current scenarios on the web and which posed a twofold question:

Does a website/platform have the web of data implemented? YES/NO
Is the web of data on ta website/platform apparent to the user? YES/NO

The possible combinations (YES/YES, YES/NO, NO/YES, NO/NO) provide a good heuristic tool for describing what is currently available, with and without the Semantic Web. Take, for instance, the shiny interface of Talis’ Project Cenote: Cenote’s vision is to “make library data visible in many contexts, inside and outside of the library, making the data much more accessible and visible to a wider audience – benefiting current and potential users of library services wherever they are.” On Cenote, the user doesn’t see that it’s got the Web of Dat in it – it is actually implemented, but not in a way that is apparent to the user.

On the other end of the spectrum, you have a platform like Facebook: Alan referred to Facebook as “the user’s own web of data”, i.e. web of relationships: The user is aware of these relationships (they actually shape his interaction and communication with the site), and the (numerous!) apps on Facebook continually add relationships, but, regrettably, insulated from one another and not using RDF (and don’t you try to take data out of Facebook!).

Two examples of public data that Alan cited and that grow as people/institutions add data do them are Freebase (the “open database of the world’s information” – see previous posts on this blog about Freebase) and Swivel. Swivel allows people, institutions, anyone to upload and explore data, also featuring official data sources such as (links go to their Swivel pages): New York Federal Reserve Bank, UNESCO Institute for Statistics, DukeResearch or EUROSTAT. According to Alan, there is already more data on Swivel now than in the whole Linked Data cloud.

Alan also mentioned the Social Graph API – o yesterday evening Luca Hammer (one of the web 2.0 people who had joined the Open Hacking Session) introduced me to the WordPress Plugin “Meet your commenters” – Meet you commenters uses Social Graph to find social relations on the web, and adds these data to the commenter profiles it creates in WordPress.

Two Christmas crackersImage via WikipediaOn a different note: I took sometime today to explore Alan’s homepage and found the cute Christmas Cracker’s application which was first developed in 1999 and which is now also available on Facebook. As trivial as it may sound at first – sending virtual Christmas Crackers (with more than 5000 possible combinations!) is a good showcase for developing Human Interaction Scenarios, and a number of papers have been written about the application. Here is the casestudy which Alan recommends to begin with: Designing experience – virtual Christmas Crackers.

The abstract and a list of links to all websites and demos Alan discussed can be found here. Full reference: A. Dix and R. Cyganiak (2008). Using the Web of Data. Keynote at WOD-PD 2008 | Web of Data Practitioners Days, Vienna, Austria – Oct 22-23, 2008. http://www.hcibook.com/alan/papers/WOD-PD-2008/

Even if you have not met Richard Cyganiak in person, you have certainly come across one of his creations: The Linked Data Cloud. Richard is a research assistant at DERI Galway. In his demo, he gave us the opportunity to gain hands on experience, introducing a tool he dubbed Snorql, which is basically an easier to use version of a SPARQL-endpoint, as it already has the required prefixes ‘pre-installed’:

Using the Snorql interface, we could explore the dataset we had created collaboratively during Keith Alexander and Yves Raimond’s session. Writing SPARQL queries manually can be a challenge, but is next to impossible if you (like me) don’t know the syntax. But today we could just copy and paste all the queries from a website Richard had put up prior to his session – thanks a lot for the excellent preparation and demonstration!

Richard also showed a couple of RDF browsers in action, e.g. the Tabulator Plugin (“a Firefox extension which allows Firefox to handle data as well as documents”), or the Marbles Linked Data browser which is running right on beckr.org/marbles; enter, for instance http://api.talis.com/stores/wod-pd-sandbox/items/People/JanaHerwig (learn more about Marbles here).

Thank you, Alan and Richard – the combination of talk and demo was indeed a perfect intro towards using the Web of Data.

Reblog this post [with Zemanta]
Andreas Blumauer

Why mockups are essential for designing semantic applications

Applications based on semantic technologies offer new ways to discover, browse and explore information – this is an established fact in the SemWeb community. But how can we (as semantic web “insiders”) communicate these potential benefits to a typical end-user who has never heard about “faceted search” before – which doesn’t mean that he or she wouldn’t love intelligent user interfaces if they were in place?

One answer lies in using mockups, which are, on the one hand, an indispensable instrument for prototyping user interfaces, but also valuable when it comes to explaining the workings of an application to an end-user, an audience of interested researchers or a client.

And when it comes to explaining a search engine or search widget, mockups are even more important, as we all and in particular end-users are often unable to think of search interfaces other than in terms of Google.

We have become so googlified that hardly anyone can think of different ways of searching for information than Google has offered for many years now: Put a couple of words in a text box, click a button and scroll through a list of titles and summaries. Repeat until you’re done, or try a new search and repeat. Wow!

Although even Google has started recently to implement a little bit of semantics by offering an auto-complete functionality on google.com (on some local versions like Google Austria this feature is still not available), even the most basic concepts for an intelligent search interface are still not part of common sense thinking.

Admittedly, there are people who get irritated instantly by complex user interfaces like David Huynh´s Freebase Parallax. “This is only for experts!” is their response. But in a corporate setting, complex queries are part of our daily business – they are just not supported by common search engines (only exception being data mining solutions). But that doesn’t mean that we don’t need it.

Where is the way out of this dilemma?

  • Don’t tell, but SHOW the end-users how semantic technologies can enhance search & browse experiences
  • Do not use terms like SPARQL or RDF
  • Create a simple mockup that illustrates the points you want to make
  • You’re not a designer? Use tools like Balsamiq – Try it now!

Here is an example for a mockup of a semantically enhanced expert finder:

These kind of mockups are essential for any requirements engineering phase in any project where search is a bit more than a text box, a button and a bunch of documents.

Reblog this post [with Zemanta]
Jana Herwig

A good data browser allows you to navigate the knowledge space by car

Or so I would like to paraphrase David Huynh’s words that I read today on the W3C’s Semantic Web mailing list, where he wrote in response to Michiel Hildebrand:

lange carIt’s very perceptive of you to ask about the tasks that Parallax is presumed to address, and who the users are. I don’t have a specific answer beside “browsing graph of data more efficiently”.

I tend to think that contemporary graph-based data browsers either fly the user at 50,000 feet and show her the whole world in one window below (render a huge data graph as a huge visual graph), or leave her at the street level to wander around on foot (single resource view). I’m just wishing to provide her a car. Perhaps the good thing is that the car doesn’t come with a destination built in. (It’d be quite bad in real life if you need different cars to go grocery shopping and to go to work, for example.)

I quite like this metaphor he uses to describe the motivation behind Parallax, the UI prototype David designed as a novel way to browse Freebase data. It also ties in nicely with a wish made by Richard Cyganiak in an interview with him we published yesterday:

On the top of my wish list would be a really good data browser. The current crop of data browsers for RDF, such as Tabulator, Disco and the OpenLink browser, are still very basic and geeky. I hope for some sort of “Excel for Web data”, an application that allows me to browse through different datasets, find the bits that are relevant to my problem, and lets me slice and dice and correlate the data in different ways. I think such an app would be key to the kind of serendipitous reuse I mentioned earlier.

In the mailing list post cited above, David pointed to the Spellbound blog where Jeanne Kramer-Smyth published a showcase of faceted browsing across Olympics games facts using Freebase Parallax and suggested that Parallax would be particularly useful for exploring connected information:

Now take this idea to the world of archives and libraries, OPACs and finding aids and imagine the sorts of questions you can start asking. Yes – it does depend on the data being connected, but that is happening more and more all the time. The promise of the semantic web is structured data everywhere we turn.

Image bei Wiki Commons

Reblog this post [with Zemanta]