Andreas Blumauer

Linked Data in Enterprises – some ideas for business models

Today in the morning, I wrote a short blog philosophizing about linked data and the value for enterprises. I asked a couple of questions and in its core I was wondering: “Which services and keyplayers will drive the web of data in the next few months?”

In the meantime I had the pleasure to listen to Talis´ Semantic Web Gang Podcast (January 2009 with Tom Tague from Calais) and some answers came into my mind:

  1. Some service providers will provide the highest accuracy regarding the links or tags (and the “things behind them) they provide for a given ressource or document (like Open Calais does). Tom Tague mentioned in the podcast quite often how important disambiguation is to provide the highest quality.
  2. Some will provide end-points to a given “thing” like a company, a person etc. in addition to free ones like DBpedia, but they always will try to refer to established URIs like the ones in DBpedia or Open Calais URIs, e.g. IBM´s URI @ Calais). Those companies will provide more facts, for example about a person, as those which are available now for free. They will build on the LOD infrastructure and will live in symbiosis with group number 3. They will control to whom additional facts will be given to but they will build exactly on the same interoperable framework as the “Linking Open Data” community does.
  3. Some companies will build applications on top of the linked data infrastructure. They have two kinds of knowledge: Who has the best end-points to a complex “thing” which consists of a couple of other atomic things (which necessarily exist in the web of data)? Who is interested in such a mashup?

My prediction: One possible business model will be pretty much the same as iTunes is built upon at the moment: You can listen to a song for free – but only a couple of seconds , if you want more, you pay 99 cents.

If you want to know a little bit about Werner Faymann (who is Austria´s prime minister) you go to an application which makes use from DBpedia (or the like) starting at http://dbpedia.org/page/Werner_Faymann.

If you pay 99 cents (or a bit more…) you get even more facts about Mr. Faymann, nicely mash-uped with other facts from the LOD cloud and together with special content from some other linked data sources, produced with relatively low costs due the high interoperability the Semantic Web provides – thanks to W3C and the whole community.

Andreas Blumauer

OntoWiki Kick-off in Leipzig

Virtuoso+DBpedia+OntoWiki together with several industry relevant uses cases – that´s about the formula of the OntoWiki project, which was launched yesterday in Leipzig.

Sören Auer and his team from AKSW at Uni Leipzig are the coordinators of this EU funded project which supports the development of innovative software products. All industry partners are SMEs which offer services for different fields like E-learning, E-tourism or Business Intelligence. Leipzig and OpenLink Software will work on an integration of OntoWiki & Virtuoso.

The first day of the meeting was, of course, dedicated to socialize and get to know each other. The mixture of the project team turned out to be well chosen – and in the evening we flew at higher game: We had a nice overview over Leipzig standing on the highest building of the town.

On the second day of the meeting Orri Erling, Program Manager at OpenLink Software, came up with an idea which is pretty forward: Why shouldn´t we provide OntoWiki as a Linked Data Browser, e.g. on top of DBpedia etc.? One possible outcome of this project.

Some other use cases which make already use of the existing OntoWiki system were demonstrated: Take a look at Vakantieland (…and start to plan your holidays in the Netherlands) and also at LinkedGeoData where a nice user interface can be tried out.

The Kick-Off Meeting will proceed with two workshops dedicated to semantic technologies and to Application Development with the OntoWiki Framework. Thanks to Sören and his team for the excellent hosting of this event!

Andreas Blumauer

DBpedia, UMBEL & the Future Web’s Ecology – interview with Mike Bergman & Sören Auer

Sören AuerThe Linked Open Data infrastructure is in a tremendous process of maturing – the recent release of UMBEL’s webservice AND the incorporation of UMBEL classes in DBpedia are yet another confirmation of this exciting process. Knowing and having met DBpedia co-initiator, Triplify main developer and head of the AKSW research group Sören Auer and UMBEL editor and Zitgist CEO Mike Bergman in various contexts, I felt it was time to talk to and pick the brains of both these key players in a dialog situation. The (first) result is the interview you can find below. As not everyone can expected to be familiar with both projects, here is some backgrond to get you started (you can also go directly to the interview):

Sören Auer (image above), Mike Bergman (image below)

DBpedia has become the largest RDF repository for encyclopaedic knowledge, extracting structured information from Wikipedia and making it available on the Web of Data. UMBEL, on the other hand, provides an OpenCYC-based, light-weight ontology structure for relating Web content and data to a standard set of subject concepts, with a number of 20,000 concepts currently reached. In the Linked Data Cloud, DBpedia and UMBEL map and cross-reference each other.

Mike BergmanIn practice this means that UMBEL provides classes to describe the concepts to which “things” are members. For instance, named entities from Wikipedia such as “John F. Kennedy” are mapped with subject concepts such as Leader, Person, Administrator and Graduate, with broader and equivalent classes in CYC and FOAF and broader subject concepts within UMBEL. A link is set to Wikipedia, as well as a ‘same as’ reference to DBpedia. A class structure enables faceted browsing and extraction, inferencing, and navigation and discovery for all datasets linked to that structure.

DBpedia, in turn, returns properties of ‘John J. Kennedy’ (e.g. abstracts in available Wikipedia languages, demographic information such as birth date and place, alma mater, predecessors and successors), and ‘same as’ references, e.g., to the JFK entry in Freebase (who recently released their RDF service) and the aforementioned page in UMBEL. Furthermore, DBpedia maps the URI with available RDF types, for instance foaf:person or yago:AssassinatedAmericanPoliticians and, once again, with UMBEL’s subject concepts Person, Administrator, Graduate and Leader.

Due to its reliance on Wikipedia, DBpedia does a great job at covering a bandwidth of knowledge as broad as the spectrum of the interest of people participating in Wikipedia; it’s within the area of named entities, i.e. entities such as persons, organizations, locations, which have a proper name, but are not necessarily and specifically part of a particular, acknowledged domain or discipline. UMBEL, on the other hand, has as its most apparent advantage its reliance on OpenCyc and with that the strong inferencing and logic capabilities of the CYC knowledge-base which are thus also brought to the Web of Data. DBpedia is a community project started by the University of Leipzig, Free University Berlin and OpenLink Software, while the open and free UMBEL is developed and hosted by Zitgist with support from, again, OpenLink Software.

Now, and in particular with the recent release of Zitgist’s web service endpoints and with the incorporation of UMBEL classes in DBpedia, questions arises as to the relationship of the two projects, and regarding the role of OpenLink Software in the further process. To draw a distinction:

One could say that DBpedia’s goal is to lower the barrier for web developers and end-users in the actual use of the semantic web, while UMBEL aims at bringing “order to the chaos” that is inherent to user-generated, collective knowledge.

Would you agree with this description – and is it a contradiction at all or the kind of dynamic the Semantic Web community has been waiting for?

Mike Bergman: Yes, I would agree with this description, though we have tried many others. For example, in various writings in the past, we have described UMBEL as a roadmap, or middleware, or a backbone, or a concept ontology, or an ‘infocline’, or a meta layer for metadata, and others. Today, what I tend to use, particularly in reference to DBpedia, is the TBox-ABox distinction in computer science and description logics. UMBEL is more of a class or structural and concept relationships schema — a TBox — while DBpedia is more of an an instance and entity layer with attributes — an ABox. I think they are pretty complementary…
Continue reading

Jana Herwig

Session 4: Using the Web of Data [WOD-PD]

This morning’s first session was dedicated to Using the Web of Data, or, as Alan Dix put it: “In the end, it’s not about data – it’s about use!” Alan and Richard Cyganiak were the keynoters for this session.

Alan Dix is a Professor at the Computing Department of Lancaster University, and author (with Janet Finlay, Gregory Abowd, and Russel Beale) of Human-Computer Interaction.

To start with, Alan pointed to the two sides of achieving the web of data: Firstly generating the web of data (a billion triples, as mighty as this may sound, is actually tiny, says Alan) and then, secondly, accessing the web of data.

Alan Dix giving a talk

With regard to generating the Web of Data, Alan distinguished between top down and bottom up approaches, counting to the former the creation of the web of data from legacy sources (i.e. where you take existing data and semantically lift them, e.g. from structured data) or web scraping such as DBpedia‘s extraction of data from Wikipedia.

N.B.: This notion of ‘top-down’ does not imply a hierarchical relationship, but rather means that there is already a plan for what is going to be put on the web of data (e.g. ‘all semi-structured information on Wikipedia’ or ‘dataset XY from project Z’). The bottom-up idea here implies that data is added as the result of an action, or interaction, as the user/s go, e.g. relationships are created as the user expands his or her social network. For instance on Amazon, user interaction is used to generate semantics: People do not tell Amazon what they like, they simply buy it.

Having relationships of course does not imply yet that these relationships are part of the Semantic Web. Or, as Alan put it, “why should I be RDFizing my online presence if none of my friends are?”

Please take a look at the PDF of the Alan’s slides (2,4 MB) – what I cannot reproduce here is a chart he developed, which was very useful for describing current scenarios on the web and which posed a twofold question:

Does a website/platform have the web of data implemented? YES/NO
Is the web of data on ta website/platform apparent to the user? YES/NO

The possible combinations (YES/YES, YES/NO, NO/YES, NO/NO) provide a good heuristic tool for describing what is currently available, with and without the Semantic Web. Take, for instance, the shiny interface of Talis’ Project Cenote: Cenote’s vision is to “make library data visible in many contexts, inside and outside of the library, making the data much more accessible and visible to a wider audience – benefiting current and potential users of library services wherever they are.” On Cenote, the user doesn’t see that it’s got the Web of Dat in it – it is actually implemented, but not in a way that is apparent to the user.

On the other end of the spectrum, you have a platform like Facebook: Alan referred to Facebook as “the user’s own web of data”, i.e. web of relationships: The user is aware of these relationships (they actually shape his interaction and communication with the site), and the (numerous!) apps on Facebook continually add relationships, but, regrettably, insulated from one another and not using RDF (and don’t you try to take data out of Facebook!).

Two examples of public data that Alan cited and that grow as people/institutions add data do them are Freebase (the “open database of the world’s information” – see previous posts on this blog about Freebase) and Swivel. Swivel allows people, institutions, anyone to upload and explore data, also featuring official data sources such as (links go to their Swivel pages): New York Federal Reserve Bank, UNESCO Institute for Statistics, DukeResearch or EUROSTAT. According to Alan, there is already more data on Swivel now than in the whole Linked Data cloud.

Alan also mentioned the Social Graph API – o yesterday evening Luca Hammer (one of the web 2.0 people who had joined the Open Hacking Session) introduced me to the WordPress Plugin “Meet your commenters” – Meet you commenters uses Social Graph to find social relations on the web, and adds these data to the commenter profiles it creates in WordPress.

Two Christmas crackersImage via WikipediaOn a different note: I took sometime today to explore Alan’s homepage and found the cute Christmas Cracker’s application which was first developed in 1999 and which is now also available on Facebook. As trivial as it may sound at first – sending virtual Christmas Crackers (with more than 5000 possible combinations!) is a good showcase for developing Human Interaction Scenarios, and a number of papers have been written about the application. Here is the casestudy which Alan recommends to begin with: Designing experience – virtual Christmas Crackers.

The abstract and a list of links to all websites and demos Alan discussed can be found here. Full reference: A. Dix and R. Cyganiak (2008). Using the Web of Data. Keynote at WOD-PD 2008 | Web of Data Practitioners Days, Vienna, Austria – Oct 22-23, 2008. http://www.hcibook.com/alan/papers/WOD-PD-2008/

Even if you have not met Richard Cyganiak in person, you have certainly come across one of his creations: The Linked Data Cloud. Richard is a research assistant at DERI Galway. In his demo, he gave us the opportunity to gain hands on experience, introducing a tool he dubbed Snorql, which is basically an easier to use version of a SPARQL-endpoint, as it already has the required prefixes ‘pre-installed’:

Using the Snorql interface, we could explore the dataset we had created collaboratively during Keith Alexander and Yves Raimond’s session. Writing SPARQL queries manually can be a challenge, but is next to impossible if you (like me) don’t know the syntax. But today we could just copy and paste all the queries from a website Richard had put up prior to his session – thanks a lot for the excellent preparation and demonstration!

Richard also showed a couple of RDF browsers in action, e.g. the Tabulator Plugin (“a Firefox extension which allows Firefox to handle data as well as documents”), or the Marbles Linked Data browser which is running right on beckr.org/marbles; enter, for instance http://api.talis.com/stores/wod-pd-sandbox/items/People/JanaHerwig (learn more about Marbles here).

Thank you, Alan and Richard – the combination of talk and demo was indeed a perfect intro towards using the Web of Data.

Reblog this post [with Zemanta]