Last week I had an an encounter with a social scientist (within an academic setting) who argued that discussing the Semantic web would not make sense for him (as a social scientist), because of the present lack of social practices in that field… (*jaw-dropping*) I could not persuade him with the argument that the Linked data cloud itself was the result of a social practice – the view he had of the semantic web (which I assume was not an uneducated one) even led him to denounce that developments like Dbpedia, Twine, Revyu, or the use of metadata in general had anything to do with the Semantic Web.
And this is a big challenge.
On the one hand, it is a good thing that there are social scientists who at least have a certain notion of the Semantic Web – on the other, it seems as if all the exciting ideas and developments that have taken place in the last few years have failed to reach those who have been sensitized for the SemWeb project when the idea was first conceived. I am not meaning to make a statement about social scientists here, but rather about the need to communicate what has further happened to the original idea outside also outside of one’s own community.
Btw: In its current issue, quarterly (German-language) magazine t3n is featuring a Web 3.0 and Applied Semantic Web topic as its opener. And that is a good sign, too!
The Linked Open Data infrastructure is in a tremendous process of maturing – the recent release of UMBEL’s webservice AND the incorporation of UMBEL classes in DBpedia are yet another confirmation of this exciting process. Knowing and having met DBpedia co-initiator, Triplify main developer and head of the AKSW research group Sören Auer and UMBEL editor and Zitgist CEO Mike Bergman in various contexts, I felt it was time to talk to and pick the brains of both these key players in a dialog situation. The (first) result is the interview you can find below. As not everyone can expected to be familiar with both projects, here is some backgrond to get you started (you can also go directly to the interview):
Sören Auer (image above), Mike Bergman (image below)
DBpedia has become the largest RDF repository for encyclopaedic knowledge, extracting structured information from Wikipedia and making it available on the Web of Data. UMBEL, on the other hand, provides an OpenCYC-based, light-weight ontology structure for relating Web content and data to a standard set of subject concepts, with a number of 20,000 concepts currently reached. In the Linked Data Cloud, DBpedia and UMBEL map and cross-reference each other.
In practice this means that UMBEL provides classes to describe the concepts to which “things” are members. For instance, named entities from Wikipedia such as “John F. Kennedy” are mapped with subject concepts such as Leader, Person, Administrator and Graduate, with broader and equivalent classes in CYC and FOAF and broader subject concepts within UMBEL. A link is set to Wikipedia, as well as a ‘same as’ reference to DBpedia. A class structure enables faceted browsing and extraction, inferencing, and navigation and discovery for all datasets linked to that structure.
DBpedia, in turn, returns properties of ‘John J. Kennedy’ (e.g. abstracts in available Wikipedia languages, demographic information such as birth date and place, alma mater, predecessors and successors), and ‘same as’ references, e.g., to the JFK entry in Freebase (who recently released their RDF service) and the aforementioned page in UMBEL. Furthermore, DBpedia maps the URI with available RDF types, for instance foaf:person or yago:AssassinatedAmericanPoliticians and, once again, with UMBEL’s subject concepts Person, Administrator, Graduate and Leader.
Due to its reliance on Wikipedia, DBpedia does a great job at covering a bandwidth of knowledge as broad as the spectrum of the interest of people participating in Wikipedia; it’s within the area of named entities, i.e. entities such as persons, organizations, locations, which have a proper name, but are not necessarily and specifically part of a particular, acknowledged domain or discipline. UMBEL, on the other hand, has as its most apparent advantage its reliance on OpenCyc and with that the strong inferencing and logic capabilities of the CYC knowledge-base which are thus also brought to the Web of Data. DBpedia is a community project started by the University of Leipzig, Free University Berlin and OpenLink Software, while the open and free UMBEL is developed and hosted by Zitgist with support from, again, OpenLink Software.
Now, and in particular with the recent release of Zitgist’s web service endpoints and with the incorporation of UMBEL classes in DBpedia, questions arises as to the relationship of the two projects, and regarding the role of OpenLink Software in the further process. To draw a distinction:
One could say that DBpedia’s goal is to lower the barrier for web developers and end-users in the actual use of the semantic web, while UMBEL aims at bringing “order to the chaos” that is inherent to user-generated, collective knowledge.
Would you agree with this description – and is it a contradiction at all or the kind of dynamic the Semantic Web community has been waiting for?
Mike Bergman: Yes, I would agree with this description, though we have tried many others. For example, in various writings in the past, we have described UMBEL as a roadmap, or middleware, or a backbone, or a concept ontology, or an ‘infocline’, or a meta layer for metadata, and others. Today, what I tend to use, particularly in reference to DBpedia, is the TBox-ABox distinction in computer science and description logics. UMBEL is more of a class or structural and concept relationships schema — a TBox — while DBpedia is more of an an instance and entity layer with attributes — an ABox. I think they are pretty complementary… Continue reading →
I’m not going to explicitly comment on the panel discussion at ISWC08, entitled An OWL 2 Far? Let’s simply say it was controversial. I don’t mind controversial panels. In fact, I think that few things are more boring than a panel where all panelists more or less agree. But at the same time, at the ISWC08 panel, I think, an important message got lost, namely that we really need reasoning for the Semantic Web, and that we need diversity in reasoning. (Admittedly, some people said so, but I think the message didn’t really get through.)
So, instead, let me give you some web search problems. They all came up in my real life, so they are not artificially created. It seems to me that the Semantic Web should make answering them easier, but with the existing web resources, they are really difficult.
Find all papers having received best paper awards at ISWC conferences. I did that today, and it took me more than 30 minutes. And I’m not sure if I got all of them – indeed I would have missed one of them if I hadn’t known beforehand about that specific paper having received the award. Isn’t this a typical Semantic Web problem? (The results of my search are further below.)
There’s an owl-like bird in southern German woods, and in colloquial german it’s called Käuzchen. Try to find out the english name for this bird. I actually failed, though I think I got close to the answer when I merged web search with an external knowledge base (in form of a biologist I happen to know). And actually, simply going to Wikipedia and clicking on the English link is not enough, since I’m not looking for the Strix genus of owls, but rather for a particular bird …
Who is this researcher with the russian looking name who worked on resolution-based methods for the description logic EL? This also looks like a typical Semantic Search problem, which shouldn’t be too difficult if you have the corresponding knowledge (and background knowledge) available. I admit I failed on this one using traditional methods (unless you consider it a traditional method to ask Franz Baader by email about it.)
Are lobsters spiders? I.e. are lobsters classified as spiders by biologists? This one is actually tougher than you would think using traditional methods. Should be easy using Semantic Web knowledge bases and some simple reasoning, shouldn’t it?
For all these tasks (and many others), it seems to be apparent that Semantic Web Reasoning – and the availability of corresponding knowledge bases – would make the finding of answers much easier. The current reality of the Semantic Web is still quite a bit away from this. But we’re working on it.
Finally, as promised, the results of my inquiry about the ISWC best paper awards:
So why did I dig these awards out? Because I noticed that among these 6 papers there are 3 which are explicitly concerned with OWL. And the 2007 paper involves RDF inferencing. Talk about the importance of reasoning for the Semantic Web …
Author: Pascal Hitzler, AIFB, University of Karlsruhe (TH), Germany
The TRIPLE-I conference in Graz today started with a keynote by Henry Lieberman, research scientist at the MIT Media Laboratory. Given that, nominally, at least a third of the conference is dedicated to knowledge managemen, Lieberman introduced an important, often overseen aspect of knowledge management right at the beginning: Managing knowledge that everybody knows already.
Knowledge management typically aims at knowledge that people do not know yet, e.g. (tacit) knowledge that people have acquired in a project and that is suppose to be made explicit and accessible to other people who don’t yet have this knowledge.
An intriguing MIT project I hadn’t yet heard about which Henry Lieberman introduced is the Common sense knowledge base Open Mind Common Sense – anyone can sign up to it and contribute. A total of 203 knowledge facts have, for instance, been accumulated about the concept “apple”, including facts such as these:
→ An apple is red
→ An apple is green
→ Apples grow in trees
→ an apple are food.
→ An apple has a core
→ An apple can fall from a tree
→ An apple is a type of fruit
Offered similar concepts are “egg, potato, steak, bread, spinach, frozen food, butter, appl [sic], leftover, grape”. The process of adding knowledge is guided by a list of questions that allow to conceptualize and structure the knowledge, e.g.
MadeOf
What is it made of? IsA
What kind of thing is it? UsedFor
What do you use it for? CapableOf
What can it do? PartOf
What is it part of? DefinedAs
How do you define it?
But what are the roles that common sense knowledge can play in interactive applications? Henry Lieberman suggested using common sense knowledge, a system can e.g. anticipate what a user is most likely to do, or it can at least make most likely things easiest to do, e.g. by providing a map from goals to concrete actions in the interface, or by integrating appropriate applications.
Lieberman furthermore introduced a couple of tools which illustrated these benefits, e.g. the prototype for an Event Minder for improved scheduling driven by common sense knowledge. Entering a statement such as “Lunch with Charlie at Miracle next Friday” would for instance calculate the date of ‘next Friday’, call up a calendar application and also a web service to get directions for getting to Miracle.
Regarding the difference between CYC (the common sense knowledge ontology) and the MIT’s common knowledge base Open Mind Common Sense: CYC is an ontology organized by experts with a broader and deeper knowledge – the common knowledge base grants access to anyone and has, for instance, also information about kitten that might not be that relevant to experts. At this stage, there is no mapping to CYC.
Henry Lieberman’s keynote tied in nicely with a presentation by Andrew S. Gordon about “Envisioning with Weblogs”. According to Andrew Gordon, there have been three waves in the 50 year history of common sense knowledge in artifical intelligence:
First wave: Logical formalizations of commons sense knowledge (e.g. CYC)
Second wave: volunteer contributions from web communities (e.g. Open Mind Common Sense)
3rd wave: Knowledge acquisition from the social web (e.g. Envisioning with Weblogs)
First off, what is envisioning? Andrew Gordon described it as a form of reasoning about states and events in time and space, generating answers to questions such as “What’s happening in the world right now?”, or “What is going on in the audience’s mind right now?”, or “How did this person get into the room?”, or “What am I going to have for dinner tonight?”
At the Institute for Creative Technologies (University of Southern California), Andrew is involved in a project called Story Representation and Management, which among other things, is doing research on story interpretation, i.e. “techniques for integrating automated commonsense inference into the processing of narrative text documents, and methodologies for creating very large scale commonsense knowledge bases.”
One of the paths towards the creation of this knowledge base is gathering up stories on weblogs. But can we really gather up all stories ever written in a weblog? In the research conducted and cited by Andrew (Gordon 2007), 4,5 million stories, made up of 66,6 million sentences and 1,06 billion words were extracted from weblogs.
In Gordon’s recipe for envisioning with weblogs, the retrieval of the closest situation provides the best results. Take for instance the quest of formalizing this particular problem in common sense physical reasoning: cracking an egg into a bowl (as described by Morgenstern 1998, Lifschitz 1998, Shanahan 1998).
There are so many things to be considered: Is the bowl big enough? What if the bowl is made of cardboard? What of the egg is hardboiled? Common sense knowledge in stories on weblogs does offer many answers, for instance this story from Amit Asaravala – which also generates further knowledge as to what would happen to a person who does this:
Seeing the little weirdo reminded me of one Saturday morning, a year or so ago, when I cracked an egg into a bowl and found three yolks inside. After tossing the triplets, I cracked another egg from the batch and found yet another three yolks jiggling up at me. Another egg, another trio of blondes.
This continued through all twelve eggs — I kid you not.
Though the episode had me thoroughly creeped out, I must say that I am somewhat intrigued by the thought that, on some farm somewhere, there is a crotchety old hen that consistently lays triple-yolkers.
In the following discussion, some people wondered if weblogs aren’t an unreliable source for a common sense knowledge database. Andrew however doubted that the difference true/false or the difference true/fictitious did really matter. Instead he suggested that in 99% of the cases the same physical reasoning applies in, say, the Star Wars Universe as does apply in the real world.
Common sense knowledge is not about the velocity of spacecrafts crossing the milky way, it’s about what happens if Leia punches Han.
Which is yet another point sustaining that common sense knowledge is so obvious that most of the time we don’t even know we know it. And that’s a challenge to knowledge management.
Oh, and something very nice happened to me today: While I sat in our booth preparing this blog post, someone approached me very politely saying that he had read my name somewhere before, on some blog. Turns out this person – Stefano Bertolo, Project Officer at the Information Society Directorate of the European Commission – has in the past also left a comment on the Flickr page of our “Escape from the Data Silo” logo (which can be used freely by anyone on a CC license). It’s a small world, thanks to Social Media:-) We had a nice conversation at our booth, during which he also recommended the NeON project: Lifecyle Support for Networked Ontologies: a recommendation which I herewith pass on to you, reader of this blog:-)
P.S. There were many more interesting talks and sessions, but the scope of this blogpost is, sadly, limited by the rules of physics: I could only attend one talk at a time.