Jana Herwig

TRIPLE-I 2008: First Day Filled by Commonsense Knowledge

The TRIPLE-I conference in Graz today started with a keynote by Henry Lieberman, research scientist at the MIT Media Laboratory. Given that, nominally, at least a third of the conference is dedicated to knowledge managemen, Lieberman introduced an important, often overseen aspect of knowledge management right at the beginning: Managing knowledge that everybody knows already.

Knowledge management typically aims at knowledge that people do not know yet, e.g. (tacit) knowledge that people have acquired in a project and that is suppose to be made explicit and accessible to other people who don’t yet have this knowledge.

But what about the knowledge that everybody knows without them knowing they need to know it? Such as that an apple is a type of fruit, and is green and is red? Common sense knowledge?

I boldly asked Henry Lieberman for a 12 seconds definition of Common Sense Knowledge, a challenge he accomplished with perfect precision:


Henry Lieberman defines Common Sense Knowledge on 12seconds.tv

An intriguing MIT project I hadn’t yet heard about which Henry Lieberman introduced is the Common sense knowledge base Open Mind Common Sense – anyone can sign up to it and contribute. A total of 203 knowledge facts have, for instance, been accumulated about the concept “apple”, including facts such as these:

→ An apple is red
→ An apple is green
→ Apples grow in trees
→ an apple are food.
→ An apple has a core
→ An apple can fall from a tree
→ An apple is a type of fruit

Offered similar concepts are “egg, potato, steak, bread, spinach, frozen food, butter, appl [sic], leftover, grape”. The process of adding knowledge is guided by a list of questions that allow to conceptualize and structure the knowledge, e.g.

MadeOf
What is it made of?
IsA
What kind of thing is it?
UsedFor
What do you use it for?
CapableOf
What can it do?
PartOf
What is it part of?
DefinedAs
How do you define it?

But what are the roles that common sense knowledge can play in interactive applications? Henry Lieberman suggested using common sense knowledge, a system can e.g. anticipate what a user is most likely to do, or it can at least make most likely things easiest to do, e.g. by providing a map from goals to concrete actions in the interface, or by integrating appropriate applications.

Lieberman furthermore introduced a couple of tools which illustrated these benefits, e.g. the prototype for an Event Minder for improved scheduling driven by common sense knowledge. Entering a statement such as “Lunch with Charlie at Miracle next Friday” would for instance calculate the date of ‘next Friday’, call up a calendar application and also a web service to get directions for getting to Miracle.

Regarding the difference between CYC (the common sense knowledge ontology) and the MIT’s common knowledge base Open Mind Common Sense: CYC is an ontology organized by experts with a broader and deeper knowledge – the common knowledge base grants access to anyone and has, for instance, also information about kitten that might not be that relevant to experts. At this stage, there is no mapping to CYC.

Henry Lieberman’s keynote tied in nicely with a presentation by Andrew S. Gordon about “Envisioning with Weblogs”. According to Andrew Gordon, there have been three waves in the 50 year history of common sense knowledge in artifical intelligence:

First wave: Logical formalizations of commons sense knowledge (e.g. CYC)
Second wave: volunteer contributions from web communities (e.g. Open Mind Common Sense)
3rd wave: Knowledge acquisition from the social web (e.g. Envisioning with Weblogs)

First off, what is envisioning? Andrew Gordon described it as a form of reasoning about states and events in time and space, generating answers to questions such as “What’s happening in the world right now?”, or “What is going on in the audience’s mind right now?”, or “How did this person get into the room?”, or “What am I going to have for dinner tonight?”

At the Institute for Creative Technologies (University of Southern California), Andrew is involved in a project called Story Representation and Management, which among other things, is doing research on story interpretation, i.e. “techniques for integrating automated commonsense inference into the processing of narrative text documents, and methodologies for creating very large scale commonsense knowledge bases.”

One of the paths towards the creation of this knowledge base is gathering up stories on weblogs. But can we really gather up all stories ever written in a weblog? In the research conducted and cited by Andrew (Gordon 2007), 4,5 million stories, made up of 66,6 million sentences and 1,06 billion words were extracted from weblogs.

In Gordon’s recipe for envisioning with weblogs, the retrieval of the closest situation provides the best results. Take for instance the quest of formalizing this particular problem in common sense physical reasoning: cracking an egg into a bowl (as described by Morgenstern 1998, Lifschitz 1998, Shanahan 1998).

There are so many things to be considered: Is the bowl big enough? What if the bowl is made of cardboard? What of the egg is hardboiled? Common sense knowledge in stories on weblogs does offer many answers, for instance this story from Amit Asaravala – which also generates further knowledge as to what would happen to a person who does this:

Seeing the little weirdo reminded me of one Saturday morning, a year or so ago, when I cracked an egg into a bowl and found three yolks inside. After tossing the triplets, I cracked another egg from the batch and found yet another three yolks jiggling up at me. Another egg, another trio of blondes.

This continued through all twelve eggs — I kid you not.

Though the episode had me thoroughly creeped out, I must say that I am somewhat intrigued by the thought that, on some farm somewhere, there is a crotchety old hen that consistently lays triple-yolkers.

In the following discussion, some people wondered if weblogs aren’t an unreliable source for a common sense knowledge database. Andrew however doubted that the difference true/false or the difference true/fictitious did really matter. Instead he suggested that in 99% of the cases the same physical reasoning applies in, say, the Star Wars Universe as does apply in the real world.

Common sense knowledge is not about the velocity of spacecrafts crossing the milky way, it’s about what happens if Leia punches Han.

Which is yet another point sustaining that common sense knowledge is so obvious that most of the time we don’t even know we know it. And that’s a challenge to knowledge management.

Oh, and something very nice happened to me today: While I sat in our booth preparing this blog post, someone approached me very politely saying that he had read my name somewhere before, on some blog. Turns out this person – Stefano Bertolo, Project Officer at the Information Society Directorate of the European Commission – has in the past also left a comment on the Flickr page of our “Escape from the Data Silo” logo (which can be used freely by anyone on a CC license). It’s a small world, thanks to Social Media:-) We had a nice conversation at our booth, during which he also recommended the NeON project: Lifecyle Support for Networked Ontologies: a recommendation which I herewith pass on to you, reader of this blog:-)

P.S. There were many more interesting talks and sessions, but the scope of this blogpost is, sadly, limited by the rules of physics: I could only attend one talk at a time.

Reblog this post [with Zemanta]
Jana Herwig

Common vs. Marginalized Knowledge – a Potential Showstopper for the Semantic Web?

Earlier today I published an interview that my colleague Marion Fugléwicz-Bren led with Corinna Bath from the Institute for Advanced Studies in Science, Technology and Society (IAS-TS). Corinna Bath is a researcher with a focus on gender studies in computer science and has been working specifically towards a methodology for de-gendering IT design processes and is now also turning towards the Semantic Web. Now that CYC seems to be coming into wider, or renewed use (e.g. Zitgist’s UMBEL is deriving its subject concepts and relationships from OpenCYC), it was interesting for me to read her remarks about the CYC project and specifically the research undertaken by Alison Adam in this context:

Alison Adam analyzed the well-known ontology CYC that was build to capture common sense knowledge from the 1980ies on. Her criticism focussed on the built-in assumption that we would all share a consensus reality: “be it a professor, a waitress, a six-year old child, or even a lawyer” (Lenat and Guha 1990). She revealed that the knowing subject implicitly assumed by the system is a white, middle-class male professional.

Hence, in contrast to its own agenda CYC ignores minority views, quieter voices, and allows the dominant voice to speak for everyone, which seems highly problematic. Other studies give more evidence for the highly problematic prerequisite of computer science modelling that rests on the Cartesian epistemology. Even the modelling concepts themselves should be questioned as Cecile Crutzen suggest, since e.g. the class concept and the inheritance concept lack to represent social processes, because of limited formal expressiveness for conflict, change and fluidity. Such an ontology abstracts from human sociality, situated action and real meaning construction processes.

This also made me think about my own role within and attachment to the Semantic Web Community – from a professional point of view, I see myself as a sort of mouthpiece for the Semantic Web (at least within the professional community that I am a part of), and while I am convinced that the movement is going to see its big break within the next five years, I don’t see myself as playing a significant role in it. And I’m always inclined to leave all the ‘hard stuff’, i.e. all the technology-related questions to the ‘boys’ in our team.

But one of the good things about the Semantic web is that it is actually EASY to understand – I’ve also been told by Henry Story for instance that N3 (Notation3, a shorthand non-XML serialization of Resource Description Framework models) is relatively easy to learn; and since I am one of the few women I know (sadly) who actually know what an ontology is, maybe it would be about time that I learned to model one myself.

Because we cannot expect that white, middle-class male professionals are going to be able to explore the feminine or queer knowledge in this world and mold it into a common knowledge base. Even if marginalized voiced can hardly expect that the hegemony is going to advocate their cause: The Semantic Web project itself is at stake if some voices, views and knowledge are excluded. This could indeed be a showstopper for the Semantic Web – not immediately on a technology level, but with regard to meeting the societal goals of its own agenda.

Read the entire interview with Corinna Bath here.

Alison Adam’s cited work is contained in: Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project (D.B. Lenat and R.V. Guha 1990).

Reblog this post [with Zemanta]
Jana Herwig

Just released: UMBEL – A New Vocabulary for the Semantic Web

UMBELNews has reached me this morning that UMBEL has now been publicly released! UMBEL is a new vocabulary for the Semantic Web – I first learned about it when Andreas Blumauer returned from LinkedData Planet where he had met up with Mike Bergman from Zitgist LLC who are working on UMBEL.

Here is the release announcement Mike communicated via email yesterday:

UMBEL (Upper Mapping and Binding Exchange Layer) [1] is a lightweight ontology for relating Web content and data to a standard set of 20,000 subject concepts. Based on OpenCyc [2], these subject concepts have defined relationships between them, and can act as semantic binding nodes for any data or Web content. A further 1.5 million named entities have been extracted from Wikipedia and mapped to the UMBEL reference structure with cross-links to YAGO [3] and DBpedia [4]. The system can easily be extended with additional dictionaries of named entities, including ones specific to enterprises or domains.

UMBEL is provided as open source under the Creative Commons 3.0 Attribution-Share Alike license. The complete ontology with all subject concepts, definitions, terms and relationships can be freely downloaded [see 5]. All subject concepts and named entities are available as Linked Data [see 5]. Five volumes of documentation [5] are also available.

The release is accompanied by about a dozen Web services [6] for using or manipulating UMBEL, along with a new introductory slide show [7]. Additional release information may be found on Fred’s [8] or my [9] separate blog postings. We welcome those with interest or suggestions for improvements to do so through the UMBEL discussion forum [10]. We will shortly be putting easier services online for such input.

So, enjoy! We look forward to your commentary, suggestions and putting UMBEL under production-grade stress. We know will be doing the same!

Regards, Mike

Great release! They have also given us access to a media-oriented article which you can read on our portal.