Jana Herwig

The Wild vs The Orderly: Folksonomies and Semantics (TRIPLE-I 2008)

This second day of TRIPLE-I 2008 was my personal folksonomy day, even though the theme was already set yesterday, with Andreas Hotho‘s invited talk about “Extracting Semantics from Folksonomies” which was the opening lecture of the workshop “Knowledge acquisition from the Social Web.”

Andreas Hotho is directing the Bibsonomy project at Kassel University’s Knowledge and Data Engineering resarch group; Bibsonomy is a social bookmark and publication sharing system catering especially for researchers who, next to bookmarkingm also wish to manage publications. Next to other interesting things, Bibsonomy supports the import of bookmarks from del.icio.us, Firefox bookmarks and local BibTex files. Being a project led by a university’s computer science department, Bibsonomy is at the same time the result, the object and a stimulus for research in the area of tagging and folksonomies. Andreas describes this double appeal of folksonomies to both ordinary people and researchers in a 12 seconds vlog post:


Andreas Hotho’s statement about folksonomies and research (see www.bibsonomy.org) on 12seconds.tv

One of the outcomes of the research into folksonomies is FolkRank, a search algorithm that exploits the structure of folksonomies; the name reveals that it was inspired by PageRank, but as the graph of folksonomy structures does not correspond to the web graph, some adaptations had to be made. The specifics of these adaptations can be found in an online article by Andreas and his colleagues: “FolkRank: A Ranking Algorithm for Folksonomies” (PDF, 268 KB).

Andreas Hotho’s talk more specifically addressed the search for methods to identify tags which describe the same concept (or a more specific / a more general concept respectively) within a folksonomy. He suggested two approaches:

  1. Applying measures directly to folksonomy statistics, allowing to describe tags as a vector; e.g. co-occurrence frequency and FolkRank could serve as a similarity measure (with these two having a tendency towards high-frequency tags) or a cosine method (which is more likely to produce “siblings”)
  2. Looking up tags in an external thesaurus/vocabulary (for instance achieving semantic grounding by mapping a tag and its most similar tags with Wordnet Synsets)

Future areas of interest within folksonomy research Andreas proposed were trend detection, tag recommendation, detecting spam (a major challenge!), logsonomies (i.e. the structure of search engine query log files) and learning synsets, hierarchies, and structures of folksonomies. Andreas Hotho can be contacted via his homepage, if you have any further questions regarding Bibsonomy, FolkRank or this present piece of research.

Another presentation dedicated to folksonomies – and the presentation that won my personal presentation design award – was “Seeding, Weeding, Fertilizing – Different Tag Gardening Activities for Folksonomy Maintenance and Enrichment” by Katrin Weller and Isabella Peters, both from the Dept. of Information Science at Heinrich Heine University in Düsseldorf. The entire presentation was designed to match the CI of Tagcare, a tag gardening tool that is hopefully going to go online soon.

The term “Tag Gardening” was borrowed from James Governor who wrote in a 2006 blogpost:

“Like plants or animals, tags evolve in an emergent fashion, open to hybridisation. Stewardship can help grow and put roots down.

Helping the darwinian process is tag gardening.

Tag gardening is about taking tags in the wild and tending to them, or identifying a wild tag that will do well in your south facing IT

garden. I am talking about domestication here.

Just like there are professional bloggers i am pretty sure some parties will emerge that get paid for their abilities.”

I seriously hope that the latter is going to come true, even though I have the feeling that most providers will continue to consider user input and effort pro bono work!

Katrin Weller’s intro (Isabella Peters had excused herself) focused on the well-known problems with tags and folksonomies, e.g. :

  • spelling variants, synonyms, abbreviations, different natural languages
  • adhoc or personal functions of tags other than content description (e.g. “toread”, “@Henry”, “nicepic”)
  • flatness of tag clouds which allows for browsing by popularity, but not by semantic interrelations

She further distinguished three levels where tag or tag cloud improvement becomes relevant:

  • single document vs document collection level
  • Single user vs collaborative level
  • intra- and cross plattform level (e.g. different tagging conventions, tag separation with comma or blank space, etc)

To push the gardening metaphor even further, Kathrin presented us their ideas of weeding, seeding, fertilizing etc.:

Weeding
The weeds in this case are “bad” tags like spam or misspelled tags (weed: any plant that crowds out cultivated plants)
Aim: enhancing recall and a consistent indexing vocabulary
Achieved by: type-ahead functionality, editing funcionalities, natural language processing, user guidelines for indexing and retrieval, nomination of authorized users as gardeners

Seeding
Seeding in folksonomies means to expand frequently used tags by more specific tags (called “baby tags” or “seedlings” by Katrin Weller; seedling: young plant or tree grown from a seed)

Landscaping
The idea of landscaping here means to create “flower beds” through identifying species of tags, e.g. by similarity.
Aim: enhancing precision and expressiveness

Fertilizing
Fertilizing in this context means to combine folksonomies with other knowledge organization systems (KOS): thesauri, controlled vocabularies, ontologies, etc. (fertilizer: any substance such as manure or a mixture of nitrates used to make soil more fertile). Fertilizing might work both ways, Katrin suggested: a folksonomy might be fertilized with the semantic structure of a KOS, or a KOS enhanced by terms from a folksonomy.

And finally TagCare: The ambitious plan is to have a system that allows to import tag clouds from Flickr, deli.icio.us and Bibsonomy, cleanse out dissimilarities between tags, add hierarchical structure to the tag clouds, allow the user to view tag statistics and probably also to have community features, such calibrating one’s tags with those of the chief gardener or to activate collaborative spam elimination. It is going to be a free service, and if you want to be notified when it goes live, you might want to send an email to Katrin.

This full-service proposal for tag gardening does of course sound brilliant – yet is it going to be feasible, on a technical level? In the post-presentation discussion, somebody mentioned Faviki, which relies on DBpedia concepts to solidify the tag cloud. It didn’t exactly seem as though the TagCare team had already thought along these (semantic web) lines, even though this perfectly corresponded to their ‘Fertilizing’ idea. But if TagCare solely relies on good human gardeners, how long will it take until they have gained a big enough community to stimulate someone’s altruism? The idea of tag gardening of course is beautiful, and I am curious to learn more about the technology it is going to use.

Other folksonomy and tag related presentations that I was unable to attend or am unable to describe now, after the 10th hour of my 2nd day at TRIPLE-I, with a band performing folkore music involving yodeling and probably Schuhplattler right outside of this room:

  • Quality Metrics for Tags of Broad Folksonomies (Celine Van Damme, Martin Hepp, Tanguy, Coenen, University of Brussels, Universität der Bundeswehr München
  • Providing Multi Source Tag Recommendations in a Social Resource Sharing Platform (Martin Memmel, Michael Kockler, Rafael Schirru, German Research Centre for Artificial Intelligence DFKI)
  • Semantic Tagging and Inference in Online Communities, Yildirim Ahmet, Üsküdarli Suzan, BoÄŸaziçi University
  • Using Visual Features to Improve Tag Suggestions in Image Sharing Sites (Mathias Lux, Oge Marques, Arthur Pitman, Klagenfurt University)
  • Harnessing Wikipedia for Smart Tags Clustering (Maria Grineva, Maxim Grinev, Denis Turdakov, Pavel Velikhov, Russian Academy of Sciences)

Please leave a comment if you think that any of the above needs correction.

EDIT: I got the chance to record another 12 seconds definition (and am thinking of setting up a video glossary for the Semantic Web now): Rolf Sint from Salzburg Research explains what folksonomies are and why folksonomies and ontologies go together well in 12 seconds! Rolf is also involved in the KiWi project, which aims to develop a wiki-based knowledge management system boosted by semantic technologies.


Rolf Sint explains folksonomies and their relation to ontologies on 12seconds.tv

Reblog this post [with Zemanta]
Jana Herwig

TRIPLE-I 2008: First Day Filled by Commonsense Knowledge

The TRIPLE-I conference in Graz today started with a keynote by Henry Lieberman, research scientist at the MIT Media Laboratory. Given that, nominally, at least a third of the conference is dedicated to knowledge managemen, Lieberman introduced an important, often overseen aspect of knowledge management right at the beginning: Managing knowledge that everybody knows already.

Knowledge management typically aims at knowledge that people do not know yet, e.g. (tacit) knowledge that people have acquired in a project and that is suppose to be made explicit and accessible to other people who don’t yet have this knowledge.

But what about the knowledge that everybody knows without them knowing they need to know it? Such as that an apple is a type of fruit, and is green and is red? Common sense knowledge?

I boldly asked Henry Lieberman for a 12 seconds definition of Common Sense Knowledge, a challenge he accomplished with perfect precision:


Henry Lieberman defines Common Sense Knowledge on 12seconds.tv

An intriguing MIT project I hadn’t yet heard about which Henry Lieberman introduced is the Common sense knowledge base Open Mind Common Sense – anyone can sign up to it and contribute. A total of 203 knowledge facts have, for instance, been accumulated about the concept “apple”, including facts such as these:

→ An apple is red
→ An apple is green
→ Apples grow in trees
→ an apple are food.
→ An apple has a core
→ An apple can fall from a tree
→ An apple is a type of fruit

Offered similar concepts are “egg, potato, steak, bread, spinach, frozen food, butter, appl [sic], leftover, grape”. The process of adding knowledge is guided by a list of questions that allow to conceptualize and structure the knowledge, e.g.

MadeOf
What is it made of?
IsA
What kind of thing is it?
UsedFor
What do you use it for?
CapableOf
What can it do?
PartOf
What is it part of?
DefinedAs
How do you define it?

But what are the roles that common sense knowledge can play in interactive applications? Henry Lieberman suggested using common sense knowledge, a system can e.g. anticipate what a user is most likely to do, or it can at least make most likely things easiest to do, e.g. by providing a map from goals to concrete actions in the interface, or by integrating appropriate applications.

Lieberman furthermore introduced a couple of tools which illustrated these benefits, e.g. the prototype for an Event Minder for improved scheduling driven by common sense knowledge. Entering a statement such as “Lunch with Charlie at Miracle next Friday” would for instance calculate the date of ‘next Friday’, call up a calendar application and also a web service to get directions for getting to Miracle.

Regarding the difference between CYC (the common sense knowledge ontology) and the MIT’s common knowledge base Open Mind Common Sense: CYC is an ontology organized by experts with a broader and deeper knowledge – the common knowledge base grants access to anyone and has, for instance, also information about kitten that might not be that relevant to experts. At this stage, there is no mapping to CYC.

Henry Lieberman’s keynote tied in nicely with a presentation by Andrew S. Gordon about “Envisioning with Weblogs”. According to Andrew Gordon, there have been three waves in the 50 year history of common sense knowledge in artifical intelligence:

First wave: Logical formalizations of commons sense knowledge (e.g. CYC)
Second wave: volunteer contributions from web communities (e.g. Open Mind Common Sense)
3rd wave: Knowledge acquisition from the social web (e.g. Envisioning with Weblogs)

First off, what is envisioning? Andrew Gordon described it as a form of reasoning about states and events in time and space, generating answers to questions such as “What’s happening in the world right now?”, or “What is going on in the audience’s mind right now?”, or “How did this person get into the room?”, or “What am I going to have for dinner tonight?”

At the Institute for Creative Technologies (University of Southern California), Andrew is involved in a project called Story Representation and Management, which among other things, is doing research on story interpretation, i.e. “techniques for integrating automated commonsense inference into the processing of narrative text documents, and methodologies for creating very large scale commonsense knowledge bases.”

One of the paths towards the creation of this knowledge base is gathering up stories on weblogs. But can we really gather up all stories ever written in a weblog? In the research conducted and cited by Andrew (Gordon 2007), 4,5 million stories, made up of 66,6 million sentences and 1,06 billion words were extracted from weblogs.

In Gordon’s recipe for envisioning with weblogs, the retrieval of the closest situation provides the best results. Take for instance the quest of formalizing this particular problem in common sense physical reasoning: cracking an egg into a bowl (as described by Morgenstern 1998, Lifschitz 1998, Shanahan 1998).

There are so many things to be considered: Is the bowl big enough? What if the bowl is made of cardboard? What of the egg is hardboiled? Common sense knowledge in stories on weblogs does offer many answers, for instance this story from Amit Asaravala – which also generates further knowledge as to what would happen to a person who does this:

Seeing the little weirdo reminded me of one Saturday morning, a year or so ago, when I cracked an egg into a bowl and found three yolks inside. After tossing the triplets, I cracked another egg from the batch and found yet another three yolks jiggling up at me. Another egg, another trio of blondes.

This continued through all twelve eggs — I kid you not.

Though the episode had me thoroughly creeped out, I must say that I am somewhat intrigued by the thought that, on some farm somewhere, there is a crotchety old hen that consistently lays triple-yolkers.

In the following discussion, some people wondered if weblogs aren’t an unreliable source for a common sense knowledge database. Andrew however doubted that the difference true/false or the difference true/fictitious did really matter. Instead he suggested that in 99% of the cases the same physical reasoning applies in, say, the Star Wars Universe as does apply in the real world.

Common sense knowledge is not about the velocity of spacecrafts crossing the milky way, it’s about what happens if Leia punches Han.

Which is yet another point sustaining that common sense knowledge is so obvious that most of the time we don’t even know we know it. And that’s a challenge to knowledge management.

Oh, and something very nice happened to me today: While I sat in our booth preparing this blog post, someone approached me very politely saying that he had read my name somewhere before, on some blog. Turns out this person – Stefano Bertolo, Project Officer at the Information Society Directorate of the European Commission – has in the past also left a comment on the Flickr page of our “Escape from the Data Silo” logo (which can be used freely by anyone on a CC license). It’s a small world, thanks to Social Media:-) We had a nice conversation at our booth, during which he also recommended the NeON project: Lifecyle Support for Networked Ontologies: a recommendation which I herewith pass on to you, reader of this blog:-)

P.S. There were many more interesting talks and sessions, but the scope of this blogpost is, sadly, limited by the rules of physics: I could only attend one talk at a time.

Reblog this post [with Zemanta]