Jana Herwig

The Wild vs The Orderly: Folksonomies and Semantics (TRIPLE-I 2008)

This second day of TRIPLE-I 2008 was my personal folksonomy day, even though the theme was already set yesterday, with Andreas Hotho‘s invited talk about “Extracting Semantics from Folksonomies” which was the opening lecture of the workshop “Knowledge acquisition from the Social Web.”

Andreas Hotho is directing the Bibsonomy project at Kassel University’s Knowledge and Data Engineering resarch group; Bibsonomy is a social bookmark and publication sharing system catering especially for researchers who, next to bookmarkingm also wish to manage publications. Next to other interesting things, Bibsonomy supports the import of bookmarks from del.icio.us, Firefox bookmarks and local BibTex files. Being a project led by a university’s computer science department, Bibsonomy is at the same time the result, the object and a stimulus for research in the area of tagging and folksonomies. Andreas describes this double appeal of folksonomies to both ordinary people and researchers in a 12 seconds vlog post:


Andreas Hotho’s statement about folksonomies and research (see www.bibsonomy.org) on 12seconds.tv

One of the outcomes of the research into folksonomies is FolkRank, a search algorithm that exploits the structure of folksonomies; the name reveals that it was inspired by PageRank, but as the graph of folksonomy structures does not correspond to the web graph, some adaptations had to be made. The specifics of these adaptations can be found in an online article by Andreas and his colleagues: “FolkRank: A Ranking Algorithm for Folksonomies” (PDF, 268 KB).

Andreas Hotho’s talk more specifically addressed the search for methods to identify tags which describe the same concept (or a more specific / a more general concept respectively) within a folksonomy. He suggested two approaches:

  1. Applying measures directly to folksonomy statistics, allowing to describe tags as a vector; e.g. co-occurrence frequency and FolkRank could serve as a similarity measure (with these two having a tendency towards high-frequency tags) or a cosine method (which is more likely to produce “siblings”)
  2. Looking up tags in an external thesaurus/vocabulary (for instance achieving semantic grounding by mapping a tag and its most similar tags with Wordnet Synsets)

Future areas of interest within folksonomy research Andreas proposed were trend detection, tag recommendation, detecting spam (a major challenge!), logsonomies (i.e. the structure of search engine query log files) and learning synsets, hierarchies, and structures of folksonomies. Andreas Hotho can be contacted via his homepage, if you have any further questions regarding Bibsonomy, FolkRank or this present piece of research.

Another presentation dedicated to folksonomies – and the presentation that won my personal presentation design award – was “Seeding, Weeding, Fertilizing – Different Tag Gardening Activities for Folksonomy Maintenance and Enrichment” by Katrin Weller and Isabella Peters, both from the Dept. of Information Science at Heinrich Heine University in Düsseldorf. The entire presentation was designed to match the CI of Tagcare, a tag gardening tool that is hopefully going to go online soon.

The term “Tag Gardening” was borrowed from James Governor who wrote in a 2006 blogpost:

“Like plants or animals, tags evolve in an emergent fashion, open to hybridisation. Stewardship can help grow and put roots down.

Helping the darwinian process is tag gardening.

Tag gardening is about taking tags in the wild and tending to them, or identifying a wild tag that will do well in your south facing IT

garden. I am talking about domestication here.

Just like there are professional bloggers i am pretty sure some parties will emerge that get paid for their abilities.”

I seriously hope that the latter is going to come true, even though I have the feeling that most providers will continue to consider user input and effort pro bono work!

Katrin Weller’s intro (Isabella Peters had excused herself) focused on the well-known problems with tags and folksonomies, e.g. :

  • spelling variants, synonyms, abbreviations, different natural languages
  • adhoc or personal functions of tags other than content description (e.g. “toread”, “@Henry”, “nicepic”)
  • flatness of tag clouds which allows for browsing by popularity, but not by semantic interrelations

She further distinguished three levels where tag or tag cloud improvement becomes relevant:

  • single document vs document collection level
  • Single user vs collaborative level
  • intra- and cross plattform level (e.g. different tagging conventions, tag separation with comma or blank space, etc)

To push the gardening metaphor even further, Kathrin presented us their ideas of weeding, seeding, fertilizing etc.:

Weeding
The weeds in this case are “bad” tags like spam or misspelled tags (weed: any plant that crowds out cultivated plants)
Aim: enhancing recall and a consistent indexing vocabulary
Achieved by: type-ahead functionality, editing funcionalities, natural language processing, user guidelines for indexing and retrieval, nomination of authorized users as gardeners

Seeding
Seeding in folksonomies means to expand frequently used tags by more specific tags (called “baby tags” or “seedlings” by Katrin Weller; seedling: young plant or tree grown from a seed)

Landscaping
The idea of landscaping here means to create “flower beds” through identifying species of tags, e.g. by similarity.
Aim: enhancing precision and expressiveness

Fertilizing
Fertilizing in this context means to combine folksonomies with other knowledge organization systems (KOS): thesauri, controlled vocabularies, ontologies, etc. (fertilizer: any substance such as manure or a mixture of nitrates used to make soil more fertile). Fertilizing might work both ways, Katrin suggested: a folksonomy might be fertilized with the semantic structure of a KOS, or a KOS enhanced by terms from a folksonomy.

And finally TagCare: The ambitious plan is to have a system that allows to import tag clouds from Flickr, deli.icio.us and Bibsonomy, cleanse out dissimilarities between tags, add hierarchical structure to the tag clouds, allow the user to view tag statistics and probably also to have community features, such calibrating one’s tags with those of the chief gardener or to activate collaborative spam elimination. It is going to be a free service, and if you want to be notified when it goes live, you might want to send an email to Katrin.

This full-service proposal for tag gardening does of course sound brilliant – yet is it going to be feasible, on a technical level? In the post-presentation discussion, somebody mentioned Faviki, which relies on DBpedia concepts to solidify the tag cloud. It didn’t exactly seem as though the TagCare team had already thought along these (semantic web) lines, even though this perfectly corresponded to their ‘Fertilizing’ idea. But if TagCare solely relies on good human gardeners, how long will it take until they have gained a big enough community to stimulate someone’s altruism? The idea of tag gardening of course is beautiful, and I am curious to learn more about the technology it is going to use.

Other folksonomy and tag related presentations that I was unable to attend or am unable to describe now, after the 10th hour of my 2nd day at TRIPLE-I, with a band performing folkore music involving yodeling and probably Schuhplattler right outside of this room:

  • Quality Metrics for Tags of Broad Folksonomies (Celine Van Damme, Martin Hepp, Tanguy, Coenen, University of Brussels, Universität der Bundeswehr München
  • Providing Multi Source Tag Recommendations in a Social Resource Sharing Platform (Martin Memmel, Michael Kockler, Rafael Schirru, German Research Centre for Artificial Intelligence DFKI)
  • Semantic Tagging and Inference in Online Communities, Yildirim Ahmet, Üsküdarli Suzan, BoÄŸaziçi University
  • Using Visual Features to Improve Tag Suggestions in Image Sharing Sites (Mathias Lux, Oge Marques, Arthur Pitman, Klagenfurt University)
  • Harnessing Wikipedia for Smart Tags Clustering (Maria Grineva, Maxim Grinev, Denis Turdakov, Pavel Velikhov, Russian Academy of Sciences)

Please leave a comment if you think that any of the above needs correction.

EDIT: I got the chance to record another 12 seconds definition (and am thinking of setting up a video glossary for the Semantic Web now): Rolf Sint from Salzburg Research explains what folksonomies are and why folksonomies and ontologies go together well in 12 seconds! Rolf is also involved in the KiWi project, which aims to develop a wiki-based knowledge management system boosted by semantic technologies.


Rolf Sint explains folksonomies and their relation to ontologies on 12seconds.tv

Reblog this post [with Zemanta]
Jana Herwig

Java’s Inner Sanctum: A Visit to Sun Microsystems’ Usability Lab in Prague

The walls in room 3328, the observation room at Sun Microsystem’s usability lab in Prague, are painted a subdued blue. It swallows all the light, ensuring the testing scenario is not interrupted by curious guests like us, the Kiwi-project team members who were granted the privilege of a tour of the inner sanctum of Sun’s developer den. Through the one-way mirror, we can see a rosy-cheeked developer, talking to himself in Czech, interrupted by little sips from a coke bottle. He does not see us. The fact that very few of us understand Czech gives the situation an even more experimental appeal.

Sun Usabilty Lab, Prague
The new usability lab at Sun Microsystems, Prague

Jakub Franc, the cognitive psychologists in charge of the design of the study, explains to us that Sun rely on the Think Aloud method and observation in most of their test cases, rather than analyzing data from biofeedback sensors or eye-tracking devices. “Eye-tracking is good for testing the usability of web sites,” he says, “but for our purposes, the think aloud method, where the test person describes what he does and thinks, has greater benefits to offer.” The authenticity of the tasks to be performed in the study is a key: The developer behind the sound-proof glass wall is currently busy importing his own PHP application into NetBeans, Sun’s open source development environment, while the interaction designers and developers who created the tested module observe. A typical testing scenario lasts about 90 minutes, with the final 20 minutes consisting of an interview. “I always tell the testers that it’s not their fault if they fail to perform a task,” says Jakub. “If they fail, it’s the product’s fault. After all, that’s why we’re testing it.”

Before a software product is tested in its design or redesign phase, the ideal candidates are identified based on the results of questionnaires that are sent out to people in the tester database. The database includes both users of open source software as well as of competitive products, with the ideal test sample consisting of people who represent the whole spectrum of the target group, ranging from expert to newbie – and they must not necessarily be open source enthusiasts: “We offer a relatively high reward of 1000 CZK* as we want testers from all levels and backgrounds and not just the volunteering enthusiasts.”

Until Sun Microsystems moved into their new building in 2006, they collaborated with the Department of Computer Science at Czech Technical University (CTU), where they set up the very first usability lab in the Czech Republic in 2004. The deal was that Sun would supply the equipment and know-how, and CTU would supply the space and construction. Both institutions shared the facility until, after three years, all usage rights and equipment were transferred to CTU. One of the features of the new lab is the one-way mirror – the previous one relied on video observation: “From our experience, despite the fact that some participants feel less comfortable in this set-up, it makes a difference to observers”, writes Jiri Mzourek on his, i.e. one of the many Sun blogs, “they feel more connected to the participants”.

Jakub Franc
Jakub Franc, cognitive psychologist and usability researcher

Even though there is now an in-house usability lab at Sun, the collaboration between Sun and CTU continues, in particular in research and design projects. Students participate in projects led by Sun that focus on Sun products, learning about research methodology as well as gaining experience in project management in a real business environment. Jakub Franc also gives seminars in cognitive psychology and research methodology to CTU students, and is himself pursuing a PhD in environmental psychology – a relatively new discipline which, according to Jakub, deals with questions such as: “How should buildings be designed so that people are not getting lost in them? What recreational areas help people to recover from daily stress? What kinds of front gardens discourage burglars from invading the place?” In other words: Jakub studies the cognitive parameters of the usability of real objects.

Once the KiWi/Sun usecase enters the evaluation stage, the KiWi team will again be given access to the lab – but this time not as visitors, but as observers, witnessing how usable the KiWi-Wiki system really is to the inclined user. We are looking forward to the experience – and thank the designers of the lab for implementing a sound-proof wall, just in case the KiWis get emotional!

*) worth about 2 monthly passes for the metro in Prague, or 40 beers in a good pub

Zemanta Pixie
Jana Herwig

Integrating Information Extraction into the KiWi-System: a proposal from Brno

Semantic technology isn’t about technology: It’s noble concern is to make the life and work of people easier. Yesterday, Marek Schmidt and Petr Knoth, both working on PhD project within Natural Language Processing (NLP) at Brno University, introduced their vision of how Information Extraction could be integrated into the KiWi-System.

First off: What is Information Extraction? In natural language processing, “information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents” (Wikipedia). Marek and Petr’s vision for using IE in KiWi is to support the user in the creation of semantic annotations.

Annotations and Ontology

The image above illustrates their vision: If a user, for instance, enters the text “Hello, I am the best expert in Java around the Sun” into the content editor, structured information is extracted, analyzed on the fly and returned as suggestions. Through the application of reasoning on existing annotations and on further information that is available on the system – e.g. relevant domain ontologies, but also information about the user himself – the system will be able to infer new statements: E.g. the system will be able to infer that Bill Rodgers has Java programming skills, even though this information has never been explicitly stated in the knowledge base.

Jana Herwig

Combining Closed and Open Data Classification Mechanisms in an Extended Thesaurus

Rolf SintIn the next session, Rolf Sint gave us insights into his approach to the combination of closed and open data classification mechanisms, which is informed by his findings in his master’s thesis. The probably most widely used retrieval method for digital content is full-text search; Google and Yahoo’s indexing methods, for instance, rely on full-text search. To be able to use this method, words must be contained within the content, leading to obvious problems with synonyms, ambiguities or the different lexical inventory of different languages. Advantages are that full-text search is easy to use, and that no maintenance is required as this responsibility rests with the content providers.

On the other end of the spectrum, within open data classification mechanisms, we have social tagging. Tagging (in general) means that a user asigns labels to content items. The advantage here is that content is immediately classified; as such, tagging is an easy way to provide metadata for content, in particular as the user does not to have think about (arbitrary, system-dictated) structures. However, this leads to problems if singulars and plurals are used simultaneously, if synonyms are used, spelling mistakes occur etc etc. With tags, the exact same spelling has to be used if items are to be assigned to the same group. But if done collectively (and that is what social tagging is about), the wisdom of crowds can improve the signal to noise ratio significantly – see the miracle of the tag cloud.

What Rolf proposed in his thesis was to combine the two approaches. In his design, he used an extended thesaurus as an instrument to achieve vocabulary control – we’re looking at an extended thesaurus here, because it’s not simply built around a taxonomy, but expanded by tags that were assigned by users and integrated using a vocabulary management tool.
Extended Theasurus

This extended thesaurus can be applied in multiple ways. Continue reading