Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Extending Google: First Look at SemantiFind

September 23, 2008 By: Jana Herwig Category: Collective Intelligence, Search Engines, Tools & Software 6 Comments →

Just stumbled upon SemantiFind via T3N, and then upon the review on ReadWriteWeb from last week Thursday.

What’s it about? Semantifind is an IE and FF browser plug-in that extends Google’s search functionalities, most notably through a typeahead functionality that allows you to refine your search results before hitting ‘enter’. ReadWriteWeb wasn’t too impressed though:

Unfortunately, SemantiFind is one of those tools that’s good in theory, but not so good in practice. When performing some test searches, results were not as precise as they should have been. For example, in the above-mentioned search for “Georgia,” a search for the U.S. state returned Google results for the country as well.

Ambiguities due to homonyms such as Georgia vs Georgia, or Java vs Java are among the faves of people who are trying to pitch a semantic tool to you – but I really wonder whether the effects of homonyms aren’t highly overrated? How often do people really search for these, and in particular search for these without context, i.e. further search terms such as in ‘Georgia Tech’, ‘Georgia war’, ‘Java Coffee’ or ‘Java bugs’?

I must say I was quite impressed by the choice of search terms offered, and if you (like me) are easy prey for the serendipity effect, then SemantiFind can please and distract you endlessly. Here is a preview of what appears if you enter ’serendipity’ – please note the preview of possible descriptions and definitions which you get on the Google homepage with the plugin (click > big):

Once you pick a term it turns into a kind of button (just slightly annoying: you cannot edit a term after it’s turned into a button, but would have to delete the whole thing and type again if you want to change your search query):

And then, what happens? On the search results page, you see results filtered by SemantiFind’s user-generated, user-approved labels on top of the other search results – which irritated me at first as it comes across as a search engine within the search engine. Admittedly: I’d rather sift through 13 results than through 10,900,00 search results (even though I never make it to the end of Google’s search list anyway; does anybody?) – but does the article about trees doing their best work with thermostats at 70° really deserve the second rank in SemantiFind’s list of recommended search results?

So while I agree with RWW that this “just goes to show why search engines that rely on people to filter the results might not work. Human error shouldn’t be a factor in web searches”, I am still quite fond of the suggestions and definition previews. I would probably use SemantiFind regularly if they allowed me to configure the plugin in such a way that I’d get the suggestions on the input page, but not the recommended results on the results page.

What’s the source of these results anway? SemantiFind’s recommended results seem to rely entirely on input generated by users – to add input, you need to install their toolbar and start adding labels to websites; if a website has been labeled before, you can confirm or reject existing labels. What’s nice: a label recommender (only presumably the same one that’s used for search queries) reduces ambiguity. What’s curious: You can also browse the pages you have already labeled in what they call your “catalogue” – which makes the service even more reminiscent of a bookmarking service, and which makes me wonder whether one shouldn’t possibly link this with a del.icio.us/Mr.Wong/Bibsonomy/Faviki account (Faviki would probably be the best, considering their tag recommendations are based on DBpedia, and considering that Faviki just added 1 million new tags and now holds more than 5 million tags across all languages)

Questions that remain: I’d really like to know how they maintain their list of suggested labels – ambiguity, typos, plurals forms, i.e. the usual folksonomy issues must be a big challenge. Also, I’d like to know where they get their definitions in the preview from – from Google? Or are these user-generated as well? There must, after all, be some use for the “request a new definition” form?

Too bad they don’t have a blog to which one could send a track back, and there is nothing much on their company page either.

Reblog this post [with Zemanta]
Sphere: Related Content

My ants won’t join your storm, I’ve already set them free

September 15, 2008 By: Jana Herwig Category: Social Software, Tools & Software No Comments →

AntstormSo, Antstorm. After Appscout’s report that they had “never seen a service that brings social bookmarking and semantic search together the way AntStorm does,” and as I have this little project of listing all available semantic search engines, I thought I might as well check it out (yes, the list is in perpetual need for an udpate – those cementic developers are nearly too fast to keep up with).

Actually, Antstorm has very little to do with the semantic web, and a lot with the social web – but expect no folksonomies. First thing to do for you at Antstorm: They’re asking you to import your bookmarks; ideally, you would already have them in neatly arranged folders, labeled appropriately, and then Antstorm would convert these folders into what they call “trails”, which other users can follow. You can keep trails private, of course, but it doesn’t seem as if you can also keep selected bookmarks within these trails private. Hmpf.

And hey, wait: Is there anybody in the age of del.icio.us who still keeps her bookmarks on a computer? I don’t, except the ones that I need half a dozen or more times a day, e.g. the login to the corporate CMS or webmail, and these are not the links that anybody outside of my work context could benefit from. Importing bookmarks from del.icio.us is, however, not part of the AntStorm package – what you can do is to automatically add new links to del.icio.us as well by checking a box “Add to delicious” – but as you cannot add tags to a link on AntStorm, I wonder of what use an untagged bookmark could be on del.icio.us?

Things might get a little more interesting if you decide to add links to a group as well: A group on AntStorm is a community of editors who collaboratively manage trails related to the interests of their group. Any group member can suggest new links – the group decides by voting for or against it whether these will be added or not. Collaborative filtering, alright – I wonder, however, how many users you’d have to have in a group a.k.a. microniche to receive results that matter.

I failed to find out what the appeal of AntStorm could be – as all my bookmarks are either on del.icio.us, Bibsonomy (imported from del.icio.us) or CiteUlike (for all things academic), I don’t have any browser bookmarks left to get me started on AntStorm. AntStorm’s sales copy – “Have you ever needed a bookmark and realized it was on some other computer? Or have you ever wanted to save a bookmark, but you weren’t on your primary computer?” – would have convinced me in 2004, but I’ve already unleashed all my bookmarks. What they call trails looks all too suspiciously like yet another, difficult to manage folder structure to me. Of course I am biased, but I just don’t see how a collaborative link suggestion tool could work without tagging – or maybe I just didn’t find it?

Anybody with a few stationary bookmarks left – please set them free on AntStorm, maybe you’ll find out what they’re really good for. I clicked around a bit and skim-viewed their How-to-Video (9 min 18 sec!).

They promise that a share of the earnings generated by users will go to charity, and that’s always a good thing. Also, their logo is cute (even of not web 2.0 shiny) and I quite like the idea of a storm of ants.

Reblog this post [with Zemanta]
Sphere: Related Content

The Wild vs The Orderly: Folksonomies and Semantics (TRIPLE-I 2008)

September 04, 2008 By: Jana Herwig Category: Collective Intelligence, Search Engines, Social Software, Vocabularies & Languages 2 Comments →

This second day of TRIPLE-I 2008 was my personal folksonomy day, even though the theme was already set yesterday, with Andreas Hotho’s invited talk about “Extracting Semantics from Folksonomies” which was the opening lecture of the workshop “Knowledge acquisition from the Social Web.”

Andreas Hotho is directing the Bibsonomy project at Kassel University’s Knowledge and Data Engineering resarch group; Bibsonomy is a social bookmark and publication sharing system catering especially for researchers who, next to bookmarkingm also wish to manage publications. Next to other interesting things, Bibsonomy supports the import of bookmarks from del.icio.us, Firefox bookmarks and local BibTex files. Being a project led by a university’s computer science department, Bibsonomy is at the same time the result, the object and a stimulus for research in the area of tagging and folksonomies. Andreas describes this double appeal of folksonomies to both ordinary people and researchers in a 12 seconds vlog post:


Andreas Hotho’s statement about folksonomies and research (see www.bibsonomy.org) on 12seconds.tv

One of the outcomes of the research into folksonomies is FolkRank, a search algorithm that exploits the structure of folksonomies; the name reveals that it was inspired by PageRank, but as the graph of folksonomy structures does not correspond to the web graph, some adaptations had to be made. The specifics of these adaptations can be found in an online article by Andreas and his colleagues: “FolkRank: A Ranking Algorithm for Folksonomies” (PDF, 268 KB).

Andreas Hotho’s talk more specifically addressed the search for methods to identify tags which describe the same concept (or a more specific / a more general concept respectively) within a folksonomy. He suggested two approaches:

  1. Applying measures directly to folksonomy statistics, allowing to describe tags as a vector; e.g. co-occurrence frequency and FolkRank could serve as a similarity measure (with these two having a tendency towards high-frequency tags) or a cosine method (which is more likely to produce “siblings”)
  2. Looking up tags in an external thesaurus/vocabulary (for instance achieving semantic grounding by mapping a tag and its most similar tags with Wordnet Synsets)

Future areas of interest within folksonomy research Andreas proposed were trend detection, tag recommendation, detecting spam (a major challenge!), logsonomies (i.e. the structure of search engine query log files) and learning synsets, hierarchies, and structures of folksonomies. Andreas Hotho can be contacted via his homepage, if you have any further questions regarding Bibsonomy, FolkRank or this present piece of research.

Another presentation dedicated to folksonomies – and the presentation that won my personal presentation design award – was “Seeding, Weeding, Fertilizing – Different Tag Gardening Activities for Folksonomy Maintenance and Enrichment” by Katrin Weller and Isabella Peters, both from the Dept. of Information Science at Heinrich Heine University in Düsseldorf. The entire presentation was designed to match the CI of Tagcare, a tag gardening tool that is hopefully going to go online soon.

The term “Tag Gardening” was borrowed from James Governor who wrote in a 2006 blogpost:

“Like plants or animals, tags evolve in an emergent fashion, open to hybridisation. Stewardship can help grow and put roots down.

Helping the darwinian process is tag gardening.

Tag gardening is about taking tags in the wild and tending to them, or identifying a wild tag that will do well in your south facing IT

garden. I am talking about domestication here.

Just like there are professional bloggers i am pretty sure some parties will emerge that get paid for their abilities.”

I seriously hope that the latter is going to come true, even though I have the feeling that most providers will continue to consider user input and effort pro bono work!

Katrin Weller’s intro (Isabella Peters had excused herself) focused on the well-known problems with tags and folksonomies, e.g. :

  • spelling variants, synonyms, abbreviations, different natural languages
  • adhoc or personal functions of tags other than content description (e.g. “toread”, “@Henry”, “nicepic”)
  • flatness of tag clouds which allows for browsing by popularity, but not by semantic interrelations

She further distinguished three levels where tag or tag cloud improvement becomes relevant:

  • single document vs document collection level
  • Single user vs collaborative level
  • intra- and cross plattform level (e.g. different tagging conventions, tag separation with comma or blank space, etc)

To push the gardening metaphor even further, Kathrin presented us their ideas of weeding, seeding, fertilizing etc.:

Weeding
The weeds in this case are “bad” tags like spam or misspelled tags (weed: any plant that crowds out cultivated plants)
Aim: enhancing recall and a consistent indexing vocabulary
Achieved by: type-ahead functionality, editing funcionalities, natural language processing, user guidelines for indexing and retrieval, nomination of authorized users as gardeners

Seeding
Seeding in folksonomies means to expand frequently used tags by more specific tags (called “baby tags” or “seedlings” by Katrin Weller; seedling: young plant or tree grown from a seed)

Landscaping
The idea of landscaping here means to create “flower beds” through identifying species of tags, e.g. by similarity.
Aim: enhancing precision and expressiveness

Fertilizing
Fertilizing in this context means to combine folksonomies with other knowledge organization systems (KOS): thesauri, controlled vocabularies, ontologies, etc. (fertilizer: any substance such as manure or a mixture of nitrates used to make soil more fertile). Fertilizing might work both ways, Katrin suggested: a folksonomy might be fertilized with the semantic structure of a KOS, or a KOS enhanced by terms from a folksonomy.

And finally TagCare: The ambitious plan is to have a system that allows to import tag clouds from Flickr, deli.icio.us and Bibsonomy, cleanse out dissimilarities between tags, add hierarchical structure to the tag clouds, allow the user to view tag statistics and probably also to have community features, such calibrating one’s tags with those of the chief gardener or to activate collaborative spam elimination. It is going to be a free service, and if you want to be notified when it goes live, you might want to send an email to Katrin.

This full-service proposal for tag gardening does of course sound brilliant – yet is it going to be feasible, on a technical level? In the post-presentation discussion, somebody mentioned Faviki, which relies on DBpedia concepts to solidify the tag cloud. It didn’t exactly seem as though the TagCare team had already thought along these (semantic web) lines, even though this perfectly corresponded to their ‘Fertilizing’ idea. But if TagCare solely relies on good human gardeners, how long will it take until they have gained a big enough community to stimulate someone’s altruism? The idea of tag gardening of course is beautiful, and I am curious to learn more about the technology it is going to use.

Other folksonomy and tag related presentations that I was unable to attend or am unable to describe now, after the 10th hour of my 2nd day at TRIPLE-I, with a band performing folkore music involving yodeling and probably Schuhplattler right outside of this room:

  • Quality Metrics for Tags of Broad Folksonomies (Celine Van Damme, Martin Hepp, Tanguy, Coenen, University of Brussels, Universität der Bundeswehr München
  • Providing Multi Source Tag Recommendations in a Social Resource Sharing Platform (Martin Memmel, Michael Kockler, Rafael Schirru, German Research Centre for Artificial Intelligence DFKI)
  • Semantic Tagging and Inference in Online Communities, Yildirim Ahmet, Üsküdarli Suzan, BoÄŸaziçi University
  • Using Visual Features to Improve Tag Suggestions in Image Sharing Sites (Mathias Lux, Oge Marques, Arthur Pitman, Klagenfurt University)
  • Harnessing Wikipedia for Smart Tags Clustering (Maria Grineva, Maxim Grinev, Denis Turdakov, Pavel Velikhov, Russian Academy of Sciences)

Please leave a comment if you think that any of the above needs correction.

EDIT: I got the chance to record another 12 seconds definition (and am thinking of setting up a video glossary for the Semantic Web now): Rolf Sint from Salzburg Research explains what folksonomies are and why folksonomies and ontologies go together well in 12 seconds! Rolf is also involved in the KiWi project, which aims to develop a wiki-based knowledge management system boosted by semantic technologies.


Rolf Sint explains folksonomies and their relation to ontologies on 12seconds.tv

Reblog this post [with Zemanta]
Sphere: Related Content

Combining Closed and Open Data Classification Mechanisms in an Extended Thesaurus

June 26, 2008 By: Jana Herwig Category: Ontology Engineering, Social Software No Comments →

Rolf SintIn the next session, Rolf Sint gave us insights into his approach to the combination of closed and open data classification mechanisms, which is informed by his findings in his master’s thesis. The probably most widely used retrieval method for digital content is full-text search; Google and Yahoo’s indexing methods, for instance, rely on full-text search. To be able to use this method, words must be contained within the content, leading to obvious problems with synonyms, ambiguities or the different lexical inventory of different languages. Advantages are that full-text search is easy to use, and that no maintenance is required as this responsibility rests with the content providers.

On the other end of the spectrum, within open data classification mechanisms, we have social tagging. Tagging (in general) means that a user asigns labels to content items. The advantage here is that content is immediately classified; as such, tagging is an easy way to provide metadata for content, in particular as the user does not to have think about (arbitrary, system-dictated) structures. However, this leads to problems if singulars and plurals are used simultaneously, if synonyms are used, spelling mistakes occur etc etc. With tags, the exact same spelling has to be used if items are to be assigned to the same group. But if done collectively (and that is what social tagging is about), the wisdom of crowds can improve the signal to noise ratio significantly – see the miracle of the tag cloud.

What Rolf proposed in his thesis was to combine the two approaches. In his design, he used an extended thesaurus as an instrument to achieve vocabulary control – we’re looking at an extended thesaurus here, because it’s not simply built around a taxonomy, but expanded by tags that were assigned by users and integrated using a vocabulary management tool.
Extended Theasurus

This extended thesaurus can be applied in multiple ways. (more…)

Sphere: Related Content

Usage Data Model Day in the KiWi Project

June 26, 2008 By: Jana Herwig Category: Ontology Engineering 1 Comment →

Physical Tagging in a TreeYesterday we dealt with reports, user interaction and interface questions, today is usage data model day (or morning) in the KiWi – Knowledge in a Wiki - Project. Usage data model means that it is concerned with an abstract conceptualization of the data as perceived by the user (and not by the developer/implementer) – at the same time, it is not immmediately concerned with the visualization of data on screen. François Bry gave us an overview of the proposed core concepts and objects which are currently: content item, tag (and tagging), link, rule, user, and access right.

There is no need for me to repeat his full presentation, as François had already in advance made his presentation available on the KiWi-project wiki. Nonetheless, I’d like to highlight a few aspects:

A content item is to be understood as a slight generalisation of a wiki page: Every wiki page is a content item, but not every content item is a wikipage, and content items that are no wiki pages are part of a wiki page. This could include, for instance, media content such as pictures, diagrams or tables. This modularization (content items within pages) meets the demands of the proposal that Kiwi-pages must be composable.

Consequently, not only wiki pages but content items too must be taggable (which takes us to: tagging). Furthermore, it was proposed to make a distinction between atomic tags (short; consisting of a tag name and an associated content item instead of a description) and structured tags (that are made up of atomic tags), as well as between explicit tags (that are applied by users) and implicit tags (that are generated on the basis of rules that have been defined by users).

To illustrate this distinction, I’ll paste in a few illustrating explanations from François’ wiki report:

The tags assigned to the content item of an atomic tag T can be seen as tags assigned to the atomic tag T itself. Tagging of tags in this way can serve, for example, to distinguish between the atomic “hotel” in English and the same atomic tag “hotel” in French or to group or classify tags. [...] A structured tag is build up from atomic tags. [...] Examples of structured tags are as follows:

hotel(3stars downtown)
hotel(location(downtwon))
hotel(comfortable)

A heated debated ensued (which I quite like, because that is the point where our own, yet unchallenged assumptions are exposed), in particular with regard to the implementation of structured tags: Wouldn’t that mean to raise the cognitive barrier too high if users were required to enter complicated tags?

Much was clarified with the agreement that users may use structured tags, but that this wouldn’t be a requirement. Using complex tags (e.g. a structured tag that includes dates or deadlines) might make sense to a particular set of users (e.g. project managers in the Logica use case) – and whether a software feature is going to be used (successfully) or not is primarily depending upon the question whether the user sees a benefit in it or not. Also: The concept of structured tags within the data model does not yet say anything about the way they will be represented on screen – in most cases, users won’t see a hotel(location(downtwon)) spelled out.

On to the coffee break!

[Image: Physical tagging on a tree, by Jean Etienne Poirrier]

Zemanta Pixie
Sphere: Related Content

Semantic Tagging with Faviki

June 11, 2008 By: Jana Herwig Category: Tools & Software 7 Comments →

In May, a new bookmarking service, Faviki, started which, unlike other bookmarking services, comes to the public semantically enhanced. ReadWriteWeb already had a first look at it and described it as follows:

Faviki is a new social bookmarking tool that offers something that services like Ma.gnolia, del.icio.us, and Diigo do not – semantic tagging capabilities. What this means is that instead of having users haphazardly entering in tags to describe the links they save, Faviki will suggest tags to be used instead. However, unlike other services, Faviki’s suggestions don’t just come from a community of users and their tagging history, but from structured information extracted straight out of the Wikipedia database. Faviki’s backend uses DBpedia, a community-maintained database created by extracting structured info from Wikipedia and turning that into a database which you can query.

Faviki Tag CloudWhat Faviki does, from a user’s perspective, is to suggest tags based on Wikipedia/DBpedia terms – one of the side effects of this procedure being that e.g. “Safety (disambiguation)” can also be chosen as a possible tag – I am not so sure yet whether this is an option that makes sense (although one can probably argue that it neither does any harm, because people should be smart enough not to use such tags). And as the above screen shot of Faviki’s tag cloud reveals, it currently seems to be mainly used by people who are interested in the semantic web and search engines (with semantic search being the most promising area of application of semantic technologies). It’s probably going to take a while (if ever) before Faviki is going to reach such a diverse user-base as can be guessed from del.icio.us’ tag cloud – but then again: Maybe Faviki isn’t going to need that, as it doesn’t rely on collective tagging, but already benefits from Wikipedia’s diversity of entries!

delicious tag cloud

As was also regretted by ReadWriteWeb: It’s a pity that there is currently no opportunity to import tags from del.icio.us or other services to Faviki. Who is going to win the bookmarking race? Del.icio.us has the advantage of a broad user-base, and many users already have their networks of fellow bookmarkers which they probably wouldn’t want to give up (I personally wouldn’t). Bibsonomy has the advantage of an extra feature that allows to bookmark publications and later export them as a uniformly formatted bibliography. If I could make a wish, I’d rather have a service that brings together the best of Faviki, Bibsonomy AND del.icio.us!

Related Websites:
Faviki Blog on Wordpress.com
del.icio.us tag cloud

Zemanta Pixie
Sphere: Related Content