Maybe you have noticed it already; today in the morning something new appeared at Google’s search engine interface: A bunch of corresponding search-suggestions based on your search query. Google spoke about this enhancement:
Starting today, we’re deploying a new technology that can better understand associations and concepts related to your search, and one of its first applications lets us offer you even more useful related searches (the terms found at the bottom, and sometimes at the top, of the search results page).
I tried it. So, if you type in “time travel” you also get search proposals like “theory of relativity time travel” or “wormhole time travel”. Google annouced, that the service is available in various languages. The direct test with German is a little disillusioning: Searching for “zeit reise” (which is the same concept as above, in german) leads to alternative searches like “reisen 50er jahren” (travel 50ies) and “reisen im mittelalter” (travel in the medieval).
Even if this semantic-like extension of the basis search function still needs some tuning, the point is getting clearer: Also Google is doing developments to get more meaningful results into their search algorithms. And parts of the semantic methodology are finding their way into mainstream services like search engines – as we have seen with Wolfram Alpha some days ago. So keep your eyes open – maybe next morning you’ll find another piece of the semantic puzzle embedded into one of your favorite web-apps.
Presently more and more tools come up in the Web 2.0 – Domain, which bring semantic technologies into blogger´s everyday life. Zemanta was for sure a break-through in annotation of blog entries. I’m running this service on my private and my corporate blog. It is easy to integrate in every common blog-software and it is really a save of time in my daily work. Unfortunaly it is avaible only for english blogs.
Another service which came up recently is Quintura, which provides search capabilities for your own blog with a visual map of tags or hints based on an index created of the own blog entries. It is easy to customize for the own blog’s style with the use of a simple interface. Quintura offers code-snippets to copy to your blog-post or sidebar. Even if it is no semantic search engine in the narrow sense, Quintura provide a fine semantic-like interface for a meaning-sensitive search. See how Quintura is implemented into The Semantic Puzzle at our sidebar.
Hello Monday! I am a bit tired today as I did not really have a weekend but spent it in a rather intellectually stimulating fashion, attending BarCamp Vienna held on the premises of HP in the 12th district. My head is still buzzing from all the input!
Originally, the plan had been to have a marketing-themed BarCamp, but thanks to the bottom-up approach towards scheduling typical for BarCamps, that didn’t quite come to pass (greatly appreciated also that this wasn’t enforced by the organizers, thank you!). There were two sessions in the ones that I attended that have relevance for the Social Semantic Web:
One was held by Alexander Kirk about the latest improvements in Factolex, a collaborative, micro-content encyclopedia based on facts; I hear that Factolex will receive further semantic enhancements in the near future, so I’ll write a longer blog post about it then. One feature Alex showed and which impressed me considerably was the distributed way in which one can add further facts to Factolex now: On any webpage, highlight a word or phrase (e.g. “President of the European commission”) and then click on the bookmarklet. Factolex is automatically going to check whether it knows the term already and either creates a new one or adds a fact to an existing term. The source will be added automatically – pretty nifty!
Another project that does not yet have a name and that is currently in stealth mode was presented by Christian Zeidler: Social Enhanced Search on del.icio.us. The project addresses a well known del.icio.us problem: You can search your bookmarks, i.e. search the tags and possibly definitions you might have added – yet all too often this only leads to the problem that your search query does not match the tags you once assigned. Being able to search the full text of the saved page would improve the scenario considerably – and this is exactly the approach Christian’s project takes.
To begin with, he built his own search index using Lucene, an open source, full-featured text search engine library written in Java. Of course it doesn’t crawl the whole web – just the pages you have added to your del.icio.us account. Instead of building one index for every user, Christian decided to have one large search index which also takes away the troubles of double indexation – the current index, based on 800 pages, doesn’t exceed a size of 3MB, which seems rather reasonable.
Apart from your own bookmarks, the plan is to also allow searching the bookmarks of your friends on del.icio.us, giving your search perspective. How many friends do you have on Facebook, how many on del.icio.us? It’s about half a dozen on del.icio.us for me, so I guess that “friendship” here really stands for particular topics and interests – this social perspective thing might actually work for enhanced searches, I think.
What other means are there to weight and rank search results? Somebody raised the issue of customization, i.e. let the user define which weight he’d like to give the results of which friend. I completely agree with Christian when he said he doesn’t believe people want customization, as conscious, user-initiated customization efforts are often (considered) too high. Instead, the system must learn from the data, e.g. prefer the results of friends whose results you use the most often.
Another useful feature that is already in place is that you can add any RSS feed to your search index as well – this is indeed very neat. And finally, in addition and as a point of reference, the prototype displayed the Lucene-based results in one column, and Yahoo! Search BOSS results in another column. Not surprisingly, the Search BOSS results were rather general, and the Lucene-based results rather specific – and that specificity is what you’d expect from searching your own bookmarks.
What’s it about? Semantifind is an IE and FF browser plug-in that extends Google’s search functionalities, most notably through a typeahead functionality that allows you to refine your search results before hitting ‘enter’. ReadWriteWeb wasn’t too impressed though:
Unfortunately, SemantiFind is one of those tools that’s good in theory, but not so good in practice. When performing some test searches, results were not as precise as they should have been. For example, in the above-mentioned search for “Georgia,” a search for the U.S. state returned Google results for the country as well.
Ambiguities due to homonyms such as GeorgiavsGeorgia, or JavavsJava are among the faves of people who are trying to pitch a semantic tool to you – but I really wonder whether the effects of homonyms aren’t highly overrated? How often do people really search for these, and in particular search for these without context, i.e. further search terms such as in ‘Georgia Tech’, ‘Georgia war’, ‘Java Coffee’ or ‘Java bugs’?
I must say I was quite impressed by the choice of search terms offered, and if you (like me) are easy prey for the serendipity effect, then SemantiFind can please and distract you endlessly. Here is a preview of what appears if you enter ’serendipity’ – please note the preview of possible descriptions and definitions which you get on the Google homepage with the plugin (click > big):
Once you pick a term it turns into a kind of button (just slightly annoying: you cannot edit a term after it’s turned into a button, but would have to delete the whole thing and type again if you want to change your search query):
And then, what happens? On the search results page, you see results filtered by SemantiFind’s user-generated, user-approved labels on top of the other search results – which irritated me at first as it comes across as a search engine within the search engine. Admittedly: I’d rather sift through 13 results than through 10,900,00 search results (even though I never make it to the end of Google’s search list anyway; does anybody?) – but does the article about trees doing their best work with thermostats at 70° really deserve the second rank in SemantiFind’s list of recommended search results?
So while I agree with RWW that this “just goes to show why search engines that rely on people to filter the results might not work. Human error shouldn’t be a factor in web searches”, I am still quite fond of the suggestions and definition previews. I would probably use SemantiFind regularly if they allowed me to configure the plugin in such a way that I’d get the suggestions on the input page, but not the recommended results on the results page.
What’s the source of these results anway? SemantiFind’s recommended results seem to rely entirely on input generated by users – to add input, you need to install their toolbar and start adding labels to websites; if a website has been labeled before, you can confirm or reject existing labels. What’s nice: a label recommender (only presumably the same one that’s used for search queries) reduces ambiguity. What’s curious: You can also browse the pages you have already labeled in what they call your “catalogue” – which makes the service even more reminiscent of a bookmarking service, and which makes me wonder whether one shouldn’t possibly link this with a del.icio.us/Mr.Wong/Bibsonomy/Faviki account (Faviki would probably be the best, considering their tag recommendations are based on DBpedia, and considering that Faviki just added 1 million new tags and now holds more than 5 million tags across all languages)
Questions that remain: I’d really like to know how they maintain their list of suggested labels – ambiguity, typos, plurals forms, i.e. the usual folksonomy issues must be a big challenge. Also, I’d like to know where they get their definitions in the preview from – from Google? Or are these user-generated as well? There must, after all, be some use for the “request a new definition” form?
Too bad they don’t have a blog to which one could send a track back, and there is nothing much on their company page either.
Applications based on semantic technologies offer new ways to discover, browse and explore information – this is an established fact in the SemWeb community. But how can we (as semantic web “insiders”) communicate these potential benefits to a typical end-user who has never heard about “faceted search” before – which doesn’t mean that he or she wouldn’t love intelligent user interfaces if they were in place?
One answer lies in using mockups, which are, on the one hand, an indispensable instrument for prototyping user interfaces, but also valuable when it comes to explaining the workings of an application to an end-user, an audience of interested researchers or a client.
And when it comes to explaining a search engine or search widget, mockups are even more important, as we all and in particular end-users are often unable to think of search interfaces other than in terms of Google.
We have become so googlified that hardly anyone can think of different ways of searching for information than Google has offered for many years now: Put a couple of words in a text box, click a button and scroll through a list of titles and summaries. Repeat until you’re done, or try a new search and repeat. Wow!
Although even Google has started recently to implement a little bit of semantics by offering an auto-complete functionality on google.com (on some local versions like Google Austria this feature is still not available), even the most basic concepts for an intelligent search interface are still not part of common sense thinking.
Admittedly, there are people who get irritated instantly by complex user interfaces like David Huynh´s Freebase Parallax. “This is only for experts!” is their response. But in a corporate setting, complex queries are part of our daily business – they are just not supported by common search engines (only exception being data mining solutions). But that doesn’t mean that we don’t need it.
Where is the way out of this dilemma?
Don’t tell, but SHOW the end-users how semantic technologies can enhance search & browse experiences
Do not use terms like SPARQL or RDF
Create a simple mockup that illustrates the points you want to make
You’re not a designer? Use tools like Balsamiq – Try it now!
Here is an example for a mockup of a semantically enhanced expert finder:
These kind of mockups are essential for any requirements engineering phase in any project where search is a bit more than a text box, a button and a bunch of documents.
This second day of TRIPLE-I 2008 was my personal folksonomy day, even though the theme was already set yesterday, with Andreas Hotho’s invited talk about “Extracting Semantics from Folksonomies” which was the opening lecture of the workshop “Knowledge acquisition from the Social Web.”
Andreas Hotho is directing the Bibsonomy project at Kassel University’s Knowledge and Data Engineering resarch group; Bibsonomy is a social bookmark and publication sharing system catering especially for researchers who, next to bookmarkingm also wish to manage publications. Next to other interesting things, Bibsonomy supports the import of bookmarks from del.icio.us, Firefox bookmarks and local BibTex files. Being a project led by a university’s computer science department, Bibsonomy is at the same time the result, the object and a stimulus for research in the area of tagging and folksonomies. Andreas describes this double appeal of folksonomies to both ordinary people and researchers in a 12 seconds vlog post:
One of the outcomes of the research into folksonomies is FolkRank, a search algorithm that exploits the structure of folksonomies; the name reveals that it was inspired by PageRank, but as the graph of folksonomy structures does not correspond to the web graph, some adaptations had to be made. The specifics of these adaptations can be found in an online article by Andreas and his colleagues: “FolkRank: A Ranking Algorithm for Folksonomies” (PDF, 268 KB).
Andreas Hotho’s talk more specifically addressed the search for methods to identify tags which describe the same concept (or a more specific / a more general concept respectively) within a folksonomy. He suggested two approaches:
Applying measures directly to folksonomy statistics, allowing to describe tags as a vector; e.g. co-occurrence frequency and FolkRank could serve as a similarity measure (with these two having a tendency towards high-frequency tags) or a cosine method (which is more likely to produce “siblings”)
Looking up tags in an external thesaurus/vocabulary (for instance achieving semantic grounding by mapping a tag and its most similar tags with Wordnet Synsets)
Future areas of interest within folksonomy research Andreas proposed were trend detection, tag recommendation, detecting spam (a major challenge!), logsonomies (i.e. the structure of search engine query log files) and learning synsets, hierarchies, and structures of folksonomies. Andreas Hotho can be contacted via his homepage, if you have any further questions regarding Bibsonomy, FolkRank or this present piece of research.
“Like plants or animals, tags evolve in an emergent fashion, open to hybridisation. Stewardship can help grow and put roots down.
Helping the darwinian process is tag gardening.
Tag gardening is about taking tags in the wild and tending to them, or identifying a wild tag that will do well in your south facing IT
garden. I am talking about domestication here.
Just like there are professional bloggers i am pretty sure some parties will emerge that get paid for their abilities.”
I seriously hope that the latter is going to come true, even though I have the feeling that most providers will continue to consider user input and effort pro bono work!
Katrin Weller’s intro (Isabella Peters had excused herself) focused on the well-known problems with tags and folksonomies, e.g. :
spelling variants, synonyms, abbreviations, different natural languages
adhoc or personal functions of tags other than content description (e.g. “toread”, “@Henry”, “nicepic”)
flatness of tag clouds which allows for browsing by popularity, but not by semantic interrelations
She further distinguished three levels where tag or tag cloud improvement becomes relevant:
single document vs document collection level
Single user vs collaborative level
intra- and cross plattform level (e.g. different tagging conventions, tag separation with comma or blank space, etc)
To push the gardening metaphor even further, Kathrin presented us their ideas of weeding, seeding, fertilizing etc.:
Weeding
The weeds in this case are “bad” tags like spam or misspelled tags (weed: any plant that crowds out cultivated plants)
Aim: enhancing recall and a consistent indexing vocabulary
Achieved by: type-ahead functionality, editing funcionalities, natural language processing, user guidelines for indexing and retrieval, nomination of authorized users as gardeners
Seeding
Seeding in folksonomies means to expand frequently used tags by more specific tags (called “baby tags” or “seedlings” by Katrin Weller; seedling: young plant or tree grown from a seed)
Landscaping
The idea of landscaping here means to create “flower beds” through identifying species of tags, e.g. by similarity.
Aim: enhancing precision and expressiveness
Fertilizing
Fertilizing in this context means to combine folksonomies with other knowledge organization systems (KOS): thesauri, controlled vocabularies, ontologies, etc. (fertilizer: any substance such as manure or a mixture of nitrates used to make soil more fertile). Fertilizing might work both ways, Katrin suggested: a folksonomy might be fertilized with the semantic structure of a KOS, or a KOS enhanced by terms from a folksonomy.
And finally TagCare: The ambitious plan is to have a system that allows to import tag clouds from Flickr, deli.icio.us and Bibsonomy, cleanse out dissimilarities between tags, add hierarchical structure to the tag clouds, allow the user to view tag statistics and probably also to have community features, such calibrating one’s tags with those of the chief gardener or to activate collaborative spam elimination. It is going to be a free service, and if you want to be notified when it goes live, you might want to send an email to Katrin.
This full-service proposal for tag gardening does of course sound brilliant – yet is it going to be feasible, on a technical level? In the post-presentation discussion, somebody mentioned Faviki, which relies on DBpedia concepts to solidify the tag cloud. It didn’t exactly seem as though the TagCare team had already thought along these (semantic web) lines, even though this perfectly corresponded to their ‘Fertilizing’ idea. But if TagCare solely relies on good human gardeners, how long will it take until they have gained a big enough community to stimulate someone’s altruism? The idea of tag gardening of course is beautiful, and I am curious to learn more about the technology it is going to use.
Other folksonomy and tag related presentations that I was unable to attend or am unable to describe now, after the 10th hour of my 2nd day at TRIPLE-I, with a band performing folkore music involving yodeling and probably Schuhplattler right outside of this room:
Providing Multi Source Tag Recommendations in a Social Resource Sharing Platform (Martin Memmel, Michael Kockler, Rafael Schirru, German Research Centre for Artificial Intelligence DFKI)
Semantic Tagging and Inference in Online Communities, Yildirim Ahmet, Üsküdarli Suzan, Boğaziçi University
Using Visual Features to Improve Tag Suggestions in Image Sharing Sites (Mathias Lux, Oge Marques, Arthur Pitman, Klagenfurt University)
Harnessing Wikipedia for Smart Tags Clustering (Maria Grineva, Maxim Grinev, Denis Turdakov, Pavel Velikhov, Russian Academy of Sciences)
Please leave a comment if you think that any of the above needs correction.
EDIT: I got the chance to record another 12 seconds definition (and am thinking of setting up a video glossary for the Semantic Web now): Rolf Sint from Salzburg Research explains what folksonomies are and why folksonomies and ontologies go together well in 12 seconds! Rolf is also involved in the KiWi project, which aims to develop a wiki-based knowledge management system boosted by semantic technologies.
Sweet Tools is a comprehensive collection of tools and applications for the Semantic Web. It is maintained by Mike Bergman with help from the Semantic Web Company. [more]