Jana Herwig

Extending Google: First Look at SemantiFind

Just stumbled upon SemantiFind via T3N, and then upon the review on ReadWriteWeb from last week Thursday.

What’s it about? Semantifind is an IE and FF browser plug-in that extends Google’s search functionalities, most notably through a typeahead functionality that allows you to refine your search results before hitting ‘enter’. ReadWriteWeb wasn’t too impressed though:

Unfortunately, SemantiFind is one of those tools that’s good in theory, but not so good in practice. When performing some test searches, results were not as precise as they should have been. For example, in the above-mentioned search for “Georgia,” a search for the U.S. state returned Google results for the country as well.

Ambiguities due to homonyms such as Georgia vs Georgia, or Java vs Java are among the faves of people who are trying to pitch a semantic tool to you – but I really wonder whether the effects of homonyms aren’t highly overrated? How often do people really search for these, and in particular search for these without context, i.e. further search terms such as in ‘Georgia Tech’, ‘Georgia war’, ‘Java Coffee’ or ‘Java bugs’?

I must say I was quite impressed by the choice of search terms offered, and if you (like me) are easy prey for the serendipity effect, then SemantiFind can please and distract you endlessly. Here is a preview of what appears if you enter ‘serendipity’ – please note the preview of possible descriptions and definitions which you get on the Google homepage with the plugin (click > big):

Once you pick a term it turns into a kind of button (just slightly annoying: you cannot edit a term after it’s turned into a button, but would have to delete the whole thing and type again if you want to change your search query):

And then, what happens? On the search results page, you see results filtered by SemantiFind’s user-generated, user-approved labels on top of the other search results – which irritated me at first as it comes across as a search engine within the search engine. Admittedly: I’d rather sift through 13 results than through 10,900,00 search results (even though I never make it to the end of Google’s search list anyway; does anybody?) – but does the article about trees doing their best work with thermostats at 70° really deserve the second rank in SemantiFind’s list of recommended search results?

So while I agree with RWW that this “just goes to show why search engines that rely on people to filter the results might not work. Human error shouldn’t be a factor in web searches”, I am still quite fond of the suggestions and definition previews. I would probably use SemantiFind regularly if they allowed me to configure the plugin in such a way that I’d get the suggestions on the input page, but not the recommended results on the results page.

What’s the source of these results anway? SemantiFind’s recommended results seem to rely entirely on input generated by users – to add input, you need to install their toolbar and start adding labels to websites; if a website has been labeled before, you can confirm or reject existing labels. What’s nice: a label recommender (only presumably the same one that’s used for search queries) reduces ambiguity. What’s curious: You can also browse the pages you have already labeled in what they call your “catalogue” – which makes the service even more reminiscent of a bookmarking service, and which makes me wonder whether one shouldn’t possibly link this with a del.icio.us/Mr.Wong/Bibsonomy/Faviki account (Faviki would probably be the best, considering their tag recommendations are based on DBpedia, and considering that Faviki just added 1 million new tags and now holds more than 5 million tags across all languages)

Questions that remain: I’d really like to know how they maintain their list of suggested labels – ambiguity, typos, plurals forms, i.e. the usual folksonomy issues must be a big challenge. Also, I’d like to know where they get their definitions in the preview from – from Google? Or are these user-generated as well? There must, after all, be some use for the “request a new definition” form?

Too bad they don’t have a blog to which one could send a track back, and there is nothing much on their company page either.

Reblog this post [with Zemanta]
Matthias Samwald

Packing my bags for VoCamp Oxford

(by Matthias Samwald)

I am packing my bags once again: The first VoCamp (hosted at Oxford University, UK) is about to start this week. So, what is a VoCamp supposed to be? The official definition reads like this: “A VoCamp is a series (hopefully) of informal events where people can spend some dedicated time creating lightweight vocabularies/ontologies for the Semantic Web/Web of Data. The emphasis of the event(s) is not on creating the perfect ontology in a particular domain, but on creating vocabs that are good enough for people to start using for publishing data on the Web.”

I always thought that the lack widely established vocabularies/ontologies has been very damaging to the developent of the Semantic Web. The VoCamp initiative could help changing this situation for the better, so I really hope that this is the start of a long series of events.

My topics of main interest are: 1) Associative Tags; 2) Agreement, Disagreement, discourse; 3) Corporate Semantic Web, 4) “Are upper level ontologies/vocabularies not so bad after all?”, 5) “ Cleaner schemas and ontologies”. These interests are motivated partly by use-cases from the “KiWi – Knowledge in a Wiki” EU project, and partly by developments in the area of biomedical research at DERI Galway and the W3C Interest Group for Health Care and Life Science. Details below.

__Associative Tags__

Tagging is one of the key components of the ‘Web 2.0′, and Semantic Web technologies will help to make tagging even more powerful. Schemas such as SCOT or MOAT have already been established, and make it possible to ‘tag’ not only with simple strings, but with entities. These entities (such as concepts described in SKOS) can be associated with clear semantics and can be further described with RDF statements, to describe hierarchies of entities, or to link entities to rich data sources such as DBpedia. This enables sophisticated data-integration and cross-data source queries that would not have been able with simple, string-based tags.

On the other hand, Semantic Web developers can learn from the simplicity that has made tagging so successful. Creating useful tags is very simple, and good user interfaces can further improve the simplicity of creating useful tag with feature such as autocompletion and tag recommendation. This simplicity should server as a role model for many Semantic Web applications.

Specifically, I am interested in what I call ‘associative tags’, bundles of tags/entities/concepts that can be used for the simple representation of facts. The primary intention of creating aTags is not the categorization of the document, but the representation of the key facts inside the document. Key facts in the biomedical domain might be, for example,

“Protein A interacts with protein B” (which can be represented with an aTag comprising of the three entities “Protein A”, “Molecular interaction” and “Protein B”) or

“Overexpression of protein A in tissue B is the cause of disease C” (an aTag comprising of the four entities “Overexpression”, “Protein A”, “Tissue B” and “Disease C”).

Once the aTags from these different sources are aggregated, it is possible to pose a query such as “show me molecules that are associated with molecules that are associated with disease C”, yielding “protein A” as an answer. Hierachies (in the form of rdfs:subClassOf and skos:narrower) can be used to expand queries based on background knowledge (e.g., that “disease D” is a subclass of “disease C”).

In many cases (especially with some ontologies in the biomedical domain), creating such associative tags can be much simpler than the creation of ‘real’ statements, i.e., relations between individuals and property restrictions of classes.

__Agreement, Disagreement, discourse__

Many people in the Semantic Web community are interested in the representation of argumentation structures on the web. For example: stating that one snippet of text contains statements that are in disagreement with another snippet of text, which is in agreement with yet another snippet of text. This can be of use for many knowledge domains, such as news articles, biomedical publications or reports submitted to a software bug tracker. Of special interest in this context are extensions of established schemas, especially SIOC. There is also another ontology called SWAN that is specifically tailored to the biomedical domain, and efforts to align SWAN with SIOC have started recently.

__Corporate Semantic Web__

As Semantic Web technologies are finally getting mature enough to allow industrial uptake, it is becoming clear that ontologies for describing organization structures and business processes are still lacking maturity. FOAF allows us to represent basic information about persons, organizations and their relationships, but lacks vocabulary for stating that one person is the boss of another person, that a project consists of several subtasks, et cetera. While there are some small projects that try to create such schemas/ontologies, a solution of widespread acceptance does not seem to be in sight at the moment.

__Are upper level ontologies/vocabularies not so bad after all?__

FOAF seemingly tried it a long time ago – foaf:Person is a subclass of, “http://xmlns.com/wordnet/1.6/Person”, foaf:Document “http://xmlns.com/wordnet/1.6/Document” and so on. Linking to external schemas/ontologies (or making use of their classes and properties directly) can definitly help in facilitating semantic interoperability. For a long time, many web developers were very skeptical about such ‘top-down’ approaches of data integration, but recently the recognition of the potential values of such resources seems to be increasing. In parallel, the recent 1-2 years brought us some very large upper ontologies that are available as linked data, such as:

  • Wordnet 2.0, hosted by the W3C
  • Yago/DBpedia
  • OpenCyc (now with new URIs)
  • UMBEL (derived from OpenCyc and others).

I think the practice of re-using and linking to such upper ontologies as should become popular (again). It helps in creating a highly interlinked Semantic Web, and helps to avoid re-inventing the wheel for each new schema/ontology. This linking should not be done post-hoc, but should be a central part of the early stages of vocabulary/ontology/data creation.

__Cleaner schemas and ontologies__

Working with established ontologies and schemas in ontology editors can be a chore. Most have dependencies on other ontologies, but don’t use owl:imports. Most use an awkward mix of OWL statements and RDF(S), resulting in ontologies that are OWL Full. Many require some OWL reasoning to make use of sameAs statements and inverse properties, but at the same time reasoning is complicated because the ontologies are OWL Full or even contain logical inconsistencies. Often enough, there seems to be no practical reason for the design choices that caused the trouble: some minor changes can turn a messy OWL Full ontology into an OWL lite or OWL DL ontology. At the moment, many different working groups have created local versions of schemas such as FOAF or Dublin Core that are valid OWL-DL to fix that problem.

It doesn’t have to be this way.

Trying to adhere to OWL lite/DL and adding owl:imports statements can help building cleaner, modular and more sustainable ontologies, and does not require significant additional effort during the creation of ontologies. Maybe we can find a consensus that this would be a worthwhile goal, and develop plans towards reaching that goal.

Jana Herwig

Tag Recommender Evaluation – Anyone Can Particpate

The IWIS Group at the Dept. of Computer Science, University of Aalborg, Denmark, have just opened up their evaluation of a tag recommender system they are building; the component is to be part of the wiki-based, semantic knowledge management system KiWi (itself based on IkeWiki). Anyone interested in participating, please send an Email to Fred Durão at fred@cs.aau.dk.

Hi,
We are conducting an evaluation of a tag based recommender system with personalization we have developed here at the IWIS group at Aalborg University (http://iwis.cs.aau.dk) and in the context of KIWI project (http://www.kiwi-project.eu). We would be very grateful if you could help us with this task.

The recommendater system is based on a set of algorithms we are evaluating. Later we are planning to plug it into the KIWI system and develop an appropriate user interface for it. Currently, we are evaluating it based on Delicious data (tags and content). The recommendations will be processed by our recommender system based on the tags you placed in Delicious.

As personalization is a crucial aspect to us, we will give you a generated username and password to log onto the Delicious Web site. Therefore please send an e-mail back to us that you would like to participate. You only have to tag a minimum 10 web sites of your preference. Tag as much as you can!

Afterwards we are going to email you a list of recommendations to web sites that you might be interested. These are computed by our recommender system. We will ask you to mark the recommendations by YES if the sites recommended suits your preference or NO if it does not.

The achieved results will be published to all participants after the end of the analysis.

People interested in participate of this evaluation please send an email to fred@cs.aau.dk.

Best regards,

Fred Durão and Peter Dolog

Here is link to the FAQ.

Reblog this post [with Zemanta]
Andreas Blumauer

Why mockups are essential for designing semantic applications

Applications based on semantic technologies offer new ways to discover, browse and explore information – this is an established fact in the SemWeb community. But how can we (as semantic web “insiders”) communicate these potential benefits to a typical end-user who has never heard about “faceted search” before – which doesn’t mean that he or she wouldn’t love intelligent user interfaces if they were in place?

One answer lies in using mockups, which are, on the one hand, an indispensable instrument for prototyping user interfaces, but also valuable when it comes to explaining the workings of an application to an end-user, an audience of interested researchers or a client.

And when it comes to explaining a search engine or search widget, mockups are even more important, as we all and in particular end-users are often unable to think of search interfaces other than in terms of Google.

We have become so googlified that hardly anyone can think of different ways of searching for information than Google has offered for many years now: Put a couple of words in a text box, click a button and scroll through a list of titles and summaries. Repeat until you’re done, or try a new search and repeat. Wow!

Although even Google has started recently to implement a little bit of semantics by offering an auto-complete functionality on google.com (on some local versions like Google Austria this feature is still not available), even the most basic concepts for an intelligent search interface are still not part of common sense thinking.

Admittedly, there are people who get irritated instantly by complex user interfaces like David Huynh´s Freebase Parallax. “This is only for experts!” is their response. But in a corporate setting, complex queries are part of our daily business – they are just not supported by common search engines (only exception being data mining solutions). But that doesn’t mean that we don’t need it.

Where is the way out of this dilemma?

  • Don’t tell, but SHOW the end-users how semantic technologies can enhance search & browse experiences
  • Do not use terms like SPARQL or RDF
  • Create a simple mockup that illustrates the points you want to make
  • You’re not a designer? Use tools like Balsamiq – Try it now!

Here is an example for a mockup of a semantically enhanced expert finder:

These kind of mockups are essential for any requirements engineering phase in any project where search is a bit more than a text box, a button and a bunch of documents.

Reblog this post [with Zemanta]