Jana Herwig

Semantic Desktop, Lifting and Human Language Technology [WOD-PD]

The next session at WOD-PD was given by Leo Sauermann (German Research Center for Artificial Intelligence DFKI, Germany), and Brian Davis (DERI Galway, Ireland). Leo introduced the idea of the Semantic Desktop, and more specifically, the Nepomuk Social Semantic Desktop. There’s good article about Nepomuk on Linux.com, written by Bruce Byfield on August 26, 2008, from which I quote the following, enlightening passages:

Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk’s coordinator, explains, “The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate.” [...] “The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information.”

At a high level of generalization, Nepomuk has three main aspects, according to Bernardi. First, there is a standard framework for annotating pieces of information so that connections can be made between them. Second, there are ontologies, the sets of “documented shared understanding” or common concepts that can be defined for particular types of information, such as bio-science or computer desktop use. Finally, there are the tools for making or using the annotations and ontologies, what Bernardi calls the “workspaces that connect to other workspaces and help you in your day to day activities of collecting information, structuring it, making sense of it, and creating new information and communicating it.”

Leo has provided the relevant download links for those who “want to get their hands dirty” with Nepomuk (as he put it) on his blog. Leo Sauermann and Ansgar Bernardi also contributed an article about the Semantic Desktop to the recently published Social Semantic Web volume – a preview of the article is available here (in German – I’m sorry!).

Brian Davis‘ part of the talk focused on Lifting and Human Language Technology (HLT) for the Semantic Desktop – Semantic Lifting means to capture semantics and translate them into ontologies. Human language technology (HLT), in its broadest sense, can be described as computational methods for processing and manipulating language (for instance text analysis).

One of the goals of the Semantic Desktop is speech act detection for email – speech act here as defined by John Searle. At its most basic definition, a speech act is simply an utterance, but is also often understood more specifically as an illocutionary act (which is a term introduced by John L. Austin in How to do things with words), or a ‘performative utterance’, meaning that by saying something, one actually does something. For instance, the sentence “Please have the document ready for Workshop 1.” contains an instruction: It informs the reader about the requirements for a particular event, and asks him or her to meet these requirements.

Brian also introduced Roundtrip Ontology Authoring (ROA), which is a process that allows non-expert users to author or amend an ontology by using simple, easy to learn, controlled natural language. The process is a combination of Controlled Language for Information Extraction (CLIE) and Text Generation which is developed on top of GATE. ROA is documented on the the Nepomuk website; for further information about CLIE, read this article by Valentin Tablan, Tamara Polajnar, Hamish Cunningham and Kalina Bontcheva: User-friendly ontology authoring using a controlled language (PDF, 64 KB).

Reblog this post [with Zemanta]
Jana Herwig

Web of Data Practitioners Days, 1st Session: Tweaking Turtles [WOD-PD]

Good morning from Vienna:) The Web of Data Practitioners Days really kicked off with a bang today – with Michael Hausenblas doing a strip! Only to expose the Semantic Web t-shirt he wore underneath his smart suit and tie, of course, but he really got the attention of attendees at 9:15 in the morning:)

First session – Web of Data 101 by Yves Raimond and Keith Alexander – explained the implications of the move from a Web of Documents to a Web of Data: With the Semantic Web architecture, data can be made explicit on the web. Data here means not only data contained in documents, but data describing persons, cities, bands, events, finally arriving at the “Web of Things” (see also this presentation by Dave Raggett, W3C, – PDF 2,7 MB). The Web of Data wouldn’t be a Web if the data weren’t interlinked – here is an overview of the principles of Linked Data:

  • always use URIs as names for things
  • more specifically, use HTTP URIs so that people can look up those names on the web
  • when someone looks up an URI, provide useful RDF information (RDF is the data model used for data on the web of data)
  • include RDF statements that link to other URI (otherwise it wouldn’t be a web).

Please also watch out for what is already happening and is going to happen in the future on www.bbc.co.uk/music/beta. This beta site is powered by MusicBrainz, the open content music database that is also part of the Linked Data cloud. Yves is collaborating with the BBC in the Programmes ontology project, the aim of which is to provide a simple vocabulary for describing programmes.

Yves’ intro was followed by a Turtle hacking session led by Keith Alexander. Turtle is a serialisation format for RDF, i.e. a format in which you can write RDF statements. The Turtle session is documented here on Keith’s Talis website. Even though I copied and pasted most of the code, I didn’t manage to produce a piece of valid code in N3 right away (i.e. not valid according to this validator). It only worked after I had removed the statements about who I know or what I am interested in – without these connections, what remains is a bit boring, I guess. But this looks like I managed to post at least something to the test store!

EDIT: Problem was that I had terminated the statements to soon, with a dot where a semicolon should have been; the demo didn’t allow me to overwrite the first post to the store, but here is my FOAF self-description in Turtle:

@prefix foaf:<http://xmlns.com/foaf/0.1/> .
@prefix owl:<http://www.w3.org/2002/07/owl#> .
@prefix people:<http://api.talis.com/stores/wod-pd-sandbox/items/People/> .

people:JanaHerwig a foaf:Person ;
foaf:name “Jana Herwig” ;
foaf:nick “digiom” ;
foaf:homepage <http://digiom.wordpress.com> ;
owl:sameAs <http://dbtune.org/last-fm/jezobeljones> ;
foaf:knows people:MichaelHausenblas, people:YvesRaimond, people:WolfgangHalb ;
foaf:topic_interest <http://dbpedia.org/resource/Semantic_Web>, <http://dbpedia.org/resource/Web>, <http://dbpedia.org/resource/Popular_Culture>, <http://dbpedia.org/resource/Lolcat>.

Achieved with zero Semantic coding skills – the Web of Data cannot be so hard to achieve:)

EDIT: Did do the update, too – just posted my first SPARQL query to this endpoint. Are the results going to be preserved in this link? Here is the query “by foot”:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX people: <http://api.talis.com/stores/wod-pd-sandbox/items/People/>
DESCRIBE people:JanaHerwig

Jana Herwig

Danny Ayers: “The Semantic Web is the path of least resistance”

Danny AyersThe Web of Data Practitioners Days are approaching – giving me the opportunity to do an advance interview with Danny Ayers, Semantic Web evangelist, Community Platform manager at Talis, Web of Things everything (I think). I’d just like to extract two or three points here – you can read the whole interview on our website. First something that’s noteworthy to me as it says something about the patterns of technological evolution in general:

Looking back a few years, I don’t think many people working on the Web could have predicted the remarkable rise of blogging, the revival of DHTML and ancient Internet Explorer tricks such as Ajax, online social networks, Wikis, the whole Web 2.0 thing. It’s worth noting that these developments have been consistent with Tim Berners-Lee’s vision of the Web as a system in which people are the key component.

Shifting to the Semantic Web perspective, for a long time I have believed this approach is on track simply because it offers improvements to the Web for which there are no obvious alternative techniques. Personally, I was relatively late to realise what those improvements really were – moving from a Web of Documents to a more general Web of Data. Expressed like that, and looking at existing Web architecture, the Semantic Web is the path of least resistance.

Remember? AJAX, when it cropped up and caused a big buzz in 2005, was nothing new, it was just a new term for an old thing, i.e. the Internet Explorer tricks Danny mentions (see also A Brief History of AJAX: “Browser asynchronous hacks have been possible since 1996, when Internet Explorer introduced the IFRAME tag, passing through a number of techniques such as pixel gifs, Netscape layers, Microsoft Remote Scripting, Java/JavaScript gateways, stylesheet hacks, image/cookies, and most recently the XMLHttpRequest.”)

Sometimes it takes a while until someone (society, industry, what have you) starts to notice that this or that, something, could actually be useful. Sometimes technologies that everybody thinks are silly become a huge sucess – think text messages!

And sometimes you have a great (piece of) technology and it just never really catches on, and if that is the case, then mostly because some forces in the market (trusts, monopolies, corporations who force you to use their software/technology and at ridiculous price, people who would do anyhing they can to undo the natural laws of the digital world) won’t let it happen. What happend to Video 2000 and Betamax? Nixed by JVC’s licensing strategies for VHS. Just wanted to make this point before moving on to the next quote. Danny:

Regarding possible obstacles, there are many ways the Web could suffer, probably most dangerous being interventions from national governments or commercial interests, tilting the table on which we build these systems – such as software patents and threats to net neutrality. The Web works because it’s more or less the same to everyone, everywhere.

So if you think that the Web should continue to be the same to everyone, everywhere, if you would like to liaise with other people interested in the SemWeb and the Web of Data, but most importantly, if you do not know a whole lot about the SemWeb yet but would like to learn more, then please come and do attend the Web of Data Practitioners Days in Vienna, Oct 22-23.

It is going to start with a “Web of Data 101″, i.e. a low-threshold introduction given by Keith Alexander (Talis, UK) and Yves Raimond (Queen Mary University of London, UK) to Semantic Technology in the context of the Web. Here is the full program – please mind that there is a deadline for the registration also (6 Oct 2008!).

Reblog this post [with Zemanta]
Jana Herwig

♪♫♪No Milk Today♫♪♪ – New Ways of Finding Music for Vegans

Shortly before Yves Raimond, a researcher at Queen Mary University of London with a focus on metadata for musical resources, won the 2nd prize in the Triplification Challenge, he talked to us about new ways of finding music using the infrastructure of the web of data. If you ever catch anyone again complaining about the lack of persuasive showcases of the Semantic Web, please direct them to this interview with Yves! Quote:

I think there is something quite frustrating about music recommender systems at the moment though. First, they do not explain how a particular recommendation was derived. I would really like them to tell me “I recommended this track because the harmonies are similar to other tracks you liked according to such and such criteria”. I think I would place more trust in a recommender system that actually explains recommendations, like a friend would do.

Another frustration is that we now have a really huge music-related web of data, created within the scope of the Linking Open Data project, which is not used at all by current recommender systems.

We started some work with Alexandre Passant, driven by these two frustrations. Using all these interlinked data for recommendation purposes allows us to break free from the traditional ‘information barriers’, and use all sorts of data as a basis for a musical recommendation.

For example, using the datasets currently available and interlinked on the web, you can already provide recommendations such as “You’re interested in intentional living and the Beastie Boys? Did you know that B.B. King is a vegetarian, as is Adam Yauch, who is a member of the Beastie Boys?”

Last.fm, are you listening? The full interview can be found here.

Yves is also going to be a keynote speaker at the Web of Data Practitioners Days, Oct 22-23, here in Vienna, where you’ll have the chance to discuss the issue of LOD-based music recommendation with him in greater detail.

Other highlights of the program: Web of Data 101 (interested SemWeb beginners: please attend!), an Open Hacking Session, and keynotes from Danny Ayers and Keith Alexander, Richard Cyganiak, Ansgar Scherp, Alan Dix, Leo Sauerman, Sören Auer and Tassilo Pellegrini. URL of the website is webofdata.info

Other news of the day: Physicists can’t dance, but hasthelargehadroncolliderdestroyedtheworldyet.com?

Reblog this post [with Zemanta]