Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Multimedia in the Web of Data – Annotating and Interlinking Photos, Music, Multimedia [WOD-PD]

October 23, 2008 By: Jana Herwig Category: Conferences & Events, Internet & Media, Linked Data & Open Data, Mashups & Web services, Social Software 4 Comments →

The Web of Data Practitioners Days concluded with the session on Multimedia in the Web of Data, the first part of which was led by Ansgar Scherp (University of Koblenz-Landau, Germany).

Multimedia content, as Ansgar pointed out, is hardly annotated, badly organized, and hardly ever looked at again – just think of the 300 something pics you might take on an average week-end getaway, and which you never touch again. Annotating multimedia content requires a lot of work and dedication – but most of the time, these pictures eventually dissappear in the “digital shoe box” that is your photo management software.

The most obvious remedy is to annotate content as early as possible, ideally when creating the content, ideally already on your portable camera (formerly known as: mobile phone:) Ansgar suggested to provide incentives for people to encourage picture annotation – professionals could for instance receive a higher financial reward if the deliver already annotated pictures. And of course there are ‘Games with a purpose’ such as Google Image Labeler, where players tag images in pairs, with and against each other, and are rewarded with the entertainment factor of the game.

The slide below shows what has happened (or will happen) to the process of creating photo books in the digital age and the age of mashups:

Ansgar Scherp's slides

After all, this is the age of the social semantic web, so why not try and (re-)use the content, structure and contexts that other users have already created on the web? Content augmentation, for the scope that Ansgar is concerned with, consists in the reuse of content and structures (e.g. from sources such as Flickr and Wikipedia, Geonames) made possible through the definition of rules, e.g.:

  • If there are two or less pictures on a page*
  • then automatically augment the page with additional photos using location information.

* Page here means a page in the album you are currently working on – you probably took a picture of yourself and your friend in Paris, and even though you went to the Centre Pompidou, you forgot to actually take a pic of the building itself – well, let the web be your library!

So the goal is clear: develop a procedure for applying automatic content augmentation in the creation of good photo books.

But what makes a ‘good’ photo book anyway? Here are some of the results of a structural analysis of real, human-created photobooks conducted at CeWe Color:

  • % of photos with faces: 36%
  • Number of album pages: 16.96
  • Photos per page: 6.69
  • Text fields per page: 1.45
  • % of pages with text: 87%

There are many rules that can be established from the structural analysis, which can be applied in turn in the creation of photoboooks, e.g. rules like this one,

  • If the text located in the upper third of a page
  • if the font size is equal or larger that 16 points
  • if the number of words is less than 10
  • if there is no caption on the page that has a bigger font size
  • then this page is the title

Ansgar recommended xSmart, which he described as a “context-driven authoring tool for page-based multimedia presentations.”

Ansgar’s presentation was followed by two more: one by Yves Raimond on Interlinking Music on the Web of Data, and one on Interlinking Multimedia – in spite of better intentions, I did not manage to cover these two in detail, but at least I gathered the links to relevant resources from all three sessions… (more…)

Sphere: Related Content

Web of Data Practitioners Days, 1st Session: Tweaking Turtles [WOD-PD]

October 22, 2008 By: Jana Herwig Category: Conferences & Events, Linked Data & Open Data 7 Comments →

Good morning from Vienna:) The Web of Data Practitioners Days really kicked off with a bang today – with Michael Hausenblas doing a strip! Only to expose the Semantic Web t-shirt he wore underneath his smart suit and tie, of course, but he really got the attention of attendees at 9:15 in the morning:)

First session – Web of Data 101 by Yves Raimond and Keith Alexander – explained the implications of the move from a Web of Documents to a Web of Data: With the Semantic Web architecture, data can be made explicit on the web. Data here means not only data contained in documents, but data describing persons, cities, bands, events, finally arriving at the “Web of Things” (see also this presentation by Dave Raggett, W3C, – PDF 2,7 MB). The Web of Data wouldn’t be a Web if the data weren’t interlinked – here is an overview of the principles of Linked Data:

  • always use URIs as names for things
  • more specifically, use HTTP URIs so that people can look up those names on the web
  • when someone looks up an URI, provide useful RDF information (RDF is the data model used for data on the web of data)
  • include RDF statements that link to other URI (otherwise it wouldn’t be a web).

Please also watch out for what is already happening and is going to happen in the future on www.bbc.co.uk/music/beta. This beta site is powered by MusicBrainz, the open content music database that is also part of the Linked Data cloud. Yves is collaborating with the BBC in the Programmes ontology project, the aim of which is to provide a simple vocabulary for describing programmes.

Yves’ intro was followed by a Turtle hacking session led by Keith Alexander. Turtle is a serialisation format for RDF, i.e. a format in which you can write RDF statements. The Turtle session is documented here on Keith’s Talis website. Even though I copied and pasted most of the code, I didn’t manage to produce a piece of valid code in N3 right away (i.e. not valid according to this validator). It only worked after I had removed the statements about who I know or what I am interested in – without these connections, what remains is a bit boring, I guess. But this looks like I managed to post at least something to the test store!

EDIT: Problem was that I had terminated the statements to soon, with a dot where a semicolon should have been; the demo didn’t allow me to overwrite the first post to the store, but here is my FOAF self-description in Turtle:

@prefix foaf:<http://xmlns.com/foaf/0.1/> .
@prefix owl:<http://www.w3.org/2002/07/owl#> .
@prefix people:<http://api.talis.com/stores/wod-pd-sandbox/items/People/> .

people:JanaHerwig a foaf:Person ;
foaf:name “Jana Herwig” ;
foaf:nick “digiom” ;
foaf:homepage <http://digiom.wordpress.com> ;
owl:sameAs <http://dbtune.org/last-fm/jezobeljones> ;
foaf:knows people:MichaelHausenblas, people:YvesRaimond, people:WolfgangHalb ;
foaf:topic_interest <http://dbpedia.org/resource/Semantic_Web>, <http://dbpedia.org/resource/Web>, <http://dbpedia.org/resource/Popular_Culture>, <http://dbpedia.org/resource/Lolcat>.

Achieved with zero Semantic coding skills – the Web of Data cannot be so hard to achieve:)

EDIT: Did do the update, too – just posted my first SPARQL query to this endpoint. Are the results going to be preserved in this link? Here is the query “by foot”:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX people: <http://api.talis.com/stores/wod-pd-sandbox/items/People/>
DESCRIBE people:JanaHerwig

Sphere: Related Content

♪♫♪No Milk Today♫♪♪ – New Ways of Finding Music for Vegans

September 11, 2008 By: Jana Herwig Category: Conferences & Events, Linked Data & Open Data, Mashups & Web services 1 Comment →

Shortly before Yves Raimond, a researcher at Queen Mary University of London with a focus on metadata for musical resources, won the 2nd prize in the Triplification Challenge, he talked to us about new ways of finding music using the infrastructure of the web of data. If you ever catch anyone again complaining about the lack of persuasive showcases of the Semantic Web, please direct them to this interview with Yves! Quote:

I think there is something quite frustrating about music recommender systems at the moment though. First, they do not explain how a particular recommendation was derived. I would really like them to tell me “I recommended this track because the harmonies are similar to other tracks you liked according to such and such criteria”. I think I would place more trust in a recommender system that actually explains recommendations, like a friend would do.

Another frustration is that we now have a really huge music-related web of data, created within the scope of the Linking Open Data project, which is not used at all by current recommender systems.

We started some work with Alexandre Passant, driven by these two frustrations. Using all these interlinked data for recommendation purposes allows us to break free from the traditional ‘information barriers’, and use all sorts of data as a basis for a musical recommendation.

For example, using the datasets currently available and interlinked on the web, you can already provide recommendations such as “You’re interested in intentional living and the Beastie Boys? Did you know that B.B. King is a vegetarian, as is Adam Yauch, who is a member of the Beastie Boys?”

Last.fm, are you listening? The full interview can be found here.

Yves is also going to be a keynote speaker at the Web of Data Practitioners Days, Oct 22-23, here in Vienna, where you’ll have the chance to discuss the issue of LOD-based music recommendation with him in greater detail.

Other highlights of the program: Web of Data 101 (interested SemWeb beginners: please attend!), an Open Hacking Session, and keynotes from Danny Ayers and Keith Alexander, Richard Cyganiak, Ansgar Scherp, Alan Dix, Leo Sauerman, Sören Auer and Tassilo Pellegrini. URL of the website is webofdata.info

Other news of the day: Physicists can’t dance, but hasthelargehadroncolliderdestroyedtheworldyet.com?

Reblog this post [with Zemanta]
Sphere: Related Content

Linked Data @ TRIPLE-I: Measuring the size of a fact, not of a fiction

September 08, 2008 By: Jana Herwig Category: Conferences & Events, Linked Data & Open Data No Comments →

The TRIPLE-I 2008 conference ended three days ago, yet there are a couple of loose ends I’d still like to tie up. First of all: Linked Data. Tom Heath was invited to give a keynote on “Humans and the Web of Data” – there are a variety of roles in which people may come across Tom and his LOD related work:

He administrates the site LinkedData.org (on behalf of the Linked Data community), he is the creator of Revyu.com (“Review anything!”), which won him the 1st prize in the Semantic Web Challenge 2007, he was a co-organizer of the Linked Data on the Web Workshop at this year’s World Wide Web conference in Beijing, and he was an interviewee in my 12 seconds definitions mission @ TRIPLE-I – see his micro definition of Linked Data in the vid below. (To learn more about Tom and the different roles he fulfils, look here).


Tom Heath explains Linked Data TRIPLE-I 2008 on 12seconds.tv

His keynote was not so much an introduction to Linked Data (I should expect that a conference like TRIPLE-I/I-Semantics would typically attract people who at least have an idea of what Linked Data is about), but rather a confirmation that the Web of Data is no longer a fiction, but a fact. One of the often cited proofs is the growth of the LOD dataset cloud over the last year, as shown in the image below (clicky for biggy, visualization created by Richard Cyganiak).

At the same time – and this was accordingly acknowledged by a later presentation given by Wolfgang Halb which had been prepared collaboratively by Tom, Wolfgang, Michael Hausenblas and Yves Raimond – it’s not just the sheer number of triples on the web that counts. Over the course of one year, the efforts of the Linked Data community (who seek to populate the web with open data, data in RDF) generated 4 billion triples – but only 3 million interlinks.

Their paper was an attempt to measure the size of the Semantic Web based on interlinks. A brief excerpt from the conclusion:

We have identified two different types of datasets, namely single- point-of-access datasets (such as DBpedia), and distributed datasets (e.g. the FOAF-o-sphere). At least for the single-point-of-access datasets it seems that automatic interlinking yields a high number of semantic links, however of rather shallow quality. Our finding was that not only the number of triples is relevant, but also how the datasets both internally and externally are interlinked. Based on this observation we will further research into other types of Semantic Web data and propose a metric for gauging it, based on the quality and quantity of the semantic links. We expect similar mechanisms (for example regarding automatic interlinking) to take place on the Semantic Web.

Another point raised by Tom in his key note was the issue of trust: According to his research, there are five parameters that have an influence on whether we trust a source or recommendation on the web or not: experience , expertise, impartiality (we don’t trust a travel agent, because we can’t help but believe that she is mainly going to recommend the offer of her ‘favourite’ clients), affinity, and track record, with experience, expertise and affinity being the most important ones. A semantic people search engine Tom presented, Hoonoh.com (currently in alpha), thus allows to weight search results according to these three criteria.

Tom’s concluding statement emphasized that Linking Data makes sense not for the sake of it, but for the sake of being at the service of humans: “A web of machine-readable data is even more interesting from a human than from a machine perspective,” for instance in search engines like Hoonoh.com

Reblog this post [with Zemanta]
Sphere: Related Content