Andreas Blumauer

Seevl: Explore the cultural universe based on semantic web technologies

Just recently Alexandre Passant from DERI Galway went public with a new web service called seevl. First impressions after test driving the system reveal that the seevl team is keeping the promises they have made: “Seevl reinvents music discovery. We provide new ways to explore the cultural and musical universe of your favorite artists and to discover new ones by understanding how they are connected. In addition, we let you comment every piece of data about them.”

I was talking with Alexandre and asked a couple of questions:

Q: seevl.net aims to offer a new way of music recommendations. What exactly can the user expect from it?
The main idea is to offer context around the recommendations, while existing systems are opaque, or rely on collaborative filtering techniques. So that a user know why he could / should like X if he’s browsing page about Y. We hope (and we’ve seen it from our user feedback so far) that it can help to discover new bands and hidden connections.

Q: Yes, indeed this is something new. Maybe for the typical users this could be too complicated. This brilliant feature should somehow be hidden – working just like a magic button?
So far, we include this in the “why is related” button, but we’re constantly working on the UI / UX. Also, we only provide text for now, but are working on dataviz interfaces.

Q: seevl offers for developers a Web API. It seems like you don´t use semantic web standards for that?
We use content-negotiation to provide machine-readable data for every page (search results, entity description, related artists, etc.). If by non-SW standards you mean non-RDF, indeed, we provide JSON instead of RDF/XML or N3, etc. But our JSON integrates URI that you can dereference and follows a similar approach than other existing RDF-JSON serialisation. So, why JSON you may ask. Because our developer target is music hackers, and all APIs from this community (last.fm, echonest, etc.) offer JSON, not RDF. Learning a new JSON schema takes 5 min, learning RDF takes much more.
But we believe that a JSON-RDF serialisation combines the best of both worlds. Actually, we could say we provide our data using standards (we’re giving back a graph that follows the RDF abstract model, with links to dereferencable URIS) but not in a (so far) standardised serialisation.

Q: I agree. But mid-term oriented I would go additionally for SPARQL. A lot of people learn how to SPARQL at the moment.
Yes, we have to measure the cost / ROI. Complete SPARQL can lead to complex queries, that’s why they are somehow hidden behind our search interface (that basically construct a controlled SPARQL query). But that could be something provided to advanced customers.

Q: seevl.net is based on linked data sets like DBpedia, MusicBrainz or Freebase. Is seevl itself offering Linked (Open) Data? I can also see heavy use of the open graph protocol. How could a facebook application of seevl could look like?
Yes, we provide our data back at http://developers.seevl.net. We’re using the Music Ontology and a bit of other models (FOAF, etc.). So far, the OGP markup is used for Facebook likes – but we are looking at other things that could be built on top of this.

Q: Which business model are you following? Can one integrate your service into his shop? would you offer this a cloud service? for how much?
We’ll have B2C (new features on the website are coming soon) and a B2B freemium model. We’re currently identifying how much calls we can support as part of the free-calls per day (so that will indeed be cloud-based, our architecture is on EC2). So, integration of our service / data in shop websites, etc. is definitely what we’d like to see and to feature in our upcoming app-gallery ! The only requirement for data-reuse is attribution and linking-back to the service.

Thanks Alex, and I wish you and your team all the best with seevl.net!

 

Thomas Thurner

Vienna Semantic Web Meetup – the next season

Started mid 2009, Vienna Semantic Web Meetup (VSWM) goes now in it’s third year. Hosted by various partners, from media to culture and from corporate to academic, this regular gathering now counts over 200 members. As it is a good tradition at VSWM, people from abroad are visiting by, giving input and new insights. Also the next season of VSWM will bring this mixture of international connection and informal meeting in putting two upcoming topics onto the agenda.

Digital Identity on the Semantic Web
Thursday, April 7, 2011

While recent developments in ICT make it easier for companies and consumers to reach each other, they can also scatter your personal information more widely, making life easier for criminals. On the other hand public institutions and government agencies are collecting personal data too. So personal data is processed without the consensus (or even the knowledge) of the respective citizen. As we know, leaks in this field may unleash sensible personal data as well. The misuse of personal data can be restricted – this is a challenge to both, the technological and the juridical domain. This meetup takes a look on how Semantic Web Technologies can take over its responsibility in this emerging field.

  • Christof Tschohl (BIM)
    Ludwig Boltzmann Institute for Human Rights
  • Mischa Tuffield (Garlik)
    A Standards-based, Open and Privacy-aware Social Web (W3C)

>> read more, and register for free

Portals, Apps and Visualizations for Open Government Data
Wednesday, June 15, 2011

Picking up Keith Andrews suggestion, this is a MeetUp focusing on tools, services and projects dealing with Visualization, Apps-creation and Portals/Catalogs for Open [Government] Data. As this MeetUp is on the eve of Austrians first Open Government Data – Conference (OGD2011) we expect to meet experts ans enthusiasts from Austria and abroad.

  • Keith Andrews (IICM)
    Institute for Information Processing and Computer Supported New Media at Graz University of Technology
  • Andreas Blumauer (SWC)
    Storing, searching, serving Open Government Data – getting an overview on the growing market for open data solutions

>> read more, and register for free



Tassilo Pellegrini

Interview with Georgi Kobilarov: “I believe that data publishing must happen in a distributed style.”

Uberblic.org connects structured data from the web. The Berlin-based inventor Georgi Kobilarov gives a brief insight into the mashup service and talks about the challenges when it comes to build applications upon linked data.

You have recently published the service uberblic.org, a Linked Data mashup editor. What was your motivation to develop this tool?

Uberblic.org provides an integrated view of web data. Our goal is to integrate all the structured data on the web, and give web-developers a single point to access to that reconciled data. More than that, we will open up the tools we use to manage the data sources to the community, so that the people can help us curating that repository of free data. We re-publish all the data we import as Linked Data, under the licenses of the original data publishers.

Some of the data sources we import are available in the Linked Open Data cloud as well, but many are not. Linked Data is an elegant way to publish data in a distributed way on the web, but consuming it from that distributed cloud is – at least – impractical. In every real-world application using linked data from the web I’ve seen, organizations built up internal copies of the cloud, and often even reconcile linked data sources. They build their own Linked Data proxies. Uberblic.org helps those users by providing one public proxy for data from the web. Many of our sources get monitored for data changes, and the according data in uberblic is updated in real-time.

uberblic

Can you give us a brief insight how the tool works? What technology is is built on?

My company, Uberblic Labs, has developed a data integration platform that we use to power uberblic.org. We call it the Uberblic Platform (the name uberblic is derived from the German “Überblick” – English “overview”). This platform enables us to do the full process of “data fusion”: Importing and converting external data sources, mapping the data schemas to a central ontology, filtering out data errors, automatically suggesting duplicates to the user, and merging data from different sources into a single, reconciled representation.

Structured and semi-structured data from the web is an excellent use case for our software platform, since there we come across all the interesting cases of real-world data heterogeneity. But what I think is especially powerful and yet missing in other Linked Data projects I know, is the ability to subscribe to update-feeds. We do that extensively, fetching updates in real-time from Wikipedia and the like.

Our platform is built in Scala and runs a on cluster of machines, with workers communicating through a messaging system. We developed an RDF storage layer on top of a distributed key-values store for storing all provenance information used in the extraction process, currently around 100 million named graphs for uberblic.org. That storage layer does not directly provide SPARQL access, so we push all the output data into a SPARQL endpoint hosted by Talis as well.

What have been the biggest challenges in tackling the integration issues of dispersed data?

It was quite a steep learning curve to do Linked Data not only in an academic environment, but in a reliable, industry-strength set-up. In academia, there was always the excuse that things are just research prototypes. Now that excuse is gone. That’s also where it becomes necessary to manually clean up data. And there are two ways to do that: Either you enable the users to change facts directly in your repository after you have imported the external data (that is what Freebase does), or you facilitate clean-up cycles in the original data source and fetch these updates in real-time. That is what we do.

I believe that data publishing must happen in a distributed style, because then each data source gets taken care of by a specialized group of people using specialized tools. And it’s what you see not only on the web, but also inside organizations and enterprises. But consuming data trough centralized APIs is more than just convenient. We all use Google
or another search engine as a central access point to web pages which are published in a distributed way all over the web, don’t we? Can you imagine today researching a topic on the web without the centralization power of search engines, just by following links across web sites, like in the old days?

When we built the Uberblic Platform, some of the things I imagined to be large headaches, like schema mapping, turned out to work really well. Those pathologic cases you often see in academic “challenges” are – well – pathologic. It’s not necessary to solve them fully automatically through super-intelligent algorithms. Much more important than the sophistication of your algorithms are well designed workflows so that the user becomes a part of the solution. And that’s not about crowd-sourcing or swarm intelligence, the editorial curating of schema mappings and object reconciliation can be done just by a small team of people. If they have the right set of tools.

What are the next plans with uberblic.org? Where will the journey go?

Uberblic.org will continue to integrate more interesting and useful data sources from the web, and we will start making more APIs available to web developers to build their applications on top. We are also looking for partners who are interested in developing applications and have been struggling in the past to get the cross-source data from the web they need.

The work on improving uberblic.org will also benefit our Uberblic Platform, and hence our clients who use that same software for integrating organizational data sources with each other and with the web of data.

About Georgi Kobilarov

Georgi is founder and managing director of Uberblic Labs, a company based in Berlin specialized in Linked Data integration. He worked as a research associate in the Web-based Systems Group at Freie Universität Berlin and as a visiting researcher at Hewlett Packard Labs Bristol. As co-founder and lead developer of DBpedia, he was also a day-one contributor to the Linking Open Data project. Georgi is consulting with the BBC on several Linked Data related projects. He organizes the Web of Data Meetup London, a bi-yearly gathering of the UK Linked Data community. Georgi graduated with a Diplom in business administration from Freie Universität Berlin and has many years of work experience as a software developer. Visit his blog: http://blog.georgikobilarov.com

Tassilo Pellegrini

George Anadiotis: “Linked Data brings value by offering an alternative approach to lightweight data integration and mashups.”

george-imcGeorge Anadiotis is an expert on artificial intelligence with academic roots at the Vrije Universiteit, Amsterdam. In February 2009 he took the position as R&D Director at the Greek technology company IMC. I met him in September at I-SEMANTICS 2009 where he and his team contributed to the Triplification Challenge. In their paper Linked Data for the Masses they were pondering about the pragmatic value of Linked Data from an inbound and outbound perspective.  In his words:

We started experimenting with the technical infrastructure needed and created some proof-of-concept applications. Part of this work was enabling Linked Data access for the front-end infrastructure we used, Liferay portal. We decided on the appropriate vocabularies for the type of content we wanted to publish (FOAF, SIOC and MOAT mainly), delved on the internals of Liferay and used D2R to map its relational database to the vocabularies of choice, also using techniques to improve performance as much as possible. Since Liferay itself is also based on the notion of communities, we thought our work would be more widely applicable and useful, so we chose to submit it for review at the Triplification Challenge and make it available to the community as open source software. Our applications have gradually matured and are about to be deployed in our commercial projects, while at the same time we are now making the Liferay Linked Data Module available as a Sourceforge project and we are working with Liferay management in order to disseminate this effort to the community and also include it in a future release of the software.

Read the full interview here.

Reblog this post [with Zemanta]