Andreas Blumauer

Seevl: Explore the cultural universe based on semantic web technologies

Just recently Alexandre Passant from DERI Galway went public with a new web service called seevl. First impressions after test driving the system reveal that the seevl team is keeping the promises they have made: “Seevl reinvents music discovery. We provide new ways to explore the cultural and musical universe of your favorite artists and to discover new ones by understanding how they are connected. In addition, we let you comment every piece of data about them.”

I was talking with Alexandre and asked a couple of questions:

Q: seevl.net aims to offer a new way of music recommendations. What exactly can the user expect from it?
The main idea is to offer context around the recommendations, while existing systems are opaque, or rely on collaborative filtering techniques. So that a user know why he could / should like X if he’s browsing page about Y. We hope (and we’ve seen it from our user feedback so far) that it can help to discover new bands and hidden connections.

Q: Yes, indeed this is something new. Maybe for the typical users this could be too complicated. This brilliant feature should somehow be hidden – working just like a magic button?
So far, we include this in the “why is related” button, but we’re constantly working on the UI / UX. Also, we only provide text for now, but are working on dataviz interfaces.

Q: seevl offers for developers a Web API. It seems like you don´t use semantic web standards for that?
We use content-negotiation to provide machine-readable data for every page (search results, entity description, related artists, etc.). If by non-SW standards you mean non-RDF, indeed, we provide JSON instead of RDF/XML or N3, etc. But our JSON integrates URI that you can dereference and follows a similar approach than other existing RDF-JSON serialisation. So, why JSON you may ask. Because our developer target is music hackers, and all APIs from this community (last.fm, echonest, etc.) offer JSON, not RDF. Learning a new JSON schema takes 5 min, learning RDF takes much more.
But we believe that a JSON-RDF serialisation combines the best of both worlds. Actually, we could say we provide our data using standards (we’re giving back a graph that follows the RDF abstract model, with links to dereferencable URIS) but not in a (so far) standardised serialisation.

Q: I agree. But mid-term oriented I would go additionally for SPARQL. A lot of people learn how to SPARQL at the moment.
Yes, we have to measure the cost / ROI. Complete SPARQL can lead to complex queries, that’s why they are somehow hidden behind our search interface (that basically construct a controlled SPARQL query). But that could be something provided to advanced customers.

Q: seevl.net is based on linked data sets like DBpedia, MusicBrainz or Freebase. Is seevl itself offering Linked (Open) Data? I can also see heavy use of the open graph protocol. How could a facebook application of seevl could look like?
Yes, we provide our data back at http://developers.seevl.net. We’re using the Music Ontology and a bit of other models (FOAF, etc.). So far, the OGP markup is used for Facebook likes – but we are looking at other things that could be built on top of this.

Q: Which business model are you following? Can one integrate your service into his shop? would you offer this a cloud service? for how much?
We’ll have B2C (new features on the website are coming soon) and a B2B freemium model. We’re currently identifying how much calls we can support as part of the free-calls per day (so that will indeed be cloud-based, our architecture is on EC2). So, integration of our service / data in shop websites, etc. is definitely what we’d like to see and to feature in our upcoming app-gallery ! The only requirement for data-reuse is attribution and linking-back to the service.

Thanks Alex, and I wish you and your team all the best with seevl.net!

 

Thomas Thurner

The hype, the hope and the LOD2: Sören Auer engaged in the next generation LOD

The paneuropean Project LOD2 is one of the biggest projects dealing with linked data. Scientists, programmers and software architects in various european countries are working on the next generation of linked open data. In a series of interviews i’m presenting people working on and with LOD2. As a start, i had the change to talk to Sören Auer, head of the LOD2 project.

Thomas Thurner: Over the recent years the LOD movement gained tremendous momentum. As one of the key players in this area how do you perceive this development? Hype or hope?

Sören Auer: From my point of view the momentum LOD gained is deserved. We should strive for a Web, which is more decentralized, democratic, participatory, transparent and inclusive. Linked Open Data is from my point a key technological building block on this road. However, a lot of work is ahead of us. LOD has to find its way directly into mainstream technology such as CMSes, Search Engines, Web Applications, Mash-Ups and we have to show users and stakeholders the direct added-value of this technology.

Thomas Thurner: What is the current state of the LOD cloud from a technological point of view? Where do you see room for improvement?

Sören Auer: Currently, the technological state of LOD seems to be comparable to the early days of the Web. We are still able to draw maps/clouds of the LOD datasets and data links are still sparse and difficult to maintain. This reminds me a lot of the early days of the Web, where we also had problems with broken links (the infamous 404). Later, after content management systems and Web applications automatized the link generation and maintenance this improved a lot and I hope we are on the same road with LOD technologies finding its way into more and more Web systems.

Thomas Thurner: How is the LOD2 project addressing theses issues? What are the project’s key objectives?

Sören Auer: LOD2 is addressing in three ways: First, we develop new research approaches highly relevant for LOD, for example, for Linked Data management, automatic data linking as well as Linked Data enrichment andquality improvement. Second, we implement and integrate these approaches into specialized tools (e.g. SILK, OntoWiki, Virtuoso and DL-Learner) forming together the integrated LOD2 stack. The LOD2 stack can be used by data publishers for the whole life-cycle of Linked Data management ranging from extraction over linking, authoring, enrichment to exploration & search.

Thomas Thurner: What do you think are the most important factors to bring LOD to the masses?

Sören Auer: From my point of view the key factor here is that we manage to integrate the large number of tools and approaches for supporting the Linked Datalife-cycle stages in a synergistic way, where each aspect adds value and triggers a number of other improvements. For example, the establishing of a new data link has a direct effect on search & exploration of Linked Data. We have to directly show these kind of benefits to users so they receive and instant gratification for contributions to the Web of Data. Semantic Wikis, such as Semantic MediaWiki and OntoWiki, are already nicely working in this direction. An application with an enormous potential to bring LOD to the masses would be the creation of a distributed, social semantic network. With OpenId, WebId, FOAF, Semantic Pingback most of the building blocks are available, but the final step integrating these into an easy-to-use social networking application still has to be done.

Thomas Thurner: Compared to other semantic web approaches linked data principles seem to be rather easy to understand. On the other hand some argue that the “linked data cloud” is a big heap of data which cannot be used for professional purposes. What is your point of view?

Sören Auer: Of course the currently available data is not useful for all potential usage scenarios. However, already now Linked Data can be used for many interesting applications: For example, we just completed the development of a prototype for a large search engine, where users searching are assisted with comprehensive background information obtained from the Linked Data Web. For this use case, information available as Linked Data is already very valuable and useful. The criticism of LOD being a “heap of data” also reminds me a lot of the early days of the Web, where people raised similar criticisms for the Web being a medium of un-professionalism. Later it turned out that, of course there is a lot of amateurism, but as Wikipedia impressively demonstrates the working together of many amateurs with the right tools can in the end outperform few professionals.

Thomas Thurner: Linked Data could also become a new paradigm for light-weight enterprise data integration. What are the biggest obstacles today for linked data to being accepted by the business community?

Sören Auer: Using Linked Data for data integration in large enterprises has an enormous potential. Just last week I was invited for a workshop with the IT department of one of the top car makers and the people responsible there for data integration were extremely excited about the opportunities of Linked Data in the large heterogeneous enterprise with more than 3000 different backend systems. Linked Data technologies can easily fill the gap between unstructured Intranet search and expensive & complicated Service-oriented Architectures. Compared to SOA, Linked Data is a pay-as-you-go strategy, where data integration can be performed incementally and in sync with the requirements and evolution of the data structures in the enterprise. In order to realize this vision, we need to continue the maturation of enterprise Linked Data tools – the availability of PoolParty, Sindice Enterprise Edition, Virtuoso, TopBraid are already important steps in that direction.

Thomas Thurner: Automatic mechanisms to curate linked data and to make alignments between datasets possible play a crucial role for the next phase of linked data economics. Which technologies will play a central role? What will be the most critical point – do you see a “wisdom of the crowd” playing a role in this game?

Sören Auer: Definitely! Tapping the wisdom of the crowd for mapping & linking has a huge potential, which is currently unused. We started working in that direction with DBpedia Live and the DBpedia mapping Wiki. In order, to make it really easy for people to contribute we have to dramatically lower the barrier to contributing to the alignment process. In LOD2 we also plan to enable users to create mapping and links between dataset by simply giving examples of correct links and evaluating some automatically generated ones.

Thomas Thurner: At the moment governments all around the world start to publish open data, more and more stakeholders start to understand the benefit of open linked data. On the other hand enterprises haven´t even started with this topic. What could be the dynamics which will trigger projects in industry sectors like financial industries which will make use of open data principles?

Sören Auer: Making statistical and financial information available in structured form and as Linked Data could have a enormous impact in this regard. With the DataCube vocabulary effort a first step in this direction was made, but it would be nice if this vocabulary would get an official stamp of a standardization organization such as W3C. Since the benefit of publishing statistical and financial data in structured form, e.g. as Linked Data, is visible most when done by many, this could be also facilitated by government regulations and industry best-practices.

About INFAI

The Institute for Applied Computer Science (InfAI) at Universität Leipzig hosts research groups in service sciences, knowledge engineering and management as well as natural language processing. The approximately 20 researchers of the Agile Knowledge Engineering and Semantic Web (AKSW) research group at InfAI headed by Dr. Sören Auer are establishing theoretical results and scalable implementations for the field. Particular emphasis is given to areas such as ontology creation and
manipulation, knowledge extraction, ontology learning and information & data integration on the Semantic Data Web. The implemented tools and services (such as DBpedia, OntoWiki, DL-Learner and LinkedGeoData) developed by the group enjoy considerable popularity.

About Sören Auer

Dr. Sören Auer leads the research group Agile Knowledge Engineering and Semantic Web (AKSW) at Universität Leipzig. His research interests include semantic data web technologies, knowledge representation, engineering & management, usability, agile methodologies as well as databases and information systems. He aims to combine strong theoretical results with high-impact practical applications. Sören is author of over 50 peer-reviewed scientific publications resulting in a Hirsch index of 15. Sören is leading the large-scale integrated EU-FP7-ICT research project “LOD2 – Creating Knowledge out of Interlinked Data”. Sören is founder (respectively co-founder) of several high-impact research and community projects such as the Wikipedia semantification project DBpedia or the social Semantic Web toolkit OntoWiki. He is co-organiser of several workshops, programme chair of I-Semantics 2008, OKCON 2010, ESWC 2010 and ICWE 2011, area editor of the Semantic Web Journal, serves as an expert for industry, the European Commission, the W3C and is member of the advisory board of the Open Knowledge Foundation.

Thomas Schandl

Linked data based thesaurus management in collaborative settings

The creation and management of controlled vocabularies in companies often takes place in a distributed manner. Different departments in different branch offices often rather create their own vocabularies, than have one large central knowledge model, where everyone contributes.

How to model divergent views on one concept?

Such a central model is not only much harder to manage, but there is also the general problem that differerent departments like marketing, quality assurance, R&D, etc. will have divergent views on the model and its concepts. These different perspectives on one and the same concept are hard to unify in a single model.

Think of a company that sells mobile phones and wants to create a model of its line of products. It wants to utilize this model in the context of its online shop as well as in the context of its user support forum. While the structure of the model (i.e. the relationships between the products) might be very similar or the same in both contexts, there will be differences in which properties of the products are actually relevant in the respective contexts.

In the model of the marketing department there might be a concept for a “Phantastax StamiMaxx” cell phone with a definiton “The StamiMaxx has a powerful battery and is great for professionals who travel a lot”. They might relate it to manufacturer “ACME Corporation” and to several concepts representing different features like “Android OS”, “Multi-touch touchscreen”, etc.
The very same phone has different properties that are interesting from the Quality Assurance departement’s perspective. They might call it by a more specific name like “Phantastax i3000 StamiMaxx S”, have a different definition for it like “3G cell phone implementing the new WTF3000 protocol, …” and relate it to concepts representing known problems and their solutions.

Now they face the task to integrate these different models, as it is not desirable to use a bunch of isolated models within one company.

Support of collaborative work on distributed models

To support this kind of collaborative work on distributed knowledge models, we would like to link the concepts of the models, just as is we link documents in the World Wide Web. Fortunately the Simple Knowledge Organisation System (SKOS) offers mapping properties that can be used to define relationships between concepts from different knowledge models.

E.g. when we want to say that concept “Phantastax StamiMaxx” in the product line thesaurus refers to the same real world entity as concept “Phantastax i3000 StamiMaxx S” in the Quality Assurance thesaurus, then we can use skos:exactMatch to express that. If we want to express that the concepts are merly similar, skos:closeMatch could be used.

The other SKOS mapping properties express a hierarchical (narrowMatch, broadMatch) or an associative (relatedMatch) mapping relation between concepts from different concept schemes. With those we can say that my Samsung Galaxy concept has a skos:broadMatch “Smartphone” in the product line vocabulary and a skos:relatedMatch “ACME Corporation” in a controlled vocabulary about Tech companies.

Modularisation of knowledge models

In this way SKOS thesaurus management systems like PoolParty make it possible to modularise knowledge models, represent concepts in their different contexts and consequently enable collaborative work on those models: The marketing guy can work on his model with the concept properties focused on sales without disrupting the work of the quality assurance expert on her own thesaurus. Later one or both of them can create the skos:exactMatch link between the concepts that are the same, like seen in the “Exact Matching Concepts” box in screenshot of PoolParty below.

Enrich your knowledge: Get connected with the LOD Cloud

Going a step further the models could be connected to external knowledge, e.g. a source from the Linked Open Data (LOD) Cloud. Once we establish links to LOD hubs like DBpedia, we can import additional information for their concepts or use it to establish whether similar concepts from different models really refer to the same real world resource.

Andreas Blumauer

Florian Bauer: I like to view “linked data” as a “single worldwide API”

Florian BauerFlorian Bauer is REEEP’s Operations and IT Director, responsible for the overall operational management of the organisation, the product management of reegle (the search engine for renewable energy and energy efficiency) and the management of the IT landscape of REEEP.

PoolParty Team had the chance to talk with Florian about reegle – information gateway on clean energy.

Could you please give us a brief overview over reegle – what are the targets you are pursuing with this platform?

The main aim of the reegle information gateway (http://www.reegle.info) is to provide a one-stop gateway to comprehensive, high-quality and up-to-date information on clean energy. By making this information accessible to stakeholders in the field around the world, and by presenting it in a user-friendly and intuitive format, reegle directly helps to facilitate the transition to low-carbon energy.

The website provides information on renewable energy, energy efficiency and climate change and their various sub-sectors at a global level, and some reegle services actually combine raw data sets from several different sources, put these datasets into context and thus provide enriched information.

reegle is an offshoot of the Renewable Energy & Energy Efficiency Partnership (REEEP), a non-profit, specialist change agent aiming to catalyze the market for renewable energy and energy efficiency, with a primary focus on emerging markets and developing countries.

The new reegle data portal (data.reegle.info), launched in 2011, has established reegle as a publisher and consumer of Linked Open Data in the energy sector. It provides key clean energy datasets free for re-use using Linked Open Data W3C standards.

reegle consists of two components: one is the semantic search engine (http://www.reegle.info/), the other is the linked data portal (http://data.reegle.info/) – What are your target groups, and which typical problems of the clean energy domain can you solve with these services?

For reegle.info, our target groups are primarily project developers, financiers and government policy-makers. These users can access high-quality information on clean energy-related issues with the set of tools we provide: a special web search, a catalogue of more than 1700 key stakeholders, a map view for geographical browsing, a clean energy glossary, and an energy country profiles function.

The energy country profiles are typical of what we’re trying to achieve. Here, we take information from many different providers and combine it all to present one comprehensive information dossier on renewable energy and energy efficiency in that particular country. This means that in one location you have the country’s most important energy-related information ranging from key statistics, and current regulations to key players in the energy field in both public and private sectors.

For our data portal, the target group is a more technical one: primarily IT developers and open data specialists who want to create new mash-ups and integrate data from reegle into other websites. One of the first using these reegle data sets is the OpenEI.org website, another key portal in the energy field.

Open data is not the same as linked open data. Why did you choose to build your services around W3C´s linked data paradigm and/or standards like RDF?

Tim Berners-Lee once mentioned that he likes to compare the progressive ways of offering data with the “stars system” used to rate hotels. You get:

* for making data public (in any format)
** for machine-readable formats (structured data)
*** if the data is offered in a non-proprietary format
**** if you use URIs to identify things, so people can point to your datasets
***** for linking to other people’s data to provide context

So, as you can imagine, our goal is for reegle to be firmly in the 5-star category, and to establish reegle as an avant-garde tool in energy data.
I also like to view “linked data” as a “single worldwide API”. If the old web was like a huge book, the new semantic web is like a huge database, and SPARQL is the way to ask for information – by sending a query through the SPARQL Endpoint. RDF is the language that offers all possibilities to describe a given dataset with all of the necessary information, including any links to other datasets. Therefore RDF data and SPARQL endpoints provide a powerful tool to find and filter datasets and are crucial, base parts of the semantic web’s architectural layers. On reegle the SPARQL endpoint and the description of the structure of our RDF files is online on our clean energy open data portal.

You also decided to build a SKOS based domain thesaurus for clean energy which now plays an important role to improve the search experience at reegle.
Which experiences have you gained so far from this effort? Which obstacles did you have to overcome?

The SKOS-based renewable energy thesaurus can be seen as the “heart” of reegle as it provides the basis for a lot of related services in reegle, including the refinement suggestions for search results, the auto-completion options and the glossary links between defined terms and their synonyms and related terms.

We decided to use SKOS because we think it is the best language for building a formal and controlled vocabulary for thesauri in a semantic web context, without adding too much complexity. Although it is a simple language, you really still need IT experts to use it to build a thesaurus – domain experts with additional IT skills (hard to find!).

So in our case, we decided to use a scalable and easy-to-use thesaurus server called “PoolParty”. Using this system drastically reduced the complexity, and allowed us to concentrate on the actual building of the thesaurus with our domain experts, and to spend less time on transferring the knowledge into data sets.

What are your future plans with reegle?

Currently we’re working on restructuring the site to better highlight our new added-value services such as the clean energy country profiles. We are also planning to further develop our thesaurus to include climate-compatible development terms and we’ll soon release a wordpress plug-in to insert this thesaurus into clean energy blogs. One of the most exciting projects we are actually working on is the development of “dossier pages”, where we will provide relevant information to several topics mashed up on one page using semantic web technologies. This is part of the EU funded SCMS (“semantic content management system”) project.