Thomas Thurner

The hype, the hope and the LOD2: Sören Auer engaged in the next generation LOD

The paneuropean Project LOD2 is one of the biggest projects dealing with linked data. Scientists, programmers and software architects in various european countries are working on the next generation of linked open data. In a series of interviews i’m presenting people working on and with LOD2. As a start, i had the change to talk to Sören Auer, head of the LOD2 project.

Thomas Thurner: Over the recent years the LOD movement gained tremendous momentum. As one of the key players in this area how do you perceive this development? Hype or hope?

Sören Auer: From my point of view the momentum LOD gained is deserved. We should strive for a Web, which is more decentralized, democratic, participatory, transparent and inclusive. Linked Open Data is from my point a key technological building block on this road. However, a lot of work is ahead of us. LOD has to find its way directly into mainstream technology such as CMSes, Search Engines, Web Applications, Mash-Ups and we have to show users and stakeholders the direct added-value of this technology.

Thomas Thurner: What is the current state of the LOD cloud from a technological point of view? Where do you see room for improvement?

Sören Auer: Currently, the technological state of LOD seems to be comparable to the early days of the Web. We are still able to draw maps/clouds of the LOD datasets and data links are still sparse and difficult to maintain. This reminds me a lot of the early days of the Web, where we also had problems with broken links (the infamous 404). Later, after content management systems and Web applications automatized the link generation and maintenance this improved a lot and I hope we are on the same road with LOD technologies finding its way into more and more Web systems.

Thomas Thurner: How is the LOD2 project addressing theses issues? What are the project’s key objectives?

Sören Auer: LOD2 is addressing in three ways: First, we develop new research approaches highly relevant for LOD, for example, for Linked Data management, automatic data linking as well as Linked Data enrichment andquality improvement. Second, we implement and integrate these approaches into specialized tools (e.g. SILK, OntoWiki, Virtuoso and DL-Learner) forming together the integrated LOD2 stack. The LOD2 stack can be used by data publishers for the whole life-cycle of Linked Data management ranging from extraction over linking, authoring, enrichment to exploration & search.

Thomas Thurner: What do you think are the most important factors to bring LOD to the masses?

Sören Auer: From my point of view the key factor here is that we manage to integrate the large number of tools and approaches for supporting the Linked Datalife-cycle stages in a synergistic way, where each aspect adds value and triggers a number of other improvements. For example, the establishing of a new data link has a direct effect on search & exploration of Linked Data. We have to directly show these kind of benefits to users so they receive and instant gratification for contributions to the Web of Data. Semantic Wikis, such as Semantic MediaWiki and OntoWiki, are already nicely working in this direction. An application with an enormous potential to bring LOD to the masses would be the creation of a distributed, social semantic network. With OpenId, WebId, FOAF, Semantic Pingback most of the building blocks are available, but the final step integrating these into an easy-to-use social networking application still has to be done.

Thomas Thurner: Compared to other semantic web approaches linked data principles seem to be rather easy to understand. On the other hand some argue that the “linked data cloud” is a big heap of data which cannot be used for professional purposes. What is your point of view?

Sören Auer: Of course the currently available data is not useful for all potential usage scenarios. However, already now Linked Data can be used for many interesting applications: For example, we just completed the development of a prototype for a large search engine, where users searching are assisted with comprehensive background information obtained from the Linked Data Web. For this use case, information available as Linked Data is already very valuable and useful. The criticism of LOD being a “heap of data” also reminds me a lot of the early days of the Web, where people raised similar criticisms for the Web being a medium of un-professionalism. Later it turned out that, of course there is a lot of amateurism, but as Wikipedia impressively demonstrates the working together of many amateurs with the right tools can in the end outperform few professionals.

Thomas Thurner: Linked Data could also become a new paradigm for light-weight enterprise data integration. What are the biggest obstacles today for linked data to being accepted by the business community?

Sören Auer: Using Linked Data for data integration in large enterprises has an enormous potential. Just last week I was invited for a workshop with the IT department of one of the top car makers and the people responsible there for data integration were extremely excited about the opportunities of Linked Data in the large heterogeneous enterprise with more than 3000 different backend systems. Linked Data technologies can easily fill the gap between unstructured Intranet search and expensive & complicated Service-oriented Architectures. Compared to SOA, Linked Data is a pay-as-you-go strategy, where data integration can be performed incementally and in sync with the requirements and evolution of the data structures in the enterprise. In order to realize this vision, we need to continue the maturation of enterprise Linked Data tools – the availability of PoolParty, Sindice Enterprise Edition, Virtuoso, TopBraid are already important steps in that direction.

Thomas Thurner: Automatic mechanisms to curate linked data and to make alignments between datasets possible play a crucial role for the next phase of linked data economics. Which technologies will play a central role? What will be the most critical point – do you see a “wisdom of the crowd” playing a role in this game?

Sören Auer: Definitely! Tapping the wisdom of the crowd for mapping & linking has a huge potential, which is currently unused. We started working in that direction with DBpedia Live and the DBpedia mapping Wiki. In order, to make it really easy for people to contribute we have to dramatically lower the barrier to contributing to the alignment process. In LOD2 we also plan to enable users to create mapping and links between dataset by simply giving examples of correct links and evaluating some automatically generated ones.

Thomas Thurner: At the moment governments all around the world start to publish open data, more and more stakeholders start to understand the benefit of open linked data. On the other hand enterprises haven´t even started with this topic. What could be the dynamics which will trigger projects in industry sectors like financial industries which will make use of open data principles?

Sören Auer: Making statistical and financial information available in structured form and as Linked Data could have a enormous impact in this regard. With the DataCube vocabulary effort a first step in this direction was made, but it would be nice if this vocabulary would get an official stamp of a standardization organization such as W3C. Since the benefit of publishing statistical and financial data in structured form, e.g. as Linked Data, is visible most when done by many, this could be also facilitated by government regulations and industry best-practices.

About INFAI

The Institute for Applied Computer Science (InfAI) at Universität Leipzig hosts research groups in service sciences, knowledge engineering and management as well as natural language processing. The approximately 20 researchers of the Agile Knowledge Engineering and Semantic Web (AKSW) research group at InfAI headed by Dr. Sören Auer are establishing theoretical results and scalable implementations for the field. Particular emphasis is given to areas such as ontology creation and
manipulation, knowledge extraction, ontology learning and information & data integration on the Semantic Data Web. The implemented tools and services (such as DBpedia, OntoWiki, DL-Learner and LinkedGeoData) developed by the group enjoy considerable popularity.

About Sören Auer

Dr. Sören Auer leads the research group Agile Knowledge Engineering and Semantic Web (AKSW) at Universität Leipzig. His research interests include semantic data web technologies, knowledge representation, engineering & management, usability, agile methodologies as well as databases and information systems. He aims to combine strong theoretical results with high-impact practical applications. Sören is author of over 50 peer-reviewed scientific publications resulting in a Hirsch index of 15. Sören is leading the large-scale integrated EU-FP7-ICT research project “LOD2 – Creating Knowledge out of Interlinked Data”. Sören is founder (respectively co-founder) of several high-impact research and community projects such as the Wikipedia semantification project DBpedia or the social Semantic Web toolkit OntoWiki. He is co-organiser of several workshops, programme chair of I-Semantics 2008, OKCON 2010, ESWC 2010 and ICWE 2011, area editor of the Semantic Web Journal, serves as an expert for industry, the European Commission, the W3C and is member of the advisory board of the Open Knowledge Foundation.

Thomas Thurner

Paneuropean Open Government Data Survey – join now!

LOD2 project is currently circulating a survey aimed at people interested in open government data. If you are interested in government information (whether as a publisher, producer, reuser or consumer) LOD2 – team would be very grateful for 10-15 minutes of your time to let them know about what you would like to see from the technology developed by LOD2.


You can find the survey at survey.lod2.eu
The survey will be open until the 17th December 2010.

Very much appreciated, is any help in forwarding this to relevant colleagues or suggestions for people this should be  to, and for any blogging/tweeting to make sure as many potentially interested people as possible have the opportunity to respond! If you have any questions or issues about the survey please don’t hesitate to contact Martin Kaltenböck <m.kaltenboeck –at– semantic-web.at> or Thomas Thurner <t.thurner–at– semantic-web.at>.

Tassilo Pellegrini

LOD2 Kick Off Meeting in Leipzig

From September 6 – 8, 2010 we kicked off the LOD2 project in Leipzig / Germany. LOD2 is funded by the European Commission within the 7th Framework Programme (Grant Agreement No. 257943) consisting of 10 partners from 7 countries. Its main aim is to integrate and syndicate linked data with large-scale, existing applications and showcase the benefits in three application scenarios: 1) Media & Publishing, 2) Enterprise Data Management and 3) Open Government Data. The resulting tools, methods and data sets have the potential to change the Web as we know it today. (You can download the project flyer here.)

The first day was dedicated to the general introduction of the project partners which are Universität Leipzig (Germany), Centrum Wiskunde & Informatica (Netherlands), National University of Ireland in Galway (Ireland), Freie Universität Berlin (Germany), OpenLink Software (United Kingdom), Semantic Web Company (Austria), TenForce (Belgium), Exalead (France), Wolters Kluwer Deutschland (Germany) and Open Knowledge Foundation (United Kingdom). Below you see a picture of the kick off team.

During the morning of the second day a first introduction to the technical components took place. The picture below shows an abstraction of the LOD2 high level architecture.

Orri Erling and Hugh Williams from OpenLink introduced Virtuoso, which will be used as one of the storage technologies in the LOD2 stack. The second knowledge store technology will be MonetDB introduced by Peter Boncz from CWI. Both systems will also be used as a kind of benchmark laboratory for hosting and querying linked data.

Christian Bizer from FU Berlin talked about Silk and D2R. In combination they will be used to discover relationship and similarities between entities within different linked data sources – generally called identity resolution.

Giovanni Tummarello from DERI introduced Sindice and Sig.ma under the aspect of how to update, validate and reuse data that is available on the web and support the production of professional, collaboratively governed linked data especially for enterprise use. Beside that an important aspect will be how to handle the high amounts of generated data. So according to Giovanni scaling the infrastructure and the use of appropriate hardware will be central in bringing the Sindice index into enterprise stacks i.e. as an approach for lightweight data consolidation purposes.

Norman Heino from AKSW University of Leipzig introduced OntoWiki and Semantic Pingback. Ontowiki will be used at the interface layer for producing, annotating, browsing and querying linked data and presenting it to the enduser in various GUIs. Semantic Pingback’s aim is to interlink the Web 2.0 with the Semantic Web by backwards compatible RPCs (remote procedure calls). It detects new typed or untyped external links, manages the GET and POST commands and it takes care of server autodiscovery.

Andreas Blumauer from Semantic Web Company demonstrated PoolParty as a smart editor for metadata in enterprise stacks. Like Ontowiki PoolParty also addresses the interface level of LOD2 especially when it comes to generate, edit and link metadata to documents primarily based on SKOS. PoolParty deliberatelly uses Thesauri as a mapping layer to discover similarities of documents, generate tag recommendations for their annotation and publish used vocabularies as Linked Data.

In the afternoon we continued with individual breakout sessions to discuss work package interdependencies and start profiling the use cases and requirements eingineering in more detail.

The third day started with an introduction by Stefano Bertolo – the responsible scientific project officer from the EC side for the LOD2 project – who pointed out that the LOD2 project is an important one for the European Web of Data and the EC among others specially is interested in the Open Government Data use case of LOD2.

After this introduction talks of the 3 Use Cases were presented by A) Jonathan Gray (OKFN) about the Open Gov Data use case followd by B) Amar-Djalil MEZAOUR (Exalead) speaking about the Linked Business Data use case and C) Christian Dirschl (Wolters Kluwer) having a talk about the LOD in the publishing & media industry use case.

Central to the success of LOD2 will be a smart handling of all the integration issues which will come up in the course of the project. Here Tenforce, an integration specialist from Belgium, will have the lead. CEO Bastiaan Deblieck gave a detailed outlook on the methodologies  and he presented a nice and comprehensive overview how the integration issues will be approached from a SCRUM perspective.

After a presentation about LOD2 project dissemination, training and community building activities by Martin Kaltenböck (Semantic Web Company) there were serveral discussions going on until the successful kick off meeting was closed by project lead Sören Auer (Universität Leipzig) at 04.00pm of 08 September 2010.

Updated news information can be accessed on the LOD2 project website as well as on the LOD2project twitter stream (and on twitter using #lod2)…

Stay tuned!

Andreas Blumauer

Why SKOS thesauri matter – the next generation of semantic technologies

As a matter of fact still a lot of “semantic technologies” are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like “similarity search”, the search for similar documents to enable cross-reading or recommendation systems.

Providers of first generation semantic technologies calculate rather basic “semantic networks” by co-occurency analysis which results sometimes in  disappointing results. Bearing in mind that Google just bought a company (“Google buys Metaweb“) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.

A demo application was recently developed by PoolParty team where one can find out how thesauri will improve search results on top of second generation semantic technologies. With PoolParty SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag & Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) STW (Standard Thesaurus für Wirtschaft), DBpedia and respective articles from Wikipedia.

STW which was developed by the German National Library of Economics (ZBW) provides vocabulary on any economic subject: about 6,000 standardized subject headings and about 18,000 entry terms to support individual keywords.

This background knowledge is used in this demo app to improve the search for similar documents dramatically:

Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as “similar documents”.

This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from Econstor) but of course for other recommender systems thesauri from other domains can be used instead of STW.

Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of third generation semantic technologies are currently developed by LASSO project and LOD2 project, two innovative projects in the area of linked data and the semantic web.