Thomas Thurner

The hype, the hope and the LOD2: Sören Auer engaged in the next generation LOD

The paneuropean Project LOD2 is one of the biggest projects dealing with linked data. Scientists, programmers and software architects in various european countries are working on the next generation of linked open data. In a series of interviews i’m presenting people working on and with LOD2. As a start, i had the change to talk to Sören Auer, head of the LOD2 project.

Thomas Thurner: Over the recent years the LOD movement gained tremendous momentum. As one of the key players in this area how do you perceive this development? Hype or hope?

Sören Auer: From my point of view the momentum LOD gained is deserved. We should strive for a Web, which is more decentralized, democratic, participatory, transparent and inclusive. Linked Open Data is from my point a key technological building block on this road. However, a lot of work is ahead of us. LOD has to find its way directly into mainstream technology such as CMSes, Search Engines, Web Applications, Mash-Ups and we have to show users and stakeholders the direct added-value of this technology.

Thomas Thurner: What is the current state of the LOD cloud from a technological point of view? Where do you see room for improvement?

Sören Auer: Currently, the technological state of LOD seems to be comparable to the early days of the Web. We are still able to draw maps/clouds of the LOD datasets and data links are still sparse and difficult to maintain. This reminds me a lot of the early days of the Web, where we also had problems with broken links (the infamous 404). Later, after content management systems and Web applications automatized the link generation and maintenance this improved a lot and I hope we are on the same road with LOD technologies finding its way into more and more Web systems.

Thomas Thurner: How is the LOD2 project addressing theses issues? What are the project’s key objectives?

Sören Auer: LOD2 is addressing in three ways: First, we develop new research approaches highly relevant for LOD, for example, for Linked Data management, automatic data linking as well as Linked Data enrichment andquality improvement. Second, we implement and integrate these approaches into specialized tools (e.g. SILK, OntoWiki, Virtuoso and DL-Learner) forming together the integrated LOD2 stack. The LOD2 stack can be used by data publishers for the whole life-cycle of Linked Data management ranging from extraction over linking, authoring, enrichment to exploration & search.

Thomas Thurner: What do you think are the most important factors to bring LOD to the masses?

Sören Auer: From my point of view the key factor here is that we manage to integrate the large number of tools and approaches for supporting the Linked Datalife-cycle stages in a synergistic way, where each aspect adds value and triggers a number of other improvements. For example, the establishing of a new data link has a direct effect on search & exploration of Linked Data. We have to directly show these kind of benefits to users so they receive and instant gratification for contributions to the Web of Data. Semantic Wikis, such as Semantic MediaWiki and OntoWiki, are already nicely working in this direction. An application with an enormous potential to bring LOD to the masses would be the creation of a distributed, social semantic network. With OpenId, WebId, FOAF, Semantic Pingback most of the building blocks are available, but the final step integrating these into an easy-to-use social networking application still has to be done.

Thomas Thurner: Compared to other semantic web approaches linked data principles seem to be rather easy to understand. On the other hand some argue that the “linked data cloud” is a big heap of data which cannot be used for professional purposes. What is your point of view?

Sören Auer: Of course the currently available data is not useful for all potential usage scenarios. However, already now Linked Data can be used for many interesting applications: For example, we just completed the development of a prototype for a large search engine, where users searching are assisted with comprehensive background information obtained from the Linked Data Web. For this use case, information available as Linked Data is already very valuable and useful. The criticism of LOD being a “heap of data” also reminds me a lot of the early days of the Web, where people raised similar criticisms for the Web being a medium of un-professionalism. Later it turned out that, of course there is a lot of amateurism, but as Wikipedia impressively demonstrates the working together of many amateurs with the right tools can in the end outperform few professionals.

Thomas Thurner: Linked Data could also become a new paradigm for light-weight enterprise data integration. What are the biggest obstacles today for linked data to being accepted by the business community?

Sören Auer: Using Linked Data for data integration in large enterprises has an enormous potential. Just last week I was invited for a workshop with the IT department of one of the top car makers and the people responsible there for data integration were extremely excited about the opportunities of Linked Data in the large heterogeneous enterprise with more than 3000 different backend systems. Linked Data technologies can easily fill the gap between unstructured Intranet search and expensive & complicated Service-oriented Architectures. Compared to SOA, Linked Data is a pay-as-you-go strategy, where data integration can be performed incementally and in sync with the requirements and evolution of the data structures in the enterprise. In order to realize this vision, we need to continue the maturation of enterprise Linked Data tools – the availability of PoolParty, Sindice Enterprise Edition, Virtuoso, TopBraid are already important steps in that direction.

Thomas Thurner: Automatic mechanisms to curate linked data and to make alignments between datasets possible play a crucial role for the next phase of linked data economics. Which technologies will play a central role? What will be the most critical point – do you see a “wisdom of the crowd” playing a role in this game?

Sören Auer: Definitely! Tapping the wisdom of the crowd for mapping & linking has a huge potential, which is currently unused. We started working in that direction with DBpedia Live and the DBpedia mapping Wiki. In order, to make it really easy for people to contribute we have to dramatically lower the barrier to contributing to the alignment process. In LOD2 we also plan to enable users to create mapping and links between dataset by simply giving examples of correct links and evaluating some automatically generated ones.

Thomas Thurner: At the moment governments all around the world start to publish open data, more and more stakeholders start to understand the benefit of open linked data. On the other hand enterprises haven´t even started with this topic. What could be the dynamics which will trigger projects in industry sectors like financial industries which will make use of open data principles?

Sören Auer: Making statistical and financial information available in structured form and as Linked Data could have a enormous impact in this regard. With the DataCube vocabulary effort a first step in this direction was made, but it would be nice if this vocabulary would get an official stamp of a standardization organization such as W3C. Since the benefit of publishing statistical and financial data in structured form, e.g. as Linked Data, is visible most when done by many, this could be also facilitated by government regulations and industry best-practices.

About INFAI

The Institute for Applied Computer Science (InfAI) at Universität Leipzig hosts research groups in service sciences, knowledge engineering and management as well as natural language processing. The approximately 20 researchers of the Agile Knowledge Engineering and Semantic Web (AKSW) research group at InfAI headed by Dr. Sören Auer are establishing theoretical results and scalable implementations for the field. Particular emphasis is given to areas such as ontology creation and
manipulation, knowledge extraction, ontology learning and information & data integration on the Semantic Data Web. The implemented tools and services (such as DBpedia, OntoWiki, DL-Learner and LinkedGeoData) developed by the group enjoy considerable popularity.

About Sören Auer

Dr. Sören Auer leads the research group Agile Knowledge Engineering and Semantic Web (AKSW) at Universität Leipzig. His research interests include semantic data web technologies, knowledge representation, engineering & management, usability, agile methodologies as well as databases and information systems. He aims to combine strong theoretical results with high-impact practical applications. Sören is author of over 50 peer-reviewed scientific publications resulting in a Hirsch index of 15. Sören is leading the large-scale integrated EU-FP7-ICT research project “LOD2 – Creating Knowledge out of Interlinked Data”. Sören is founder (respectively co-founder) of several high-impact research and community projects such as the Wikipedia semantification project DBpedia or the social Semantic Web toolkit OntoWiki. He is co-organiser of several workshops, programme chair of I-Semantics 2008, OKCON 2010, ESWC 2010 and ICWE 2011, area editor of the Semantic Web Journal, serves as an expert for industry, the European Commission, the W3C and is member of the advisory board of the Open Knowledge Foundation.

Thomas Schandl

Using Triplify to expose the semantics of a site

Recently the SWC took a thorough look at Triplify, a tool for mapping the contents of a relational DB to RDF, in the course of which we could convince ourselves of Triplify’s ease of use and its potent capabilities.
We take this opportunity to given an account of the philosophy behind Triplify, how it is used and also had the chance to interview the creator Sören Auer.

Triplify Logo

A common objection from critics of the semantic web is that regular users or webmasters won’t go to the trouble of marking up their content or whole web sites with RDF.
While it is obvious that nobody is going to decorate their web pages with hand-carved RDF triples, it is also apparent that a lot of the current web’s pages are generated by transforming information from relational databases to HTML pages, which are perfectly suited for human consumption, but which suffer from a big loss of machine-readable semantics.

As the information in the relational databases is highly structured and contains rich semantics, it is only natural to also use the already existing structured data to generate RDF representations of the same information.

Triplify is all about this approach of bootstrapping data for the semantic web. It does this for web applications which are built on PHP and MySQL.
Triplify consists of a lightweight PHP script and a configuration file. The latter is used to do the mapping of the columns of an application’s relational database to appropriate RDF classes and properties.

In many cases a site administrator who wants to export her site’s content as RDF, only has to save Triplify with a premade configuration file for her site’s application into the right folder, as for many popular applications like WordPress, Joomla! or phpBB all the work has already been done.
Once installed, Triplify can be used to generate a dump of the site’s complete RDF graph, or to generate Linked Data, as each of the site’s main concepts’ RDF graph is provided under its own URL, e.g. the semantic description of a user with the ID 123 can be accessed under http://yoursite.com/triplify/user/123.

If no configuration for an application exits, it is fairly easy to create one by yourself.
All one has to do is to look at the app’s database schema, find appropriate classes and properties from well known ontologies and create MySQL queries that grab the data from the relational database and map them to RDF classes or properties.
An example for a query that takes the data from a table describing the user of a CMS:
"SELECT id, name AS 'foaf:name', url AS 'foaf:homepage', short_description AS 'dc:abstract' FROM user_table",

Triplify’s creator Sören Auer kindly gave us the opportunity for an interview:

Triplify is very easy to configure for web developers. For which scenarios would you recommend to use Triplify, and in which situations other approaches of semantifying your data might be more suitable?

As you already mentioned Triplify was primarily developed for Web applications developed in PHP. These usually have a relatively small and simple set of tables. Triplify creates complete RDF exports, Linked Data or JSON, but does not include SPARQL endpoint functionality. When SPARQL is required you are better off with D2R Server or Virtuosos RDF views.

Triplify creates semantic representations of the data in relational databases. Do you think there would also be benefit in the inverse approach i. e. creating an application that parses triples and writes it to a relational DB according to a mapping file?

In certain scenarios this might make sense, but for the most cases I think the database schema has to be developed separately. Database schemata contain more storage and retrieval oriented information, such as for example about data indexing. Vocabularies and ontologies on the other hand represent information on a conceptually higher level and are more flexible with regard to evolution of the information structures than databases.

Are there plans for further development of Triplify?

Sure. We want to add SPARQL support and possibly port Triplify to other scripting languages such as Ruby and Python.

Thank you Sören, we will stay tuned about the news from your great application and look forward to the Triplification Challenge 2009!

Thomas Schandl

OntoWiki Workshop

Days 3 and 4 of the OntoWiki KickOff Meeting in Leipzig were comprised of semantic technologies and OntoWiki development workshops.

Just like the overall organization of the project meeting was very good, so Sebastian Dietzold, Sebastian Hellmann, Michael Martin and Jörg Unbehauen did a real good job at putting the ideas behind key concepts of the semantic web across in several introductory SemWeb presentations. Their talks about various technologies from the semantic web stack like URIs, RDF and its serialisations, RDFS, SPARQL and some related tools were well suited to bring people who are relatively new to the semantic web up to speed. Links to the presentation slides can be found at the project page in the coming days.

Later Jens Lehmann outlined the new things OWL 2 brings, e. g. profiles, which are subsets of OWL 2 and which provide different degrees of expressivity and reasoning efficiency.

The last day started with Sören Auer’s presentation of their semantic wiki OpenResearch, a site where information on conferences, journals and scientists is pooled. OpenResearch is built with Semantic MediaWiki (SMW), just like our Social Semantic Web wiki.

While SMW is a very useful tool as it lowers the entry barriers for using semantic wikis, Sören also pointed out  that in comparison OntoWiki provides some important features that SMW doesn’t have:

  • SMW doesn’t use SPARQL for its queries, but a less powerful custom query language, whereas OntoWiki has full SPARQL support.
  • OntoWiki’s UI has many widgets that support the user when entering data or new properties on a page (e. g. there is an autocomplete feature for suggesting properties)
  • With SMW changes to the wiki’s semantic structure often entail manual changes to many, many pages. With OntoWiki it is easy to e.g. change poperties at any time.

For the new version of OntoWiki Sören and his team use the Zend framework and develop the Erfurt API to store and access RDF data. The Erfurt API supports SPARQL, versioning, caching and RDF based authentification/access control. It abstracts different stores using the adapter pattern, so it can be used with Virtuoso and any other store which has an interface provided by Zend_Db (MySQL, Oracle, PostgreSQL, etc.) plus they are working on an interface for Redland. Find the slides for Philipp Frischmuth’s Erfurt API presentation here, the API documentation here and Norman Heino’s Zend & OntoWiki Application Framework presentation here.

Julian Jöris demonstrated how Selenium is used for acceptance testing. This is a very promising testing framework for web applications, where one can e.g. record interactions with different browsers and automatically run them as tests. Selenium has a Firefox extension to record macros and is integrated with PHPUnit.

Finally we had a very good discussion about our conX-OntoWiki integration use case and application ideas, so we left Leipzig with a pleasant anticipation of the coming co-operation in the project.

Andreas Blumauer

The Semantic Web becomes mainstream, again.

The roll-out of semantic web technologies seems to enter the next stage. And it will be a quiet (r)evolution like the open source movement was. Two examples: Next year´s JAX in Mainz/Germany will have its first Semantic Web track. Organisers say that “the Semantic Web is going to conquer the business market soon” – we will see if it will be that martial.

Another example: One of the biggest Open Source Magazines in Germany, t3n, has recently published its new magazine with many stories around the Semantic Web. Editor in chief, Jan Christe says: “We have constantly stumbled upon semantic web related stuff  when we scanned the news, so we decided to set a focus on this topic.”

The Semantic Web is tangible now – Christe says: “Applications like OpenCalais, Zemanta or Tagaroo show the end-users what´s really in for them.” And it is also nice to see, that the semantic web won´t be reduced down to “search” anymore: t3n´s new issue has also interesting articles about Linked Data, for instance Sören Auer´s “How to develop Semantic Web Applications”.

So, as a conclusion: Paul Miller´s waiting for the “Semantic Web in Business” (a great blog post!) has an end. It won´t be found in heavy books, rather in the open source community and sometimes in light-weight magazines.

Yes, we can!