The Semantic Puzzle

Jana Herwig

Bringing (Legacy) Data to the Web [WOD-PD]

The third session at WOD-PD was dedicated to “Bringing (Legacy) Data on the Web“, and led by Sören Auer (University of Leipzig, Germany) and Orri Erling (OpenLink SoftwareFounded in 1992, OpenLink Software, Inc. , is a software company headquartered in Burlington, Massachusetts, USA. The company develops and deploys standards-compliant middleware products that cover: Transparent access to SQL data sources via ODBC and JDBC drivers, and OLE-DB, ADO. NET, and XMLA ...) .

Sören Auer giving a talkSören Auer described the difference between the Web 1.0, 2.0 and 3.0 as follows: On the Web 1.0, you had many websites that provided unstructured, mainly textual content. On the Web 2.0, you have a few large websites that are specialised on specific content types. And, finally, on the Web 3.0, there are many websites which contain, and are able to semantically syndicate, arbitrarily structured content.

So why would we need another web? What you cannot do with the current web is finding answers to seemingly complex, yet in reality pretty mundane question such as: Where in Leipzig do I find an apartment that is close to bilingual, German-French child care facilities? Are there any ERP service providers which have offices in ViennaVienna (/viˈɛnə/; German: Wien, [viːn]) is the capital and largest city of Austria, and one of the nine states of Austria. Vienna is Austria's primary city, with a population of about 1.8 million (2.6 million within the metropolitan area, nearly one third of Austria's population), and its ... and BerlinBerlin is the capital city of Germany and is one of the 16 states of Germany. With a population of 3.45 million people, Berlin is Germany's largest city. It is the second most populous city proper and the seventh most populous urban area in the European Union. Located in northeastern Germany, it ...? Who are the researchers in South-East AsiaAsia is the world's largest and most populous continent, located primarily in the eastern and northern hemispheres. It covers 8.6% of the Earth's total surface area (or 29.9% of its land area) and with approximately 4 billion people, it hosts 60% of the world's current human population. Asia is ... currently working on database related topics?

Sören further discussed three of the present means of bringing relation data to the web: Triplify (a web application plugin that exposes data from relational databases in RDF), D2RQ (a declarative language to describe mappings between relational database schemata and OWL/RDFS ontologies, developed at Free University Berlin), and Virtuoso Universal Server (a middleware and database engine hybrid delivering for instance data integration for SQL, RDF, XML, Web Services). With respect to TriplifyPlugin for Web applications, which reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data. (http://dl-learner.org/Projects/Triplify), Sören – who is Triplify’s founder and main developer at AKSW Uni Leipzig – showed and discussed the configuration for WordPressWordPress is an open source Content Management System (CMS), often used as a blog publishing application, powered by PHP and MySQL. It has many features including a plug-in architecture and a template system. Used by over 12% of the 1,000,000 biggest websites, WordPress is the most popular CMS ... 2.1., which can be found here (click here for more configurations, e.g. for Joomla, OpenConf and DrupalDrupal is a free and open source content management system (CMS) written in PHP and distributed under the GNU General Public License. It is used as a back-end system for many different types of websites, ranging from small personal blogs to large corporate and political sites, including ...). The next aim for Triplify is to become an integral part in enduser web app distibutions.

And important question raised by Sören was: How do next generation search engines know that something has changed on the web of data? He suggested three approaches:

  1. Always try to crawl everything (this may sound silly – but that’s actually what is happening on the current web)
  2. Ping a central update notification service – e.g. PingTheSemanticWeb.com – which works as a showcase, but will probably not scale if the data web gets really deployed.
  3. Each linked data endpoint publishes an update log – e.g. with Triplify, as a special folder inside the Triplify namespace, e.g. http://example.com/Triplify/update

Also discussed by Sören and worth checking out is Reuters’ Semantic proxy – the demo went live in late September.

Orri Erling, as the lead developer of the VirtuosoVirtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional RDBMS, ORDBMS, virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the ... Team, addressed the issue of mapping relational databases to RDF with OpenLink Virtuoso. In his talk, he addressed the pros and cons of RDF data warehouse:

Pros

  • Even query performance across all data
  • Possibility of forward-chaining inference
  • Some SPARQL features may be better supported, e.g. Unspecified predicates

Cons

  • Keeping data up-to-date
  • Complex set up, needs dedicated servers: you don’t build them on a whim

Orri Erling giving a talkWhat Virtuoso delivers is mapping of SPARQL to SQL against any existing schema (whether stored in Virtuoso or elsewhere); a physical quad-store (quad as in quadruple; not as in quad-bike:); and Federated/local Relational Data Base Management Systems (RDBMS).

A more detailed discussion of the requirements for Relational-to-RDF Mapping is available on Orri’s blog, where he discusses it in the light of his own experience. A power point presentation of a previous talk he gave to the W3C RDB2RDF Incubator Group can be downloaded here: Mapping Relational Databases to RDF with OpenLink Virtuoso (PPT, 115KB). His summary of the group discussions around the same topic, Requirements for Relational to RDF Mapping, can be found here.

Orri also showed the Virtuoso billion triples demo which, according to the corresponding blogpost, “is being worked on at the time of submission and may be shown online by appointment.” The demo was a submission to the Billion Triples Challenge.

Reblog this post [with Zemanta]