Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Using Triplify to expose the semantics of a site

April 20, 2009 By: Thomas Schandl Category: Calls & Competitions, Linked Data & Open Data 2 Comments →

Recently the SWC took a thorough look at Triplify, a tool for mapping the contents of a relational DB to RDF, in the course of which we could convince ourselves of Triplify’s ease of use and its potent capabilities.
We take this opportunity to given an account of the philosophy behind Triplify, how it is used and also had the chance to interview the creator Sören Auer.

Triplify Logo

A common objection from critics of the semantic web is that regular users or webmasters won’t go to the trouble of marking up their content or whole web sites with RDF.
While it is obvious that nobody is going to decorate their web pages with hand-carved RDF triples, it is also apparent that a lot of the current web’s pages are generated by transforming information from relational databases to HTML pages, which are perfectly suited for human consumption, but which suffer from a big loss of machine-readable semantics.

As the information in the relational databases is highly structured and contains rich semantics, it is only natural to also use the already existing structured data to generate RDF representations of the same information.

Triplify is all about this approach of bootstrapping data for the semantic web. It does this for web applications which are built on PHP and MySQL.
Triplify consists of a lightweight PHP script and a configuration file. The latter is used to do the mapping of the columns of an application’s relational database to appropriate RDF classes and properties.

In many cases a site administrator who wants to export her site’s content as RDF, only has to save Triplify with a premade configuration file for her site’s application into the right folder, as for many popular applications like Wordpress, Joomla! or phpBB all the work has already been done.
Once installed, Triplify can be used to generate a dump of the site’s complete RDF graph, or to generate Linked Data, as each of the site’s main concepts’ RDF graph is provided under its own URL, e.g. the semantic description of a user with the ID 123 can be accessed under http://yoursite.com/triplify/user/123.

If no configuration for an application exits, it is fairly easy to create one by yourself.
All one has to do is to look at the app’s database schema, find appropriate classes and properties from well known ontologies and create MySQL queries that grab the data from the relational database and map them to RDF classes or properties.
An example for a query that takes the data from a table describing the user of a CMS:
"SELECT id, name AS 'foaf:name', url AS 'foaf:homepage', short_description AS 'dc:abstract' FROM user_table",

Triplify’s creator Sören Auer kindly gave us the opportunity for an interview:

Triplify is very easy to configure for web developers. For which scenarios would you recommend to use Triplify, and in which situations other approaches of semantifying your data might be more suitable?

As you already mentioned Triplify was primarily developed for Web applications developed in PHP. These usually have a relatively small and simple set of tables. Triplify creates complete RDF exports, Linked Data or JSON, but does not include SPARQL endpoint functionality. When SPARQL is required you are better off with D2R Server or Virtuosos RDF views.

Triplify creates semantic representations of the data in relational databases. Do you think there would also be benefit in the inverse approach i. e. creating an application that parses triples and writes it to a relational DB according to a mapping file?

In certain scenarios this might make sense, but for the most cases I think the database schema has to be developed separately. Database schemata contain more storage and retrieval oriented information, such as for example about data indexing. Vocabularies and ontologies on the other hand represent information on a conceptually higher level and are more flexible with regard to evolution of the information structures than databases.

Are there plans for further development of Triplify?

Sure. We want to add SPARQL support and possibly port Triplify to other scripting languages such as Ruby and Python.

Thank you Sören, we will stay tuned about the news from your great application and look forward to the Triplification Challenge 2009!

Sphere: Related Content

OntoWiki Workshop

December 09, 2008 By: Thomas Schandl Category: Software Development, Tools & Software No Comments →

Days 3 and 4 of the OntoWiki KickOff Meeting in Leipzig were comprised of semantic technologies and OntoWiki development workshops.

Just like the overall organization of the project meeting was very good, so Sebastian Dietzold, Sebastian Hellmann, Michael Martin and Jörg Unbehauen did a real good job at putting the ideas behind key concepts of the semantic web across in several introductory SemWeb presentations. Their talks about various technologies from the semantic web stack like URIs, RDF and its serialisations, RDFS, SPARQL and some related tools were well suited to bring people who are relatively new to the semantic web up to speed. Links to the presentation slides can be found at the project page in the coming days.

Later Jens Lehmann outlined the new things OWL 2 brings, e. g. profiles, which are subsets of OWL 2 and which provide different degrees of expressivity and reasoning efficiency.

The last day started with Sören Auer’s presentation of their semantic wiki OpenResearch, a site where information on conferences, journals and scientists is pooled. OpenResearch is built with Semantic MediaWiki (SMW), just like our Social Semantic Web wiki.

While SMW is a very useful tool as it lowers the entry barriers for using semantic wikis, Sören also pointed out  that in comparison OntoWiki provides some important features that SMW doesn’t have:

  • SMW doesn’t use SPARQL for its queries, but a less powerful custom query language, whereas OntoWiki has full SPARQL support.
  • OntoWiki’s UI has many widgets that support the user when entering data or new properties on a page (e. g. there is an autocomplete feature for suggesting properties)
  • With SMW changes to the wiki’s semantic structure often entail manual changes to many, many pages. With OntoWiki it is easy to e.g. change poperties at any time.

For the new version of OntoWiki Sören and his team use the Zend framework and develop the Erfurt API to store and access RDF data. The Erfurt API supports SPARQL, versioning, caching and RDF based authentification/access control. It abstracts different stores using the adapter pattern, so it can be used with Virtuoso and any other store which has an interface provided by Zend_Db (MySQL, Oracle, PostgreSQL, etc.) plus they are working on an interface for Redland. Find the slides for Philipp Frischmuth’s Erfurt API presentation here, the API documentation here and Norman Heino’s Zend & OntoWiki Application Framework presentation here.

Julian Jöris demonstrated how Selenium is used for acceptance testing. This is a very promising testing framework for web applications, where one can e.g. record interactions with different browsers and automatically run them as tests. Selenium has a Firefox extension to record macros and is integrated with PHPUnit.

Finally we had a very good discussion about our conX-OntoWiki integration use case and application ideas, so we left Leipzig with a pleasant anticipation of the coming co-operation in the project.

Sphere: Related Content

The Semantic Web becomes mainstream, again.

December 05, 2008 By: Andreas Blumauer Category: Enterprise 2.0, Literature & Publications No Comments →

The roll-out of semantic web technologies seems to enter the next stage. And it will be a quiet (r)evolution like the open source movement was. Two examples: Next year´s JAX in Mainz/Germany will have its first Semantic Web track. Organisers say that “the Semantic Web is going to conquer the business market soon” – we will see if it will be that martial.

Another example: One of the biggest Open Source Magazines in Germany, t3n, has recently published its new magazine with many stories around the Semantic Web. Editor in chief, Jan Christe says: “We have constantly stumbled upon semantic web related stuff  when we scanned the news, so we decided to set a focus on this topic.”

The Semantic Web is tangible now – Christe says: “Applications like OpenCalais, Zemanta or Tagaroo show the end-users what´s really in for them.” And it is also nice to see, that the semantic web won´t be reduced down to “search” anymore: t3n´s new issue has also interesting articles about Linked Data, for instance Sören Auer´s “How to develop Semantic Web Applications”.

So, as a conclusion: Paul Miller´s waiting for the “Semantic Web in Business” (a great blog post!) has an end. It won´t be found in heavy books, rather in the open source community and sometimes in light-weight magazines.

Yes, we can!

Sphere: Related Content

DBpedia, UMBEL & the Future Web’s Ecology – interview with Mike Bergman & Sören Auer

November 10, 2008 By: Andreas Blumauer Category: Linked Data & Open Data, Mashups & Web services, Ontology Engineering 5 Comments →

Sören AuerThe Linked Open Data infrastructure is in a tremendous process of maturing – the recent release of UMBEL’s webservice AND the incorporation of UMBEL classes in DBpedia are yet another confirmation of this exciting process. Knowing and having met DBpedia co-initiator, Triplify main developer and head of the AKSW research group Sören Auer and UMBEL editor and Zitgist CEO Mike Bergman in various contexts, I felt it was time to talk to and pick the brains of both these key players in a dialog situation. The (first) result is the interview you can find below. As not everyone can expected to be familiar with both projects, here is some backgrond to get you started (you can also go directly to the interview):

Sören Auer (image above), Mike Bergman (image below)

DBpedia has become the largest RDF repository for encyclopaedic knowledge, extracting structured information from Wikipedia and making it available on the Web of Data. UMBEL, on the other hand, provides an OpenCYC-based, light-weight ontology structure for relating Web content and data to a standard set of subject concepts, with a number of 20,000 concepts currently reached. In the Linked Data Cloud, DBpedia and UMBEL map and cross-reference each other.

Mike BergmanIn practice this means that UMBEL provides classes to describe the concepts to which “things” are members. For instance, named entities from Wikipedia such as “John F. Kennedy” are mapped with subject concepts such as Leader, Person, Administrator and Graduate, with broader and equivalent classes in CYC and FOAF and broader subject concepts within UMBEL. A link is set to Wikipedia, as well as a ‘same as’ reference to DBpedia. A class structure enables faceted browsing and extraction, inferencing, and navigation and discovery for all datasets linked to that structure.

DBpedia, in turn, returns properties of ‘John J. Kennedy’ (e.g. abstracts in available Wikipedia languages, demographic information such as birth date and place, alma mater, predecessors and successors), and ‘same as’ references, e.g., to the JFK entry in Freebase (who recently released their RDF service) and the aforementioned page in UMBEL. Furthermore, DBpedia maps the URI with available RDF types, for instance foaf:person or yago:AssassinatedAmericanPoliticians and, once again, with UMBEL’s subject concepts Person, Administrator, Graduate and Leader.

Due to its reliance on Wikipedia, DBpedia does a great job at covering a bandwidth of knowledge as broad as the spectrum of the interest of people participating in Wikipedia; it’s within the area of named entities, i.e. entities such as persons, organizations, locations, which have a proper name, but are not necessarily and specifically part of a particular, acknowledged domain or discipline. UMBEL, on the other hand, has as its most apparent advantage its reliance on OpenCyc and with that the strong inferencing and logic capabilities of the CYC knowledge-base which are thus also brought to the Web of Data. DBpedia is a community project started by the University of Leipzig, Free University Berlin and OpenLink Software, while the open and free UMBEL is developed and hosted by Zitgist with support from, again, OpenLink Software.

Now, and in particular with the recent release of Zitgist’s web service endpoints and with the incorporation of UMBEL classes in DBpedia, questions arises as to the relationship of the two projects, and regarding the role of OpenLink Software in the further process. To draw a distinction:

One could say that DBpedia’s goal is to lower the barrier for web developers and end-users in the actual use of the semantic web, while UMBEL aims at bringing “order to the chaos” that is inherent to user-generated, collective knowledge.

Would you agree with this description – and is it a contradiction at all or the kind of dynamic the Semantic Web community has been waiting for?

Mike Bergman: Yes, I would agree with this description, though we have tried many others. For example, in various writings in the past, we have described UMBEL as a roadmap, or middleware, or a backbone, or a concept ontology, or an ‘infocline’, or a meta layer for metadata, and others. Today, what I tend to use, particularly in reference to DBpedia, is the TBox-ABox distinction in computer science and description logics. UMBEL is more of a class or structural and concept relationships schema — a TBox — while DBpedia is more of an an instance and entity layer with attributes — an ABox. I think they are pretty complementary…
(more…)

Sphere: Related Content

Bringing (Legacy) Data to the Web [WOD-PD]

October 22, 2008 By: Jana Herwig Category: Conferences & Events, Linked Data & Open Data 2 Comments →

The third session at WOD-PD was dedicated to “Bringing (Legacy) Data on the Web“, and led by Sören Auer (University of Leipzig, Germany) and Orri Erling (OpenLink Software) .

Sören Auer giving a talkSören Auer described the difference between the Web 1.0, 2.0 and 3.0 as follows: On the Web 1.0, you had many websites that provided unstructured, mainly textual content. On the Web 2.0, you have a few large websites that are specialised on specific content types. And, finally, on the Web 3.0, there are many websites which contain, and are able to semantically syndicate, arbitrarily structured content.

So why would we need another web? What you cannot do with the current web is finding answers to seemingly complex, yet in reality pretty mundane question such as: Where in Leipzig do I find an apartment that is close to bilingual, German-French child care facilities? Are there any ERP service providers which have offices in Vienna and Berlin? Who are the researchers in South-East Asia currently working on database related topics?

Sören further discussed three of the present means of bringing relation data to the web: Triplify (a web application plugin that exposes data from relational databases in RDF), D2RQ (a declarative language to describe mappings between relational database schemata and OWL/RDFS ontologies, developed at Free University Berlin), and Virtuoso Universal Server (a middleware and database engine hybrid delivering for instance data integration for SQL, RDF, XML, Web Services). With respect to Triplify, Sören – who is Triplify’s founder and main developer at AKSW Uni Leipzig – showed and discussed the configuration for Wordpress 2.1., which can be found here (click here for more configurations, e.g. for Joomla, OpenConf and Drupal). The next aim for Triplify is to become an integral part in enduser web app distibutions.

And important question raised by Sören was: How do next generation search engines know that something has changed on the web of data? He suggested three approaches:

  1. Always try to crawl everything (this may sound silly – but that’s actually what is happening on the current web)
  2. Ping a central update notification service – e.g. PingTheSemanticWeb.com – which works as a showcase, but will probably not scale if the data web gets really deployed.
  3. Each linked data endpoint publishes an update log – e.g. with Triplify, as a special folder inside the Triplify namespace, e.g. http://example.com/Triplify/update

Also discussed by Sören and worth checking out is Reuters’ Semantic proxy – the demo went live in late September.

Orri Erling, as the lead developer of the Virtuoso Team, addressed the issue of mapping relational databases to RDF with OpenLink Virtuoso. In his talk, he addressed the pros and cons of RDF data warehouse:

Pros

  • Even query performance across all data
  • Possibility of forward-chaining inference
  • Some SPARQL features may be better supported, e.g. Unspecified predicates

Cons

  • Keeping data up-to-date
  • Complex set up, needs dedicated servers: you don’t build them on a whim

Orri Erling giving a talkWhat Virtuoso delivers is mapping of SPARQL to SQL against any existing schema (whether stored in Virtuoso or elsewhere); a physical quad-store (quad as in quadruple; not as in quad-bike :) ; and Federated/local Relational Data Base Management Systems (RDBMS).

A more detailed discussion of the requirements for Relational-to-RDF Mapping is available on Orri’s blog, where he discusses it in the light of his own experience. A power point presentation of a previous talk he gave to the W3C RDB2RDF Incubator Group can be downloaded here: Mapping Relational Databases to RDF with OpenLink Virtuoso (PPT, 115KB). His summary of the group discussions around the same topic, Requirements for Relational to RDF Mapping, can be found here.

Orri also showed the Virtuoso billion triples demo which, according to the corresponding blogpost, “is being worked on at the time of submission and may be shown online by appointment.” The demo was a submission to the Billion Triples Challenge.

Reblog this post [with Zemanta]
Sphere: Related Content