Thomas Schandl

Interview on Enhancing Semantic Web applications with Linguistic Information

John McCrae (Uni Bielefeld), Elena Montiel-Ponsoda (Universidad Politécnica de Madrid) and Tobias Wunner (DERI Galway) will hold a tutorial at the ESWC 2011 with the title “Enriching the Semantic Web with Linguistic Information“. We had a chance to talk to them beforehand:

Can you please tell us about the aims and purpose of your tutorial and the importance of incorporating linguistic information in the Semantic Web?

With the continuing growth of linked data and semantic technologies the incorporation of linguistic descriptions into Semantic Web resources has become a challenging issue. The integration of linguistic information especially on a multilingual level could greatly benefit Natural Language Processing (NLP) applications. Furthermore, the continuing growth of ontologies for semantic modeling and the use of terminological resources to add human language descriptions has raised the issue of how to add linguistic information to ontologies and linked data vocabularies and to represent models of lexical and terminological information in a way which is compatible with Semantic Web standards. Prominent examples here are, for instance, multilingual language tags in RDF Schema or SKOS’s success in bringing terminological information to the Semantic Web.

In the Tutorial we would like to discuss trends and novel models such as Lemon – the lexicon model for ontologies – to show possible future directions. The tutorial is targeted at researchers and practitioners interested in learning how to enrich ontologies with linguistic information in one or several natural languages and NLP tool developers interested in understanding how Semantic Web resources can be leveraged fro NLP. There will be two hands-on sessions in this tutorial.

Why did you choose to use PoolParty thesaurus management system in your tutorial?

To create terminology models on the web there are only few tools available which are often very technical and not straightforward to use for non-experts. We found that PoolParty in contrast to other SKOS editors has an attractive and usable interface. In addition the web based interface was preferable, as it did not require the participants to download software, the immediate publishing of linked data is more compatible with linked data principles and the tool has similarities to our own tools for working with lemon.

Thank you for this interview!

Thomas Schandl

Drupal and the Semantic Web – Interview with Stéphane Corlosquet

Stéphane Corlosquet has been the main driving force in incorporating Semantic Web capabilities into Drupal. In the recent release of Drupal 7, Semantic Web technologies became part of the core of this popular CMS, which is used to power at least 1% of all the world’s web sites.

Drupal is the leading CMS when it comes to implementing Semantic Web standards. What are the reasons for this, what makes Drupal such a good fit for Semantic Web technologies?

Historically, Drupal is known to be web standard compliant. It supported the RDF-based aggregation format known as RSS 1.0 as early as in 2001, which was later upgraded to RSS 2.0. The Drupal community prides itself in valid HTML code, not only for the code generated by Drupal, but also by taking the extra step of automatically fixing faulty HTML entered by its users. Drupal has been using XHTML since its version 4.0 in 2002. The next logical step beyond XHTML was to add a layer of semantics with the RDFa standard, a W3C recommendation published in 2008.

There are definitely many reasons that contributed to the addition of RDFa into Drupal 7. The first comes from the Drupal project lead, Dries Buytaert, who is passionate about the web and open source. Secondly, the growing Drupal community is very web savvy and includes many experts from different backgrounds in accessilibity, CSS, HTML, security etc. As a result, every release of Drupal includes many latest standards. The community meets twice a year at conferences (DrupalCons), thes events play a great role in hashing out what technologies or designs will be incorporated into the next version of Drupal. Because of the flexibility of its internal architecture, Drupal is able to keep up with the latest web standards. Content in Drupal is very structured and provides site administrators with a user interface to build the site structure they want, using entity types, content types, fields and taxonomies for categorization. When it comes to other CMSs, Joomla!’s community appears to be more fragmented with a core software that is not as extensible as Drupal and WordPress is more of a blogging platform, so turning it into a full blown CMS can be challenging. Both WordPress and Joomla! are in fact adapting the concept of Drupal’s Content Construction Kit (CCK) to their software but they have not yet reached the same level of maturity as Drupal.

A common objection to the adoption of Semantic Web technologies is that the learning curve is steep and that it is too complicated for many web developers to get into it. How can Drupal 7 change that? Which features accessible for the average web site operator will it offer?

Semantic Web technologies don’t have to be complicated when applied to simple use cases! We purposely chose only of a subset of semantic web technologies to integrate into the core of Drupal, keeping the learning curve for the Drupal developers and users as low as possible. The main technology is RDFa which includes the notions of vocabularies (a schema, or collection of attributes) as well as Compact URIs (CURIEs) which make the authoring of RDFa easier. In fact, some web developers might have come across these notions before when working with Dublin Core in the meta tags as such dc:title or dc:date.

Which benefits will web site owners get when they switch to a semantics enabled Drupal 7?

Google and Bing increasingly rely on machine-readable structured data from the websites that they crawl. The design of Drupal 7 embeds semantic meta data that makes machine-to-machine (M2M) search native for a Drupal 7 website. RDFa can add value by giving search engines more details such as the latitude and longitude of a venue for display on a map; or providing the ISO date format for localization and proper display in the search results for different countries.

What are your hopes regarding the development of other applications that either provide or consume data from D7 sites? Which improvements of standards, best practices or (lightweight) ontologies in the Semantic Web community would you like to see?

Services like Sig.ma are already able to collect semantic data from different sources and display it in new ways in the form of mash-ups. Eventually, these services that consume semantic data will not be just Drupal specific, as more platforms jump on the semantic web band wagon. What I hope to see as improvements or best practices in the future are more well-maintained vocabularies. Many of the existing vocabularies are over engineered, some fail to de-reference properly. Their is also some work to be done in order to improve the tooling made available to web developers as well as introducing the simple concepts of Linked Data to web developers via easy to read documentation.

Thank you for this interview, Stéphane!

Thomas Thurner

semantic technolgies for non-SQL-writers

isd_banner3Andreas Blumauer (Semantic Web Company) talked with Brian Donnelly about a new system on the market called “Semantic Discovery System” (SDS), which helps to do sophisticated queries across existing datasets. Also talking why complex scripts or triple stores should not be exposed to the end-users anymore.

SDS is doing, what semantic web enterprises promised for years: An application that allows users to formulate sophisticated questions on their datasets and getting back data without writing SQL statements or going down to OWL concepts.

SDS leave the data in its orignal format and doing no transformation into triple stores. And then give the user through a graphical desktop software – with the use of OWL and SPARQL – the possibility to formulate questions on this datasets. So this is a software engine that focuses “at business people with a tool as easy to use as Excel or Mind Manager – with zero need to know or care about OWL, SPARQL” as Donnelly explains.

The next times will show if Donnelly’s “Semantic Discovery System” may be a semantic web killer application. In any case it seems to be a good step in bringing semantic technologies out of the teccie’s corner onto the desktops of business users.

Read the full interview at www.semantic-web.at

Reblog this post [with Zemanta]
Thomas Schandl

Tom Tague on Open Calais 4

The recent release of Open Calais v4 offers excting new possibilities by making a great contribution to Linked Data efforts.

Previous releases of Thomson Reuter’s Open Calais web service already produced promising results by extracting named entities, facts and events from user submitted contet – especially news articles. Now these extracted concepts come with an URI and are linked into the LOD cloud – specifically to DBpedia, Freebase, Musicbrainz, CIA world fact book and others. Tom Tague

On this occasion Tom Tague, vice president of the Calais creators ClearForest, answered questions the Semantic Web Company had about the goals of Open Calais. 

The latest release of Open Calais produces metadata conforming to linked data principles. You provide this great service free to everyone via your web service.
What led to that decision, which benefits are there for Thomson Reuters?

Thomson Reuters has the largest trusted content sources in the world – but we don’t have all the content in the world. We believe that the world is going to want to integrate highly managed and trustworthy content assets such as those provided by Thomson Reuters with the low latency, highly diverse content exploding on the web. Fundamentally what we’re trying to achieve is nearly effortless interoperability of content between any two partners – Calais enables this by extracting the semantic metadata buried in your content but then takes it a step further. By linking those semantic elements to the Linked Data cloud we are setting the stage for the dramatic enhancement of any content source – and we hope that many will choose Thomson Reuters as one of the methods for enhancing that content.

It seems with Open Calais you use a hybrid business model, which integrates end users in a form of enterprise collaboration into value creation.
Do you think such a business model is viable during the long run and what are your experience so far?

As of right now Calais isn’t truly a “Business”.  It’s a strategic initiative that’s setting at least a piece of the stage for the Linked Content Economy. Our goal is to understand how this new content economy is going to involve and to make certain that we have a leadership position as it moves from a concept to reality.

Apart from the thousands of users submitting content to Open Calais, there is also a community of developers making their own applications around your core app. How important are the social dynamics of the Open Source community for the success of Open Calais?

Extraordinarily important. Calais is a web service – which means it’s relevant to about 0.0001% of the population. We are absolutely reliant on the creativity, energy and domain expertise of our developer community to translate Calais from a technology to an end-user relevant capability. And – as a user-driven project we also rely on our developers and users to give us feedback on what they like, what they don’t and where they think we should head.
What are your plans regarding to offering your service in German?

We hope to get there in 2009. We’ve released basic French and are gearing up for additional languages in the coming year.

Thank you, Tom, for your answers! We look forward to more applications like Semantic Proxy and Linked Facts that demonstrate the great protential of the Calais engine.