Thomas Thurner

Automatic Semantic Tagging for Drupal CMS launched

REEEP [1] and CTCN [2] have recently launched Climate Tagger, a new tool to automatically scan, label, sort and catalogue datasets and document collections. Climate Tagger now incorporates a Drupal Module for automatic annotation of Drupal content nodes. Climate Tagger addresses knowledge-driven organizations in the climate and development arenas, providing automated functionality to streamline, catalogue and link their Climate Compatible Development data and information resources.

Climate Tagger

Climate Tagger for Drupal is a simple, FREE and easy-to-use way to integrate the well-known Reegle Tagging API [3], originally developed in 2011 with the support of CDKN [4], (now part of the Climate Tagger suite as Climate Tagger API) into any web site based on the Drupal Content Management System [5]. Climate Tagger is backed by the expansive Climate Compatible Development Thesaurus, developed by experts in multiple fields and continuously updated to remain current (explore the thesaurus at http://www.reegle.info/glossary). The thesaurus is available in English, French, Spanish, German and Portuguese. And can connect content on different portals published in these different languages.

Climate Tagger for Drupal can be fine-tuned to individual (and existing) configuration of any Drupal 7 installation by:

  • determining which content types and fields will be automatically tagged
  • scheduling “batch jobs” for automatic updating (also for already existing contents; where the option is available to re-tag all content or only tag with new concepts found via a thesaurus expansion / update)
  • automatically limit and manage volumes of tag results based on individually chosen scoring thresholds
  • blending with manual tagging
click to enlarge

click to enlarge

“Climate Tagger [6] brings together the semantic power of Semantic Web Company’s PoolParty Semantic Suite [7] with the domain expertise of REEEP and CTCN, resulting in an automatic annotation module for Drupal 7 with an accuracy never seen before” states Martin Kaltenböck, Managing Partner of Semantic Web Company [8], which acts as the technology provider behind the module.

Climate Tagger is the result of a shared commitment to breaking down the ‘information silos’ that exist in the climate compatible development community, and to provide concrete solutions that can be implemented right now, anywhere” said REEEP Director General Martin Hiller. “Together with CTCN and SWC laid the foundations for a system that can be continuously improved and expanded to bring new sectors, systems and organizations into the climate knowledge community.”

For the Open Data and Linked Open Data communities, a Climate Tagger plugin for CKAN [9] has also been published, which was developed by developed by NREL [10] in cooperation with CTCN’s support, harnessing the same taxonomy and expert vetted thesaurus behind the Climate Tagger, helping connect open data to climate compatible content through the simultaneous use of these tools.

REEEP Director General Martin Hiller and CTCN Director Jukka Uosukainen will be talking about Climate Tagger at the COP20 side event hosted by the Climate Knowledge Brokers Group in Lima [11], Peru, on Monday, December 1st at 4:45pm.

Further reading and downloads

About REEEP:

REEEP invests in clean energy markets in developing countries to lower CO2 emissions and build prosperity. Based on strategic portfolio of high impact projects, REEEP works to generate energy access, improve lives and economic opportunities, build sustainable markets, and combat climate change.

REEEP understands market change from a practice, policy and financial perspective. We monitor, evaluate and learn from our portfolio to understand opportunities and barriers to success within markets. These insights then influence policy, increase public and private investment, and inform our portfolio strategy to build scale within and replication across markets. REEEP is committed to open access to knowledge to support entrepreneurship, innovation and policy improvements to empower market shifts across the developing world.

About the CTCN

The Climate Technology Centre & Network facilitates the transfer of climate technologies by providing technical assistance, improving access to technology knowledge, and fostering collaboration among climate technology stakeholders. The CTCN is the operational arm of the UNFCCC Technology Mechanism and is hosted by the United Nations Environment Programme (UNEP) in collaboration with the United Nations Industrial Development Organization (UNIDO) and 11 independent, regional organizations with expertise in climate technologies.

About Semantic Web Company

Semantic Web Company (SWC, http://www.semantic-web.at) is a technology provider headquartered in Vienna (Austria). SWC supports organizations from all industrial sectors worldwide to improve their information and data management. Their products have outstanding capabilities to extract meaning from structured and unstructured data by making use of linked data technologies.

Thomas Schandl

Drupal and the Semantic Web – Interview with Stéphane Corlosquet

Stéphane Corlosquet has been the main driving force in incorporating Semantic Web capabilities into Drupal. In the recent release of Drupal 7, Semantic Web technologies became part of the core of this popular CMS, which is used to power at least 1% of all the world’s web sites.

Drupal is the leading CMS when it comes to implementing Semantic Web standards. What are the reasons for this, what makes Drupal such a good fit for Semantic Web technologies?

Historically, Drupal is known to be web standard compliant. It supported the RDF-based aggregation format known as RSS 1.0 as early as in 2001, which was later upgraded to RSS 2.0. The Drupal community prides itself in valid HTML code, not only for the code generated by Drupal, but also by taking the extra step of automatically fixing faulty HTML entered by its users. Drupal has been using XHTML since its version 4.0 in 2002. The next logical step beyond XHTML was to add a layer of semantics with the RDFa standard, a W3C recommendation published in 2008.

There are definitely many reasons that contributed to the addition of RDFa into Drupal 7. The first comes from the Drupal project lead, Dries Buytaert, who is passionate about the web and open source. Secondly, the growing Drupal community is very web savvy and includes many experts from different backgrounds in accessilibity, CSS, HTML, security etc. As a result, every release of Drupal includes many latest standards. The community meets twice a year at conferences (DrupalCons), thes events play a great role in hashing out what technologies or designs will be incorporated into the next version of Drupal. Because of the flexibility of its internal architecture, Drupal is able to keep up with the latest web standards. Content in Drupal is very structured and provides site administrators with a user interface to build the site structure they want, using entity types, content types, fields and taxonomies for categorization. When it comes to other CMSs, Joomla!’s community appears to be more fragmented with a core software that is not as extensible as Drupal and WordPress is more of a blogging platform, so turning it into a full blown CMS can be challenging. Both WordPress and Joomla! are in fact adapting the concept of Drupal’s Content Construction Kit (CCK) to their software but they have not yet reached the same level of maturity as Drupal.

A common objection to the adoption of Semantic Web technologies is that the learning curve is steep and that it is too complicated for many web developers to get into it. How can Drupal 7 change that? Which features accessible for the average web site operator will it offer?

Semantic Web technologies don’t have to be complicated when applied to simple use cases! We purposely chose only of a subset of semantic web technologies to integrate into the core of Drupal, keeping the learning curve for the Drupal developers and users as low as possible. The main technology is RDFa which includes the notions of vocabularies (a schema, or collection of attributes) as well as Compact URIs (CURIEs) which make the authoring of RDFa easier. In fact, some web developers might have come across these notions before when working with Dublin Core in the meta tags as such dc:title or dc:date.

Which benefits will web site owners get when they switch to a semantics enabled Drupal 7?

Google and Bing increasingly rely on machine-readable structured data from the websites that they crawl. The design of Drupal 7 embeds semantic meta data that makes machine-to-machine (M2M) search native for a Drupal 7 website. RDFa can add value by giving search engines more details such as the latitude and longitude of a venue for display on a map; or providing the ISO date format for localization and proper display in the search results for different countries.

What are your hopes regarding the development of other applications that either provide or consume data from D7 sites? Which improvements of standards, best practices or (lightweight) ontologies in the Semantic Web community would you like to see?

Services like Sig.ma are already able to collect semantic data from different sources and display it in new ways in the form of mash-ups. Eventually, these services that consume semantic data will not be just Drupal specific, as more platforms jump on the semantic web band wagon. What I hope to see as improvements or best practices in the future are more well-maintained vocabularies. Many of the existing vocabularies are over engineered, some fail to de-reference properly. Their is also some work to be done in order to improve the tooling made available to web developers as well as introducing the simple concepts of Linked Data to web developers via easy to read documentation.

Thank you for this interview, Stéphane!

Andreas Koller

Semantic Web and Drupal

Yesterday I was at a meeting of the Drupal Austria Community at the Nelsons/TU Vienna. Drupal is a powerful free open source CMS written in PHP. We talked about the semantic web and how to integrate semantic technology into Drupal as a module. Our special guest was Stéphane Corlosquet from Deri Galway. Straight away he came from the airport to join our meeting at half past 9 pm. After having a short dinner (a fast hamburger) he presented us his thougts about how Drupal data could be described in RDF. In the attached diagram you can see the current Drupal data structure and the proposed RDF schema

Drupal RDF Schema
But how could this be realized? Which framework would fit the intended requirements? A possible solution could be ARC. ARC is a flexible RDF system for semantic web and PHP developers an easy RDF and SPARQL for LAMP systems. Stephane told us that he is just at the beginning of testing ARC for its usefulness for achieving the desired goals. I’m curious what the output of the planned architecture using ARC will be. Could Drupal, in combination with the RDF module, be the CMS killer application? I’m sure these guys are on the right way. Thank you all guys for this very pleasant evening yesterday. Hope to see you again soon.