The Semantic Puzzle

Thomas Thurner

Automatic Semantic Tagging for Drupal CMS launched

REEEP [1] and CTCN [2] have recently launched Climate Tagger, a new tool to automatically scan, label, sort and catalogue datasets and document collections. Climate Tagger now incorporates a DrupalDrupal is a free and open source content management system (CMS) written in PHP and distributed under the GNU General Public License. It is used as a back-end system for many different types of websites, ranging from small personal blogs to large corporate and political sites, including ... Module for automatic annotationAn annotation is notes that you make to yourself while you are reading information in a book, document, online record, video, software code or other information, "in the margin", or perhaps just underlined or highlighted passages. Annotated bibliographies, give descriptions about how each source ... of Drupal content nodes. Climate Tagger addresses knowledge-driven organizations in the climate and development arenas, providing automated functionality to streamline, catalogue and link their Climate Compatible Development data and information resources.

Climate Tagger

Climate Tagger for Drupal is a simple, FREE and easy-to-use way to integrate the well-known Reegle Tagging APIAn application programming interface (API) is an interface implemented by a software program to enable interaction with other software, similar to the way a user interface facilitates interaction between humans and computers. APIs are implemented by applications, libraries and operating systems ... [3], originally developed in 2011 with the support of CDKN [4], (now part of the Climate Tagger suite as Climate Tagger API) into any web site based on the Drupal Content Management SystemA content management system (CMS) is a collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based. The procedures are designed to: Allow for a large number of people to contribute to and share stored data Control access to ... [5]. Climate Tagger is backed by the expansive Climate Compatible Development ThesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than ..., developed by experts in multiple fields and continuously updated to remain current (explore the thesaurus at http://www.reegle.info/glossary). The thesaurus is available in English, French, Spanish, German and Portuguese. And can connect content on different portals published in these different languages.

Climate Tagger for Drupal can be fine-tuned to individual (and existing) configuration of any Drupal 7 installation by:

  • determining which content types and fields will be automatically tagged
  • scheduling “batch jobs” for automatic updating (also for already existing contents; where the option is available to re-tag all content or only tag with new concepts found via a thesaurus expansion / update)
  • automatically limit and manage volumes of tag results based on individually chosen scoring thresholds
  • blending with manual tagging
click to enlarge

click to enlarge

“Climate Tagger [6] brings together the semantic power of Semantic Web CompanyThe Semantic Web Company (SWC), based in Vienna, provides companies, institutions and organizations with professional services related to the Semantic Web, semantic technologies and Social Software’s PoolPartyWeb based ontology manager which can serve as a central hub for your knowledge organization. With PoolParty you can organize and maintain knowledge models based on widely accepted specifications like RDF, SPARQL and SKOS. Semantic Suite [7] with the domain expertise of REEEP and CTCN, resulting in an automatic annotation module for Drupal 7 with an accuracy never seen before” states Martin Kaltenböck, Managing Partner of Semantic Web Company [8], which acts as the technology provider behind the module.

Climate Tagger is the result of a shared commitment to breaking down the ‘information silos’ that exist in the climate compatible development community, and to provide concrete solutions that can be implemented right now, anywhere” said REEEP Director General Martin Hiller. “Together with CTCN and SWCThe Semantic Web Company (SWC), based in Vienna, provides companies, institutions and organizations with professional services related to the Semantic Web, semantic technologies and Social Software laid the foundations for a system that can be continuously improved and expanded to bring new sectors, systems and organizations into the climate knowledge community.”

For the Open Data and Linked Open DataLinked Open Data (LOD) stands for freely available data on the World Wide Web, which can be identified via Uniform Resource Identifier (URI) and can be accessed and retrieved directly via HTTP. Finally link your data to other data to provide context. communities, a Climate Tagger plugin for CKANSystem for the creation of a registry of open knowledge packages and projects that enables to find, share and reuse open content and data, especially in ways that are machine automatable. (http://www.ckan.net) [9] has also been published, which was developed by developed by NREL [10] in cooperation with CTCN’s support, harnessing the same taxonomyTaxonomy is the practice and science of classification. The word finds its roots in the Greek τάξις, taxis (meaning 'order' or 'arrangement') and νόμος, nomos (meaning 'law' or 'science'). Taxonomy uses taxonomic units, known as taxa. In addition, the word is also used as a count noun: ... and expert vetted thesaurus behind the Climate Tagger, helping connect open data to climate compatible content through the simultaneous use of these tools.

REEEP Director General Martin Hiller and CTCN Director Jukka Uosukainen will be talking about Climate Tagger at the COP20 side event hosted by the Climate Knowledge Brokers Group in Lima [11], Peru, on Monday, December 1st at 4:45pm.

Further reading and downloads

About REEEP:

REEEP invests in clean energy markets in developing countries to lower CO2 emissions and build prosperity. Based on strategic portfolio of high impact projects, REEEP works to generate energy access, improve lives and economic opportunities, build sustainable markets, and combat climate change.

REEEP understands market change from a practice, policy and financial perspective. We monitor, evaluate and learn from our portfolio to understand opportunities and barriers to success within markets. These insights then influence policy, increase public and private investment, and inform our portfolio strategy to build scale within and replication across markets. REEEP is committed to open access to knowledge to support entrepreneurship, innovation and policy improvements to empower market shifts across the developing world.

About the CTCN

The Climate Technology Centre & Network facilitates the transfer of climate technologies by providing technical assistance, improving access to technology knowledge, and fostering collaboration among climate technology stakeholders. The CTCN is the operational arm of the UNFCCC Technology Mechanism and is hosted by the United Nations Environment Programme (UNEP) in collaboration with the United Nations Industrial Development Organization (UNIDO) and 11 independent, regional organizations with expertise in climate technologies.

About Semantic Web Company

Semantic Web Company (SWC, http://www.semantic-web.at) is a technology provider headquartered in ViennaVienna (/viːˈɛnə/; German: About this sound Wien [viːn], Austro-Bavarian: Wean) is the capital and largest city of Austria, and one of the nine states of Austria. Vienna is Austria's primary city, with a population of about 1.757 million (2.4 million within the metropolitan area, more than ... (Austria). SWC supports organizations from all industrial sectors worldwide to improve their information and data management. Their products have outstanding capabilities to extract meaning from structured and unstructured data by making use of linked data technologies.

Tassilo Pellegrini

Introducing the Linked Data Business Cube

With the increasing availability of semantic data on the World Wide Web and its reutilization for commercial purposes, questions arise about the economic value of interlinked data and business models that can be built on top of it. The Linked Data Business Cube provides a systematic approach to conceptualize business models for Linked Data assets. Similar to an OLAP Cube, the Linked Data Business Cube provides an integrated view on stakeholders (x-axis), revenue models (y-axis) and Linked Data assets (z-axis), thus allowing to systematically investigate the specificities of various Linked Data business models.

Linked Data Business Cube_Full

 

Mapping Revenue Models to Linked Data Assets

By mapping revenue models to Linked Data assets we can modify the Linked Data Business Cube as illustrated in the figure below.

Linked Data Business Cube_Revenue-Type

The figure indicates that with increasing business value of a resource the opportunities to derive direct revenues rise. Assets that are easily substitutable generate little incentives for direct revenues but can be used to trigger indirect revenues. This basically applies to instance data and metadata. On the other side, assets that are unique and difficult to imitate and substitute, i.e. in terms of competence and investments necessary to provide the service, carry the highest potential for direct revenues. This applies to assets like content, service and technology. Generally speaking, the higher the value proposition of an asset – in terms of added value – the higher the willingness to pay.

Ontologies seem to function as a “mediating layer” between “low-incentive assets” and “high-incentive assets”. This means that ontologies as a precondition for the provision and utilization of Linked Data can be capitalized in a variety of ways, depending on the business strategy of the Linked Data provider.

It is important to note that each revenue model has specific merits and flaws and requires certain preconditions to work properly. Additionally they often occur in combination as they are functionally complementary.

Mapping Revenue Models to Stakeholders

A Linked Data ecosystem is usually comprised of several stakeholders that engage in the value creation process. The cube can help us to elaborate the most reasonable business model for each stakeholder.

Linked Data Business Cube_Stakeholders

Summing up, Linked Data generates new business opportunities, but the commercialization of Linked Data is very context specific. Revenue models change in accordance to the various assets involved and the stakeholders who take use of them. Knowing these circumstances is crucial in establishing successful business models, but to do so it requires a holistic and interconnected understanding of the value creation process and the specific benefits and limitations Linked Data generates at each step of the value chain.

Read more: Asset Creation and Commercialization of Interlinked Data

Thomas Thurner

SEMANTiCS – the emergence of a European Marketplace for the Semantic Web

SEMANTiCS conference celebrated its 10th anniversary this September in Leipzig. And this year’s venue has been capable of opening a new age for the Semantic Web in EuropeEurope is, by convention, one of the world's seven continents. Comprising the westernmost peninsula of Eurasia, Europe is generally divided from Asia to its east by the water divide of the Ural Mountains, the Ural River, the Caspian Sea, the Caucasus Mountains, and the Black Sea to the ... – a marketplace for the next generation of semantic technologies was born.

semantics-2014-leipzig

As Phil Archer stated in his key note, the Semantic Web is now mature, and academia and industry can be proud of the achievements so far. And exactly that fact gave the thread for the conference: Real world use cases demonstrated by industry representatives, new and already running applied projects presented by the leading consortia in the field and a vivid academia showing the next ideas and developments in the field. So this years SEMANTiCS conference brought together the European Community in Semantic Web Technology – both from academia and industry.

  • Papers and Presentations: 45 (50% of them industry talks)
  • Posters: 10 (out of 22)
  • A marketplace with 11 permanent booths
  • Presented Vocabularies at the 1st Vocabulary Carnival: 24
  • Attendance: 225
  • Geographic Coverage: 21 countries

This year’s SEMANTiCS was co-located and connected with a couple of other related events, like the German ISKO, the Multilingual Linked Open Data for Enterprises (MLODE 2014) and the 2nd DBpedia Community Meeting 2014. This wisely connected gatherings brought people together and allowed transdisciplinary exchange.

Recapitulatory speaking: This SEMANTiCS has opened up new sights on Semantic Technologies, when it comes to

  • industry use
  • problem solving capacity
  • next generation development
  • knowledge about top companiesA company is a form of business organization. In the United States, a company is a corporation—or, less commonly, an association, partnership, or union—that carries on an industrial enterprise. " Generally, a company may be a "corporation, partnership, association, joint-stock ..., institutes and people in the sector
Andreas Blumauer

From Taxonomies over Ontologies to Knowledge Graphs

With the rise of linked data and the semantic web, concepts and terms like ‘ontology, an ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain. In theory, an ontology is a "formal, explicit ...’, ‘vocabulary’, ‘thesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than ...’ or ‘taxonomyTaxonomy is the practice and science of classification. The word finds its roots in the Greek τάξις, taxis (meaning 'order' or 'arrangement') and νόμος, nomos (meaning 'law' or 'science'). Taxonomy uses taxonomic units, known as taxa. In addition, the word is also used as a count noun: ...’ are being picked up frequently by information managers, search engine specialists or data engineers to describe ‘knowledge models’ in general. In many cases the terms are used without any specific meaning which brings a lot of people to the basic question:

What are the differences between a taxonomy, a thesaurus, an ontology and a knowledge graph?

This article should bring light into this discussion by guiding you through an example which starts off from a taxonomy, introduces an ontology and finally exposes a knowledge graph (linked data graph) to be used as the basis for semantic applications.

1. Taxonomies and thesauri

Taxonomies and thesauri are closely related species of controlled vocabularies to describe relations between concepts and their labels including synonyms, most often in various languages. Such structures can be used as a basis for domain-specific entity extraction or text categorization services. Here is an example of a taxonomy created with PoolParty Thesaurus Server which is about the Apollo programme:

Apollo programme taxonomyThe nodes of a taxonomy represent various types of ‘things’ (so called ‘resources’): The topmost level (orange) is the root node of the taxonomy, purple nodes are so called ‘concept schemes’ followed by ‘top concepts’ (dark green) and ordinary ‘concepts’ (light green). In 2009 W3C introduced the Simple Knowledge Organization System (SKOS) as a standard for the creation and publication of taxonomies and thesauri. The SKOS ontology comprises only a few classes and properties. The most important types of resources are: Concept, ConceptScheme and Collection. Hierarchical relations between concepts are ‘broader’ and its inverse ‘narrower’. Thesauri most often cover also non-hierarchical relations between concepts like the symmetric property ‘related’. Every concept has at least on ‘preferred label’ and can have numerous synonyms (‘alternative labels’). Whereas a taxonomy could be envisaged as a tree, thesauri most often have polyhierarchies: a concept can be the child-node of more than one node. A thesaurus should be envisaged rather as a network (graph) of nodes than a simple tree by including polyhierarchical and also non-hierarchical relations between concepts.

2. Ontologies

Ontologies are perceived as being complex in contrast to the rather simple taxonomies and thesauri. Limitations of taxonomies and SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ...-based vocabularies in general become obvious as soon as one tries to describe a specific relation between two concepts: ‘Neil Armstrong’ is not only unspecifically ‘related’ to ‘Apollo 11′, he was ‘commander of’ this certain Apollo mission. Therefore we have to extend the SKOS ontology by two classes (‘Astronaut’ and ‘Mission’) and the property ‘commander of’ which is the inverse of ‘commanded by’.

Apollo ontology relationsThe SKOS concept with the preferred label ‘Buzz Aldrin’ has to be classified as an ‘Astronaut’ in order to be described by specific relations and attributes like ‘is lunar module pilot of’ or ‘birthDate’. The introduction of additional ontologies in order to expand expressivity of SKOS-based vocabularies is following the ‘pay-as-you-go’ strategy of the linked data community. The PoolPartyWeb based ontology manager which can serve as a central hub for your knowledge organization. With PoolParty you can organize and maintain knowledge models based on widely accepted specifications like RDF, SPARQL and SKOS. knowledge modelling approach suggests to start first with SKOS to further extend this simple knowledge model by other knowledge graphs, ontologies and annotated documents and legacy data. This paradigm could be memorized by a rule named ‘Start SKOS, grow big’.

3. Knowledge Graphs

Knowledge graphs are all around (e.g. DBpedia, Freebase, etc.). Based on W3CThe World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web (abbreviated WWW or W3).Founded and currently led by Tim Berners-Lee, the consortium is made up of member organizations which maintain full-time staff for the purpose of working together ...’s Semantic Web Standards such graphs can be used to further enrich your SKOS knowledge models. In combination with an ontology, specific knowledge about a certain resource can be obtained with a simple SPARQL query. As an example, the fact that Neil Armstrong was born on August 5th, 1930 can be retrieved from DBpediaDBpedia is a project aiming to extract structured information from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, .... Watch this YouTube video which demonstrates how ‘linked data harvesting’ works with PoolParty.

Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.

Why should I transform my content and data into a large knowledge graph?

The answer is simple: to being able to make complex queries over the entirety of all kind of information. By breaking up the data silos there is a high probability that query results become more valid.

With PoolParty Semantic Integrator, content and documents from SharePoint, ConfluenceConfluence is a web-based corporate wiki written in Java and mainly used in corporate environments. It is developed and marketed by Atlassian. Confluence is sold as either on-premises software or as a hosted solution. Its license is proprietary, but a zero-cost license program is available for ..., DrupalDrupal is a free and open source content management system (CMS) written in PHP and distributed under the GNU General Public License. It is used as a back-end system for many different types of websites, ranging from small personal blogs to large corporate and political sites, including ... etc. can be tranformed automatically to integrate them into enterprise knowledge graphs.

Taxonomies, thesauri, ontologies, linked data graphs including enterprise content and legacy data – all kind of information could become part of an enterprise knowledge graph which can be stored in a linked data warehouse. Based on technologies like Virtuoso, such data warehouses have the ability to serve as a complex question answering system with excellent performance and scalability.

4. Conclusion

In the early days of the semantic web, we’ve constantly discussed whether taxonomies, ontologies or linked data graphs will be part of the solution. Again and again discussions like ‘Did the current data-driven world kill ontologies?‘ are being lead. My proposal is: try to combine all of those. Embrace every method which makes meaningful information out of data. Stop to denounce communities which don’t follow the one or the other aspect of the semantic web (e.g. reasoning or SKOS). Let’s put the pieces together – together!