Enabling and managing interoperability at the data and the service level is one of the strategic key issues in networked knowledge organization systems (KOSs) and a growing issue in effective data management. But why do we need “semantic” interoperability and how can we achieve it?
Interoperability vs. Integration
The concept of (data) interoperability can best be understood in contrast to (data) integration. While integration refers to a process, where formerly distinct data sources and their representation models are being merged into one newly consolidated data source, the concept of interoperability is defined by a structural separation of knowledge sources and their representation models, but that allows connectivity and interactivity between these sources by deliberately defined overlaps in the representation model. Under circumstances of interoperability data sources are being designed to provide interfaces for connectivity to share and integrate data on top of a common data model, while leaving the original principles of data and knowledge representation intact. Thus, interoperability is an efficient means to improve and ease integration of data and knowledge sources.
Three levels of interoperability
When designing interoperable KOSs it is important to distinguish between structural, syntactic and semantic interoperability (Galinski 2006):
- Structural interoperability is achieved by representing metadata using a shared data model like the Dublin Core Abstraction Model or RDF (Resource Description Framework).
- Syntactic interoperability if achieved by serializing data in a shared mark-up language like XML, Turtle or N3.
- Semantic interoperability is achieved by using a shared terminology or controlled vocabulary to label and classify metadata terms and relations.
Given the fact that metadata standards carry a lot of intrinsic legacy, it is sometimes very difficult to achieve interoperability at all three levels mentioned above. Metadata formats and models are historically grown, they are most of the time a result of community decision processes, often highly formalized for specific functional purposes and most of the time deliberately rigid and difficult to change. Hence it is important to have a clear understanding and documentation of the application profile of a metadata format as a precondition for enabling interoperability at all three levels mentioned above. Semantic Web standards do a really good job in this respect!!
In the next post, we will take a look at various KOSs and how they differ with respect to expressivity, scope and target group.
The reduction of green house gas emissions is one of the big global challenges for the next decades. (Linked) Open Data on this multi-domain challenge is key for addressing the issues in policy, construction, energy efficiency, production a like. Today – on the World Environment Day 2014 – a new (linked open) data initiative contributes to this effort: GBPN’s Data Endpoint for Building Energy Performance Scenarios.
GBPN (The Global Buildings Performance Network) provides the full data set on a recently made global scenario analysis for saving energy in the building sector worldwide, projected from 2005 to 2050. The multidimensional dataset includes parameters like housing types, building vintages and energy uses – for various climate zones and regions and is freely available for full use and re-use as open data under CC-BY 3.0 France license.
To explore this easily, the Semantic Web Company has developed an interactive query / filtering tool which allows to create graphs and tables in slicing this multidimensional data cube. Chosen results can be exported as open data in the open formats: RDF and CSV and also queried via a provided SPARQL endpoint (a semantic web based data API). A built-in query-builder makes the use as well as the learning and understanding of SPARQL easy – for advanced users as well as also for non-experts or beginners.
The LOD based information- & data system is part of Semantic Web Companies’ recent Poolparty Semantic Drupal developments and is based on OpenLinks Virtuoso 7 QuadStore holding and calculating ~235 million triples as well as it makes use of the RDF ETL Tool: UnifiedViews as well as D2R Server for RDF conversion. The underlying GBPN ontology runs on PoolParty 4.2 and serves also a powerful domain-specific news aggregator realized with SWC’s sOnr webminer.
Together with other Energy Efficiency related Linked Open Data Initiatives like REEEP, NREL, BPIE and others, GBPNs recent initative is a contribution towards a broader availability of data supporting action agains global warming – as also Dr. Peter Graham, Executive Director of GBPN emphasized “…data and modelling of building energy use has long been difficult or expensive to access – yet it is critical to policy development and investment in low-energy buildings. With the release of the BEPS open data model, GBPN are providing free access to the world’s best aggregated data analyses on building energy performance.”
The Linked Open Data (LOD) is modelled using the RDF Data Cube Vocabulary (that is a W3C recommendation) including 17 dimensions in the cube. In total there are 235 million triples available in RDF including links to DBpedia and Geonames – linking the indicators: years – climate zones – regions and building types as well as user scenarios….
Two live demos of PoolParty Semantic Integrator demonstrate new ways to retrieve information based on linked data technologies
Linked data graphs can be used to annotate and categorize documents. By transforming text into RDF graphs and linking them with LOD like DBpedia, Geonames, MeSH etc. completely new ways to make queries over large document repositories become possible.
An online-demo illustrates those principles: Imagine you were an information officer at the Global Health Observatory of the World Health Organisation. You inform policy makers about the global situation in specific disease areas to direct support to the required health support programs. For your research you need data about disease prevalence in relation with socioeconomic factors.
Datasets and technology
About 160.000 scientific abstracts from PubMed, linked to three different disease categories were collected. Abstracts were automatically annotated with PoolParty Extractor, based on terms from the Medical Subject Headings (MeSH) and Geonames that are organized in a SKOS thesaurus, managed with PoolParty Thesaurus Server. Abstracts were transformed to RDF and stored in Virtuoso RDF store. In the next step, it is easy to combine these data sets within the triple store with large linked data sources like DBPedia, Geonames or Yago. The use of linked data makes it easy to e.g. group annotated countries by the Human Development Index (HDI). The hierarchical structure of the thesaurus was used to collect all concepts that are connected to a specific disease.
This demo was developed based on the libraries sgvizler to visualize SPARQL results. AngularJS was used to dynamically replace variables in SPARQL query templates.
Another example of linked data based search in the field of renewable energy can be tried out here.
GotoWebinar, March 20: Semantic Web for Developers – building semantic applications with PoolParty
This webinar gives insights into software development based on semantic web standards. We will give a short overview over frequently used standards (SPARQL, SKOS and RDF), application scenarios and technologies (OpenRDF, Virtuoso, Solr/Lucene) and we will give live-demos on how to make use of PoolParty technologies to build a variety of semantic applications.
Part 1 (15min):
Short introduction to semantic web and linked data standards, overview over typical application scenarios
Part 2 (30min):
PoolParty architecture, components and APIs: making use of the linked data front end, Sparql endpoint, PoolParty reports, thesaurus API (PPT API), extractor API (PPX API) and semantic search API (PPS API) – for each API, an example will be shown (incl. returned formats and how to make use of it in a programming language like PHP)
Part 3 (10min):
Putting the pieces together: Combine the APIs to build
– a semantic search engine
– a content recommender
– a linked data mashup
Part 4 (min. 5min):
Questions and answering
Register now: https://www4.gotomeeting.com/register/774130327