The Semantic Puzzle

Andreas Blumauer

Linked data based search: Make use of linked data to provide means for complex queries

Two live demos of PoolParty Semantic Integrator demonstrate new ways to retrieve information based on linked data technologies

data visualisation

Linked data graphs can be used to annotate and categorize documents. By transforming text into RDF graphs and linking them with LODLinked Open Data (LOD) stands for freely available data on the World Wide Web, which can be identified via Uniform Resource Identifier (URI) and can be accessed and retrieved directly via HTTP. Finally link your data to other data to provide context. like DBpediaDBpedia is a project aiming to extract structured information from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, ..., GeonamesGeoNames is a geographical data base available and accessible through various Web services, under a Creative Commons attribution license., MeSH etc. completely new ways to make queries over large document repositories become possible.

An online-demo illustrates those principles: Imagine you were an information officer at the Global Health Observatory of the World Health Organisation. You inform policy makers about the global situation in specific disease areas to direct support to the required health support programs. For your research you need data about disease prevalence in relation with socioeconomic factors.

Datasets and technology

About 160.000 scientific abstracts from PubMed, linked to three different disease categories were collected. Abstracts were automatically annotated with PoolParty Extractor, based on terms from the Medical Subject Headings (MeSH) and GeonamesGeoNames is a geographical data base available and accessible through various Web services, under a Creative Commons attribution license. that are organized in a SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ... thesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than ..., managed with PoolParty Thesaurus Server. Abstracts were transformed to RDF and stored in Virtuoso RDF store. In the next step, it is easy to combine these data sets within the triple store with large linked data sources like DBPedia, Geonames or Yago. The use of linked data makes it easy to e.g. group annotated countries by the Human Development Index (HDI). The hierarchical structure of the thesaurus was used to collect all concepts that are connected to a specific disease.

This demo was developed based on the libraries sgvizler to visualize SPARQL results. AngularJS was used to dynamically replace variables in SPARQL query templates.

Another example of linked data based search in the field of renewable energy can be tried out here.