Martin Kaltenböck

GBPN Knowledge Platform using Semantic Technologies and Linked Open Data launched

The brand new web based GBPN Knowledge Platform has been launched on 21 February 2013. It helps the building sector effectively reduce its impact on climate change!

It has been designed as a participative knowledge hub and data hub harvesting, sharing and curating best practice policies in building energy performance globally. Available in English and soon in Mandarin, this new web-based tool of the Global Buildings Performance Network (GBPN) aims to stimulate collective research and analysis from experts worldwide to promote better decision-making and help the building sector effectively reduce its impact on climate change. To sustain and accelerate change in the building sector, the GBPN encourages open and transparent access to good quality and verifiable data. The data can be used and re-used in HTML, PDF and machine readable raw data (CSV) formats – provided by a Creative Commons Attribution (CC-BY 3.0 FR) license.

The GBPN Knowledge Platform is built on Drupal CMS and seamless connected with the PoolParty Semantic Information Management Platform of Semantic Web Company. Thereby this knowledge platform makes use of semantic technologies and Linked Open Data (LOD) principles and techniques under the hood. A lot of the available data of the various GBPN tools is provided as (linked) open data under a Creative Commons Attribution license. The Semantic Web Company is responsible for conceptual design and technical implementation of the GBPN Knowledge Platform.

As follows an overview and description of the most important features, tools and services of the information management system.

Continue reading

Andreas Blumauer

State-of-the-art Text Mining: PoolParty Extractor 2.1.1 released

PoolParty Extractor (PPX) is part of the PoolParty product family and builds the basis for state-of-the art text mining applications.

The idea behind PPX is to underpin automatic text mining algorithms with domain-specific knowledge from thesauri and linked data sources. This is the precondition to extract meaning from unstructured information more precisely and with higher performance. PoolParty Extractor supports the following application scenarios:

  • automatic document categorisation
  • named entity extraction based on concepts from thesauri or other knowledge models
  • text analysis to improve semantic indexing
  • automatic transformation of unstructured text to an RDF based linked data source
  • linking and enrichment of text with structured data from databases or XML-documents
  • extended indexing by using inflected forms of words and by splitting of compound words
  • generation and continuous improvement of thesauri by text corpus analysis

PoolParty Extractor can be integrated smoothly with third-party systems like CMS, DMS, communication platforms, wikis etc. PPX is fully based on Java and provides an HTTP API. Integrations with Sharepoint, Confluence, WordPress and others exist, please provide us your use case!

The latest release 2.1.1 of PPX further extends the capabilities to extract meaning from text with high precision and high performance:

  • use of tf-idf (term frequency inverse document frequency)
    • Creation of a textcorpus for tf-idf
    • Use tf-idf calculation during extraction
    • Corpus / thesaurus alignment
      • show missing concepts
      • show not used concepts
  • Use regular expressions to match specific patterns in texts
  • Use parts of the thesaurus as dynamic components for regular expressions
  • Calculate inflected forms (at the moment for German)
    • Word forms are added to the extraction model and used during extraction
    • List of inflected forms can be imported to thesaurus
  • Split compound words (at the moment for German)

PPX can be tested online as a web service, please send us a short note describing your interest and we will provide further details.

Andreas Blumauer

PoolParty Thesaurus Manager 3.1 with auto-population feature was presented at SemTechBiz 2012 in San Francisco

A new PoolParty Thesaurus Manager (PPT) release was presented at this year´s Semantic Technology & Business Conference in San Francisco: Version 3.1.0 is a major release offering lots of great new funcitionalities and improvements including auto-population of thesauri and linked data knowledge models.

The main new features are:

  • Autopopulation of Thesauri from DBpedia
    The Skossy functionality has been integrated into PPT. You can assign DBpedia categories to concepts and then autopopulate your thesaurus based on data from DBpedia.

  • Linked Data Based Synonym and Translation Service
    You can add labels (pref, alt, hidden) to the concepts of your thesaurus based on suggestions for synonyms and translations provided by data from DBpedia.
  • ADMS Description for Projects
    Metadata for PoolParty projects can now be published according to the Asset Description Metadata Schema (ADMS) developed by the joinup project of the European Union.

  • Windows Theme
    A new theme has been added based on the Windows GUI guidelines.

Andreas Koller from Semantic Web Company: “SemTechBiz 2012 was a great success for us, we had a lot of talks with people from various industries at our booth. Demonstrating how building knowledge models on top of linked data sources can improve text mining for example, attracted wide interest. We enjoyed the whole conference, the location and the support from the organization team.”

To get an overview over all changes made in Release 3.1.0 take a look at the Release Notes.

Andreas Blumauer

Exploiting Big Data: Linked Data and SKOS

Yesterday I gave a webinar covering the question which role SKOS plays in the linked data game. Just the day before I discovered an interesting white paper published by Fujitsu which clearly states that linked data and SKOS are excellent approaches to ‘create additional value in linking and exploiting big data for business benefit’.

I had at least five scenarios in mind in which SKOS and linked data in general can be combined. Take a look at the slides or watch the video to find out …

  • how to publish SKOS thesauri as linked data
  • how to generate SKOS from LOD sources like DBpedia
  • how to make use of SKOS thesauri for entity extraction & content enrichment from LOD sources
  • how to use linked data mechanisms for collaborative thesaurus management
  • how to use SKOS for linked data alignment & better disambiguation
View more presentations from Semantic Web Company