The Semantic Puzzle

Andreas Blumauer

PoolParty PowerTagging – bringing semantics to enterprises

PoolParty PowerTagging (PPP) is on its way: By extending Confluence´s label management, new application scenarios which make use of content recommendation and semantic indexing will be supported soon. PPP will be published at this year´s Atlassian Summit and at SemTechBiz in San Francisco at the beginning of June.

The Problem: weak semantics

Tagging is still not a very popular task, especially in corporate environments. Many users don´t see the benefit of creating metadata to describe the actual content. A typical counter-argument to social tagging is that there are too many words for the same thing. “Even if I am tagging very hard my colleagues won´t find necessarily my pages  because they will use different words to search for the content. I don´t have enough time to insert ‘New York City’, ‘NYC’, ‘Big AppleApple Inc. is an American multinational corporation that designs and manufactures consumer electronics, computer software, and commercial servers. The company's best-known hardware products include Macintosh computers, the iPod, the iPhone and the iPad. Apple software includes the Mac OS X ...’ etc. as labels”.

The result: Tagging facilities of enterprise software platforms like ConfluenceConfluence is a web-based corporate wiki written in Java and mainly used in corporate environments. It is developed and marketed by Atlassian. Confluence is sold as either on-premises software or as a hosted solution. Its license is proprietary, but a zero-cost license program is available for ... are rarely used and don´t help to index content at all. Search is mostly based on classical full-text indexing. Semantic search as seen more and more on the WWW has still not entered the enterprise realm.

The Solution: thesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than ... based indexing

W3C´s Semantic Web technology stack provides means to define controlled vocabularies like thesauri which results into more and more tools and data which make use of standards like SKOS. Tagging based on thesauri means that concepts are attached to pages & documents rather than putting labels on them. Labels like ‘New York City’, ‘NYC’ and ‘Big Apple’ refer to the same concept, thus it should be sufficient if one of the various terms is used for labeling, all the other names of this certain concept should be attached automatically.

PoolPartyWeb based ontology manager which can serve as a central hub for your knowledge organization. With PoolParty you can organize and maintain knowledge models based on widely accepted specifications like RDF, SPARQL and SKOS. PowerTagging is able to analyse each Confluence page and to insert concepts from a thesaurus and all of their names automatically. Users can curate all suggested tags or they can also index their spaces automically resulting in a semantic index which makes search more comfortable than ever before.

Usage: enhanced collaboration with enterprise knowledge models

There are two main application scenarios which can be realised on top of Confluence and its PowerTagging extension:

  • Semantic Search: Fully integrated with Confluence´s built-in Lucene based search facility, users no longer have to type in search phrases literally: Even if only ‘New York City’ is mentioned on a page on a word-by-word basis, it´s sufficient to search for ‘Big Apple’ or ‘NYC’ and results will be generated. This feature is especially interesting for domains in which a lot of technical terms or abbreviations are commonly used or for enterprisesA company is a form of business organization. In the United States, a company is a corporation—or, less commonly, an association, partnership, or union—that carries on an industrial enterprise. " Generally, a company may be a "corporation, partnership, association, joint-stock ... in multi-lingual environments.
  • Content recommendation: Identifying similar and semantically matching contents especially in larger Confluence instances is a crucial task: Imagine you´re working for a recruiting company and you would like to match a new open position with all people in your applicant database. Or: Imagine you´re working on technical documentation and you can provide your customers automatically with further readings. Or: Imagine you´re working on a slidedeck and you´ll see instantly if some of your colleagues have worked on similar issues recently.

Don´t re-invent the wheel again and again. Save time and money. PPP will help to fulfill these tasks when creating rich contents more efficiently than ever before. You can link similar contents within Confluence automatically and you can fetch further readings even from the WWW like from Wikipedia.

If you are interested in trying out PowerTagging, please drop us a note and we will be happy to support you!

Andreas Blumauer

Exploiting Big Data: Linked Data and SKOS

Yesterday I gave a webinar covering the question which role SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ... plays in the linked data game. Just the day before I discovered an interesting white paper published by Fujitsu which clearly states that linked data and SKOS are excellent approaches to ‘create additional value in linking and exploiting big data for business benefit’.

I had at least five scenarios in mind in which SKOS and linked data in general can be combined. Take a look at the slides or watch the video to find out …

  • how to publish SKOS thesauri as linked data
  • how to generate SKOS from LOD sources like DBpediaDBpedia is a project aiming to extract structured information from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, ...
  • how to make use of SKOS thesauri for entity extraction & content enrichment from LOD sources
  • how to use linked data mechanisms for collaborative thesaurus management
  • how to use SKOS for linked data alignment & better disambiguation
View more presentations from Semantic Web Company
Martin Kaltenböck

reegle.info – linked (open) energy data cloud

Access to the latest high quality information on renewables, energy efficiency and climate change is fundamental to the acceleration of the clean energy marketplace, facilitating investments, promoting new legislation and regulations and broadening interest and knowledge in the sector.

reegle.info acts as a unique clean energy information portal, targeting specific stakeholders including governments, project developers, businesses, financiers, NGOs, academia, international organizations and civil society. Alongside comprehensive country energy profiles, energy statistics and a directory of relevant stakeholders it also offers the clean energy search and an extensive glossary. There is also an insightful clean energy blog with interesting and up-to-date background information.

As reegle.info provides relevant clean energy data from several key energy open data sources as for instance OpenEI, World Bank Data or the UK Open Data Portal the reegle.info Information Gateway has a strong need for efficient and automated data management mechanisms and technologies! Therefore REEEP (The Renewable Energy and Energy Efficiency Partnership) decided to use Linked Open Data (LOD).

Linked Open Data provides a powerful way for reegle.info for sustainable data management and data integration – thereby the current reegle.info linked (open) energy data cloud came into being and looks as follows:

The figure of the reegle.info linked (open) energy data cloud above shows the model behind the scenes of the reegle.info clean energy information gateway providing an insight about sources and respective connections / links between the several sources and data sets.

For the realisation of the Linked Open Data based reegle.info system the following software components are in use:

Going this direction the reegle.info clean energy key portal is very flexible for future expansions in the fields of data integhration and data management by new data sets from several data sources!

By the way – reegle.info is very open too – thereby the whole REEEP generated data is available via a Sparql endpoint for free re-use under the UK Open Government Data license on the reegle.info data portal!

Try it out and make use of free high quality clean energy data!

Helmut Nagy

LOD2 Plenary Vienna (March 2012) – 3rd day – afternoon session

Promising title. After two and a half day (well for almost all of us) we entered the final phase of the plenary. So two and a half days of intense and interesting discussions catching up with all that has been done so far and planning what should happen the next half year. But still two session in front of us.

The afternoon started with the discussion of WP9 the “Open Government Data” use case. First Uroš Milošević from Institut Mihajlo Pupin (IMP) reported about the Serbian CKAN project already holding some data from the Statistical Office of Serbia. Also tools from the LOD2 stack have been and will be used for this project. Sounds great!

Then Irina Bolychevsky of OKFN continued the session announcing that a better integration between CKAN and LOD2Stack should be made  to get more RDF in publicdata.eu. Good idea! We were collecting ideas for integration and talked about e.g. a wizard for generating RDF from .csv files (ULEI is working on something like that). Also a integration of google refinehttp://code.google.com/p/google-refine/ has been discussed. The consortium decided to make an extraction sprint transforming a (to be defined) number of  interesting data sets from CKANSystem for the creation of a registry of open knowledge packages and projects that enables to find, share and reuse open content and data, especially in ways that are machine automatable. (http://www.ckan.net) to RDF.

Finally we had a discussion if linked data is a (the) solution for CKAN to find data and find related data etc. Well i think the people in the consortium are pretty sure it is (not so sure if people from OKFN are). Irina and Mark from OKFN invited everyone to provide input to the Use Case.

This session ended with a presentation about WP9a from Jindřich Mynarz from UEP and Martin Nečaský from CU. They are developing a distributed market place for public contracts. A ontology, an ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain. In theory, an ontology is a "formal, explicit ... for public contracts has been developed and is open for review on google code. Next step here will be a web application for filing/creating public contracts in RDF as linked data using tools from the stack. So all in all pretty good progress in WP9.

The third day and the plenary ended with Martin Kaltenböck from SWC and Sören Auer our project lead from ULEI presenting WP10-11-12 Dissemination, Exploitation and Project Management. First we voted for our next plenary to be in CambridgeThe city of Cambridge is a university town and the administrative centre of the county of Cambridgeshire, England. It lies in East Anglia about 50 miles (80 km) north-by-east of London. Cambridge is at the heart of the high-technology centre known as Silicon Fen – a play on Silicon Valley and ... (hosted by OKFN). Past dissemination activities have already been presented on day one, so Martin reminded us all to write blog posts about all the great things we are doing in LOD2EU-funded (FP7) research project aiming to take the Web of Linked Data to the next level. Main research challenges: improve coherence and quality of data published on the Web, close the performance gap between relational and RDF data management, establish trust on the Linked Data Web and .... Next big dissemination activity and also a good opportunity to meet people from the consortium will be the European Data Forum from June 6-7 in Copenhagen.

And that was pretty much it. I as i hope all the others enjoyed three days with a bunch of great people from all over EuropeEurope is, by convention, one of the world's seven continents. Comprising the westernmost peninsula of Eurasia, Europe is generally divided from Asia to its east by the water divide of the Ural Mountains, the Ural River, the Caspian Sea, the Caucasus Mountains, and the Black Sea to the ... working on a great project. As always it was intense but it was also fun. Hope everyone had a save trip home!

Enhanced by Zemanta