The Semantic Puzzle

Andreas Blumauer

PoolParty PowerTagging – bringing semantics to enterprises

PoolParty PowerTagging (PPP) is on its way: By extending Confluence´s label management, new application scenarios which make use of content recommendation and semantic indexing will be supported soon. PPP will be published at this year´s Atlassian Summit and at SemTechBiz in San Francisco at the beginning of June.

The Problem: weak semantics

Tagging is still not a very popular task, especially in corporate environments. Many users don´t see the benefit of creating metadata to describe the actual content. A typical counter-argument to social tagging is that there are too many words for the same thing. “Even if I am tagging very hard my colleagues won´t find necessarily my pages  because they will use different words to search for the content. I don´t have enough time to insert ‘New York City’, ‘NYC’, ‘Big AppleApple Inc. is an American multinational corporation that designs and manufactures consumer electronics, computer software, and commercial servers. The company's best-known hardware products include Macintosh computers, the iPod, the iPhone and the iPad. Apple software includes the Mac OS X ...’ etc. as labels”.

The result: Tagging facilities of enterprise software platforms like ConfluenceConfluence is a web-based corporate wiki written in Java and mainly used in corporate environments. It is developed and marketed by Atlassian. Confluence is sold as either on-premises software or as a hosted solution. Its license is proprietary, but a zero-cost license program is available for ... are rarely used and don´t help to index content at all. Search is mostly based on classical full-text indexingAutomatic indexing is the ability for a computer to scan large volumes of documents against a controlled vocabulary, taxonomy, thesaurus or ontology and use those controlled terms to quickly and effectively index large document depositories. As the number of documents exponentially increases .... Semantic search as seen more and more on the WWW has still not entered the enterprise realm.

The Solution: thesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than ... based indexing

W3C´s Semantic Web technology stack provides means to define controlled vocabularies like thesauri which results into more and more tools and data which make use of standards like SKOS. Tagging based on thesauri means that concepts are attached to pages & documents rather than putting labels on them. Labels like ‘New York City’, ‘NYC’ and ‘Big Apple’ refer to the same concept, thus it should be sufficient if one of the various terms is used for labeling, all the other names of this certain concept should be attached automatically.

PoolParty PowerTagging is able to analyse each Confluence page and to insert concepts from a thesaurus and all of their names automatically. Users can curate all suggested tags or they can also index their spaces automically resulting in a semantic index which makes search more comfortable than ever before.

Usage: enhanced collaboration with enterprise knowledge models

There are two main application scenarios which can be realised on top of Confluence and its PowerTagging extension:

  • Semantic Search: Fully integrated with Confluence´s built-in Lucene based search facility, users no longer have to type in search phrases literally: Even if only ‘New York City’ is mentioned on a page on a word-by-word basis, it´s sufficient to search for ‘Big Apple’ or ‘NYC’ and results will be generated. This feature is especially interesting for domains in which a lot of technical terms or abbreviations are commonly used or for enterprises in multi-lingual environments.
  • Content recommendation: Identifying similar and semantically matching contents especially in larger Confluence instances is a crucial task: Imagine you´re working for a recruiting company and you would like to match a new open position with all people in your applicant database. Or: Imagine you´re working on technical documentation and you can provide your customers automatically with further readings. Or: Imagine you´re working on a slidedeck and you´ll see instantly if some of your colleagues have worked on similar issues recently.

Don´t re-invent the wheel again and again. Save time and money. PPP will help to fulfill these tasks when creating rich contents more efficiently than ever before. You can link similar contents within Confluence automatically and you can fetch further readings even from the WWW like from Wikipedia.

If you are interested in trying out PowerTagging, please drop us a note and we will be happy to support you!