Thomas Thurner

Automatic Semantic Tagging for Drupal CMS launched

REEEP [1] and CTCN [2] have recently launched Climate Tagger, a new tool to automatically scan, label, sort and catalogue datasets and document collections. Climate Tagger now incorporates a Drupal Module for automatic annotation of Drupal content nodes. Climate Tagger addresses knowledge-driven organizations in the climate and development arenas, providing automated functionality to streamline, catalogue and link their Climate Compatible Development data and information resources.

Climate Tagger

Climate Tagger for Drupal is a simple, FREE and easy-to-use way to integrate the well-known Reegle Tagging API [3], originally developed in 2011 with the support of CDKN [4], (now part of the Climate Tagger suite as Climate Tagger API) into any web site based on the Drupal Content Management System [5]. Climate Tagger is backed by the expansive Climate Compatible Development Thesaurus, developed by experts in multiple fields and continuously updated to remain current (explore the thesaurus at http://www.reegle.info/glossary). The thesaurus is available in English, French, Spanish, German and Portuguese. And can connect content on different portals published in these different languages.

Climate Tagger for Drupal can be fine-tuned to individual (and existing) configuration of any Drupal 7 installation by:

  • determining which content types and fields will be automatically tagged
  • scheduling “batch jobs” for automatic updating (also for already existing contents; where the option is available to re-tag all content or only tag with new concepts found via a thesaurus expansion / update)
  • automatically limit and manage volumes of tag results based on individually chosen scoring thresholds
  • blending with manual tagging
click to enlarge

click to enlarge

“Climate Tagger [6] brings together the semantic power of Semantic Web Company’s PoolParty Semantic Suite [7] with the domain expertise of REEEP and CTCN, resulting in an automatic annotation module for Drupal 7 with an accuracy never seen before” states Martin Kaltenböck, Managing Partner of Semantic Web Company [8], which acts as the technology provider behind the module.

Climate Tagger is the result of a shared commitment to breaking down the ‘information silos’ that exist in the climate compatible development community, and to provide concrete solutions that can be implemented right now, anywhere” said REEEP Director General Martin Hiller. “Together with CTCN and SWC laid the foundations for a system that can be continuously improved and expanded to bring new sectors, systems and organizations into the climate knowledge community.”

For the Open Data and Linked Open Data communities, a Climate Tagger plugin for CKAN [9] has also been published, which was developed by developed by NREL [10] in cooperation with CTCN’s support, harnessing the same taxonomy and expert vetted thesaurus behind the Climate Tagger, helping connect open data to climate compatible content through the simultaneous use of these tools.

REEEP Director General Martin Hiller and CTCN Director Jukka Uosukainen will be talking about Climate Tagger at the COP20 side event hosted by the Climate Knowledge Brokers Group in Lima [11], Peru, on Monday, December 1st at 4:45pm.

Further reading and downloads

About REEEP:

REEEP invests in clean energy markets in developing countries to lower CO2 emissions and build prosperity. Based on strategic portfolio of high impact projects, REEEP works to generate energy access, improve lives and economic opportunities, build sustainable markets, and combat climate change.

REEEP understands market change from a practice, policy and financial perspective. We monitor, evaluate and learn from our portfolio to understand opportunities and barriers to success within markets. These insights then influence policy, increase public and private investment, and inform our portfolio strategy to build scale within and replication across markets. REEEP is committed to open access to knowledge to support entrepreneurship, innovation and policy improvements to empower market shifts across the developing world.

About the CTCN

The Climate Technology Centre & Network facilitates the transfer of climate technologies by providing technical assistance, improving access to technology knowledge, and fostering collaboration among climate technology stakeholders. The CTCN is the operational arm of the UNFCCC Technology Mechanism and is hosted by the United Nations Environment Programme (UNEP) in collaboration with the United Nations Industrial Development Organization (UNIDO) and 11 independent, regional organizations with expertise in climate technologies.

About Semantic Web Company

Semantic Web Company (SWC, http://www.semantic-web.at) is a technology provider headquartered in Vienna (Austria). SWC supports organizations from all industrial sectors worldwide to improve their information and data management. Their products have outstanding capabilities to extract meaning from structured and unstructured data by making use of linked data technologies.

Thomas Thurner

Semantic Web driven tagging tool makes clean energy content searchable and findable!

New reegle API will tag online resources automatically – and suggest related content.

A new cost-free tagging tool is now available to anyone who provides online resources in the clean energy field. This API (application programming interface), developed by the Semantic Web Company, will automatically tag documents and web content that cover renewable energy, energy efficiency and climate-relevant topics according to the well maintained Reegle’s Clean Energy and Climate Change Thesaurus. It can also suggest related documents from the growing pool of content that has already been indexed using the tool.

Tagging” means that when integrated into a website, this API will automatically scan the site’s content and identify specific terms, concepts and geographic mentions and then apply tags to each so all resources connected with the site are searchable online.

By automating the tagging process, we can help ensure that content is classified in a consistent way across the entire sector, based on our Clean Energy Thesaurus” notes Florian Bauer, Operations & IT Director of REEEP. “This will help make major depositories of existing information open and accessible, and help promote clean, low-carbon development in the process.”

In addition to tagging, the API can also make suggestions for related reading from the web resources already indexed, thus enriching the content of any website. “Sharing your own indexed resources with the API content pool can increase the outreach of your documents hugely,” recommends Denise Recheis, expert in knowledge management at reegle.

Try out service

The tool is available at http://api.reegle.info, where you can try out the API on the spot. Simply cut and paste a block of text, and a demonstration will show all of the concepts, terms and categories that the tool automatically generates.

Free API key

On this site, web developers can register to get a free API key for each project, with no limit on the number of keys. When logged in, the dashboard includes a request builder to help developers to build the necessary code. The service is available in five different languages: English, French, Spanish, Portuguese and German. The API returns the formats RDF/XML and JSON.

About REEEP

The reegle tagging API project is a collaborative effort with NREL (OpenEI), weADAPT and IDS (eldis), and was made possible by support from the CDKN Innovation Fund. For further information about the reegle tagging API Reeep’s Thesaurus and Knowledge Manager Denise Recheis is available.

Jana Herwig

The Wild vs The Orderly: Folksonomies and Semantics (TRIPLE-I 2008)

This second day of TRIPLE-I 2008 was my personal folksonomy day, even though the theme was already set yesterday, with Andreas Hotho‘s invited talk about “Extracting Semantics from Folksonomies” which was the opening lecture of the workshop “Knowledge acquisition from the Social Web.”

Andreas Hotho is directing the Bibsonomy project at Kassel University’s Knowledge and Data Engineering resarch group; Bibsonomy is a social bookmark and publication sharing system catering especially for researchers who, next to bookmarkingm also wish to manage publications. Next to other interesting things, Bibsonomy supports the import of bookmarks from del.icio.us, Firefox bookmarks and local BibTex files. Being a project led by a university’s computer science department, Bibsonomy is at the same time the result, the object and a stimulus for research in the area of tagging and folksonomies. Andreas describes this double appeal of folksonomies to both ordinary people and researchers in a 12 seconds vlog post:


Andreas Hotho’s statement about folksonomies and research (see www.bibsonomy.org) on 12seconds.tv

One of the outcomes of the research into folksonomies is FolkRank, a search algorithm that exploits the structure of folksonomies; the name reveals that it was inspired by PageRank, but as the graph of folksonomy structures does not correspond to the web graph, some adaptations had to be made. The specifics of these adaptations can be found in an online article by Andreas and his colleagues: “FolkRank: A Ranking Algorithm for Folksonomies” (PDF, 268 KB).

Andreas Hotho’s talk more specifically addressed the search for methods to identify tags which describe the same concept (or a more specific / a more general concept respectively) within a folksonomy. He suggested two approaches:

  1. Applying measures directly to folksonomy statistics, allowing to describe tags as a vector; e.g. co-occurrence frequency and FolkRank could serve as a similarity measure (with these two having a tendency towards high-frequency tags) or a cosine method (which is more likely to produce “siblings”)
  2. Looking up tags in an external thesaurus/vocabulary (for instance achieving semantic grounding by mapping a tag and its most similar tags with Wordnet Synsets)

Future areas of interest within folksonomy research Andreas proposed were trend detection, tag recommendation, detecting spam (a major challenge!), logsonomies (i.e. the structure of search engine query log files) and learning synsets, hierarchies, and structures of folksonomies. Andreas Hotho can be contacted via his homepage, if you have any further questions regarding Bibsonomy, FolkRank or this present piece of research.

Another presentation dedicated to folksonomies – and the presentation that won my personal presentation design award – was “Seeding, Weeding, Fertilizing – Different Tag Gardening Activities for Folksonomy Maintenance and Enrichment” by Katrin Weller and Isabella Peters, both from the Dept. of Information Science at Heinrich Heine University in Düsseldorf. The entire presentation was designed to match the CI of Tagcare, a tag gardening tool that is hopefully going to go online soon.

The term “Tag Gardening” was borrowed from James Governor who wrote in a 2006 blogpost:

“Like plants or animals, tags evolve in an emergent fashion, open to hybridisation. Stewardship can help grow and put roots down.

Helping the darwinian process is tag gardening.

Tag gardening is about taking tags in the wild and tending to them, or identifying a wild tag that will do well in your south facing IT

garden. I am talking about domestication here.

Just like there are professional bloggers i am pretty sure some parties will emerge that get paid for their abilities.”

I seriously hope that the latter is going to come true, even though I have the feeling that most providers will continue to consider user input and effort pro bono work!

Katrin Weller’s intro (Isabella Peters had excused herself) focused on the well-known problems with tags and folksonomies, e.g. :

  • spelling variants, synonyms, abbreviations, different natural languages
  • adhoc or personal functions of tags other than content description (e.g. “toread”, “@Henry”, “nicepic”)
  • flatness of tag clouds which allows for browsing by popularity, but not by semantic interrelations

She further distinguished three levels where tag or tag cloud improvement becomes relevant:

  • single document vs document collection level
  • Single user vs collaborative level
  • intra- and cross plattform level (e.g. different tagging conventions, tag separation with comma or blank space, etc)

To push the gardening metaphor even further, Kathrin presented us their ideas of weeding, seeding, fertilizing etc.:

Weeding
The weeds in this case are “bad” tags like spam or misspelled tags (weed: any plant that crowds out cultivated plants)
Aim: enhancing recall and a consistent indexing vocabulary
Achieved by: type-ahead functionality, editing funcionalities, natural language processing, user guidelines for indexing and retrieval, nomination of authorized users as gardeners

Seeding
Seeding in folksonomies means to expand frequently used tags by more specific tags (called “baby tags” or “seedlings” by Katrin Weller; seedling: young plant or tree grown from a seed)

Landscaping
The idea of landscaping here means to create “flower beds” through identifying species of tags, e.g. by similarity.
Aim: enhancing precision and expressiveness

Fertilizing
Fertilizing in this context means to combine folksonomies with other knowledge organization systems (KOS): thesauri, controlled vocabularies, ontologies, etc. (fertilizer: any substance such as manure or a mixture of nitrates used to make soil more fertile). Fertilizing might work both ways, Katrin suggested: a folksonomy might be fertilized with the semantic structure of a KOS, or a KOS enhanced by terms from a folksonomy.

And finally TagCare: The ambitious plan is to have a system that allows to import tag clouds from Flickr, deli.icio.us and Bibsonomy, cleanse out dissimilarities between tags, add hierarchical structure to the tag clouds, allow the user to view tag statistics and probably also to have community features, such calibrating one’s tags with those of the chief gardener or to activate collaborative spam elimination. It is going to be a free service, and if you want to be notified when it goes live, you might want to send an email to Katrin.

This full-service proposal for tag gardening does of course sound brilliant – yet is it going to be feasible, on a technical level? In the post-presentation discussion, somebody mentioned Faviki, which relies on DBpedia concepts to solidify the tag cloud. It didn’t exactly seem as though the TagCare team had already thought along these (semantic web) lines, even though this perfectly corresponded to their ‘Fertilizing’ idea. But if TagCare solely relies on good human gardeners, how long will it take until they have gained a big enough community to stimulate someone’s altruism? The idea of tag gardening of course is beautiful, and I am curious to learn more about the technology it is going to use.

Other folksonomy and tag related presentations that I was unable to attend or am unable to describe now, after the 10th hour of my 2nd day at TRIPLE-I, with a band performing folkore music involving yodeling and probably Schuhplattler right outside of this room:

  • Quality Metrics for Tags of Broad Folksonomies (Celine Van Damme, Martin Hepp, Tanguy, Coenen, University of Brussels, Universität der Bundeswehr München
  • Providing Multi Source Tag Recommendations in a Social Resource Sharing Platform (Martin Memmel, Michael Kockler, Rafael Schirru, German Research Centre for Artificial Intelligence DFKI)
  • Semantic Tagging and Inference in Online Communities, Yildirim Ahmet, Ãœsküdarli Suzan, BoÄŸaziçi University
  • Using Visual Features to Improve Tag Suggestions in Image Sharing Sites (Mathias Lux, Oge Marques, Arthur Pitman, Klagenfurt University)
  • Harnessing Wikipedia for Smart Tags Clustering (Maria Grineva, Maxim Grinev, Denis Turdakov, Pavel Velikhov, Russian Academy of Sciences)

Please leave a comment if you think that any of the above needs correction.

EDIT: I got the chance to record another 12 seconds definition (and am thinking of setting up a video glossary for the Semantic Web now): Rolf Sint from Salzburg Research explains what folksonomies are and why folksonomies and ontologies go together well in 12 seconds! Rolf is also involved in the KiWi project, which aims to develop a wiki-based knowledge management system boosted by semantic technologies.


Rolf Sint explains folksonomies and their relation to ontologies on 12seconds.tv

Reblog this post [with Zemanta]
Jana Herwig

Semantic Tagging with Faviki

In May, a new bookmarking service, Faviki, started which, unlike other bookmarking services, comes to the public semantically enhanced. ReadWriteWeb already had a first look at it and described it as follows:

Faviki is a new social bookmarking tool that offers something that services like Ma.gnolia, del.icio.us, and Diigo do not – semantic tagging capabilities. What this means is that instead of having users haphazardly entering in tags to describe the links they save, Faviki will suggest tags to be used instead. However, unlike other services, Faviki’s suggestions don’t just come from a community of users and their tagging history, but from structured information extracted straight out of the Wikipedia database. Faviki’s backend uses DBpedia, a community-maintained database created by extracting structured info from Wikipedia and turning that into a database which you can query.

Faviki Tag CloudWhat Faviki does, from a user’s perspective, is to suggest tags based on Wikipedia/DBpedia terms – one of the side effects of this procedure being that e.g. “Safety (disambiguation)” can also be chosen as a possible tag – I am not so sure yet whether this is an option that makes sense (although one can probably argue that it neither does any harm, because people should be smart enough not to use such tags). And as the above screen shot of Faviki’s tag cloud reveals, it currently seems to be mainly used by people who are interested in the semantic web and search engines (with semantic search being the most promising area of application of semantic technologies). It’s probably going to take a while (if ever) before Faviki is going to reach such a diverse user-base as can be guessed from del.icio.us’ tag cloud – but then again: Maybe Faviki isn’t going to need that, as it doesn’t rely on collective tagging, but already benefits from Wikipedia’s diversity of entries!

delicious tag cloud

As was also regretted by ReadWriteWeb: It’s a pity that there is currently no opportunity to import tags from del.icio.us or other services to Faviki. Who is going to win the bookmarking race? Del.icio.us has the advantage of a broad user-base, and many users already have their networks of fellow bookmarkers which they probably wouldn’t want to give up (I personally wouldn’t). Bibsonomy has the advantage of an extra feature that allows to bookmark publications and later export them as a uniformly formatted bibliography. If I could make a wish, I’d rather have a service that brings together the best of Faviki, Bibsonomy AND del.icio.us!

Related Websites:
Faviki Blog on WordPress.com
del.icio.us tag cloud

Zemanta Pixie