Andreas Blumauer

Automatic text analytics using DBpedia and PoolParty – A Live Demo

Let me show you which steps have to be taken to generate a high-quality text mining application, ready to be used to annotate and to categorize any kind of text or documents covering nearly any domain. With our approach of thesaurus based text mining your documents can also be linked to the world of linked (open) data; enrich your documents with data from the LOD cloud!

Step 1. Generate a thesaurus by using a linked data source like DBpedia

As recently reported SWC has developed a tool called SKOSsy which can be used to extract seed thesauri from DBpedia. In our example I will generate a knowledge model describing the domain of “digital photography“. This step took around 15 minutes.

Step 2. Load the thesaurus into PoolParty and improve it to your needs

After the seed thesaurus has been loaded into PoolParty Thesaurus Manager you have many possibilities to enhance the knowledge model further: Add more categories, synonyms, relations etc. In this example I use the seed-thesaurus without any further improvements. This step took approximately 2 minutes.

Step 3. Generate an automatic text extractor on top of your thesaurus

This step took a couple of seconds and ended up in having generated a fast and reliable text mining application on top of PoolParty Extractor, ready to be used to enrich your documents with data from the LOD cloud.

You can try it out here: PPX Live-Demo

To try the extractor on your own, please take a look at the image above which shows a proper configuration, you have to insert the following UUID in the form: d35d4ddb-adc3-4ea5-b027-deacac03e391

Since our example is all about ‘digital photography’, we recommend to use text samples (or some fragments) like these ones to test the quality of PPX based text analytics:

Let us know what you think about this straight-forward approach and your opinion about the quality of the results. We believe that thesaurus based text mining is in many cases an alternative to some other approaches, especially if you want to to enrich your content with information from the upcoming web of data.

Of course we would be happy to generate other demos in the areas of your interest! Just get in contact with us by using our contact form.

Andreas Blumauer

Linked Open Data: The Essentials – A quick start guide for decision makers

Together with REEEP (Renewable Energy and Energy Efficiency Partnership) the Semantic Web Company (SWC) has composed a fundamental publication on the topic of Linked Open Data.

Linked Open Data: The Essentials provides answers to the following key questions:

  • What do the terms Open Data, Open Government Data and Linked Open Data actually mean, and what are the differences between them?
  • What do I need to take into account in developing a LOD strategy?
  • What does my organisation need to do technically in order to open up and publish its datasets?
  • How can I make sure the data is accessible and digestible for others?
  • How can I add value to my own data sets by consuming LOD from others?
  • What can be learned existing best practices?
  • What are the key potentials of sharing and consuming open datasets?

Read more about this publication and find out how to obtain a copy.

Andreas Blumauer

Going to SEMTECHBIZ Berlin 2012

I went to London last September to visit SemTechBiz UK to represent the Semantic Web Company and PoolParty technologies in the exhibition area of this excellent conference. I had tons of interesting talks at our booth and – although I never found time to visit any talk – I have learned again a lot about customer´s needs.

Compared to ISWC or ESWC, two other major conferences in the area of semantic web, SemTechBiz is clearly the place to go if you´re interested in semantic web applications. Especially in the last three years we have observed a continuous growth of acceptance and demand for semantic web technologies in various industries. For many information professionals and IT managers it has become clearer than ever before that semantic web applications can solve several well-known problems in the areas of enterprise search, data integration, business intelligence and knowledge management.

Thus it was great news for us to have another SemTechBiz conference in place – this time in Berlin, which is one of the most vibrant cities in the world when it comes to innovative web technologies like linked data or open data. And again we will “explore how semantic solutions and linked data are being embraced throughout companies across a diverse range of disciplines and business categories”.

We hope to meet you at SemTechBiz Berlin 2012 (February 6-7) – PoolParty Team is present as Gold Sponsor and is looking forward to meeting you in the exhibition area to talk with you about your semantic web applications.

Enhanced by Zemanta
Andreas Blumauer

I-Semantics: Get in touch with Europe´s Linked Data community!

In September 2012 I-Semantics will take place the 8th time. With more than 400 participants every year the conference is one of the largest conferences in Europe in the field of semantic systems and the semantic web.  It is held concurrently with the I-KNOW Conference on Knowledge Management and Knowledge Technologies.

I-Semantics is a conference aiming to bring together science and industry:

  • To address the needs and interests of industry the iPraxis track presents enterprise solutions that deal with semantic processing of data and/or information in areas like like Linked Data, Data Publishing, Semantic Search, Recommendation Services, Sentiment Detection, Search Engine Add-Ons, Thesaurus and/or Ontology Management, Text Mining, Data Mining and any related fields.
  • In the exhibition area I-SEMANTICS 2012 will offer its participants a unique platform either to present latest and leading edge developments or to catch up with the developments of most innovative IT technologies, content applications, knowledge management trends and emerging market opportunities.
  • For the first time in 2012 we will bring to you the I-CHALLENGE, consisting of the Best Paper Award, the Best Poster Award, the Best PhD Paper and the Linked Data Cup.
  • I-SEMANTICS 2012 proceedings will be published in the digital library of the ACM ICP Series and will contain all accepted papers from the Research & Application track and the I-CHALLENGE. The topics of interest for research and application papers include (but are not limited to): The Web of Data, Quality of Semantic Data on the Web, Corporate Semantic Web, Semantic Content Engineering, Semantic Multimedia and (Linked) Data Ecosystems & Markets

Website: I-Semantics 2012

Enhanced by Zemanta