<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Semantic Puzzle&#187; Search Engines</title>
	<atom:link href="http://blog.semantic-web.at/category/search-engines/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.semantic-web.at</link>
	<description>Open World Assumptions</description>
	<lastBuildDate>Thu, 02 Feb 2012 14:26:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Introducing SKOSsy &#8211; generate thesauri on the fly!</title>
		<link>http://blog.semantic-web.at/2011/11/29/introducing-skossy-generate-thesauri-on-the-fly/</link>
		<comments>http://blog.semantic-web.at/2011/11/29/introducing-skossy-generate-thesauri-on-the-fly/#comments</comments>
		<pubDate>Tue, 29 Nov 2011 15:52:53 +0000</pubDate>
		<dc:creator>Andreas Blumauer</dc:creator>
				<category><![CDATA[Linked Data & Open Data]]></category>
		<category><![CDATA[Ontology Engineering]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[Tools & Software]]></category>
		<category><![CDATA[dbpedia]]></category>
		<category><![CDATA[Knowledge representation and reasoning]]></category>
		<category><![CDATA[PoolParty]]></category>
		<category><![CDATA[Semantic Search]]></category>
		<category><![CDATA[Semantics]]></category>
		<category><![CDATA[Simple Knowledge Organization System]]></category>
		<category><![CDATA[Thesaurus]]></category>
		<category><![CDATA[Web service]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=2574</guid>
		<description><![CDATA[Imagine you could generate any thesaurus you would like for nearly any knowledge domain you can think of with quite a good quality! Sounds impossible? Reminds you of all the promises made by text mining software which generates &#8220;semantic nets&#8221; &#8230; <a href="http://blog.semantic-web.at/2011/11/29/introducing-skossy-generate-thesauri-on-the-fly/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Imagine you could generate any thesaurus you would like for nearly any knowledge domain you can think of with quite a good quality! <a href="http://blog.semantic-web.at/wp-content/uploads/2011/11/skossy3.png"><img class="alignright size-full wp-image-2573" title="skossy3" src="http://blog.semantic-web.at/wp-content/uploads/2011/11/skossy3.png" alt="" width="203" height="207" /></a>Sounds impossible? Reminds you of all the promises made by text mining software which generates &#8220;semantic nets&#8221; from scratch?</p>
<p><strong>Let me introduce you to SKOSsy</strong>. I will explain what this web service can do for you:</p>
<p><strong>SKOSsy generates SKOS based thesauri in German or in English for a domain you are interested in</strong>. Not any domain but nearly any: SKOSsy extracts data from DBpedia, so it can cover anything which is in DBpedia. Thus, SKOSsy works well whenever a first <strong>seed thesaurus</strong> should be generated for a certain organisation or project. If you load the automatically generated thesaurus into an editor like <a href="http://poolparty.biz/products/poolparty-thesaurus-manager/" target="_blank">PoolParty Thesaurus Manager</a> (PPT) you can start to enrich the knowledge model by additional concepts, relations and links to other LOD sources. But you don´t have to start in the open countryside with your thesaurus project.</p>
<p><strong>Let me give you an example</strong>: Imagine you are working for a company which is an international plant builder and you would like to index several thousands of documents the &#8220;semantic way&#8221;. You have to walk through the following steps:</p>
<ol>
<li><strong>Identify proper categories in Wikipedia/DBpedia</strong> which describe best what your business or your domain is all about. Those categories should contain pages / resources which are related to the documents you would like to index. For example: <a href="http://dbpedia.org/resource/Category:Metalworking" target="_blank">http://dbpedia.org/resource/Category:Metalworking</a> or <a href="http://dbpedia.org/resource/Category:Metalworking" target="_blank">http://dbpedia.org/resource/Category:Industrial_automation</a></li>
<li>After you have selected proper categories <strong>SKOSsy will traverse DBpedia for you and collect all resources</strong>, their hierarchical and non-hierarchical relations, alternative labels, definitions and other properties and put them together as a valid SKOS thesaurus; this step will last a couple of minutes. (<a href="http://prod.poolparty.punkt.at/PoolParty/wiki/plantbuilding">Find the resulting vocabulary here</a>)</li>
<li><strong>Load the resulting thesaurus into PPT</strong>, explore it, improve it and enrich it with additional facts.</li>
<li>After you´re done you can <strong>generate a tailor-made text extractor</strong> by using <a href="http://poolparty.biz/products/poolparty-extractor/" target="_blank">PoolParty Extractor</a> (PPX) which is the second component of PoolParty product family</li>
<li>With PPX and its extraction model especially curated for your special use case you can <strong>extract named entities</strong> from your documents automatically and <strong>index your documents in a meaningful way.</strong></li>
<li>After a few seconds <strong>your semantic search engine is ready to be used</strong>. <a href="http://poolparty.biz/products/poolparty-semantic-search/" target="_blank">PoolParty Semantic Search</a> (PPS) which is the third PoolParty component will offer some nice facilities like categorized auto-complete, faceted search, content recommendation (similarity search) and smart search suggestions to ease your life as a knowledge worker.</li>
</ol>
<p>We have constantly discussed the application of thesauri and other knowledge models to improve search over the last years. Many people understood straight away why <strong>thesaurus based search is most often much better than search algorithms purely based on statistics</strong>. Of course the big contra always was, &#8220;the costs are too high to establish a &#8220;good-enough&#8221; thesaurus or even a &#8220;high-quality&#8221; one&#8221;.</p>
<p><strong>With SKOSsy in place those kinds of arguments become weaker and weaker</strong>. To sum up,</p>
<ul>
<li>SKOSsy makes heavy use of Linked Data sources, especially DBpedia</li>
<li>SKOSsy can generate SKOS thesauri for virtually any domain within a few minutes</li>
<li>Such thesauri can be improved, curated and extended to one´s individual needs but they serve usually as &#8220;good-enough&#8221; knowledge models for any semantic search application you like</li>
<li>SKOSsy based semantic search usually outperform search algorithms based on statistics since they contain high-quality information about relations, labels and disambiguation</li>
<li>SKOSsy works perfectly together with <a href="http://poolparty.biz/products/" target="_blank">PoolParty product family</a></li>
</ul>
<p><strong>If you are interested in the results produced by SKOSsy</strong>, just <a href="http://poolparty.biz/company-contact/" target="_blank">send us a short note about your domain or your project</a> and we will send you an <strong>invitation as beta-tester</strong> or <strong>prepare a demo for you</strong>.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://blog.semantic-web.at/2011/10/17/geological-survey-austria-launches-thesaurus-project/">Geological Survey Austria launches thesaurus project</a> (semantic-web.at)</li>
<li class="zemanta-article-ul-li"><a href="http://blog.semantic-web.at/2011/09/04/poolparty-3-0-and-its-all-new-linked-data-framework/">PoolParty 3.0 and its all new Linked Data framework</a> (semantic-web.at)</li>
<li class="zemanta-article-ul-li"><a href="http://poolparty.punkt.at/demozone/">PoolParty DemoZone Content Extractor Semantic Search Thesaurus Manager</a> (poolparty.punkt.at)</li>
<li class="zemanta-article-ul-li"><a href="http://stackoverflow.com/questions/7927391/query-dbpedia-for-multiple-keywords">Query DBpedia for multiple keywords</a> (stackoverflow.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/zemified_e.png?x-id=dd0c705d-bf23-486c-aaf2-2e2c8ab3cf1b" alt="Enhanced by Zemanta" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2011/11/29/introducing-skossy-generate-thesauri-on-the-fly/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>&#8220;Thesaurus based search engines will become main stream in the near future&#8221;</title>
		<link>http://blog.semantic-web.at/2011/06/26/thesaurus-based-search-engines-will-become-main-stream-in-the-near-future/</link>
		<comments>http://blog.semantic-web.at/2011/06/26/thesaurus-based-search-engines-will-become-main-stream-in-the-near-future/#comments</comments>
		<pubDate>Sun, 26 Jun 2011 08:19:52 +0000</pubDate>
		<dc:creator>Andreas Blumauer</dc:creator>
				<category><![CDATA[Linked Data & Open Data]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Semantics & Philosophy]]></category>
		<category><![CDATA[Controlled Vocabulary]]></category>
		<category><![CDATA[SKOS]]></category>
		<category><![CDATA[survey]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=2186</guid>
		<description><![CDATA[The results of the survey titled &#8220;Do controlled vocabularies matter?&#8221; which was conducted by Semantic Web Company from May until June 2011 are public now. Over 150 participants from 27 countries draw a picture of the current and future usage &#8230; <a href="http://blog.semantic-web.at/2011/06/26/thesaurus-based-search-engines-will-become-main-stream-in-the-near-future/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The results of the survey titled &#8220;Do controlled vocabularies matter?&#8221; which was conducted by Semantic Web Company from May until June 2011 are public now. Over 150 participants from 27 countries draw a picture of the current and future usage behaviour in the realm of controlled vocabularies.</p>
<p>Here are three of the most interesting outcomes of this questionnaire &#8211; the <a href="http://issuu.com/andreas_blumauer/docs/survey_do_controlled_vocabularies_matter_2011_june" target="_blank">whole report can be found and downloaded on issuu</a>:</p>
<blockquote><p><strong>Do you think enterprises and other organizations can significantly benefit from using Linked Data?</strong></p></blockquote>
<p><a href="http://blog.semantic-web.at/wp-content/uploads/2011/06/linked_data_benefit.jpg"><img class="alignleft size-medium wp-image-2187" title="linked data benefit" src="http://blog.semantic-web.at/wp-content/uploads/2011/06/linked_data_benefit-300x117.jpg" alt="" width="300" height="117" /></a>The answer is a clear<strong> YES. </strong>A subsequent question also reveals that all kind of organisation sizes have about the same opinion concerning linked data. Only few people think that linked data is a &#8220;niche thing&#8221;.<strong> </strong>In general it can be said, that over <strong>90% of the participants</strong> think that <strong>most or at least some organisations can benefit from using linked data.</strong></p>
<blockquote><p><strong>Do you think that search engines which utilize thesauri to improve results will become main-stream</strong></p></blockquote>
<p><a href="http://blog.semantic-web.at/wp-content/uploads/2011/06/thesaurus_based_search.jpg"><img class="alignleft size-medium wp-image-2193" title="thesaurus_based_search" src="http://blog.semantic-web.at/wp-content/uploads/2011/06/thesaurus_based_search-300x112.jpg" alt="" width="300" height="112" /></a>The results of this question are amazing: <strong>Two thirds</strong> of the participants think that <strong>thesaurus based search</strong> is already or will become main-stream in the near future. Scepticism towards this development seems to be low &#8211; at least it can be stated, that a clear majority thinks that <strong>thesaurus based search engines will become main stream in the near future.</strong></p>
<p>&nbsp;</p>
<blockquote><p><strong>How important is the usage of standards like SKOS for controlled vocabularies?</strong></p></blockquote>
<p><a href="http://blog.semantic-web.at/wp-content/uploads/2011/06/importance-of-skos.jpg"><img class="alignleft size-medium wp-image-2200" title="importance of skos" src="http://blog.semantic-web.at/wp-content/uploads/2011/06/importance-of-skos-300x111.jpg" alt="" width="300" height="111" /></a>The results speak for themselves. The majority of the participants are convinced that standards like SKOS are important for their daily work. In August 2009 W3C announced the new SKOS standard – now, nearly two years after, it looks like this standard has well arrived. <strong>48.7% stated that standards like SKOS are very important and 29.1% voted for “relevant”</strong>.</p>
<p>&nbsp;</p>
<p>As an overall result of the survey it can be stated: <em>Semantic Web community has done a great job to convince the controlled vocabulary people to benefit from SKOS and linked data &#8211; on the other side only 3-5% are aware of SPARQL as a valuable resource to build standard APIs around controlled vocabularies to lower costs when implementing such knowledge organization systems.</em></p>
<p>Many thanks to all participants of this survey!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2011/06/26/thesaurus-based-search-engines-will-become-main-stream-in-the-near-future/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Florian Bauer: I like to view “linked data” as a “single worldwide API”</title>
		<link>http://blog.semantic-web.at/2011/03/16/florian-bauer-i-like-to-view-%e2%80%9clinked-data%e2%80%9d-as-a-%e2%80%9csingle-worldwide-api%e2%80%9d/</link>
		<comments>http://blog.semantic-web.at/2011/03/16/florian-bauer-i-like-to-view-%e2%80%9clinked-data%e2%80%9d-as-a-%e2%80%9csingle-worldwide-api%e2%80%9d/#comments</comments>
		<pubDate>Wed, 16 Mar 2011 14:18:15 +0000</pubDate>
		<dc:creator>Andreas Blumauer</dc:creator>
				<category><![CDATA[Linked Data & Open Data]]></category>
		<category><![CDATA[Open Government Data]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Tools & Software]]></category>
		<category><![CDATA[clean energy]]></category>
		<category><![CDATA[REEEP]]></category>
		<category><![CDATA[reegle]]></category>
		<category><![CDATA[Semantic Search]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=2017</guid>
		<description><![CDATA[Florian Bauer is REEEP&#8217;s Operations and IT Director, responsible for the overall operational management of the organisation, the product management of reegle (the search engine for renewable energy and energy efficiency) and the management of the IT landscape of REEEP. &#8230; <a href="http://blog.semantic-web.at/2011/03/16/florian-bauer-i-like-to-view-%e2%80%9clinked-data%e2%80%9d-as-a-%e2%80%9csingle-worldwide-api%e2%80%9d/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><span><strong><a href="http://blog.semantic-web.at/wp-content/uploads/2011/03/Florain-Bauer-REEEP.jpg"><img title="Florain Bauer REEEP" src="http://blog.semantic-web.at/wp-content/uploads/2011/03/Florain-Bauer-REEEP.jpg" alt="Florian Bauer" hspace="5" width="150" height="197" align="left" /></a>Florian Bauer</strong> is <a href="http://www.reeep.org" target="_blank">REEEP&#8217;s</a> Operations and IT Director, responsible for the overall operational management of the organisation, the product management of reegle (the search engine for renewable energy and energy efficiency) and the management of the IT       landscape of REEEP.</span></p>
<p>PoolParty Team had the chance to talk with Florian about reegle &#8211; information gateway on clean energy.</p>
<p><strong><em>Could you please give us a brief overview over reegle &#8211; what are the targets you are pursuing with this platform?</em></strong></p>
<p>The main aim of the reegle information gateway (<a href="http://www.reegle.info" target="_blank">http://www.reegle.info</a>) is to provide a one-stop gateway to comprehensive, high-quality and up-to-date information on clean energy.  By making this information accessible to stakeholders in the field around the world, and by presenting it in a user-friendly and intuitive format, reegle directly helps to facilitate the transition to low-carbon energy.</p>
<p>The website provides information on renewable energy, energy efficiency and climate change and their various sub-sectors at a global level, and some reegle services actually combine raw data sets from several different sources, put these datasets into context and thus provide enriched information.</p>
<p>reegle is an offshoot of the Renewable Energy &amp; Energy Efficiency Partnership (<a href="http://www.reeep.org" target="_blank">REEEP</a>), a non-profit, specialist change agent aiming to catalyze the market for renewable energy and energy efficiency, with a primary focus on emerging markets and developing countries.</p>
<p>The new reegle data portal (<a href="http://data.reegle.info" target="_blank">data.reegle.info</a>), launched in 2011, has established reegle as a publisher and consumer of Linked Open Data in the energy sector. It provides key clean energy datasets free for re-use using Linked Open Data W3C standards.</p>
<p><strong><em>reegle consists of two components: one is the semantic search engine (<a href="http://www.reegle.info/" target="_blank">http://www.reegle.info/</a>), the other is the linked data portal (<a href="http://data.reegle.info/" target="_blank">http://data.reegle.info/</a>) &#8211; What are your target groups, and which typical problems of the clean energy domain can you solve with these services?</em></strong></p>
<p>For reegle.info, our target groups are primarily project developers, financiers and government policy-makers. These users can access high-quality information on clean energy-related issues with the set of tools we provide:  a special web search, a catalogue of more than 1700 key stakeholders, a map view for geographical browsing, a clean energy glossary, and an <a href="http://www.reegle.info/countries" target="_blank">energy country profiles</a> function.</p>
<p>The energy country profiles are typical of what we’re trying to achieve.  Here, we take information from many different providers and combine it all to present one comprehensive information dossier on renewable energy and energy efficiency in that particular country.  This means that in one location you have the country’s most important energy-related information ranging from key statistics, and current regulations to key players in the energy field in both public and private sectors.</p>
<p>For our data portal, the target group is a more technical one:  primarily IT developers and open data specialists who want to create new mash-ups and integrate data from reegle into other websites. One of the first using these reegle data sets is the <a href="http://OpenEI.org" target="_blank">OpenEI.org</a> website, another key portal in the energy field.</p>
<p><strong><em>Open data is not the same as linked open data. Why did you choose to build your services around W3C´s linked data paradigm and/or standards like RDF?</em></strong></p>
<p>Tim Berners-Lee once mentioned that he likes to compare the progressive ways of offering data with the “stars system” used to rate hotels. You get:</p>
<p>* for making data public (in any format)<br />
** for machine-readable formats (structured data)<br />
*** if the data is offered in a non-proprietary format<br />
**** if you use URIs to identify things, so people can point to your datasets<br />
***** for linking to other people’s data to provide context</p>
<p>So, as you can imagine, our goal is for reegle to be firmly in the 5-star category, and to establish reegle as an avant-garde tool in energy data.<br />
I also like to view “linked data” as a “single worldwide API”.  If the old web was like a huge book, the new semantic web is like a huge database, and SPARQL is the way to ask for information – by sending a query through the SPARQL Endpoint. RDF is the language that offers all possibilities to describe a given dataset with all of the necessary information, including any links to other datasets. Therefore RDF data and SPARQL endpoints provide a powerful tool to find and filter datasets and are crucial, base parts of the semantic web’s architectural layers. On reegle the SPARQL endpoint and the description of the structure of our RDF files is online on our <a href="http://data.reegle.info/" target="_blank">clean energy open data portal</a>.</p>
<p><strong><em>You also decided to build a SKOS based domain thesaurus for clean energy which now plays an important role to improve the search experience at reegle.<br />
Which experiences have you gained so far from this effort? Which obstacles did you have to overcome?</em></strong></p>
<p>The SKOS-based renewable energy thesaurus can be seen as the “heart” of reegle as it provides the basis for a lot of related services in reegle, including the refinement suggestions for search results, the auto-completion options and the glossary links between defined terms and their synonyms and related terms.</p>
<p>We decided to use SKOS because we think it is the best language for building a formal and controlled vocabulary for thesauri in a semantic web context, without adding too much complexity. Although it is a simple language, you really still need IT experts to use it to build a thesaurus – domain experts with additional IT skills (hard to find!).</p>
<p>So in our case, we decided to use a scalable and easy-to-use thesaurus server called “<a href="http://poolparty.punkt.at/" target="_blank">PoolParty</a>”. Using this system drastically reduced the complexity, and allowed us to concentrate on the actual building of the thesaurus with our domain experts, and to spend less time on transferring the knowledge into data sets.</p>
<p><strong><em>What are your future plans with reegle?</em></strong></p>
<p>Currently we’re working on restructuring the site to better highlight our new added-value services such as the clean energy country profiles. We are also planning to further develop our thesaurus to include climate-compatible development terms and we’ll soon release a wordpress plug-in to insert this thesaurus into clean energy blogs. One of the most exciting projects we are actually working on is the development of “dossier pages”, where we will provide relevant information to several topics mashed up on one page using semantic web technologies. This is part of the EU funded <a href="http://www.scms.eu/" target="_blank">SCMS</a> (“semantic content management system”) project.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2011/03/16/florian-bauer-i-like-to-view-%e2%80%9clinked-data%e2%80%9d-as-a-%e2%80%9csingle-worldwide-api%e2%80%9d/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hjalmar Gislason: &#8220;What I call the emerging field of Data Market.&#8221;</title>
		<link>http://blog.semantic-web.at/2011/03/03/hjalmar-gislason-what-i-call-the-emerging-field-of-data-market/</link>
		<comments>http://blog.semantic-web.at/2011/03/03/hjalmar-gislason-what-i-call-the-emerging-field-of-data-market/#comments</comments>
		<pubDate>Thu, 03 Mar 2011 18:33:48 +0000</pubDate>
		<dc:creator>Thomas Thurner</dc:creator>
				<category><![CDATA[Linked Data & Open Data]]></category>
		<category><![CDATA[Open Government Data]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Tools & Software]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[startup]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=1991</guid>
		<description><![CDATA[Open Government Data, and Open Data provided by the corporate sector, stimulate an upcoming market segment: Commercial Open Data Services. The islandic StartUp datamarket.com is on of the emerging companies in this field. Thomas Thurner from Semantic Web Company had &#8230; <a href="http://blog.semantic-web.at/2011/03/03/hjalmar-gislason-what-i-call-the-emerging-field-of-data-market/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><img align="right" hspace="5" title="Hjalmar Gislason" src="http://blog.semantic-web.at/wp-content/uploads/2011/03/hjalli2.jpg" alt="" width="125" height="167" />Open Government Data, and Open Data provided by the corporate sector, stimulate an upcoming market segment: Commercial Open Data Services. The islandic StartUp datamarket.com is on of the emerging companies in this field. Thomas Thurner from Semantic Web Company had the chance to talk to Hjalmar Gislason, founder and CEO of datamarket.com.<br />
<code> </code></p>
<div><strong>Semantic Puzzle:</strong> What&#8217;s the business idea behind <a href="http://datamarket.com/" target="_blank">datamarket.com</a>? Whom do you expect to pay for what?</div>
<div><strong>Hjalmar Gislason</strong><strong>: </strong>From the end-user perspective its easiest to describe <a href="http://datamarket.com/" target="_blank">datamarket.com</a> as a search engine for statistical data, a &#8220;Google for statistics&#8221; if you will. Any data that is already available open and for free  out there will still be open and free on DataMarket, just easier to  find, use, compare and download from a single source. While  the audience for a search engine for statistical content is obviously  way smaller than for text content, a significant part of that audience  is business users, looking for data for business reasons. This means  that there are more direct and lucrative methods to monetize the usage  than simply contextual ads &#8211; especially in reselling access to premium  data. This is a market that already turns over billions of dollars  annually, but is as far from any of the &#8220;2.0 world&#8221; as one could  possibly imagine (think <a href="http://bloomberg.com" target="_blank">Bloomberg</a>, <a href="http://reuters.com" target="_blank">Reuters</a>, <a href="http://factset.com" target="_blank">FactSet</a>).  We believe there is an opportunity to disrupt a part of their business  with a freemium approach, and furthermore open up the data market by  reaching a business audience outside the narrowly defined financial user  base that these companies cater to. There is data out there – free and premium alike –  that can help almost any business make better plans and decisions.  Connecting people and businesses to the data that they need will release  phenomenal value. Tapping into just a fraction of that will be a hugely  successful business for those that get it right.</div>
<p><code> </code></p>
<div><strong>Semantic Puzzle:</strong> Can you tell me a bit about the technological framework behind <a href="http://datamarket.com/" target="_blank">datamarket.com</a>? How is the content from third parties is feeded</div>
<div>into the system, and which APIs do you use? As you provide mainly XLS and CSV, have you thought, to provide data also als XML in future?</div>
<div><strong>Hjalmar Gislason</strong><strong>: </strong>The backend system is written in Python. We read data from the sources  in various different formats, ranging from Excel files and even scraping  of web pages to proprietary APIs and Web Services. The data is then  stored in a normalized format in a Postgres database that we&#8217;re using in  a pretty unique way to be able to efficiently store the billions of  time series and fact values that the system will eventually hold  (currently at around <a href="http://blog.datamarket.com/2011/01/23/13-thousand-data-sets-100-million-time-series-600-million-facts" target="_blank">100 million time series and 600 million fact values</a>). The web site is also written in Python, using the <a href="http://www.djangoproject.com" target="_blank">Django</a> framework, but also making use of a lot of javascript libraries (and a  bunch of our own code) to allow for an exciting user experience. We&#8217;re  currently using a Flash-based solution called <a href="http://amcharts.com" target="_blank">amCharts</a> for the charts, but have already taken some steps to replace that with  our own solution that we&#8217;ve written on top of the excellent <a href="http://vis.stanford.edu/protovis" target="_blank">Protovis</a> visualization library. While you are right that the export formats we  provide for end users are XLS, CSV and images (for exporting the  graphs), our <a href="http://datamarket.com/p/api/" target="_blank">REST-ful API</a> actually supports XML and JSON formats as well. So we already provide data as XML.</div>
<p><code> </code></p>
<div><strong>Semantic Puzzle: </strong>As you for sure know Tim Berners-Lee&#8217;s 5-stars scheme for OGD-Providers. Where do you se your own service in this framework?</div>
<div><strong>Hjalmar Gislason</strong><strong>: </strong>Any fact value, time series and data set on DataMarket is &#8220;addressable&#8221;  with a direct URL using our API. In that sense, all the data on  DataMarket is four-star data according to Berners-Lee&#8217;s definition. In  many cases we&#8217;re integrating to data that is only one or two star data, so  just by integrating it into our system we&#8217;ve moved it a few notches up  that ladder. In some cases we&#8217;ve even been helping organizations  publishing data for the first time, taking the data from 0 to 4 stars in  one go. We&#8217;ve been toying around with several ideas that  would take &#8211; or enable users to take &#8211; the data all the way to 5-star  status, but that&#8217;s still just on the drawing table.</div>
<p><code> </code></p>
<div><strong>Semantic Puzzle: </strong>You re-use a lot of Open Data comming from the Island Government. Is there also a state-owned Data Portal for Island, or is</div>
<div>your service a &#8220;commercial replacement&#8221; for such a public effort?</div>
<div><strong>Hjalmar Gislason</strong><strong>: </strong>There is no government-operated data portal in Iceland, and to my  knowledge there are no plans for implementing one yet. Sadly there are  several more pressing issues in terms of eGovernment here that take  higher priority. We don&#8217;t see our efforts as a replacement for such a  portal, but we have managed to fulfill a little part of that role when  it comes to statistical data. We&#8217;ve also been really vocal about the  benefits of open data and among other things been influential in  launching an open data wiki - <a href="http://opingogn.net/" target="_blank">opingogn.net</a> (Icelandic only) &#8211; that exmplains the concepts with examples and use  cases and attempts to list in a directory listing as many sources of  government data as possible. There is some movement, but as an open data  enthusiast I&#8217;d really like to see things happening faster. As a matter  of fact I think there are reasons for Iceland to be extra enthusiastic  about open data to <a href="http://blog.datamarket.com/2009/08/26/iceland-restoring-trust-through-open-data-and-brutal-transparency/" target="_blank">increase transparency and restore trust</a> after the crash of the banks and the economic system in 2008.</div>
<p><code> </code></p>
<div><strong>Semantic Puzzle: </strong>A lot of commercial Open Data Services (Socrata, Factual, Google &#8230;) are evolving at the moment. What do you think, which development this market segment will face in the next month and years, and are you able to list your sight on the crucial factors for such business?</div>
<div><strong>Hjalmar Gislason</strong><strong>: </strong>I&#8217;ve been writing quite a lot up on the developments in this industry  on our blog. One of the things I&#8217;ve written the most about is what I  call the <a href="http://blog.datamarket.com/2011/02/25/the-emerging-field-of-data-markets-our-competitive-landscape/" target="_blank">Emerging field of Data Market</a>&#8220;. I define &#8220;data markets&#8221; as &#8220;Services that make it  easy to find data from a range of secondary data sources, then consume  or acquire the data in a usable – and often unified – format.&#8221; Many of  these services are trying to create marketplaces for data, envisioning  that data providers can offer their data sets for sale to data seekers. As there are several players in this space already, I  believe we’ll see many of them try to differentiate themselves in 2011  by focusing on specific types of data. There are definitely  opportunities in building specialized data markets for geospatial data,  for statistics and for enormous scientific data sets – to name a few  types – and each comes with their own challenges, target audiences and  preferred approaches. In the spirit of doing one thing and doing it  well, I think most of these projects will want to see success in one  such segment of the market before generalizing – or consolidating.</div>
<p><code> </code><br />
<strong>The interviewee: </strong>Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup &#8211; Spurl. Hjalmar offers a mix of business, strategy and technical expertise.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2011/03/03/hjalmar-gislason-what-i-call-the-emerging-field-of-data-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why SKOS thesauri matter &#8211; the next generation of semantic technologies</title>
		<link>http://blog.semantic-web.at/2010/08/31/why-skos-thesauri-matter-the-next-generation-of-semantic-technologies/</link>
		<comments>http://blog.semantic-web.at/2010/08/31/why-skos-thesauri-matter-the-next-generation-of-semantic-technologies/#comments</comments>
		<pubDate>Tue, 31 Aug 2010 04:44:19 +0000</pubDate>
		<dc:creator>Andreas Blumauer</dc:creator>
				<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Semantic Web Applications]]></category>
		<category><![CDATA[Text Mining]]></category>
		<category><![CDATA[Tools & Software]]></category>
		<category><![CDATA[lasso]]></category>
		<category><![CDATA[lod2]]></category>
		<category><![CDATA[PoolParty]]></category>
		<category><![CDATA[recommender system]]></category>
		<category><![CDATA[similarity search]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=1683</guid>
		<description><![CDATA[As a matter of fact still a lot of &#8220;semantic technologies&#8221; are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of &#8230; <a href="http://blog.semantic-web.at/2010/08/31/why-skos-thesauri-matter-the-next-generation-of-semantic-technologies/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As a matter of fact still a lot of &#8220;semantic technologies&#8221; are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like &#8220;similarity search&#8221;, the search for similar documents to enable cross-reading or recommendation systems.</p>
<p>Providers of <strong>first generation semantic technologies</strong> calculate rather basic &#8220;semantic networks&#8221; by co-occurency analysis which results sometimes in  disappointing results. Bearing in mind that Google just bought a company (&#8220;<a href="http://techcrunch.com/2010/07/16/google-acquires-metaweb-to-make-search-smarter/" target="_blank">Google buys Metaweb</a>&#8220;) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.</p>
<p><a href="http://blog.semantic-web.at/wp-content/uploads/2010/08/PoolParty-DemoZone-Screenshot.png"><img class="alignleft size-medium wp-image-1687" title="PoolParty DemoZone Screenshot" src="http://blog.semantic-web.at/wp-content/uploads/2010/08/PoolParty-DemoZone-Screenshot-300x219.png" alt="" width="300" height="219" /></a></p>
<p>A <a href="http://poolparty.punkt.at/demozone" target="_blank">demo application</a> was recently developed by <a href="http://twitter.com/PoolParty_Team" target="_blank">PoolParty team</a> where one can find out how thesauri will improve search results on top of <strong>second generation semantic technologies</strong>. With <a href="http://poolparty.punkt.at/" target="_blank">PoolParty</a> SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag &amp; Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) <a href="http://zbw.eu/stw/">STW (Standard Thesaurus für Wirtschaft)</a>, <a href="http://dbpedia.org">DBpedia</a> and respective articles from Wikipedia.</p>
<p>STW which was developed by the <a href="http://www.zbw.eu/" target="_blank">German National Library of Economics</a> (ZBW) provides vocabulary on any economic subject: about 6,000  standardized subject headings and about 18,000 entry terms to support  individual keywords.</p>
<p>This background knowledge is used in this demo app to improve the search for similar documents dramatically:</p>
<blockquote><p>Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as &#8220;similar documents&#8221;.</p></blockquote>
<p>This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from <a href="http://www.econstor.eu/">Econstor</a>) but of course for other recommender systems thesauri from other domains can be used instead of STW.</p>
<p><a href="http://blog.semantic-web.at/wp-content/uploads/2010/08/skos_hand.png"><img class="alignleft size-medium wp-image-1694" title="skos_hand" src="http://blog.semantic-web.at/wp-content/uploads/2010/08/skos_hand-272x300.png" alt="" width="272" height="300" /></a></p>
<p>Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of<strong> third generation semantic technologies</strong> are currently developed by <a href="http://www.lassoproject.org/" target="_blank">LASSO project</a> and <a href="http://bit.ly/dcDlda" target="_blank">LOD2 project</a>, two innovative projects in the area of linked data and the semantic web.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2010/08/31/why-skos-thesauri-matter-the-next-generation-of-semantic-technologies/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What if the biggest web company bought one of the central semantic web players?</title>
		<link>http://blog.semantic-web.at/2010/07/17/what-if-the-biggest-web-company-bought-one-of-the-central-semantic-web-players/</link>
		<comments>http://blog.semantic-web.at/2010/07/17/what-if-the-biggest-web-company-bought-one-of-the-central-semantic-web-players/#comments</comments>
		<pubDate>Sat, 17 Jul 2010 10:47:38 +0000</pubDate>
		<dc:creator>Andreas Blumauer</dc:creator>
				<category><![CDATA[Companies & Institutions]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Freebase]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Metaweb Technologies]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=1656</guid>
		<description><![CDATA[Well, exactly this happened yesterday: Google bought Metaweb &#8211; provider of Freebase. Freebase is an important hub in the linked data cloud providing 12 million entities with uniform resource identifiers most of them linked to other semantic web datasets like &#8230; <a href="http://blog.semantic-web.at/2010/07/17/what-if-the-biggest-web-company-bought-one-of-the-central-semantic-web-players/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Well, exactly this happened yesterday: <a href="http://googleblog.blogspot.com/2010/07/deeper-understanding-with-metaweb.html">Google bought Metaweb</a> &#8211; provider of <a href="http://www.freebase.com/">Freebase</a>. Freebase is an important hub in the linked data cloud providing 12 million entities with uniform resource identifiers most of them linked to other semantic web datasets like <a href="http://dbpedia.org">DBpedia</a> or <a href="http://data.nytimes.com/" target="_blank">New York Times</a>. For example: <a href="http://www.freebase.com/view/en/google" target="_blank">Google´s page on Freebase</a> offers a rich source for <a href="http://rdf.freebase.com/rdf/en.google" target="_blank">machine-readable facts</a> around this company.</p>
<p><em>What does this mean to the Semantic Web Community which has  been working on a smarter web in the last decade?</em><br />
Well, a lot&#8230; First of all, it´s good to hear that Google will continue to develop Freebase as a free and open database to everyone, saying &#8220;&#8230; we would be delighted if other web companies use and contribute to the data.&#8221;</p>
<p>Until yesterday still a lot of companies were not fully convinced if the Semantic Web will play a central role in the further development of the Internet. Now the game has changed. The entity-driven approach to develop web applications has just started now:</p>
<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/TJfrNo3Z-DU&amp;hl=en_US&amp;fs=1?rel=0&amp;hd=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/TJfrNo3Z-DU&amp;hl=en_US&amp;fs=1?rel=0&amp;hd=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="500" height="306"></embed></object></p>
<p>We will keep on reporting and discussing how Google will influence the development of the Semantic Web &#8211; and if I had a wish for free: Please add RDF(a) to the Freebase widgets!</p>
<div class="fb-widget" id="fbtb-8700f09960b8433d88673b1d2d83be08" style="border:0; outline:0; padding:0; margin:0; position:relative;" itemscope="" itemid="http://www.freebase.com/id/en/semantic_web" itemtype="http://www.freebase.com/id/business/industry">
<form class="fb-widget-placeholder" style="border:0; outline:0; padding:0; margin:0;">
<input name="src" value="http://www.freebase.com/widget/topic?track=topicblocks_embedthis&amp;mode=content&amp;id=%2Fen%2Fsemantic_web" type="hidden" />
<input name="width" value="413" type="hidden" />
<input name="height" value="285" type="hidden" /> <span style="line-height:1; border:0; outline:0; padding:0; margin:0; display:inline-block; padding:5px; background:#eee; border-radius:5px; -moz-border-radius:5px; -webkit-border-radius:5px;">
<div style="text-align:left; vertical-align:baseline; line-height:1; border:0; outline:0; margin:0 0 5px 5px;"> <a style="text-align:left; vertical-align:baseline; font-family:'Helvetica Neue', Arial, sans-serif; font-size:13px; font-weight:bold; line-height:1.6; text-decoration:none; color:#17b; border:0; outline:0; padding:0; margin:0;" href="http://www.freebase.com/view/en/semantic_web" target="_blank" > Semantic Web </a> </div>
<div style="vertical-align:top; border:1px solid #ddd; outline:0; padding:0; margin:0; position: relative; width:400px; height:220px; overflow:auto; background-color:#fff"> <img src="http://img.freebase.com/api/trans/image_thumb/en/semantic_web?pad=1&amp;errorid=%2Ffreebase%2Fno_image_png&amp;maxheight=150&amp;mode=fillcropmid&amp;maxwidth=150" title="Semantic Web" style="border:0; outline:0; padding: 0; margin: 28px auto; display: block;"> </div>
<p> </span> </form>
<p> <script src="http://freebaselibs.com/static/widgets/2/widget.js" type="text/javascript" defer=""></script> </div>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2010/07/17/what-if-the-biggest-web-company-bought-one-of-the-central-semantic-web-players/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Linking Open Data to Thesaurus Management</title>
		<link>http://blog.semantic-web.at/2010/02/16/linking-open-data-to-thesaurus-management/</link>
		<comments>http://blog.semantic-web.at/2010/02/16/linking-open-data-to-thesaurus-management/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 16:23:11 +0000</pubDate>
		<dc:creator>Tassilo Pellegrini</dc:creator>
				<category><![CDATA[Corporate Semantic Web]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Linked Data & Open Data]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Semantic Web Applications]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[dbpedia]]></category>
		<category><![CDATA[KIWI]]></category>
		<category><![CDATA[kiwiknows]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[PoolParty]]></category>
		<category><![CDATA[RDFa]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Simple Knowledge Organization System]]></category>
		<category><![CDATA[SKOS]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=1430</guid>
		<description><![CDATA[The Vienna-based company punkt. netServices is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here &#8230; <a href="http://blog.semantic-web.at/2010/02/16/linking-open-data-to-thesaurus-management/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.semantic-web.at/wp-content/uploads/2010/02/poolparty-logo.jpg"><img class="alignleft size-full wp-image-1466" title="poolparty-logo" src="http://blog.semantic-web.at/wp-content/uploads/2010/02/poolparty-logo-e1266070425356.jpg" alt="" width="261" height="95" /></a>The Vienna-based company <a href="http://www.punkt.at" target="_blank">punkt. netServices</a> is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a <a href="http://poolparty.punkt.at/PoolParty/" target="_blank">demo</a>.</p>
<p><strong>Purpose</strong></p>
<p>Poolparty was conceived to facilitate various applications like</p>
<ul>
<li> Semantic search engines</li>
<li> Recommender systems (similarity search)</li>
<li> Corporate bookmarking</li>
<li> Annotation- &amp; tag recommender systems</li>
<li> Autocomplete services and facetted browsing.</li>
</ul>
<p>These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.</p>
<p><strong>Thesaurus Management</strong></p>
<p>PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag &amp; drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.</p>
<p><a href="http://blog.semantic-web.at/wp-content/uploads/2010/02/poolparty-blueskin.jpg"><img title="poolparty-blueskin" src="http://blog.semantic-web.at/wp-content/uploads/2010/02/poolparty-blueskin.jpg" alt="poolparty-blueskin" width="504" height="263" /></a></p>
<p>PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus&#8217; domain in order to glean candidate terms. This is done by the key-phrase extractor of <a href="http://www.nzdl.org/Kea/index.html">KEA</a>. The extracted terms can be selected by the user, thereby becoming &#8220;free concepts&#8221; which later can be integrated into the thesaurus, turning them into &#8220;approved concepts&#8221;.</p>
<p>Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept&#8217;s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic search. Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.</p>
<p>PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: <a href="http://poolparty.punkt.at/PoolParty/HTMLFrontEnd/urn:uuid:1D64A764-CBCE-0001-6148-DA20F637144F/" target="_blank">Cocktail thesaurus</a> &amp;  <a href="http://poolparty.punkt.at/PoolParty/HTMLFrontEnd/urn:uuid:1D649E15-C6CC-0001-C311-60702F00C880/?URI=http%3A%2F%2Fzbw.eu%2Fstw" target="_blank">Standard Thesaurus for Economics</a>)</p>
<p style="text-align: center;"><a href="http://blog.semantic-web.at/wp-content/uploads/2010/02/PoolParty-Wiki-Frontend.png"><img class="aligncenter size-full wp-image-1468" title="PoolParty Wiki Frontend" src="http://blog.semantic-web.at/wp-content/uploads/2010/02/PoolParty-Wiki-Frontend.png" alt=""  /></a></p>
<p>PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or <a href="http://zthes.z3950.org/" target="_blank">Zthes</a> format. Those functionalities can also be consumed as stand-alone web services via <a href="http://demo.semantic-web.at:8080/SkosServices/index" target="_blank">PoolParty SKOS Services</a>. Additionaly, lists of concepts and their labels can also be imported via CSV files.</p>
<p><strong>Linked (Open) Data</strong></p>
<p>PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.</p>
<p>Concepts in the thesaurus can be linked to e.g. DBpedia  via a service like <a href="http://www.georgikobilarov.com/">Georgi Kobilarov</a>&#8216;s <a href="http://lookup.dbpedia.org/" target="_blank">DBpedia lookup service</a>, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.</p>
<p><a href="http://blog.semantic-web.at/wp-content/uploads/2010/02/poolparty-lod.jpg"><img title="poolparty-lod" src="http://blog.semantic-web.at/wp-content/uploads/2010/02/poolparty-lod.jpg" alt="poolparty-lod" width="630" height="265" /></a></p>
<p>Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.</p>
<p>PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the <a href="http://www.kiwi-project.eu/" target="_blank">KiWi  project</a> funded by the European Commission. KiWi also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.</p>
<p>Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.</p>
<p>This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using REST/HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.</p>
<p><strong>Technology</strong></p>
<p>The software is written in Java and utilizes the <a href="http://www.openrdf.org/doc/sesame2/system/ch05.html" target="_blank">SAIL API</a>, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on <a href="http://developer.yahoo.com/yui/" target="_blank">Yahoo User Interface (YUI)</a>. Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the <a href="http://www.nzdl.org/Kea/" target="_blank">KEA</a> 5 API, which is extended for the use of controlled vocabularies stored in a SAIL Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in <a href="http://en.wikipedia.org/wiki/Lucene" target="_blank">Lucene</a>/<a href="http://en.wikipedia.org/wiki/Solr" target="_blank">Solr</a> or any other (enterprise) search system along with extracted and semantically related concepts.</p>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/4251823d-5925-4c7d-8d67-e74c82af33f9/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=4251823d-5925-4c7d-8d67-e74c82af33f9" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2010/02/16/linking-open-data-to-thesaurus-management/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>1000-and-one pulldowns</title>
		<link>http://blog.semantic-web.at/2009/05/12/1000-and-one-pulldowns/</link>
		<comments>http://blog.semantic-web.at/2009/05/12/1000-and-one-pulldowns/#comments</comments>
		<pubDate>Tue, 12 May 2009 09:38:39 +0000</pubDate>
		<dc:creator>Thomas Thurner</dc:creator>
				<category><![CDATA[Internet & Media]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Knowledge representation]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Usability]]></category>
		<category><![CDATA[wolfram alpha]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=950</guid>
		<description><![CDATA[Image by wocrig via Flickr Luckily, times have come, where semantic search techniques have found their way to enhance knowledge providing theme portals. Nearly once a week a new knowledge portal with built-in semantic search pops up. They deal with &#8230; <a href="http://blog.semantic-web.at/2009/05/12/1000-and-one-pulldowns/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright">
<dt class="wp-caption-dt"><a href="http://www.flickr.com/photos/22857422@N03/3052628550"><img title="Personalisation interface" src="http://farm4.static.flickr.com/3168/3052628550_fd2612118c_m.jpg" alt="Personalisation interface" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image by <a href="http://www.flickr.com/photos/22857422@N03/3052628550">wocrig</a> via Flickr</dd>
</dl>
</div>
</div>
<p>Luckily, times have come, where semantic search techniques have found their way to enhance knowledge providing theme portals. Nearly once a week a new knowledge portal with built-in semantic search pops up. They deal with environmental issues, health care, economy etc. These sites are good examples how the vision of a knowledge web is fostered by semantic technologies. Such focused approaches are great showcases for &#8220;a&#8221; <a class="zem_slink" title="Semantic Web" rel="wikipedia" href="http://en.wikipedia.org/wiki/Semantic_Web">semantic web</a> (even if they are not based on &#8220;the&#8221; RDF semantic web) in the next few months besides  general knowledge portals like Wolfram Alpha.</p>
<p>But the potential of these semantic theme portals is often reduced essentially by their bad <a class="zem_slink" title="Usability" rel="wikipedia" href="http://en.wikipedia.org/wiki/Usability">usability</a>. You get lost in categories and flags &#8211; you get puzzled  by pulldowns, mouseovers and embedded hierachies &#8211; it&#8217;s sometimes a mess out off 1001 functions. You need to understand the underpinning semantic concept to get oriented within these applications &#8211; and this is not the goal of the exercise. Search has to be easy.</p>
<p>To show the potential of semantic technologies, we need good examples, which offer good usability. This is a call to everyone to provide such examples.</p>
<p>See my favorites:</p>
<ul>
<li><a href="http://www.nextbio.com">NextBio</a>,  a platform that enables life science researchers to search, discover, and share knowledge locked within public and proprietary data</li>
<li><a href="http://www.reegle.info/">reegle</a>, the Search Engine for Renewable Energy and Energy Efficiency</li>
<li><a href="http://www.kulttuurisampo.fi">CultureSampo</a>,  a Finnish cultural heritage platform for institutional organizations as well as private citizens</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/f54f1556-ce0d-442c-b789-e870bebbda33/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=f54f1556-ce0d-442c-b789-e870bebbda33" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2009/05/12/1000-and-one-pulldowns/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Linked Data is not owl:sameAs Semantic Web</title>
		<link>http://blog.semantic-web.at/2009/03/30/linked-data-is-not-owlsameas-semantic-web/</link>
		<comments>http://blog.semantic-web.at/2009/03/30/linked-data-is-not-owlsameas-semantic-web/#comments</comments>
		<pubDate>Mon, 30 Mar 2009 08:21:59 +0000</pubDate>
		<dc:creator>Andreas Blumauer</dc:creator>
				<category><![CDATA[Linked Data & Open Data]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[cloudlet]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[linking open data]]></category>
		<category><![CDATA[OpenLink Software]]></category>
		<category><![CDATA[Tagcloud]]></category>
		<category><![CDATA[Talis]]></category>
		<category><![CDATA[Tim Berners-Lee]]></category>
		<category><![CDATA[wonder wheel]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=759</guid>
		<description><![CDATA[While some people work heavily on the extension of the semantic web infrastructure, like Talis Connected Commons or OpenLink´s Amazon EC2 Instantiation others have started to bring the semantic web closer to the developers and therefore to a much broader &#8230; <a href="http://blog.semantic-web.at/2009/03/30/linked-data-is-not-owlsameas-semantic-web/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-762" title="twitter_cloudlet" src="http://blog.semantic-web.at/wp-content/uploads/2009/03/twitter_cloudlet.jpg" alt="twitter_cloudlet" width="251" height="244" />While some people work heavily on the extension of the semantic web infrastructure, like <a href="http://blogs.talis.com/n2/cc" target="_blank">Talis Connected Commons</a> or <a href="http://virtuoso.openlinksw.com/wiki/main/Main/VirtInstallationEC2" target="_blank">OpenLink´s Amazon EC2 Instantiation</a> others have started to bring the semantic web closer to the developers and therefore to a much broader audience: They offer search facilities or Linked Data Navigators like <a href="http://lod.openlinksw.com/" target="_blank">OpenLink´s Entity Finder</a> or <a href="http://visinav.deri.org/" target="_blank">DERI´s VisiNav</a>.</p>
<p>Those kind of applications should not be confused with &#8220;semantic web&#8221; end-user-applications like <a href="http://searchengineland.com/google-wonder-wheel-17093" target="_blank"> Google´s Wonderwheel</a> or <a href="http://www.crunchbase.com/company/intspei" target="_blank">INTSPEI´s</a> <a href="http://www.intspei.com/Products/SearchCloudlet.aspx" target="_blank">Cloudlet</a>: To add some semantics to existing user-interfaces can be helpful and obviously users are ready for such experiments, but of course this is NOT the innovation which the semantic web will bring but it is a very important step to be taken in parallel with the <a href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData" target="_blank">linked data initiative</a>.</p>
<p>Let´s take a look at Cloudlet: This tool is an easy-to-use <a href="http://www.getcloudlet.com" target="_blank">free Firefox extension</a> that adds context-sensitive tag clouds to the most popular search engines and helps people more efficiently navigate through their search results. The previous version of Search Cloudlet worked with Google and Yahoo; the new version also works with Twitter. It adds Tag Clouds, Author Clouds, Recipient Clouds and Hashtag Clouds to Twitter search, Twitter user profiles and home pages. See <a href="http://www.getcloudlet.com/swm.php?page=reviews" target="_blank">some reviews</a> on this popular tool.</p>
<p>Cloudlet is a child of the Web. INTSPEI has learned all lessons from Web 2.0 especially how to promote ideas using the blogosphere and how to identify market trends as early as possible, and it generates some added value for the users which is obvious. Sure, it doesn´t make use of linked data yet, but as a typical representative of the fast growing &#8220;semantic search evolution&#8221; it reminds me on <a href="http://domino.research.ibm.com/comm/research_people.nsf/pages/welty.index.html" target="_blank">Chris Welty</a>´s famous insight: &#8220;In the <em>Semantic Web</em>, it is not the <em>Semantic</em> which is new, it is the <em>Web</em> which is new.&#8221;</p>
<p>Web 1.0 was the WWW without tons of network effects. Web 2.0 changed that a lot.</p>
<p>Linked Data is not the Semantic Web, it´s the basement for it. From a software developer´s and an IT archictect´s perspective it might seem as those two concepts were the same. But this community represents a very small percentage of all web-users.</p>
<p>So where is the User´s Web in the Linked Data architecture? If you´re looking at <a href="http://www.w3.org/DesignIssues/LinkedData.html" target="_blank">TimBL´s Linked Data principles</a> one can clearly see that this is a &#8220;Web&#8221; for developers.</p>
<p>But things evolve. And some Web companies will jump on the bandwagon and will, for instance, improve their tagclouds, their semantic search, their recommender systems (Twine?) or their similarity search a lot by making use of linked data.</p>
<p>Like semantic search becomes mainstream (or call it &#8220;semantic search 2.0&#8243;) right now, then (in about three years, I guess) linked data will become part of a lot of mainstream applications. Linked data will generate tons of new network effects, maybe even new business models, it won´t be avant-garde anymore. It will be part of the Semantic <em>Web</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2009/03/30/linked-data-is-not-owlsameas-semantic-web/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>the next google</title>
		<link>http://blog.semantic-web.at/2009/03/25/the-next-google/</link>
		<comments>http://blog.semantic-web.at/2009/03/25/the-next-google/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 15:05:35 +0000</pubDate>
		<dc:creator>Thomas Thurner</dc:creator>
				<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Tools & Software]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Search Engine]]></category>
		<category><![CDATA[Web search engine]]></category>

		<guid isPermaLink="false">http://blog.semantic-web.at/?p=751</guid>
		<description><![CDATA[Image via Wikipedia Maybe you have noticed it already; today in the morning something new appeared at Google&#8217;s search engine interface: A bunch of corresponding search-suggestions based on your search query. Google spoke about this enhancement: Starting today, we&#8217;re deploying &#8230; <a href="http://blog.semantic-web.at/2009/03/25/the-next-google/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright" style="width: 212px;">
<dt class="wp-caption-dt"><a href="http://en.wikipedia.org/wiki/Image:Google1998.png"><img title="Google in 1998" src="http://upload.wikimedia.org/wikipedia/en/thumb/b/b7/Google1998.png/202px-Google1998.png" alt="Google in 1998" width="202" height="112" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://en.wikipedia.org/wiki/Image:Google1998.png">Wikipedia</a></dd>
</dl>
</div>
</div>
<p>Maybe you have noticed it already; today in the morning something new appeared at Google&#8217;s search engine interface: A bunch of corresponding search-suggestions based on your search query. <span><span class="IL_SPAN"><a title="NASDAQ: GOOG" rel="stockexchange" href="http://news.techwhack.com/10098-google-search-enhancements">Google</a></span> spoke about this enhancement: </span></p>
<blockquote><p>Starting today, we&#8217;re deploying a new technology that can better understand associations and concepts related to your search, and one of its first applications lets us offer you even more useful related searches (the terms found at the bottom, and sometimes at the top, of the search results page).</p></blockquote>
<p>I tried it. So, if you type in &#8220;time travel&#8221; you also get search proposals like &#8220;theory of relativity time travel&#8221; or &#8220;wormhole time travel&#8221;. Google annouced, that the service is available in various languages. The direct test with German is a little disillusioning: Searching for &#8220;zeit reise&#8221; (which is the same concept as above, in german) leads to alternative searches like &#8220;reisen 50er jahren&#8221; (travel 50ies) and &#8220;reisen im mittelalter&#8221; (travel in the medieval).</p>
<p>Even if this semantic-like extension of the  basis search function still needs some tuning, the point  is getting clearer: Also Google is doing developments to get more meaningful results into their search algorithms. And parts of the semantic methodology are finding their way into mainstream services like <a class="zem_slink" title="Web search engine" rel="wikipedia" href="http://en.wikipedia.org/wiki/Web_search_engine">search engines</a> &#8211; as we have seen with <a href="www.wolframalpha.com">Wolfram Alpha</a> some days ago. So keep your eyes open &#8211; maybe next morning you&#8217;ll find another piece of the semantic puzzle embedded into one of your favorite web-apps.</p>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/d3fd0ad0-cf57-41fb-a32c-3d04848dafc5/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=d3fd0ad0-cf57-41fb-a32c-3d04848dafc5" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.semantic-web.at/2009/03/25/the-next-google/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

