Andreas Blumauer

Florian Bauer: I like to view “linked data” as a “single worldwide API”

Florian BauerFlorian Bauer is REEEP’s Operations and IT Director, responsible for the overall operational management of the organisation, the product management of reegle (the search engine for renewable energy and energy efficiency) and the management of the IT landscape of REEEP.

PoolParty Team had the chance to talk with Florian about reegle – information gateway on clean energy.

Could you please give us a brief overview over reegle – what are the targets you are pursuing with this platform?

The main aim of the reegle information gateway (http://www.reegle.info) is to provide a one-stop gateway to comprehensive, high-quality and up-to-date information on clean energy. By making this information accessible to stakeholders in the field around the world, and by presenting it in a user-friendly and intuitive format, reegle directly helps to facilitate the transition to low-carbon energy.

The website provides information on renewable energy, energy efficiency and climate change and their various sub-sectors at a global level, and some reegle services actually combine raw data sets from several different sources, put these datasets into context and thus provide enriched information.

reegle is an offshoot of the Renewable Energy & Energy Efficiency Partnership (REEEP), a non-profit, specialist change agent aiming to catalyze the market for renewable energy and energy efficiency, with a primary focus on emerging markets and developing countries.

The new reegle data portal (data.reegle.info), launched in 2011, has established reegle as a publisher and consumer of Linked Open Data in the energy sector. It provides key clean energy datasets free for re-use using Linked Open Data W3C standards.

reegle consists of two components: one is the semantic search engine (http://www.reegle.info/), the other is the linked data portal (http://data.reegle.info/) – What are your target groups, and which typical problems of the clean energy domain can you solve with these services?

For reegle.info, our target groups are primarily project developers, financiers and government policy-makers. These users can access high-quality information on clean energy-related issues with the set of tools we provide: a special web search, a catalogue of more than 1700 key stakeholders, a map view for geographical browsing, a clean energy glossary, and an energy country profiles function.

The energy country profiles are typical of what we’re trying to achieve. Here, we take information from many different providers and combine it all to present one comprehensive information dossier on renewable energy and energy efficiency in that particular country. This means that in one location you have the country’s most important energy-related information ranging from key statistics, and current regulations to key players in the energy field in both public and private sectors.

For our data portal, the target group is a more technical one: primarily IT developers and open data specialists who want to create new mash-ups and integrate data from reegle into other websites. One of the first using these reegle data sets is the OpenEI.org website, another key portal in the energy field.

Open data is not the same as linked open data. Why did you choose to build your services around W3C´s linked data paradigm and/or standards like RDF?

Tim Berners-Lee once mentioned that he likes to compare the progressive ways of offering data with the “stars system” used to rate hotels. You get:

* for making data public (in any format)
** for machine-readable formats (structured data)
*** if the data is offered in a non-proprietary format
**** if you use URIs to identify things, so people can point to your datasets
***** for linking to other people’s data to provide context

So, as you can imagine, our goal is for reegle to be firmly in the 5-star category, and to establish reegle as an avant-garde tool in energy data.
I also like to view “linked data” as a “single worldwide API”. If the old web was like a huge book, the new semantic web is like a huge database, and SPARQL is the way to ask for information – by sending a query through the SPARQL Endpoint. RDF is the language that offers all possibilities to describe a given dataset with all of the necessary information, including any links to other datasets. Therefore RDF data and SPARQL endpoints provide a powerful tool to find and filter datasets and are crucial, base parts of the semantic web’s architectural layers. On reegle the SPARQL endpoint and the description of the structure of our RDF files is online on our clean energy open data portal.

You also decided to build a SKOS based domain thesaurus for clean energy which now plays an important role to improve the search experience at reegle.
Which experiences have you gained so far from this effort? Which obstacles did you have to overcome?

The SKOS-based renewable energy thesaurus can be seen as the “heart” of reegle as it provides the basis for a lot of related services in reegle, including the refinement suggestions for search results, the auto-completion options and the glossary links between defined terms and their synonyms and related terms.

We decided to use SKOS because we think it is the best language for building a formal and controlled vocabulary for thesauri in a semantic web context, without adding too much complexity. Although it is a simple language, you really still need IT experts to use it to build a thesaurus – domain experts with additional IT skills (hard to find!).

So in our case, we decided to use a scalable and easy-to-use thesaurus server called “PoolParty”. Using this system drastically reduced the complexity, and allowed us to concentrate on the actual building of the thesaurus with our domain experts, and to spend less time on transferring the knowledge into data sets.

What are your future plans with reegle?

Currently we’re working on restructuring the site to better highlight our new added-value services such as the clean energy country profiles. We are also planning to further develop our thesaurus to include climate-compatible development terms and we’ll soon release a wordpress plug-in to insert this thesaurus into clean energy blogs. One of the most exciting projects we are actually working on is the development of “dossier pages”, where we will provide relevant information to several topics mashed up on one page using semantic web technologies. This is part of the EU funded SCMS (“semantic content management system”) project.

Thomas Thurner

Hjalmar Gislason: “What I call the emerging field of Data Market.”

Open Government Data, and Open Data provided by the corporate sector, stimulate an upcoming market segment: Commercial Open Data Services. The islandic StartUp datamarket.com is on of the emerging companies in this field. Thomas Thurner from Semantic Web Company had the chance to talk to Hjalmar Gislason, founder and CEO of datamarket.com.

Semantic Puzzle: What’s the business idea behind datamarket.com? Whom do you expect to pay for what?
Hjalmar Gislason: From the end-user perspective its easiest to describe datamarket.com as a search engine for statistical data, a “Google for statistics” if you will. Any data that is already available open and for free out there will still be open and free on DataMarket, just easier to find, use, compare and download from a single source. While the audience for a search engine for statistical content is obviously way smaller than for text content, a significant part of that audience is business users, looking for data for business reasons. This means that there are more direct and lucrative methods to monetize the usage than simply contextual ads – especially in reselling access to premium data. This is a market that already turns over billions of dollars annually, but is as far from any of the “2.0 world” as one could possibly imagine (think Bloomberg, ReutersFactSet). We believe there is an opportunity to disrupt a part of their business with a freemium approach, and furthermore open up the data market by reaching a business audience outside the narrowly defined financial user base that these companies cater to. There is data out there – free and premium alike – that can help almost any business make better plans and decisions. Connecting people and businesses to the data that they need will release phenomenal value. Tapping into just a fraction of that will be a hugely successful business for those that get it right.

Semantic Puzzle: Can you tell me a bit about the technological framework behind datamarket.com? How is the content from third parties is feeded
into the system, and which APIs do you use? As you provide mainly XLS and CSV, have you thought, to provide data also als XML in future?
Hjalmar Gislason: The backend system is written in Python. We read data from the sources in various different formats, ranging from Excel files and even scraping of web pages to proprietary APIs and Web Services. The data is then stored in a normalized format in a Postgres database that we’re using in a pretty unique way to be able to efficiently store the billions of time series and fact values that the system will eventually hold (currently at around 100 million time series and 600 million fact values). The web site is also written in Python, using the Django framework, but also making use of a lot of javascript libraries (and a bunch of our own code) to allow for an exciting user experience. We’re currently using a Flash-based solution called amCharts for the charts, but have already taken some steps to replace that with our own solution that we’ve written on top of the excellent Protovis visualization library. While you are right that the export formats we provide for end users are XLS, CSV and images (for exporting the graphs), our REST-ful API actually supports XML and JSON formats as well. So we already provide data as XML.

Semantic Puzzle: As you for sure know Tim Berners-Lee’s 5-stars scheme for OGD-Providers. Where do you se your own service in this framework?
Hjalmar Gislason: Any fact value, time series and data set on DataMarket is “addressable” with a direct URL using our API. In that sense, all the data on DataMarket is four-star data according to Berners-Lee’s definition. In many cases we’re integrating to data that is only one or two star data, so just by integrating it into our system we’ve moved it a few notches up that ladder. In some cases we’ve even been helping organizations publishing data for the first time, taking the data from 0 to 4 stars in one go. We’ve been toying around with several ideas that would take – or enable users to take – the data all the way to 5-star status, but that’s still just on the drawing table.

Semantic Puzzle: You re-use a lot of Open Data comming from the Island Government. Is there also a state-owned Data Portal for Island, or is
your service a “commercial replacement” for such a public effort?
Hjalmar Gislason: There is no government-operated data portal in Iceland, and to my knowledge there are no plans for implementing one yet. Sadly there are several more pressing issues in terms of eGovernment here that take higher priority. We don’t see our efforts as a replacement for such a portal, but we have managed to fulfill a little part of that role when it comes to statistical data. We’ve also been really vocal about the benefits of open data and among other things been influential in launching an open data wiki - opingogn.net (Icelandic only) – that exmplains the concepts with examples and use cases and attempts to list in a directory listing as many sources of government data as possible. There is some movement, but as an open data enthusiast I’d really like to see things happening faster. As a matter of fact I think there are reasons for Iceland to be extra enthusiastic about open data to increase transparency and restore trust after the crash of the banks and the economic system in 2008.

Semantic Puzzle: A lot of commercial Open Data Services (Socrata, Factual, Google …) are evolving at the moment. What do you think, which development this market segment will face in the next month and years, and are you able to list your sight on the crucial factors for such business?
Hjalmar Gislason: I’ve been writing quite a lot up on the developments in this industry on our blog. One of the things I’ve written the most about is what I call the Emerging field of Data Market“. I define “data markets” as “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable – and often unified – format.” Many of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers. As there are several players in this space already, I believe we’ll see many of them try to differentiate themselves in 2011 by focusing on specific types of data. There are definitely opportunities in building specialized data markets for geospatial data, for statistics and for enormous scientific data sets – to name a few types – and each comes with their own challenges, target audiences and preferred approaches. In the spirit of doing one thing and doing it well, I think most of these projects will want to see success in one such segment of the market before generalizing – or consolidating.


The interviewee: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise.

Thomas Thurner

Vienna Semantic Web Meetup – the next season

Started mid 2009, Vienna Semantic Web Meetup (VSWM) goes now in it’s third year. Hosted by various partners, from media to culture and from corporate to academic, this regular gathering now counts over 200 members. As it is a good tradition at VSWM, people from abroad are visiting by, giving input and new insights. Also the next season of VSWM will bring this mixture of international connection and informal meeting in putting two upcoming topics onto the agenda.

Digital Identity on the Semantic Web
Thursday, April 7, 2011

While recent developments in ICT make it easier for companies and consumers to reach each other, they can also scatter your personal information more widely, making life easier for criminals. On the other hand public institutions and government agencies are collecting personal data too. So personal data is processed without the consensus (or even the knowledge) of the respective citizen. As we know, leaks in this field may unleash sensible personal data as well. The misuse of personal data can be restricted – this is a challenge to both, the technological and the juridical domain. This meetup takes a look on how Semantic Web Technologies can take over its responsibility in this emerging field.

  • Christof Tschohl (BIM)
    Ludwig Boltzmann Institute for Human Rights
  • Mischa Tuffield (Garlik)
    A Standards-based, Open and Privacy-aware Social Web (W3C)

>> read more, and register for free

Portals, Apps and Visualizations for Open Government Data
Wednesday, June 15, 2011

Picking up Keith Andrews suggestion, this is a MeetUp focusing on tools, services and projects dealing with Visualization, Apps-creation and Portals/Catalogs for Open [Government] Data. As this MeetUp is on the eve of Austrians first Open Government Data – Conference (OGD2011) we expect to meet experts ans enthusiasts from Austria and abroad.

  • Keith Andrews (IICM)
    Institute for Information Processing and Computer Supported New Media at Graz University of Technology
  • Andreas Blumauer (SWC)
    Storing, searching, serving Open Government Data – getting an overview on the growing market for open data solutions

>> read more, and register for free



Thomas Thurner

EU-Report on the requirements for a paneuropean Open Government Data Portal

The recently published report on a hearing of an experts in Luxembourg this November, provides a snap-shoot on the discussion if a central open data infrastructure may make sense. The experts group list several positive effects like union-wide comparability of some government data set, as well as the role of being motor for national and regional initiatives. It is stressed several times, that a swift progress, in coming those plans reality, is crucial for success.

Read more at: Report – Technical workshop on the goals and requirements for a pan-European data portal