Helmut Nagy

The ESA vocabulary site – Making Publishing and Reusing Vocabularies Easier

Reviewing the interview we made with Les Kneebone (project manager of the vocabulary projects at Education Services Australia) in November 2010 we can see that ESA has been one of the early adopters of SKOS as a standard for thesaurus development. Les said then: “We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as our new thesaurus management tool”. Around a year later ESA´s vocabulary site went online with PoolParty as its basis.

We asked Les to comment on his statement from last year and he confirmed that SKOS continues to be central to the ESA vocabulary business model and that it has also been important for ESA that PoolParty has been flexible enough to support continued publication of non-RDF formats, especially IMS VDEX.

In the course of this project it became more and more obvious that SKOS cannot only be used as yet another format for publishing thesauri but rather as a unified model to build thesauri in general. This approach made possible several improvements to the vocabulary development model and the maintenance process of ESA. Since all data is stored as RDF in a triple store, and SKOS and RDF are flexible formats supporting interoperability and interchangeability of data, many manual transformations that had to be done before are not needed anymore and all other systems using the vocabularies are dynamically fed by PoolParty offering the data in its needed formats (see image below).

Changes in ESA’s vocabulary development model

Les states that while some manual processes still exist to support legacy systems, PoolParty ensures the integrity and richness of ESA data. Support and customizations for legacy systems can be achieved in the confidence that the linked-data capabilities are centrally managed and stored in the PoolParty triple store.

From the publishing perspective, the previous vocabulary publishing site has been replaced by the PoolParty Linked Data Frontend (LD-Frontend) that has been customized especially for this project to offer more flexibility in the display and the layout of the data. Similar to the frontend for the Austrian Geological Survey mentioned in a previous blog post , the LD-Frontend has been adapted to the ESA styleguide and the display of the data in the HTML view of the frontend has been adapted to be more user-friendly (see screenshot below).

From ESA’s perspective Les commented here that for the vocabulary manager, edits to the frontend styles and templates are intuitive and can be tested in staging environments. But he also stated that for publishing support is important, and that SWC was very responsive.

Example ESA linked data frontend

Of course we asked Les to give a preview of the next steps for ESA. He stated that they include language translation projects so that its vocabularies, especially Schools Online Thesaurus (ScOT), can be accessed by wider markets and by students of other languages. He also stated that PoolParty handles multi-lingual thesauri very well.

We here at SWC are glad to see PoolParty used in more and more applications and usage scenarios. We are looking forward to the next steps that will be done in this project and also to see how the data offered by the ESA vocabulary site is used in other applications.

Thanks to Les Kneebone from ESA for his contribution to his blog post.

Thomas Thurner

data.wien.gv.at – the process to Vienna’s open data portal

On 17 May 2011 the time has come – the first Open Government Data (OGD) portal of a public administration in Austria was launched – and it was the capital Vienna that did this courageous and so important step in Austria and thereby took the role of a pioneer in the area of open data in our country – and hopefully will act as a model for communities, cities, states and the federal government (also important to be mentioned here is that the Open Commons Region Linz has been the first city government that has announced a data portal in Austria still before Vienna – launch date will be September 2011).

http://data.wien.gv.at is a first well done step in the area of Open Government Data for a modern and open City of Vienna. Open human- and machine readable data in several formats and from several categories (e.g. population, education, budget, leisure time and many more) are availabe for re-use now. Into the bargain available under the CC-BY-3.0 License of Creative Commons.

The road to 17th of May 2011 has started about 1 year ago – at least from the pointview of the Austrian (and Viennese) open data community: on the 8th of April 2010 a group of linked open data enthusiasts – representatives of universities, companies and the civil society – invited interested people to come to the 1st Open Government Data Meetup at the OCG (Austrian Computer Society) in Vienna. For talks there were Rufus Pollock of the Open Knowledge Foundation on site in Vienna as well as Stefano Bertolo of the European Commission has been hooked up via skype to shine a light on this – at this time – for Austria and Vienna very new topic of Open Government Data to present their experiences and best practices in the field to about 60 participants. The interest was very high – also on the side of the media – and therefore a basic interest as well as a first braod information in Vienna was built.
Afterwards everything went quickly until the 17th of May 2011 (and also if one year seems to be a long time I do think that it was an enormous performance of all involved parties to manage so much in only one year!) – after the mentioned MeetUp, the OGD Austria was founded – an initiative thats’ objective is to open (linked) government data (non personal) in Austria in human- and machine readable formats for re-use. To do this together with politics, administration, civil society and industry. Other initiatives as open3 as well as established institutions in the area of administration research as KDZ – Zentrum für Verwaltungsforschung or the Danube University of Krems or Joanneum Research – but also companies like the Semantic Web Company or Compass Verlag, and above all lots of representatives of the civil society who were interested in the topic of Open Government Data (it is important to say that in Vienna we do have a very active creative scene and web 2.0 community) did work together to push the field of open data in Vienna / Austria.

In June 2010 the Semantic Web Company (SWC) – with support from above mentioned institutions – submitted a proposal to the technology agency of the City of Vienna (ZIT) to build and implement a bundle of measures for awareness-building activities in the field of Open Government Data in Austria – the project: OGD2011 was born. The authorisation of this project (partly funded by ZIT) for sure helped a lot to inform the relevant stakeholders (politics, public administration, civil society, industry, academia and media) in the mentioned time period and to build awareness about the power, the potentials as well as about the challenges – and the important concrete steps – of Open Government Data!

The following measures were implemented and will be implemented in the course of OGD2011:

  • Open OGD Austria Stammtisch every second month (meetup, until today only in Vienna)
  • 4 Stakeholder Workshops (politics, administration, civil society, industry) in February 2011 to evaluate and identify as well as to discuss the requirements on Open Government Data in Austria from the viewpoint of the respective stakeholder group
  • Publishing of the OGD Digest Austria – Information around Open Data in Austria and international in print & PDF (until today 4 editions available)
  • Set up and operation of a mailing list as well as a XING group
  • Organisation of an open MeetUp on OGD on 15th of  Juni 2011 in Vienna
  • Set up and operation of open wiki spaces for collection of information and provision of relevant information in the field of Open Data
  • OGD2011 Conference on 16th of Juni 2011 in Vienna
  • And very important: about 40-50 bilateral talks with representatives of politicians and public administration in Vienna about OGD to raise awareness and clarify misconceptions
  • Networking with international initiatives on the topic of open data as the Open Data Network (Germany), the Open Knowledge Foundation (UK) or the ePSIplattform (just to name a few) to ensure continuous exchange on the topic – as well contentwise as about the process for an Open Government Data strategy – to learn from each other and to support each other…
  • Furthermore in July/August 2011 the Open Government Data White Book Austria will be published as a fundamental work on open data in Austria

Inspite the OGD2011 project is arranged for the whole country of Austria the participants at the workshops and events were mainly from Vienna – what is not really surprising as most of the Austrian public bodies are located in Vienna and the City and the State of Vienna has a special status in Austria.

In November 2010 another very important step happened becuase without an Open Government Data strategy it is nearly impossible to be implemented – the political YES to Open Data in Vienna in the programme of the government of the new red-green coalition.

Regarding the implementation of data.wien.gv.at the City of Vienna received support by the EU project LOD2 – LOD2 did consulting on the following topics: Open (Government) Data, Linked Open Data, licenses and business models, as well as in the area of data sheets, meta data and URL schemas in the course of the LOD2 Publink Consultancy Services.

I think that in total the following indicators were crucial for the success of the Open Government Data movement in Vienna so far:

  • Broad awareness raising at all involved stakeholder groups
  • Collaboration of all stakeholders and establishing of an open dialogue between these groups
  • Political commitment on the highest level
  • High interest as well as engagement on the side of the public administration at the City of Vienna
  • High interest and support by the media – most of all by the Open Data Blog of futurezone
  • Support of the OGD2011 project by ZIT to enable a basic funding for concrete activities and measures
  • Building of a strong community for Open Data and therefore permanent presence of the topic in the public
  • Evaluation and representation of potentials and opportunities – but also of existing risks – of Open Government Data in Vienna
  • Exchange of knowledge and experiences with international initiatives to learn from each other and use best practices vice versa
  • Intense analysis of: licenses, meta data, data description (data governance) and a very well done implementation of phase 1 of data.wien.gv.at by the City of Vienna (with support by LOD2 et al.)

But this phase one of data.wien.gv.atcan only be a start – the City of Vienna already announced continuous exchange between the public administration and the community for further development of the data portal (and today on 26th of May 2011 we had the first meeting with about 50 participants and really very fine discussions about 2 hours long). Further an online survey is planned for summer 2011 (to ask the public for concrete data needs) and an open data challenge is planned for the end of 2011 on the basis of Viennese Open Government Data – and there will also happen something in the area of the scope of the provided data sets (more data will be opened) as well as in the area of the provision of additional data formats and interfaces (along the lines of the EC and UK the City of Vienna wants to follow the path of Linked Open Government Data)….

… I am absolutely curious about how the process of Open Government Data in Vienna will go on from here in 2011 and 2012!

Additional Links: http://www.wien.gv.at/english/politics-administration/open-data.html

 

Author Martin Kaltenböck is CFO of the Semantic Web Company Wien and co-founder and member of the executive board of the OGD Austria

 

Andreas Blumauer

Florian Bauer: I like to view “linked data” as a “single worldwide API”

Florian BauerFlorian Bauer is REEEP’s Operations and IT Director, responsible for the overall operational management of the organisation, the product management of reegle (the search engine for renewable energy and energy efficiency) and the management of the IT landscape of REEEP.

PoolParty Team had the chance to talk with Florian about reegle – information gateway on clean energy.

Could you please give us a brief overview over reegle – what are the targets you are pursuing with this platform?

The main aim of the reegle information gateway (http://www.reegle.info) is to provide a one-stop gateway to comprehensive, high-quality and up-to-date information on clean energy. By making this information accessible to stakeholders in the field around the world, and by presenting it in a user-friendly and intuitive format, reegle directly helps to facilitate the transition to low-carbon energy.

The website provides information on renewable energy, energy efficiency and climate change and their various sub-sectors at a global level, and some reegle services actually combine raw data sets from several different sources, put these datasets into context and thus provide enriched information.

reegle is an offshoot of the Renewable Energy & Energy Efficiency Partnership (REEEP), a non-profit, specialist change agent aiming to catalyze the market for renewable energy and energy efficiency, with a primary focus on emerging markets and developing countries.

The new reegle data portal (data.reegle.info), launched in 2011, has established reegle as a publisher and consumer of Linked Open Data in the energy sector. It provides key clean energy datasets free for re-use using Linked Open Data W3C standards.

reegle consists of two components: one is the semantic search engine (http://www.reegle.info/), the other is the linked data portal (http://data.reegle.info/) – What are your target groups, and which typical problems of the clean energy domain can you solve with these services?

For reegle.info, our target groups are primarily project developers, financiers and government policy-makers. These users can access high-quality information on clean energy-related issues with the set of tools we provide: a special web search, a catalogue of more than 1700 key stakeholders, a map view for geographical browsing, a clean energy glossary, and an energy country profiles function.

The energy country profiles are typical of what we’re trying to achieve. Here, we take information from many different providers and combine it all to present one comprehensive information dossier on renewable energy and energy efficiency in that particular country. This means that in one location you have the country’s most important energy-related information ranging from key statistics, and current regulations to key players in the energy field in both public and private sectors.

For our data portal, the target group is a more technical one: primarily IT developers and open data specialists who want to create new mash-ups and integrate data from reegle into other websites. One of the first using these reegle data sets is the OpenEI.org website, another key portal in the energy field.

Open data is not the same as linked open data. Why did you choose to build your services around W3C´s linked data paradigm and/or standards like RDF?

Tim Berners-Lee once mentioned that he likes to compare the progressive ways of offering data with the “stars system” used to rate hotels. You get:

* for making data public (in any format)
** for machine-readable formats (structured data)
*** if the data is offered in a non-proprietary format
**** if you use URIs to identify things, so people can point to your datasets
***** for linking to other people’s data to provide context

So, as you can imagine, our goal is for reegle to be firmly in the 5-star category, and to establish reegle as an avant-garde tool in energy data.
I also like to view “linked data” as a “single worldwide API”. If the old web was like a huge book, the new semantic web is like a huge database, and SPARQL is the way to ask for information – by sending a query through the SPARQL Endpoint. RDF is the language that offers all possibilities to describe a given dataset with all of the necessary information, including any links to other datasets. Therefore RDF data and SPARQL endpoints provide a powerful tool to find and filter datasets and are crucial, base parts of the semantic web’s architectural layers. On reegle the SPARQL endpoint and the description of the structure of our RDF files is online on our clean energy open data portal.

You also decided to build a SKOS based domain thesaurus for clean energy which now plays an important role to improve the search experience at reegle.
Which experiences have you gained so far from this effort? Which obstacles did you have to overcome?

The SKOS-based renewable energy thesaurus can be seen as the “heart” of reegle as it provides the basis for a lot of related services in reegle, including the refinement suggestions for search results, the auto-completion options and the glossary links between defined terms and their synonyms and related terms.

We decided to use SKOS because we think it is the best language for building a formal and controlled vocabulary for thesauri in a semantic web context, without adding too much complexity. Although it is a simple language, you really still need IT experts to use it to build a thesaurus – domain experts with additional IT skills (hard to find!).

So in our case, we decided to use a scalable and easy-to-use thesaurus server called “PoolParty”. Using this system drastically reduced the complexity, and allowed us to concentrate on the actual building of the thesaurus with our domain experts, and to spend less time on transferring the knowledge into data sets.

What are your future plans with reegle?

Currently we’re working on restructuring the site to better highlight our new added-value services such as the clean energy country profiles. We are also planning to further develop our thesaurus to include climate-compatible development terms and we’ll soon release a wordpress plug-in to insert this thesaurus into clean energy blogs. One of the most exciting projects we are actually working on is the development of “dossier pages”, where we will provide relevant information to several topics mashed up on one page using semantic web technologies. This is part of the EU funded SCMS (“semantic content management system”) project.

Thomas Thurner

Hjalmar Gislason: “What I call the emerging field of Data Market.”

Open Government Data, and Open Data provided by the corporate sector, stimulate an upcoming market segment: Commercial Open Data Services. The islandic StartUp datamarket.com is on of the emerging companies in this field. Thomas Thurner from Semantic Web Company had the chance to talk to Hjalmar Gislason, founder and CEO of datamarket.com.

Semantic Puzzle: What’s the business idea behind datamarket.com? Whom do you expect to pay for what?
Hjalmar Gislason: From the end-user perspective its easiest to describe datamarket.com as a search engine for statistical data, a “Google for statistics” if you will. Any data that is already available open and for free out there will still be open and free on DataMarket, just easier to find, use, compare and download from a single source. While the audience for a search engine for statistical content is obviously way smaller than for text content, a significant part of that audience is business users, looking for data for business reasons. This means that there are more direct and lucrative methods to monetize the usage than simply contextual ads – especially in reselling access to premium data. This is a market that already turns over billions of dollars annually, but is as far from any of the “2.0 world” as one could possibly imagine (think Bloomberg, ReutersFactSet). We believe there is an opportunity to disrupt a part of their business with a freemium approach, and furthermore open up the data market by reaching a business audience outside the narrowly defined financial user base that these companies cater to. There is data out there – free and premium alike – that can help almost any business make better plans and decisions. Connecting people and businesses to the data that they need will release phenomenal value. Tapping into just a fraction of that will be a hugely successful business for those that get it right.

Semantic Puzzle: Can you tell me a bit about the technological framework behind datamarket.com? How is the content from third parties is feeded
into the system, and which APIs do you use? As you provide mainly XLS and CSV, have you thought, to provide data also als XML in future?
Hjalmar Gislason: The backend system is written in Python. We read data from the sources in various different formats, ranging from Excel files and even scraping of web pages to proprietary APIs and Web Services. The data is then stored in a normalized format in a Postgres database that we’re using in a pretty unique way to be able to efficiently store the billions of time series and fact values that the system will eventually hold (currently at around 100 million time series and 600 million fact values). The web site is also written in Python, using the Django framework, but also making use of a lot of javascript libraries (and a bunch of our own code) to allow for an exciting user experience. We’re currently using a Flash-based solution called amCharts for the charts, but have already taken some steps to replace that with our own solution that we’ve written on top of the excellent Protovis visualization library. While you are right that the export formats we provide for end users are XLS, CSV and images (for exporting the graphs), our REST-ful API actually supports XML and JSON formats as well. So we already provide data as XML.

Semantic Puzzle: As you for sure know Tim Berners-Lee’s 5-stars scheme for OGD-Providers. Where do you se your own service in this framework?
Hjalmar Gislason: Any fact value, time series and data set on DataMarket is “addressable” with a direct URL using our API. In that sense, all the data on DataMarket is four-star data according to Berners-Lee’s definition. In many cases we’re integrating to data that is only one or two star data, so just by integrating it into our system we’ve moved it a few notches up that ladder. In some cases we’ve even been helping organizations publishing data for the first time, taking the data from 0 to 4 stars in one go. We’ve been toying around with several ideas that would take – or enable users to take – the data all the way to 5-star status, but that’s still just on the drawing table.

Semantic Puzzle: You re-use a lot of Open Data comming from the Island Government. Is there also a state-owned Data Portal for Island, or is
your service a “commercial replacement” for such a public effort?
Hjalmar Gislason: There is no government-operated data portal in Iceland, and to my knowledge there are no plans for implementing one yet. Sadly there are several more pressing issues in terms of eGovernment here that take higher priority. We don’t see our efforts as a replacement for such a portal, but we have managed to fulfill a little part of that role when it comes to statistical data. We’ve also been really vocal about the benefits of open data and among other things been influential in launching an open data wiki - opingogn.net (Icelandic only) – that exmplains the concepts with examples and use cases and attempts to list in a directory listing as many sources of government data as possible. There is some movement, but as an open data enthusiast I’d really like to see things happening faster. As a matter of fact I think there are reasons for Iceland to be extra enthusiastic about open data to increase transparency and restore trust after the crash of the banks and the economic system in 2008.

Semantic Puzzle: A lot of commercial Open Data Services (Socrata, Factual, Google …) are evolving at the moment. What do you think, which development this market segment will face in the next month and years, and are you able to list your sight on the crucial factors for such business?
Hjalmar Gislason: I’ve been writing quite a lot up on the developments in this industry on our blog. One of the things I’ve written the most about is what I call the Emerging field of Data Market“. I define “data markets” as “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable – and often unified – format.” Many of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers. As there are several players in this space already, I believe we’ll see many of them try to differentiate themselves in 2011 by focusing on specific types of data. There are definitely opportunities in building specialized data markets for geospatial data, for statistics and for enormous scientific data sets – to name a few types – and each comes with their own challenges, target audiences and preferred approaches. In the spirit of doing one thing and doing it well, I think most of these projects will want to see success in one such segment of the market before generalizing – or consolidating.


The interviewee: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise.