Thomas Thurner

data.wien.gv.at – the process to Vienna’s open data portal

On 17 May 2011 the time has come – the first Open Government Data (OGD) portal of a public administration in Austria was launched – and it was the capital Vienna that did this courageous and so important step in Austria and thereby took the role of a pioneer in the area of open data in our country – and hopefully will act as a model for communities, cities, states and the federal government (also important to be mentioned here is that the Open Commons Region Linz has been the first city government that has announced a data portal in Austria still before Vienna – launch date will be September 2011).

http://data.wien.gv.at is a first well done step in the area of Open Government Data for a modern and open City of Vienna. Open human- and machine readable data in several formats and from several categories (e.g. population, education, budget, leisure time and many more) are availabe for re-use now. Into the bargain available under the CC-BY-3.0 License of Creative Commons.

The road to 17th of May 2011 has started about 1 year ago – at least from the pointview of the Austrian (and Viennese) open data community: on the 8th of April 2010 a group of linked open data enthusiasts – representatives of universities, companies and the civil society – invited interested people to come to the 1st Open Government Data Meetup at the OCG (Austrian Computer Society) in Vienna. For talks there were Rufus Pollock of the Open Knowledge Foundation on site in Vienna as well as Stefano Bertolo of the European Commission has been hooked up via skype to shine a light on this – at this time – for Austria and Vienna very new topic of Open Government Data to present their experiences and best practices in the field to about 60 participants. The interest was very high – also on the side of the media – and therefore a basic interest as well as a first braod information in Vienna was built.
Afterwards everything went quickly until the 17th of May 2011 (and also if one year seems to be a long time I do think that it was an enormous performance of all involved parties to manage so much in only one year!) – after the mentioned MeetUp, the OGD Austria was founded – an initiative thats’ objective is to open (linked) government data (non personal) in Austria in human- and machine readable formats for re-use. To do this together with politics, administration, civil society and industry. Other initiatives as open3 as well as established institutions in the area of administration research as KDZ – Zentrum für Verwaltungsforschung or the Danube University of Krems or Joanneum Research – but also companies like the Semantic Web Company or Compass Verlag, and above all lots of representatives of the civil society who were interested in the topic of Open Government Data (it is important to say that in Vienna we do have a very active creative scene and web 2.0 community) did work together to push the field of open data in Vienna / Austria.

In June 2010 the Semantic Web Company (SWC) – with support from above mentioned institutions – submitted a proposal to the technology agency of the City of Vienna (ZIT) to build and implement a bundle of measures for awareness-building activities in the field of Open Government Data in Austria – the project: OGD2011 was born. The authorisation of this project (partly funded by ZIT) for sure helped a lot to inform the relevant stakeholders (politics, public administration, civil society, industry, academia and media) in the mentioned time period and to build awareness about the power, the potentials as well as about the challenges – and the important concrete steps – of Open Government Data!

The following measures were implemented and will be implemented in the course of OGD2011:

  • Open OGD Austria Stammtisch every second month (meetup, until today only in Vienna)
  • 4 Stakeholder Workshops (politics, administration, civil society, industry) in February 2011 to evaluate and identify as well as to discuss the requirements on Open Government Data in Austria from the viewpoint of the respective stakeholder group
  • Publishing of the OGD Digest Austria – Information around Open Data in Austria and international in print & PDF (until today 4 editions available)
  • Set up and operation of a mailing list as well as a XING group
  • Organisation of an open MeetUp on OGD on 15th of  Juni 2011 in Vienna
  • Set up and operation of open wiki spaces for collection of information and provision of relevant information in the field of Open Data
  • OGD2011 Conference on 16th of Juni 2011 in Vienna
  • And very important: about 40-50 bilateral talks with representatives of politicians and public administration in Vienna about OGD to raise awareness and clarify misconceptions
  • Networking with international initiatives on the topic of open data as the Open Data Network (Germany), the Open Knowledge Foundation (UK) or the ePSIplattform (just to name a few) to ensure continuous exchange on the topic – as well contentwise as about the process for an Open Government Data strategy – to learn from each other and to support each other…
  • Furthermore in July/August 2011 the Open Government Data White Book Austria will be published as a fundamental work on open data in Austria

Inspite the OGD2011 project is arranged for the whole country of Austria the participants at the workshops and events were mainly from Vienna – what is not really surprising as most of the Austrian public bodies are located in Vienna and the City and the State of Vienna has a special status in Austria.

In November 2010 another very important step happened becuase without an Open Government Data strategy it is nearly impossible to be implemented – the political YES to Open Data in Vienna in the programme of the government of the new red-green coalition.

Regarding the implementation of data.wien.gv.at the City of Vienna received support by the EU project LOD2 – LOD2 did consulting on the following topics: Open (Government) Data, Linked Open Data, licenses and business models, as well as in the area of data sheets, meta data and URL schemas in the course of the LOD2 Publink Consultancy Services.

I think that in total the following indicators were crucial for the success of the Open Government Data movement in Vienna so far:

  • Broad awareness raising at all involved stakeholder groups
  • Collaboration of all stakeholders and establishing of an open dialogue between these groups
  • Political commitment on the highest level
  • High interest as well as engagement on the side of the public administration at the City of Vienna
  • High interest and support by the media – most of all by the Open Data Blog of futurezone
  • Support of the OGD2011 project by ZIT to enable a basic funding for concrete activities and measures
  • Building of a strong community for Open Data and therefore permanent presence of the topic in the public
  • Evaluation and representation of potentials and opportunities – but also of existing risks – of Open Government Data in Vienna
  • Exchange of knowledge and experiences with international initiatives to learn from each other and use best practices vice versa
  • Intense analysis of: licenses, meta data, data description (data governance) and a very well done implementation of phase 1 of data.wien.gv.at by the City of Vienna (with support by LOD2 et al.)

But this phase one of data.wien.gv.atcan only be a start – the City of Vienna already announced continuous exchange between the public administration and the community for further development of the data portal (and today on 26th of May 2011 we had the first meeting with about 50 participants and really very fine discussions about 2 hours long). Further an online survey is planned for summer 2011 (to ask the public for concrete data needs) and an open data challenge is planned for the end of 2011 on the basis of Viennese Open Government Data – and there will also happen something in the area of the scope of the provided data sets (more data will be opened) as well as in the area of the provision of additional data formats and interfaces (along the lines of the EC and UK the City of Vienna wants to follow the path of Linked Open Government Data)….

… I am absolutely curious about how the process of Open Government Data in Vienna will go on from here in 2011 and 2012!

Additional Links: http://www.wien.gv.at/english/politics-administration/open-data.html

 

Author Martin Kaltenböck is CFO of the Semantic Web Company Wien and co-founder and member of the executive board of the OGD Austria

 

Thomas Thurner

The hype, the hope and the LOD2: Sören Auer engaged in the next generation LOD

The paneuropean Project LOD2 is one of the biggest projects dealing with linked data. Scientists, programmers and software architects in various european countries are working on the next generation of linked open data. In a series of interviews i’m presenting people working on and with LOD2. As a start, i had the change to talk to Sören Auer, head of the LOD2 project.

Thomas Thurner: Over the recent years the LOD movement gained tremendous momentum. As one of the key players in this area how do you perceive this development? Hype or hope?

Sören Auer: From my point of view the momentum LOD gained is deserved. We should strive for a Web, which is more decentralized, democratic, participatory, transparent and inclusive. Linked Open Data is from my point a key technological building block on this road. However, a lot of work is ahead of us. LOD has to find its way directly into mainstream technology such as CMSes, Search Engines, Web Applications, Mash-Ups and we have to show users and stakeholders the direct added-value of this technology.

Thomas Thurner: What is the current state of the LOD cloud from a technological point of view? Where do you see room for improvement?

Sören Auer: Currently, the technological state of LOD seems to be comparable to the early days of the Web. We are still able to draw maps/clouds of the LOD datasets and data links are still sparse and difficult to maintain. This reminds me a lot of the early days of the Web, where we also had problems with broken links (the infamous 404). Later, after content management systems and Web applications automatized the link generation and maintenance this improved a lot and I hope we are on the same road with LOD technologies finding its way into more and more Web systems.

Thomas Thurner: How is the LOD2 project addressing theses issues? What are the project’s key objectives?

Sören Auer: LOD2 is addressing in three ways: First, we develop new research approaches highly relevant for LOD, for example, for Linked Data management, automatic data linking as well as Linked Data enrichment andquality improvement. Second, we implement and integrate these approaches into specialized tools (e.g. SILK, OntoWiki, Virtuoso and DL-Learner) forming together the integrated LOD2 stack. The LOD2 stack can be used by data publishers for the whole life-cycle of Linked Data management ranging from extraction over linking, authoring, enrichment to exploration & search.

Thomas Thurner: What do you think are the most important factors to bring LOD to the masses?

Sören Auer: From my point of view the key factor here is that we manage to integrate the large number of tools and approaches for supporting the Linked Datalife-cycle stages in a synergistic way, where each aspect adds value and triggers a number of other improvements. For example, the establishing of a new data link has a direct effect on search & exploration of Linked Data. We have to directly show these kind of benefits to users so they receive and instant gratification for contributions to the Web of Data. Semantic Wikis, such as Semantic MediaWiki and OntoWiki, are already nicely working in this direction. An application with an enormous potential to bring LOD to the masses would be the creation of a distributed, social semantic network. With OpenId, WebId, FOAF, Semantic Pingback most of the building blocks are available, but the final step integrating these into an easy-to-use social networking application still has to be done.

Thomas Thurner: Compared to other semantic web approaches linked data principles seem to be rather easy to understand. On the other hand some argue that the “linked data cloud” is a big heap of data which cannot be used for professional purposes. What is your point of view?

Sören Auer: Of course the currently available data is not useful for all potential usage scenarios. However, already now Linked Data can be used for many interesting applications: For example, we just completed the development of a prototype for a large search engine, where users searching are assisted with comprehensive background information obtained from the Linked Data Web. For this use case, information available as Linked Data is already very valuable and useful. The criticism of LOD being a “heap of data” also reminds me a lot of the early days of the Web, where people raised similar criticisms for the Web being a medium of un-professionalism. Later it turned out that, of course there is a lot of amateurism, but as Wikipedia impressively demonstrates the working together of many amateurs with the right tools can in the end outperform few professionals.

Thomas Thurner: Linked Data could also become a new paradigm for light-weight enterprise data integration. What are the biggest obstacles today for linked data to being accepted by the business community?

Sören Auer: Using Linked Data for data integration in large enterprises has an enormous potential. Just last week I was invited for a workshop with the IT department of one of the top car makers and the people responsible there for data integration were extremely excited about the opportunities of Linked Data in the large heterogeneous enterprise with more than 3000 different backend systems. Linked Data technologies can easily fill the gap between unstructured Intranet search and expensive & complicated Service-oriented Architectures. Compared to SOA, Linked Data is a pay-as-you-go strategy, where data integration can be performed incementally and in sync with the requirements and evolution of the data structures in the enterprise. In order to realize this vision, we need to continue the maturation of enterprise Linked Data tools – the availability of PoolParty, Sindice Enterprise Edition, Virtuoso, TopBraid are already important steps in that direction.

Thomas Thurner: Automatic mechanisms to curate linked data and to make alignments between datasets possible play a crucial role for the next phase of linked data economics. Which technologies will play a central role? What will be the most critical point – do you see a “wisdom of the crowd” playing a role in this game?

Sören Auer: Definitely! Tapping the wisdom of the crowd for mapping & linking has a huge potential, which is currently unused. We started working in that direction with DBpedia Live and the DBpedia mapping Wiki. In order, to make it really easy for people to contribute we have to dramatically lower the barrier to contributing to the alignment process. In LOD2 we also plan to enable users to create mapping and links between dataset by simply giving examples of correct links and evaluating some automatically generated ones.

Thomas Thurner: At the moment governments all around the world start to publish open data, more and more stakeholders start to understand the benefit of open linked data. On the other hand enterprises haven´t even started with this topic. What could be the dynamics which will trigger projects in industry sectors like financial industries which will make use of open data principles?

Sören Auer: Making statistical and financial information available in structured form and as Linked Data could have a enormous impact in this regard. With the DataCube vocabulary effort a first step in this direction was made, but it would be nice if this vocabulary would get an official stamp of a standardization organization such as W3C. Since the benefit of publishing statistical and financial data in structured form, e.g. as Linked Data, is visible most when done by many, this could be also facilitated by government regulations and industry best-practices.

About INFAI

The Institute for Applied Computer Science (InfAI) at Universität Leipzig hosts research groups in service sciences, knowledge engineering and management as well as natural language processing. The approximately 20 researchers of the Agile Knowledge Engineering and Semantic Web (AKSW) research group at InfAI headed by Dr. Sören Auer are establishing theoretical results and scalable implementations for the field. Particular emphasis is given to areas such as ontology creation and
manipulation, knowledge extraction, ontology learning and information & data integration on the Semantic Data Web. The implemented tools and services (such as DBpedia, OntoWiki, DL-Learner and LinkedGeoData) developed by the group enjoy considerable popularity.

About Sören Auer

Dr. Sören Auer leads the research group Agile Knowledge Engineering and Semantic Web (AKSW) at Universität Leipzig. His research interests include semantic data web technologies, knowledge representation, engineering & management, usability, agile methodologies as well as databases and information systems. He aims to combine strong theoretical results with high-impact practical applications. Sören is author of over 50 peer-reviewed scientific publications resulting in a Hirsch index of 15. Sören is leading the large-scale integrated EU-FP7-ICT research project “LOD2 – Creating Knowledge out of Interlinked Data”. Sören is founder (respectively co-founder) of several high-impact research and community projects such as the Wikipedia semantification project DBpedia or the social Semantic Web toolkit OntoWiki. He is co-organiser of several workshops, programme chair of I-Semantics 2008, OKCON 2010, ESWC 2010 and ICWE 2011, area editor of the Semantic Web Journal, serves as an expert for industry, the European Commission, the W3C and is member of the advisory board of the Open Knowledge Foundation.

Andreas Blumauer

Florian Bauer: I like to view “linked data” as a “single worldwide API”

Florian BauerFlorian Bauer is REEEP’s Operations and IT Director, responsible for the overall operational management of the organisation, the product management of reegle (the search engine for renewable energy and energy efficiency) and the management of the IT landscape of REEEP.

PoolParty Team had the chance to talk with Florian about reegle – information gateway on clean energy.

Could you please give us a brief overview over reegle – what are the targets you are pursuing with this platform?

The main aim of the reegle information gateway (http://www.reegle.info) is to provide a one-stop gateway to comprehensive, high-quality and up-to-date information on clean energy. By making this information accessible to stakeholders in the field around the world, and by presenting it in a user-friendly and intuitive format, reegle directly helps to facilitate the transition to low-carbon energy.

The website provides information on renewable energy, energy efficiency and climate change and their various sub-sectors at a global level, and some reegle services actually combine raw data sets from several different sources, put these datasets into context and thus provide enriched information.

reegle is an offshoot of the Renewable Energy & Energy Efficiency Partnership (REEEP), a non-profit, specialist change agent aiming to catalyze the market for renewable energy and energy efficiency, with a primary focus on emerging markets and developing countries.

The new reegle data portal (data.reegle.info), launched in 2011, has established reegle as a publisher and consumer of Linked Open Data in the energy sector. It provides key clean energy datasets free for re-use using Linked Open Data W3C standards.

reegle consists of two components: one is the semantic search engine (http://www.reegle.info/), the other is the linked data portal (http://data.reegle.info/) – What are your target groups, and which typical problems of the clean energy domain can you solve with these services?

For reegle.info, our target groups are primarily project developers, financiers and government policy-makers. These users can access high-quality information on clean energy-related issues with the set of tools we provide: a special web search, a catalogue of more than 1700 key stakeholders, a map view for geographical browsing, a clean energy glossary, and an energy country profiles function.

The energy country profiles are typical of what we’re trying to achieve. Here, we take information from many different providers and combine it all to present one comprehensive information dossier on renewable energy and energy efficiency in that particular country. This means that in one location you have the country’s most important energy-related information ranging from key statistics, and current regulations to key players in the energy field in both public and private sectors.

For our data portal, the target group is a more technical one: primarily IT developers and open data specialists who want to create new mash-ups and integrate data from reegle into other websites. One of the first using these reegle data sets is the OpenEI.org website, another key portal in the energy field.

Open data is not the same as linked open data. Why did you choose to build your services around W3C´s linked data paradigm and/or standards like RDF?

Tim Berners-Lee once mentioned that he likes to compare the progressive ways of offering data with the “stars system” used to rate hotels. You get:

* for making data public (in any format)
** for machine-readable formats (structured data)
*** if the data is offered in a non-proprietary format
**** if you use URIs to identify things, so people can point to your datasets
***** for linking to other people’s data to provide context

So, as you can imagine, our goal is for reegle to be firmly in the 5-star category, and to establish reegle as an avant-garde tool in energy data.
I also like to view “linked data” as a “single worldwide API”. If the old web was like a huge book, the new semantic web is like a huge database, and SPARQL is the way to ask for information – by sending a query through the SPARQL Endpoint. RDF is the language that offers all possibilities to describe a given dataset with all of the necessary information, including any links to other datasets. Therefore RDF data and SPARQL endpoints provide a powerful tool to find and filter datasets and are crucial, base parts of the semantic web’s architectural layers. On reegle the SPARQL endpoint and the description of the structure of our RDF files is online on our clean energy open data portal.

You also decided to build a SKOS based domain thesaurus for clean energy which now plays an important role to improve the search experience at reegle.
Which experiences have you gained so far from this effort? Which obstacles did you have to overcome?

The SKOS-based renewable energy thesaurus can be seen as the “heart” of reegle as it provides the basis for a lot of related services in reegle, including the refinement suggestions for search results, the auto-completion options and the glossary links between defined terms and their synonyms and related terms.

We decided to use SKOS because we think it is the best language for building a formal and controlled vocabulary for thesauri in a semantic web context, without adding too much complexity. Although it is a simple language, you really still need IT experts to use it to build a thesaurus – domain experts with additional IT skills (hard to find!).

So in our case, we decided to use a scalable and easy-to-use thesaurus server called “PoolParty”. Using this system drastically reduced the complexity, and allowed us to concentrate on the actual building of the thesaurus with our domain experts, and to spend less time on transferring the knowledge into data sets.

What are your future plans with reegle?

Currently we’re working on restructuring the site to better highlight our new added-value services such as the clean energy country profiles. We are also planning to further develop our thesaurus to include climate-compatible development terms and we’ll soon release a wordpress plug-in to insert this thesaurus into clean energy blogs. One of the most exciting projects we are actually working on is the development of “dossier pages”, where we will provide relevant information to several topics mashed up on one page using semantic web technologies. This is part of the EU funded SCMS (“semantic content management system”) project.

Thomas Thurner

Hjalmar Gislason: “What I call the emerging field of Data Market.”

Open Government Data, and Open Data provided by the corporate sector, stimulate an upcoming market segment: Commercial Open Data Services. The islandic StartUp datamarket.com is on of the emerging companies in this field. Thomas Thurner from Semantic Web Company had the chance to talk to Hjalmar Gislason, founder and CEO of datamarket.com.

Semantic Puzzle: What’s the business idea behind datamarket.com? Whom do you expect to pay for what?
Hjalmar Gislason: From the end-user perspective its easiest to describe datamarket.com as a search engine for statistical data, a “Google for statistics” if you will. Any data that is already available open and for free out there will still be open and free on DataMarket, just easier to find, use, compare and download from a single source. While the audience for a search engine for statistical content is obviously way smaller than for text content, a significant part of that audience is business users, looking for data for business reasons. This means that there are more direct and lucrative methods to monetize the usage than simply contextual ads – especially in reselling access to premium data. This is a market that already turns over billions of dollars annually, but is as far from any of the “2.0 world” as one could possibly imagine (think Bloomberg, ReutersFactSet). We believe there is an opportunity to disrupt a part of their business with a freemium approach, and furthermore open up the data market by reaching a business audience outside the narrowly defined financial user base that these companies cater to. There is data out there – free and premium alike – that can help almost any business make better plans and decisions. Connecting people and businesses to the data that they need will release phenomenal value. Tapping into just a fraction of that will be a hugely successful business for those that get it right.

Semantic Puzzle: Can you tell me a bit about the technological framework behind datamarket.com? How is the content from third parties is feeded
into the system, and which APIs do you use? As you provide mainly XLS and CSV, have you thought, to provide data also als XML in future?
Hjalmar Gislason: The backend system is written in Python. We read data from the sources in various different formats, ranging from Excel files and even scraping of web pages to proprietary APIs and Web Services. The data is then stored in a normalized format in a Postgres database that we’re using in a pretty unique way to be able to efficiently store the billions of time series and fact values that the system will eventually hold (currently at around 100 million time series and 600 million fact values). The web site is also written in Python, using the Django framework, but also making use of a lot of javascript libraries (and a bunch of our own code) to allow for an exciting user experience. We’re currently using a Flash-based solution called amCharts for the charts, but have already taken some steps to replace that with our own solution that we’ve written on top of the excellent Protovis visualization library. While you are right that the export formats we provide for end users are XLS, CSV and images (for exporting the graphs), our REST-ful API actually supports XML and JSON formats as well. So we already provide data as XML.

Semantic Puzzle: As you for sure know Tim Berners-Lee’s 5-stars scheme for OGD-Providers. Where do you se your own service in this framework?
Hjalmar Gislason: Any fact value, time series and data set on DataMarket is “addressable” with a direct URL using our API. In that sense, all the data on DataMarket is four-star data according to Berners-Lee’s definition. In many cases we’re integrating to data that is only one or two star data, so just by integrating it into our system we’ve moved it a few notches up that ladder. In some cases we’ve even been helping organizations publishing data for the first time, taking the data from 0 to 4 stars in one go. We’ve been toying around with several ideas that would take – or enable users to take – the data all the way to 5-star status, but that’s still just on the drawing table.

Semantic Puzzle: You re-use a lot of Open Data comming from the Island Government. Is there also a state-owned Data Portal for Island, or is
your service a “commercial replacement” for such a public effort?
Hjalmar Gislason: There is no government-operated data portal in Iceland, and to my knowledge there are no plans for implementing one yet. Sadly there are several more pressing issues in terms of eGovernment here that take higher priority. We don’t see our efforts as a replacement for such a portal, but we have managed to fulfill a little part of that role when it comes to statistical data. We’ve also been really vocal about the benefits of open data and among other things been influential in launching an open data wiki - opingogn.net (Icelandic only) – that exmplains the concepts with examples and use cases and attempts to list in a directory listing as many sources of government data as possible. There is some movement, but as an open data enthusiast I’d really like to see things happening faster. As a matter of fact I think there are reasons for Iceland to be extra enthusiastic about open data to increase transparency and restore trust after the crash of the banks and the economic system in 2008.

Semantic Puzzle: A lot of commercial Open Data Services (Socrata, Factual, Google …) are evolving at the moment. What do you think, which development this market segment will face in the next month and years, and are you able to list your sight on the crucial factors for such business?
Hjalmar Gislason: I’ve been writing quite a lot up on the developments in this industry on our blog. One of the things I’ve written the most about is what I call the Emerging field of Data Market“. I define “data markets” as “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable – and often unified – format.” Many of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers. As there are several players in this space already, I believe we’ll see many of them try to differentiate themselves in 2011 by focusing on specific types of data. There are definitely opportunities in building specialized data markets for geospatial data, for statistics and for enormous scientific data sets – to name a few types – and each comes with their own challenges, target audiences and preferred approaches. In the spirit of doing one thing and doing it well, I think most of these projects will want to see success in one such segment of the market before generalizing – or consolidating.


The interviewee: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise.