Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for the ‘Tools & Software’

Why SKOS thesauri matter – the next generation of semantic technologies

August 31, 2010 By: Andreas Blumauer Category: Search Engines, Semantic Web Applications, Text Mining, Tools & Software No Comments →

As a matter of fact still a lot of “semantic technologies” are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like “similarity search”, the search for similar documents to enable cross-reading or recommendation systems.

Providers of first generation semantic technologies calculate rather basic “semantic networks” by co-occurency analysis which results sometimes in  disappointing results. Bearing in mind that Google just bought a company (“Google buys Metaweb“) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.

A demo application was recently developed by PoolParty team where one can find out how thesauri will improve search results on top of second generation semantic technologies. With PoolParty SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag & Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) STW (Standard Thesaurus für Wirtschaft), DBpedia and respective articles from Wikipedia.

STW which was developed by the German National Library of Economics (ZBW) provides vocabulary on any economic subject: about 6,000 standardized subject headings and about 18,000 entry terms to support individual keywords.

This background knowledge is used in this demo app to improve the search for similar documents dramatically:

Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as “similar documents”.

This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from Econstor) but of course for other recommender systems thesauri from other domains can be used instead of STW.

Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of third generation semantic technologies are currently developed by LASSO project and LOD2 project, two innovative projects in the area of linked data and the semantic web.

Sphere: Related Content

Stella Dextre Clarke & Alan Gilchrist about the “Future of Knowledge Organization on the Web”

June 21, 2010 By: Andreas Blumauer Category: Linked Data & Open Data, Tools & Software, Vocabularies & Languages 1 Comment →

Semantic Web Company (SWC) had the pleasure and the opportunity to talk with two internationally recognised experts in the fields of information management and knowledge organization: Alan Gilchrist and Stella Dextre Clarke. SWC asked some questions about the “Future of Knowledge Organization on the Web & Linked Data” on the occasion of an event of the same name organised by ISKO UK which will take place on September 14, 2010 in London.

1. Alan, you are one of the leading experts in the field of thesaurus construction. Organising knowledge in a (worldwide) Semantic Web is a rather young discipline compared to your domain. What do you think can the Semantic Web community learn from “traditional” thesaurus management and vice versa?

You put inverted commas round the word traditional, but it might be more appropriate to put them round the word thesaurus! So long as words are used in information retrieval and in information sharing, different forms of structured vocabularies will be required, and many of the fundamental principles of thesaurus construction are still valid for their construction. Of course, the “traditional” thesaurus has mutated since the days when it was used only for controlled indexing and retrieval; and now, with the many enrichments possible it can be viewed as an ontology (in one of the definitions of this word). What remains a difficulty is to create a generalisable typology of associative relationships, though this is, of course, possible in relatively closed systems. In short, structured vocabularies with broadly thesaurus formats will be a necessary component in the web stack.

2. Stella, as a consultant you are specialized in the design and implementation of knowledge structures for information retrieval applications. In the last few months we have seen that SKOS can serve as a significant building block to link “traditional” thesaurus management to knowledge structures from the semantic web. Can you see that this development is market-driven, is there a significant growth of demand for solutions built around SKOS?

This question sounds surprisingly sceptical about the growth of SKOS. I guess the dizzying speed of phenomena like Facebook and Twitter has fuelled expectations of tools springing up overnight like mushrooms, fully formed and ready to eat. But actually it takes time, not just for the tools to be fashioned, but for the potential market to develop an understanding of what they can do and what will happen next when they are used.

Applications for SKOS are springing up all the time, as fast as people can grow the skills and vision to deploy them. At the moment the market, or shall we say the power-base, seems to be with the academic sector and allied not-for-profit organisations. This will spread progressively through the public to the private sector, as enterprises find ways of adapting their business models. The main hurdles to overcome could be intellectual property rights and the need for compilers of databases to keep earning their living.

3. Alan, constructing thesauri for the semantic web also means that one has to make the “open world assumption”. In which sense does this change the way to manage thesauri, keep them growing and assure quality? Can you see new, upcoming methodologies to do that?

Everything changes with the “open world assumption”! Following on from my answer to the previous question, it seems clear that one manifestation of the thesaurus will be found in those systems that support interoperability, such as federated searching or metadata registries. Even with simple thesaurus management software, it is possible to construct a “master vocabulary” or “word bank” to support different applications within an enterprise; thereby promoting interoperability. More sophisticated software is already available (though not very widely); more will be needed and, doubtless, will be created.

A more formal answer to both questions will be found in a new standard – ISO 25964, currently being prepared on the basis of BS 8723. The two fundamental features of these two standards are (1) the thesaurus as a theoretical and practical basis for the construction of structured vocabularies for information retieval and (2) the growing and vital need for interoperability between systems and the intelligent mapping of the vocabularies used by those systems.

4. Stella, just recently at ESWC 2010, Sean Bechhofer was asked during his keynote why there are so few SKOS tools on the market. What do you think are the reasons for this? Are there still shortcomings of the SKOS specification compared to other existing thesaurus standards? (see also: http://www.eswc2010.org/program-menu/keynote-speakers/155-sean-bechhofer & http://www.slideshare.net/seanb/skos-past-present-and-future )

Regarding the speed of development, see my reply above. As to shortcomings, did you note in one of Bechhofer’s slides: “Standardisation is necessarily a compromise: Everyone equally unhappy = success!” The SKOS development team took a conscious decision to keep the schema sufficiently simple that it could be applicable to as many different types of KOS as possible.  On the downside, this means SKOS is unsatisfactory for conveying sophisticated features of some thesauri and classification schemes. But by keeping the entry barrier low, more widespread use has been encouraged.

By way of illustration, compare SKOS with the data model and XML schema of BS 8723. This schema is comparatively specialized, with the aim of enabling exchange of any thesaurus carrying any or all of the features recommended in the standard. And incidentally, this data model and schema will have some further capabilities added when published in the forthcoming standard ISO 25964. SKOS does not provide for a number of features in these standards (such as compound equivalence). But the schemas in BS 8723 and ISO 25964 are designed for thesaurus developers to share their work, rather than for easy publication on the Web, and will never have so many users or associated tools as SKOS.

So I believe that SKOS has done well to accept compromises that encourage generalisation although they might not suit some specialists. That said, I do regret one of its weaknesses in the context of mapping. Compound equivalence mappings (that is to say, where Concept A in one vocabulary maps to a combination of Concepts  B and C in another) are very commonly needed when extending a search across multiple databases, and the SKOS mapping properties do not currently allow for them. Perhaps there will be some provision in future?

5. Stella, Alan, in September ISKO UK will organise an event on “The Future of Knowledge Organisation on the Web”. “Linked Data” seems to be a promising approach to organise knowledge in large scale environments.
Could you imagine that SKOS as a small subset of semantic web specifications will play a central role in this environment since it is quite intuitively comprehensible by virtually any knowledge worker or do you rather think SKOS is too simple (or too complex)? (see also: http://poolparty.punkt.at/using-skos-as-an-interface-to-the-linked-data-cloud )

Stella: Of course SKOS will have a central role (whether or not every knowledge worker finds it as intuitive as you suppose). “Linked Data” will find even wider applicability. ISKO-UK (the organiser of the meeting in London on 14 September) has a mission not just to spread the word about both these technologies, but to build bridges between the several communities who must share their expertise and data to build more exciting applications. We’re expecting an audience of over 100 at this low-cost event.

Alan: Yes, of course, just as all the tools in the web stack will be necessary if semantic web technologies are to be effective. But it is obvious that we are dealing with complexities of a higher order than ever before. Any structured vocabulary is an “artificial language” which, while acknowledging many aspects of theoretical linguistics is forced to be pragmatic in its construction. Consequently, it would not be surprising if SKOS is seen to be “catching up”, and this became apparent in the work of BS 8723 when thesaurus models using UML were being constructed. There remains much work to be done on all fronts.

Stella Dextre Clarke is an independent consultant specializing in the design and implementation of thesauri and other knowledge organization structures. She currently leads ISO NP 25964, the project to update and revise the international standards for thesauri. Previously she was the Convenor of the Working Group which developed BS 8723. In 2006 she won the Tony Kent Strix Award for outstanding achievement in information retrieval, in recognition for her development work on IPSV (Integrated Public Sector Vocabulary), as well as on the vocabulary standards. She is a Fellow of the Chartered Institute of Library and Information Professionals.

Alan Gilchrist has been a consultant for many years in the fields of information management and information architecture, specialising in the vocabulary aspects of information retrieval. He is co-author, with Jean Aitchison and David Bawden of Thesaurus Construction and Use, now in its fourth edition. In 1979 he founded and edited the Journal of Information Science, and is now Editor Emeritus. He has an Honorary Degree (D. Litt.) from the University of Brighton and is an Honorary Fellow of the Chartered Institute of Librarians and Information Professionals.

Sphere: Related Content

Kingsley Idehen: “By declaring its context, Linked Data can be made more easily reusable by others”

June 16, 2010 By: Andreas Blumauer Category: Corporate Semantic Web, Enterprise 2.0, Linked Data & Open Data, Tools & Software No Comments →

Semantic Web Company talked with Kingsley Idehen who is CEO of OpenLink Software and probably one of the most profound experts on data integration issues about “Linked Data”.

The interview covers questions like:

  • How can Linked Data help to make companies more productive?
  • Do you think that the Linked Data Initiative can build upon a stable architecture or will it face more and more problems the bigger the “cloud” will grow?
  • What´s the ultimate argument for an Enterprise Architect to use languages like SPARQL at least in addition to SQL?
  • How will a “Real Time Semantic Web” change the whole game?
  • How will the “Semantic Web” be called in 10 years? Will there still be a “Semantic Web”?

Read the full version of the interview here.

Sphere: Related Content

Interview with Georgi Kobilarov: “I believe that data publishing must happen in a distributed style.”

March 26, 2010 By: Tassilo Pellegrini Category: Linked Data & Open Data, Mashups & Web services, Semantic Web Applications, Tools & Software 1 Comment →

Uberblic.org connects structured data from the web. The Berlin-based inventor Georgi Kobilarov gives a brief insight into the mashup service and talks about the challenges when it comes to build applications upon linked data.

You have recently published the service uberblic.org, a Linked Data mashup editor. What was your motivation to develop this tool?

Uberblic.org provides an integrated view of web data. Our goal is to integrate all the structured data on the web, and give web-developers a single point to access to that reconciled data. More than that, we will open up the tools we use to manage the data sources to the community, so that the people can help us curating that repository of free data. We re-publish all the data we import as Linked Data, under the licenses of the original data publishers.

Some of the data sources we import are available in the Linked Open Data cloud as well, but many are not. Linked Data is an elegant way to publish data in a distributed way on the web, but consuming it from that distributed cloud is – at least – impractical. In every real-world application using linked data from the web I’ve seen, organizations built up internal copies of the cloud, and often even reconcile linked data sources. They build their own Linked Data proxies. Uberblic.org helps those users by providing one public proxy for data from the web. Many of our sources get monitored for data changes, and the according data in uberblic is updated in real-time.

uberblic

Can you give us a brief insight how the tool works? What technology is is built on?

My company, Uberblic Labs, has developed a data integration platform that we use to power uberblic.org. We call it the Uberblic Platform (the name uberblic is derived from the German “Überblick” – English “overview”). This platform enables us to do the full process of “data fusion”: Importing and converting external data sources, mapping the data schemas to a central ontology, filtering out data errors, automatically suggesting duplicates to the user, and merging data from different sources into a single, reconciled representation.

Structured and semi-structured data from the web is an excellent use case for our software platform, since there we come across all the interesting cases of real-world data heterogeneity. But what I think is especially powerful and yet missing in other Linked Data projects I know, is the ability to subscribe to update-feeds. We do that extensively, fetching updates in real-time from Wikipedia and the like.

Our platform is built in Scala and runs a on cluster of machines, with workers communicating through a messaging system. We developed an RDF storage layer on top of a distributed key-values store for storing all provenance information used in the extraction process, currently around 100 million named graphs for uberblic.org. That storage layer does not directly provide SPARQL access, so we push all the output data into a SPARQL endpoint hosted by Talis as well.

What have been the biggest challenges in tackling the integration issues of dispersed data?

It was quite a steep learning curve to do Linked Data not only in an academic environment, but in a reliable, industry-strength set-up. In academia, there was always the excuse that things are just research prototypes. Now that excuse is gone. That’s also where it becomes necessary to manually clean up data. And there are two ways to do that: Either you enable the users to change facts directly in your repository after you have imported the external data (that is what Freebase does), or you facilitate clean-up cycles in the original data source and fetch these updates in real-time. That is what we do.

I believe that data publishing must happen in a distributed style, because then each data source gets taken care of by a specialized group of people using specialized tools. And it’s what you see not only on the web, but also inside organizations and enterprises. But consuming data trough centralized APIs is more than just convenient. We all use Google
or another search engine as a central access point to web pages which are published in a distributed way all over the web, don’t we? Can you imagine today researching a topic on the web without the centralization power of search engines, just by following links across web sites, like in the old days?

When we built the Uberblic Platform, some of the things I imagined to be large headaches, like schema mapping, turned out to work really well. Those pathologic cases you often see in academic “challenges” are – well – pathologic. It’s not necessary to solve them fully automatically through super-intelligent algorithms. Much more important than the sophistication of your algorithms are well designed workflows so that the user becomes a part of the solution. And that’s not about crowd-sourcing or swarm intelligence, the editorial curating of schema mappings and object reconciliation can be done just by a small team of people. If they have the right set of tools.

What are the next plans with uberblic.org? Where will the journey go?

Uberblic.org will continue to integrate more interesting and useful data sources from the web, and we will start making more APIs available to web developers to build their applications on top. We are also looking for partners who are interested in developing applications and have been struggling in the past to get the cross-source data from the web they need.

The work on improving uberblic.org will also benefit our Uberblic Platform, and hence our clients who use that same software for integrating organizational data sources with each other and with the web of data.

About Georgi Kobilarov

Georgi is founder and managing director of Uberblic Labs, a company based in Berlin specialized in Linked Data integration. He worked as a research associate in the Web-based Systems Group at Freie Universität Berlin and as a visiting researcher at Hewlett Packard Labs Bristol. As co-founder and lead developer of DBpedia, he was also a day-one contributor to the Linking Open Data project. Georgi is consulting with the BBC on several Linked Data related projects. He organizes the Web of Data Meetup London, a bi-yearly gathering of the UK Linked Data community. Georgi graduated with a Diplom in business administration from Freie Universität Berlin and has many years of work experience as a software developer. Visit his blog: http://blog.georgikobilarov.com

Sphere: Related Content

TuQS QuadStore combines the best of two worlds

March 22, 2010 By: Andreas Blumauer Category: Software Development, Tools & Software 4 Comments →

A new QuadStore which combines the best of two worlds (Lucene/Fulltext search engines & TripleStores/RDF/SPARQL) is out and can be evaluated online.

TuQs offers the following feature:

  • SAIL accessible
  • True QuadStore with GraphSupport
  • HighSpeed regex SPARQL filters
  • Userrights on TripleBasis
  • Extendable to a QuintStore (or more generally to an n-Store)
  • Cachable SPARQL Queries for further speed improvement
  • Clusterable
  • Federationable
  • FullTextSearchable

Some queries are really complex and high-speed, e.g.:

SELECT ?s ?o
WHERE {
?s <http://www.w3.org/2004/02/skos/core#definition> ?o .
?o <http://www.turnguard.com/tuqs/function#BooleanTerm> ‘Computer AND (java* OR HTML)’
}

The best starting point to find out, what´s the speciality of TuQS is here: Just click the sample queries on the right side and see how fast they perform even on very simple hardware.

Next steps: The developer of TuQS, Jürgen Jakobitsch (aka Turnguard), is currently working on SAIL inferencing.

Sphere: Related Content

George Anadiotis: “Linked Data brings value by offering an alternative approach to lightweight data integration and mashups.”

December 10, 2009 By: Tassilo Pellegrini Category: Linked Data & Open Data, Mashups & Web services, Semantic Web Applications, Software Development, Tools & Software, Vocabularies & Languages No Comments →

george-imcGeorge Anadiotis is an expert on artificial intelligence with academic roots at the Vrije Universiteit, Amsterdam. In February 2009 he took the position as R&D Director at the Greek technology company IMC. I met him in September at I-SEMANTICS 2009 where he and his team contributed to the Triplification Challenge. In their paper Linked Data for the Masses they were pondering about the pragmatic value of Linked Data from an inbound and outbound perspective.  In his words:

We started experimenting with the technical infrastructure needed and created some proof-of-concept applications. Part of this work was enabling Linked Data access for the front-end infrastructure we used, Liferay portal. We decided on the appropriate vocabularies for the type of content we wanted to publish (FOAF, SIOC and MOAT mainly), delved on the internals of Liferay and used D2R to map its relational database to the vocabularies of choice, also using techniques to improve performance as much as possible. Since Liferay itself is also based on the notion of communities, we thought our work would be more widely applicable and useful, so we chose to submit it for review at the Triplification Challenge and make it available to the community as open source software. Our applications have gradually matured and are about to be deployed in our commercial projects, while at the same time we are now making the Liferay Linked Data Module available as a Sourceforge project and we are working with Liferay management in order to disseminate this effort to the community and also include it in a future release of the software.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Demozone for semantic applications launched

November 10, 2009 By: Thomas Schandl Category: Semantic Web Applications, Tools & Software, Videos & Tutorials 3 Comments →

The Semantic Web Company compiled a suite of some of the best semantic web applications and put them in one place for you to try out: The SWC Demozone.

swc demozone logo

We selected tools pertaining to the different application areas of the Semantic Web – be it for finding, creating, linking and/or publishing information.

The showcased applications and services so far are:

Have a look at the demos and try them out for yourself – we provided explanations and links to screencasts teaching you how to use them.

We will add more demos in the future. If you are the owner of or a contributer to an application that you’d like to see showcased in the demozone, too, please drop us a line and we’ll try to add a demo for your software.

Sphere: Related Content

Topic Maps and the Semantic Web

October 16, 2009 By: Tassilo Pellegrini Category: Conferences & Events, Miscellaneous, Tools & Software 1 Comment →

tmraFrom November 11 – 13, 2009 this will be one of the big issues at the 5th International Conference on Topic Maps taking place in Leipzig/Germany. When asked about the relationship between TM and SemWeb conference organizer Lutz Maicher says:

With the vision of the web of data Topic Maps and the Semantic Web move closer over time. Anywhere URIs represent subjects, structured statements are gathered around them. In this context I see subj3ct.com as an interesting ventures. This recently launched service provides URIs for 15 million subjects to be used in structured data. Naturally, linked data hubs like dbpedia or geonames.org are part of it. The crowd is invited to contribute to this collection, also the Topic Maps Lab provides several feeds to register new URIs. Subj3ct.com turns out to be an infrastructure technology for Web 3.0 applications, regardless whether they are based on Topic Maps or other Semantic Web technologies.

Through this convergence the uniqueness of each technology sharpens. Reasoning is the strong point of the Semantic Web. But the strength of Topic Maps are semantic portals and the global federation of facts around subjects. Bringing together all and even contradictory information about each subject – and not building reasoning-ready consistent models of the world – is built into the genes of Topic Maps.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Attending TopQuadrant’s SemWeb Technology Training

October 14, 2009 By: Thomas Schandl Category: Companies & Institutions, Tools & Software No Comments →

There’s a lot to know about semantic standards, languages, technologies and their application, so last week I attended TopQuadrant’s first European training from Oct 5th to 9th in Amsterdam.

We kicked off with Eddy Vanderlinden elaborating on the lessons he learned from 30 years of work in the financial sector. He outlined how improvements could be achieved by using data models relying on semantic web standards. You can read about his ideas in this essay.

TQ’s chief scientist Dean Allemang then continued with his talk “Enabling Creativity at the Edge”. “The edge” refers to the boundary between an information system and the real world, where the end users of a system work. As business needs change faster and faster, the people working at the edge need to be able to adapt the company’s applications on their own and shape them to their everyday needs.

Dean Allemang

Dean Allemang

Nowadays end user often achieve this kind of creativity on the edge by using self-made spreadsheets. The problem with that is their lack of interoperability. These data from different spreadsheets, databases, reports, etc. are often connected through business processes that rely on repetitive and error prone human processing, like copying things from a spreadsheet to a database, creating a report and pasting its result into another system, and so on.

The result is a complex system with many heterogenous parts and an organisation that cannot possibly know what it knows.

As a solution Dean proposed to “think outside the table” and go beyond the relational database way of orgranising data. This of course can be achieved by integrating the data using semantic technologies. TopQuadrant’s software offers possibilities to do just that, and makes it possible to create highly customizable dashboards and applications that all rely on the same data.

During the following days we learned about the ins and out of using semantic standards and languages and tried out TopBraid tools in several hands-on excercises. The TopBraid Suite is a very powerful, commercial toolkit. It includes TopBraid Composer, Live and Ensemble. Composer is a semantic web modeling and application developement tool, that uses the Eclipse framework. TopBraid Live is a server for semantic applications built with TopBraid Ensemble. Ensemble is a graphical application assembly toolkit, that enables end users to create custom apps that run in a browser and use RDF data and data models – thereby allowing for the above mentioned “creativity at the edge”.

I am very impressed with the capabilities of these tools, they enable the user to realize manifold possibilities that come with using semantic web standards – and that without programming. You can see some of these tools in action and learn about applying semantic standards in a series of webcasts from Semantic Universe. For the latter topic you might also attend one of our webinars.

On the last day Dean coverd several case studies, like connecting ontologies to legacy data sources (using e.g. D2RQ inside Composer), applying semantic technologies to the customer service management of a larger retailer or using ontologies in Federal Enterprise Architecture.

All in all I am very happy to have attended TopQuadrant’s training and hope they will establish a successful series of trainings in Europe just as they did in the US.

Sphere: Related Content

Metaweb´s Jamie Taylor: “Freebase provides a large and user extensible vocabulary for RDF/RDFa”

May 18, 2009 By: Andreas Blumauer Category: Linked Data & Open Data, Semantic Web Applications, Tools & Software No Comments →

Jamie Taylor, Metaweb

Jamie Taylor, Metaweb

Andreas Blumauer from Semantic Web Company (SWC) talked with Jamie Taylor, Minister of Information at Metaweb Technologies Inc. about Freebase & Linked Data and Google´s announcement to use RDFa.

SWC: At ISWC 2008 Freebase became “officially” part of the LOD Cloud. What exactly has changed since that time?

Jamie: Since Freebase is a community writable semantic database, the addition of the RDF interface allows anyone to publish data into the LOD cloud. LOD Applications can access any Freebase Topic through the RDF interface by constructing a URI from the Freebase identifier.  But perhaps more importantly, because entities in Freebase can be annotated with multiple identifiers, Freebase Topics can be retrieved by constructed URIs using the identifiers used by other systems and data sets.
For instance, the movie Blade Runner can be referred to as http://rdf.freebase.com/ns/en.blade_runner, but it can also be referenced as http://rdf.freebase.com/ns/authority.netflix.movie.70053131 using the Netflix identifier, http://rdf.freebase.com/ns/authority.imdb.title.tt0083658 using the IMDB identifier, or as http://rdf.freebase.com/ns/wikipedia.en.Dangerous_Days using a Wikipedia wikiword (which in this case is a Wikipedia redirect to the wikiword Blade_Runner).
Freebase also provides a user maintained mapping of how these identifiers can be used to address resources in other LOD systems. The sameas.freebase.com schema can tell an LOD user that the Freebase Blade Runner Topic can also be found in DBpedia using Wikipedia identifiers or how musical artists can be found at the BBC using Musicbrainz identifiers.  In fact, the Freebase RDF interface uses the sameas.freebase.com schema to create the owl:sameAs links in the RDF output allowing the user community to expand the interconnections between Freebase and the LOD Cloud.
Linked Data providers are also using the strong identifiers in Freebase to identify entities such as companies and locations in their own data sets.  When they find an entity that is not represented in Freebase, they simply add the entity to Freebase and use the newly minted Freebase identifier.  This permits anyone using their data to understand how their entities relates to any of the more than 5 million things interconnected within Freebase.

The RDF interface can also be used to reference the Freebase type system, giving LOD data set providers vocabularies across a wide range of subject areas.  And because anyone can expand Freebase’s data model, data providers can use our schema development tools to build and extend these vocabularies to suite their needs.
Freebase was not designed for ephemeral or fast changing data, like weather conditions or stock ticks.  But this type of information is well suited for publication as Linked Data.  Freebase entities representing a location or company can be annotated with references to LOD services that provide these types of volatile data.  Similarly, Linked Data provides a great way to disseminate very fined grained information that might be associated with a scientific study or financial report.  Linked Data provides a seemless transition from Freebase, where a user (or application) can run a query with constraints that run across a wide range of types to find entities of interest along with the LOD services that provide access to temporal or high resolution data not available in Freebase.
We recently demonstrated MQL Extensions which allows the Metaweb Query Language to use data from other systems as a part of the query constraint and result set.  While MQL Extensions are user extensible and work with a wide array of systems,  this capability makes the connection between Freebase and the LOD Cloud even more transparent.
For example, because US companies that are registered with the SEC are annotated CIK code in Freebase and the sameas.freebase.com schema indicates that the CIK annotation can be used to create a URI that is dereferencable at rdfabout.com, it is possible to write a MQL query that asks who is on the board of financial services companies that trade on NASDAQ and are  headquartered in California (and using another MQL Extension, you can ask for their stock price as well!)

SWC: Many organisations are very interested in Linking Open Data now but they are still not sure if they can benefit from publishing data on the web – what´s your experience so far?

Jamie: Linked Open Data provides a simple, standard way for organizations to distribute structured data.  For most organizations, providing access to data is another important outlet to announce the availability of higher value services.  For organizations involved in building or selling physical goods, the bits representing what they provide are not the goods themselves, but a way of attracting potential customers.  Making catalogs and specification sheets available in electronic form, so other applications can connect buyers to their physical goods is simply an effective marketing system.  Even for firms involved in electronic services, providing access to open structured data is generally a lead-in to value added services.  For instance, if I ran a service collecting hard-to-find information about manufacturing relationships between medium sized businesses, I would publish open company profiles covering things like market size, industry, location for the medium-sized businesses I tracked, so potential users the premium data would know I had the coverage they were looking for.

SWC: Just recently Google has announced to use RDFa to enhance their search results. What do you think?

Jamie: We are excited about Google’s announcement. Yahoo’s use of RDFa for Search Monkey and Google’s announcement gives RDFa users tangible benefits. The Search Monkey team was very quick to realize that because users can create data models in Freebase, and because the elements of those models all have strong RDF identifiers, Freebase provides a large and user extensible vocabulary for RDF/RDFa (see the list of vocabularies). When a user wants to create a Search Monkey application that works with their film review site, they need not invent a new vocabulary (that will probably be used only once),  they can use the Freebase Film Domain vocabulary which supports over 63,000 instances in Freebase alone.
Similarly, with over 5 Million well described Topics in Freebase and over 14,000,000 Named Objects (Topics, images, musical tracks and documents) when a user wants to unambiguously identify a subject or object in RDF/RDFa, Freebase has an extremely large collection of identifiers to draw from.  These cover people, places, companies, movies, music, books and wide variety of other subjects.  If Freebase doesn’t have the entity the user is looking for, they can of course add it themselves and make use of the identifier immediately. I think this is why Google used some Freebase identifiers in their examples. We hope that with Yahoo and Google’s support for RDFa the web will become a strongly annotated source of data which can support a wide range of user applications.

SWC: Thank you, Jamie!

Reblog this post [with Zemanta]
Sphere: Related Content