Last Wednesday at the Open House event of the Semantic Web Company in Vienna, Evan Sandhaus, Lead Semantic Architect at NY Times gave a comprehensive and entertaining introduction to rNews and its potential benefits for publishers.
Evan Sandhaus (f.l.t.r) busy preparing his talk in the kitchen of SWC, together with Andreas Blumauer (SWC) and Leo Sauermann (Gnowsis). | Mr. Sandhaus in action. |
rNews is a RDFa vocabulary, which is basically a carefully selected subset of the very rich IPTC vocabulary and some additional elements that came up during the standardization process. It is now available in version 1.0 and – according to Evan – actively supported by schema.org.
As showed above the data model of rNews is really simple and centered around two classes: the NewsItem and the Concept. This deliberate simplicity is a major advancement compared to standards like NewsML (whose complexity probably prohibited its critical uptake among the news industry). But due to the functional extensions attributed to RDFa, rNews might also be considered more complex than hNews, the microformat equivalent issued by the IPTC in 2009.
Evan mentioned three scenarios that might drive the uptake of rNews for the benefit of news publishers:
1) Better news search
rNews allows you to explicate and differentiate various documents elements like, title, author, text body, picture etc., thus giving the publisher better control of what to expose for indexers and web crawlers. This might not just improve the display of rich snippets in the search results of Google and other search engines, but also allow automated population of faceted search and metadata based similarity search.
2) Better ad placement
As rNews can be applied to any kind of news-relevant media irrespective of its format (grafics, audio, video, etc.) the metadata can be used to avoid “unfortunate juxtapositions” between editorial content and ads. Hence, media agencies could profit from this additional data by fuelling their matching algorithms and gain better insight into the context specificities of content items.
3) Better analytics
By improving the semantic granularity of a news item this additional information can be used to carry the web analytics beyond the page level and provide a better insight into usage patterns. The additional data can be applied for visualization and exploration purposes i.e. for search engine optimization, sentiment detection and many more.
This is just a small fraction of things rNews could be used for. All in all it is exciting to see that IPTC has finally started to provide publishers with a standard that is relatively easy to implement and help them to overcome the obstacles of existing technologies without disrupting existing publishing workflows. In multi-sided markets like the news industry this might be a crucial success factor!



Pingback: Geek Reading October 7, 2011 | Regular Geek
Hi Andreas,
can’t agree to the assertion that NewsML-G2 is much more complex than rNews: it has a NewsItem, a Concept Item – sounds very likely, doesn’t it – and a Package Item and a Planning Item. One of the key actions in the IPTC development process was to synchronize both models as much as possible. But rNews and NewsML-G2 have different use cases: the first one is made for marking up news on a web page while the latter is made for news exchange in a b2b context, for the required additional features NewsML-G2 has some additional properties. Just to add: a couple of news agencies have NewsML-G2 feeds – I agree, this raises much less public awareness than a standard for the markup of a web page.
Hi Michael, thanks for clarifying this. There are also many B2B solutions based on semantic web standards “behind the firewall” already, not many people take notice of this and still argue: “Where is the semantic web?” – It´s quite often not visible.
Hi Andreas, just to add: when developing NewsML-G2 it was one of the design requirements to make the XML serialization fully RDF compatible: each metadata property represents a predicate of a triple, the subject of this triple must be clearly defined and the object of the triple can be a web resource or a literal. As the IPTC discussed in Vienna last Friday a next step would be to have a comprehensive news ontology for the B2B environment. Btw: the IPTC is supporting the W3C Semantic News Community Group at http://www.w3.org/community/semnews/ – anybody interested into the semantic enhancement of news is invited.