Roughly two and a half years ago, the Linking Open Data community has established itself. The basic idea in this initiative is to publish data in a machine understandable format on the web (RDF and RDFa) and to allow for linking these data sources. Instead of a web of pages that we have nowadays, i.e., a web where human readable webpages are linked, we would get a web of linked data objects. Such a web will allow you to find information more easily and browse the web using new kinds of user interfaces. Further, it will allow publishers to better monetize their content.
The difference between the current web and a web of data is best explained with an example. Nowadays a typical online store would consist of pages describing different products. Such a page contains all the information a human reader requires. But this information is represented in a way that makes automatic processing of it hard. In a web of data, the online store would actually publish data resources to the web, which represent the products. These resources can be retrieved in different representations, such that a browser would allow you to view a product page while a web crawler would get a machine understandable representation of the product.
In this latter representation, all the important information would be available in a structured way. The crawler could “understand” that the resource it just found is a product with a specific price by a specific manufacturer and so on. Subsequently, the product could be linked to a resource identifying its manufacturer. If the crawler would find a second product of the same manufacturer, he would be able to recognize that and organize the information accordingly.
The advantage of this web of data is simply the ability to write software that can better index, process and aggregate the information on the web. The crawler above would know when different products are manufactured by the same company and could provide improved search and browsing functionalities compared to current search engines. Theoretically, such a crawler could even understand that two products in different online stores are actually the same. The Motorola Droid, for example, might be offered as a resource from both Amazon and BestBuy as resources in the web of data. Additionally, Motorola might have its own resource representing this specific mobile phone. In the web of data we could add links between these different resources that state the equivalence of them, allowing search engines to provide improved comparisons of different offers.
Although this idea might sound pretty academic and like a light version of the Semantic Web (it actually emerged from the Semantic Web community), something interesting happened with this idea: basically, the number of organizations publishing their data as linked data has exploded. Even businesses started to adopt the idea. Examples are the BBC, BestBuy, and very recently the New York Times. Further, large public bodies such as the Library of Congress and the Swedish Royal Library publish parts of their catalogs and thesauri as linked data. Even governments start to release statistical data in an appropriate, machine readable, format. And finally, also Google and Yahoo started to crawl Linked Data available within web pages (using a standard called RDFa) and to use this data in order to better display and rank search results.
But why is this interesting for the industry? Why should a company like the New York Times publish their metadata? Data that was certainly expensive to produce. Why does the BBC or BestBuy invest money to publish linked data? The answer is simple: visibility and attraction.

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.
What is interesting here is that linking the pieces together on the web is becoming simpler. When the New York Times uses Wikipedia resources to annotate their articles, the links back from Wikipedia come for free. Wikipedia would just need to know that the New York Times uses Wikipedia resources for annotation and add the New York Times as an additional data source. This could be done automatically or even by the user while brwosing as shown by recent search engine prototypes, such as sig.ma.
The idea of the web of data is truely a chance for the media industry to actually become part of the web instead of being a set of islands in the web. But there are still a lot of questions left: How to generate the links in an efficient manner? Would Wikipedia want to take a share in the revenue generated by these links? Does a online store really want to help search engines to better compare their own prices to those of their competition? Although these questions are not answered yet, the apparent interest of the industry shows that it seems to be attractive to them.
An interesting statistic that shows the potential of Linked Data was presented at the Search Engines Strategy conference 2009 in Chicago by Best Buy. They mentioned that exposing their products as Linked Data improved, within 8 weeks, the rank of the respective product page in Google to be higher than the traditional Best Buy pages; these Linked Data pages have different URLs, they are much younger and less established but they appear higher ranked nonetheless. Furthermore they reported an increase of 30% in traffic and a 15% higher click-through-rate.
So, obviously Google rewards the use of machine understandable metadata in web pages by giving them a higher ranking. In order to view the results yourself, just follow this link to the google search page: Search for Ferris Bueller on Best Buy. The first hit you see is the RDFa and Linked Data version, while the second one is the much more established traditional web page.
To come back to start of this blog post, it is apparent that an idea developed by an academic community gained enough attention in the industry to be adopted. It seems that the industry is interested in being visible and to provide innovative services to their customers. As I have learned in many conversations and during several conferences, there is a strong need to stay visible on the web. The smaller or more focused a company is, the more vital it is for them to be visible. Specifically in the domain where I am active in, the media industry, traditional providers are struggling. New competitors on the web are way more visible, and if you don’t know a specific provider, you might have a hard time to find it nowadays.
Linked Data, more intelligent search engines, innovative services on the web, and links to other sources of information is the way to be saved from ending up as an island in the web and really become a part of it. The Linked Data community is pushing the idea of the web of data forward, collecting best practices and advice. If you are not already linked, you should think about it. A good address to start is the Linked Data website.
Edited by Freddy Snijder
Tags: linked data, media industry, RDF, RDFa, search engines, semantic web, web of data



