Data marketplace… is it too big a thing to be tackled in a whole?

This my second post on data marketplaces… unfortunately triggered by the bad news of Talis’s winding Kasabi down. There are a number of good posts discussing this and its meaning to the Semantic Web and Linked Data efforts. I’d like to share my ideas here but focusing on the data markeplace side of the story.

In his blog post, Tim Hodson wrote:

So we were too early. We had a vision for easy data flow into and out of organisations, where everyone can find what they need in the form that they need it through the use of linked data and APIs, and where those data streams could be monetized and data layers could add value to your datasets

The previous quote aptly captures the essential aspects of data marketplaces. In its richest form, a data marketplace enables buying/selling access to quality data provided by different publishers (essential aspects are in bold).

Tim went on to say:

Other organisations besides Talis, sharing similar visions, have all had to change the way they present themselves as they realise that the market is simply not ready for something so new.

So I looked at a number of existing data marketplaces and see how they present themselves. It is hard to identify what exactly is a data marketplace, however I am including these mainly based on Paul Miller’s podcasts:

  • sells lists crawled from the Web as downloadable files.
  • Datafiniti: sells data crawled from the Web through SQL-like interface.
  • Microsoft Azure Data Marketplace: sells data from a number of publishers via API access based on OData.
  • Infochimps: sells data from a number of publishers via a mix of downloads and API access.
  • sells only numeric data provided by a number of publishers. It focuses mainly on visualization but also provides API access.
  • Factual: collects data (mainly related to locations) and sells API access to the data.
  • Kasabi: sells API access to data from different publishers.

Form the list above,, Azure, Infochimps and Kasabi fit the more specific definition of data marketplace i.e.  provide API access to data provided by different publishers. These functionalities have their implications:

  1. Supporting different publishers calls for a managed hosted service (a place for any publisher to put its data).
  2. API Access calls for cleansing and modeling any included data.

Selling simple access to collected data (e.g. downlodable crawled lists) doesn’t involve any of the two challenges above (or involves a simpler version of them). Providing data hosting services (i.e. database-as-a-service) doesn’t necessarily involve data cleansing and modeling (as these only affect the owner of the data which is mostly its only user). Both domains, collect-and-sell-data and database-as-a-service, seem to be doing fine and enjoying a good market. On the other hand, if we look at data marketplaces, it is clear that they don’t present themselves as pure data marketplaces (not anymore at least): ==> sells the platform as well, specialises in numbers and focuses on visualization.

Infochimps ==> calls itself “Big Data Platform for the Cloud”

Azure Data Marketplace ==> is still a pure marketplace but as part of the Microsoft Azure Cloud Platform.

All these make me wondering, is data marketplace too big a thing to be tackled now? is the market not ready? technology and tools not ready? are marketplaces not selling themselves well? should we give up the idea of having a marketplace for data?

I am just having hard time trying to understand…

P.S. All the best for the great Kasabi team… I learned a lot from you!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s