Promises of Data Marketplaces and How Can We Evaluate Them?

One of the questions I was interested in while listening to the excellent series of podcasts by Paul Miller on data marketplaces was: why would people pay to access data? this can be put differently as: what values do data marketplaces offer?

Here is a compiled list of benefits that data marketplaces promise:

  • Discoverability: through a central place where datasets are described and can be found.
  • Easy access to the data: via providing API access to the data for example.
  • Easy publishing: of-the-shelf infrastructure.
  • Commercialisation: easy buying and selling data.
  • Better data quality: providing curated and maintained datasets.
  • Value-added data: having all the datasets in one place enables users (or the marketplace provider) to draw new insights, remix datasets and derive new ones.

A logically following question is: how can we evaluate the extent to which data marketplaces are fulfilling their promises? With the expanding belief that data should be made available for free, it is important for data marketplaces to make clear the additional value they offer. Ironically maybe, this can prove to be very helpful to the open data movement as quality complaints that usually accompany open data can be addressed by marketplaces with a non-prohibitive cost on the consumer side… I believe that an empirical study of the existing data marketplaces can reveal interesting insights and lessons. I don’t have a clear idea about how to evaluate the impact that data marketplaces have achieved regarding their potential benefits but few sketchy ideas…

  • Discoverability: do data markets enhance metadata description of datasets? provide an API to search for datasets? standardise metadata description? etc…
  • Easy access to the data: this boils down to evaluating the access method (mostly an API) provided along with the service quality metrics such as availability, performance, etc… An interesting idea I came across in this paper(PDF) is that the prevailing charge-per-transaction model hinders ease of acces as clients might have to cache results. Data Licensing is also related to the ease of access and data marketplaces have the potential of fostering convergence on a small, but sufficient, set of data licenses.
  • Easy publishing: evaluating the set of services the data market provides for publishers
  • Commercialisation: what percentage of datasets on a marketplace is not free? are there datasets available for sale  on a market but not anywhere else (its commercialisation is solely enabled by the existence of the market place)?
  • Better data quality: did marketplaces enhance the quality of (open) data available elsewhere?
  • Value-added data: can users meaningfully remix existing datasets? is there a market-wide query engine? are there new datasets provided by the data marketplace through drawing insights from or remixing a number of existing datasets?

One of the biggest challenges here is that the term “data marketplace” is sill used in a very loose manner which risks ending up comparing apples with oranges… However, a carefully designed comparison can prove vital in advancing the current state of art. I’d be very glad to hear your ideas and feedback on this.