Introduction
Most data in today’s world is controlled by large private and public organisations who in practice regard the data they collect as their own private property. They share this data only within narrowly defined parameters where the value of reusing data is only available to themselves.
Our sponsors and a community of experts and data stakeholders commissioned this project to develop an alternative model for scaling up safe reuse of data.
Over a six month period we explored a wide range of ideas, issues, and use cases around the Data Commons concept and summarised the results of that conversation in this blueprint. This blueprint details the new proposed “Data Commons” model. It is available now as a snapshot of where we have landed at this point in time and for potential data reuse interests to take forwards into prototype and testing.
Our sponsors agreed at the outset that this paper should be available under Creative Commons license to further the general open data movement, and to support others with similar ideas or facing similar challenges, and to aid transparency and public scrutiny of the proposal as it is developed. We hope this conversation, and the network of interest around this project, continue to develop and build on the approaches suggested here.
A Data Commons, simply put, is a way that communities can agree on how to share their data, add to the value of their data over time, and manage the risks of its integration and reuse. Through the establishment of a Data Commons, a wider group of potential data reusers can realise more of the value for themselves and their communities safely and in a way that is high-trust and mitigates the risk of misuse.
The document describes how a data integration and reuse solution founded on commons principles can enable individuals and organisations to work together to more effectively share, reuse, and integrate data in a high-value and safe way.
Any commons is formed by a Community of Interest for mutual benefit. Overall, our conclusion is that the work of enabling high-value, high-trust data sharing is largely community-forming work (rather than technology work). The challenge of data reuse is the challenge of managing interests, and that is a relationship challenge, not a technology challenge.
Too often we see people leaping to technical point solutions without laying a solid foundation about how to establish the social protocols for how and when data will be used. There is some great new technology (such as the blockchain) that may unlock exciting new potential. But all of that is pointless unless communities of practice, often with diverse and divergent interests, can work together to establish collective rules for a shared common-pool resource. Unless all parties feel good about sharing their data, they will be unlikely to do so. Attempts at coercion lead to poor data or no data. A model where data is fenced off as private property reinforces silos of competing interests rather than data integration or sharing.
Enabling high-value, high-trust data sharing is largely community- forming work
This blueprint is our first attempt at a fundamental rethink of how data reuse might be enabled. This is a rapidly emerging field of practice and we are still very much at the stage of feeling our way forwards, but there is a lot of great work being done that we can build on. Besides the community that was formed to explore this blueprint, we refer both to other communities internationally who are embarking on a similar journey for similar reasons, and to a wealth of examples of proto-commons-based ways of enabling data sharing. Here we present our first version of the blueprint – with, we hope, enough substance to get people interested in taking this from theory into practice in the coming year.
Section One examines the risks and benefits of data reuse and concludes that the central challenge is building trust into the system. In doing so, we explore exiting practice around data integration and examine why it is hard to scale or fails to lead to comprehensive data integration.
Section Two introduces the six design principles and objectives of an alternative model for enabling data integration and reuse. Here we introduce the core ideas behind the Data Commons approach.
Section Three introduces the first part of the work of building a Data Commons: community-forming and co-design of the community protocols that underpin the social contract governing the commons. This section introduces a framework for thinking about data interests – who needs to be at the table? – then introduces the notion of a “stack of protocols” that needs to be co-designed by those interests to allow the commons to function effectively.
Section Four outlines the second part of building the commons: kick- starting use of the commons to drive value.
The appendix contains further notes about two Data Commons case studies: one for person data and one for biosphere data. These are not fully developed case studies but reflect some of the Data Commons aligned thinking applied to particular classes of data.
We hope that by proposing this alternative model for enabling data reuse we can reset the debate. It is a false dilemma that we must either be coercive to harvest value from data, or give up on the value proposition because the risk is too great. That only appears to be a dilemma when you don’t directly address trust. There will be higher value for a wider Community of Interest where there is higher trust.