The emerging challenge to integrate and reuse data
Data integration and reuse is widely recognised as a way to drive social, economic, and scientific outcomes that improve people’s lives. However, it is not without risk.
We believe that current models that focus on data ownership, minimal contributor consent or control, and short-term commercial benefit do not work as a means of encouraging data integration or reuse. The fundamental challenge is enabling trust through forming relationships that enable (or disable) data reuse. There is more value to be gained in a high-trust model than a low-trust one.
Technology and digital media are transforming the world we live in, and offering us a potentially far more responsive, effective, transparent, andaccountable approach to business, civil society, and government.
This transformation is fuelled by unprecedented amounts of data and information. While ‘big data’ promises a more prosperous, just, and equitable society, as with any innovation there are both risks and benefits. Big data can just as easily be used to steal IP, erode commercial interests, and steal hard-won academic research data. State sector use of big data can intimidate citizens and unintentionally or intentionally target marginalised communities.
Practices for the safe management of personal and commercial, creative and scientific knowledge have built up during the last hundred years. These practices around privacy, commercial sensitivity and secrecy, and intellectual property balance personal and public interests, commercial and public goods. For example, in health research, bioethics practices inform how to obtain consent and undertake research safely to make scientific progress. Intellectual property law seeks (not always successfully) to balance openness and shared discovery with the need to permit commercial rewards in reaping benefit from investment in discovery or artistic creation.
These regimens work for the most part as an evolving system that enablesdata production and use, but they are woefully unfit for what is happening now. In today’s world, data is networked, easily transmitted, and copied at almost zero marginal cost. What is more, with the advent of digitisation and wireless sensing, the cost of initially capturing data has also dropped.
In the past, when thinking about information sharing, we were largely referring to finished knowledge products such as scientific papers, patents, songs, or movies. But what happens when people want to integrate and reuse the low-level raw data? There may be value to be found, not just in the scientific paper but also in reusing the individual genome that was gathered as part of the research; or not just in information about your overall income, but in each financial transaction you make.
To understand how value arises from sharing data, and also the risk and how to manage it, it’s helpful to distinguish clearly between data sharing, data reuse, and data integration.
Data sharing is simply the transfer of data between actors. A doctor may share your data with the surgeon in hospital to co-manage your health condition. The additional value from sharing arises because the data is then reused or integrated with other data. But reuse or integration may also carry risks – and this is where the challenge in sharing arises.
Data reuse is what happens when shared data is used for another purpose – for something that was not intended when the data was first collected. Sometimes data is not shared with another person, but is repurposed by the original holder of the data. I may have collected your email address for the purpose of providing an email service, and now I want to target advertising of other products to you using that email. Reuse of data also includes sharing it with somebody else who repurposes it. If a doctor shares your medical information with the government to help understand benefit liability, then that is repurposing it. Data reuse is what drives both potential for value and many of the concerns about risk of sharing.
Data integration is what happens when we link bits of data together to understand the relationship between them. An example might be to join your personal health data to information about your lifestyle to better understand your health risks. Another example might be a scientist linking environmental DNA samples with data about pest numbers in a national park to get a more complete picture of the future biodiversity of the park. Integrating data can let us answer questions that we previously couldn’t answer – it creates a bigger picture of what is going on.
In this document the focus is on data integration and reuse, hereafter termed Reuse.
Reuse, repurposing, integration: they are all aspects of the basic idea that the value of data lies in its use and that further value can be added by its reuse. The value and risks are all based on the insight gained from the reusing or integrating. I can do new things, make different decisions, automate decisions by integrating and reusing data. In this document the focus is on data integration and reuse, hereafter termed Reuse.