One choice for a Data Commons is between a “point solution” – one tailored to the specific requirements of a narrow community and specific reuse application – and a “protocol solution” – one employing rules and specifications with more general applicability.
Many solutions today involve hardwired attempts to share data. Examples include the Statistics New Zealand IDI and most of the data integration projects within government, or the Loyalty New Zealand Fly Buys programme integrating personal data across fifty or more New Zealand companies. The same focus on bespoke solutions applies to Google or Facebook.
Programming and technology can hardwire virtually anything. So there is always the option to just build the specific Commons solution: buy a big computer and tailor individual data gathering and integration solutions to the available data, then build the high-trust, inclusive, participant-controlled, data reuse system in this “box” or “cloud”. But this costs more, is inflexible and is fragile. A point solution might be easier to prototype, but is often harder to scale. It also has a built-in centralising and so controlling tendency (though this can arguably be mitigated).
A point solution might be easier to prototype, but is often harder to scale.
We think the better opportunity is to create a protocol-based generalisable approach.
Technically, a protocol-based approach (as opposed to point solutions) usually builds more innovation, lower costs, scalability, and inclusion into the solution from the ground up. A protocol-based approach to the commons will be a slightly slower start, but far cheaper, more innovative and flexible, and more scalable in the longer run. It is a better solution for distributing control (mitigating some central actor tendencies). It also makes the solution scalable at low cost, since new interests are merely adding themselves to the network of other actors who adhere to the protocol.
But there are more basic reasons for looking for generalisable solutions. The most basic has to do with the nature of data: the more opportunities for reuse and integration of data we can create, the better we understand the world we live in and the more value we are likely to add to our society. Each specific instance of a Community of Interest sharing and reusing data sits within a much larger data ecosystem, made up of all the ways in which the data we generate reflects the complex interconnections of the social, economic, and physical world in which we live. As we link that data together, we will find more connections – expected and unexpected.
So the rules for a Data Commons ought to be designed to facilitate scalability – widening the parameters of an existing Community of Interest, and trading/ sharing data between Communities of Interest – to increase the opportunities for making these connections.
At a technical level the solutions are available. A transparently published protocol will include things like “Application Programming Interfaces” and “Metadata standards” for the technology and data layers respectively. Protocols for both are being developed or have already been adopted. But what are more important for our concept of the generalisable Data Commons are the higher layers of these protocols.
The proposal here is to create a method for people to form data sharing arrangements: not “the point solution” that is the place where this happens but a “market” where high-trust data integration and reuse can easily emerge, prosper, and be terminated between parties. So it might also include higher-level constitutional protocols: how we respect, manage, and make decisions about data reuse and manage the community. The more general – and widely accepted – these rules are, the easier it will be to facilitate both scaling up of, and trade between, Communities of Interest.
In addition, being protocol-based builds “distribution-by-design” into the DNA of the data reuse ecosystem. It is a better solution for distributing control (mitigating some central actor tendencies). It also makes the solution scalable at low cost, since new interests are merely adding themselves to the network of other actors who adhere to the protocol.