Buzzword Soup: Combining AI and blockchains

How combining two technologies will upend data governance

First published in 2017.

Right, we have a data problem. The reality is that too few organisations can get their hands on it, and those that can get their hands on it, have got too much personal data. Even once you have some lovely data, it’s a struggle to share or aggregate it across business units, countries, or software systems. As much as we think we are awash we data, we aren’t. Data is actually siloed at every turn, either through regulation, technical limitations, or lack of tools to un-silo it.

Even when organisations might be set up to share data and have a culture of sharing and collaboration, privacy laws and data protection legislation has had a chilling effect on data sharing. The Health Insurance Portability and Accountability Act (HIPAA) in the US or the Data Protection Act in the UK explicitly state what data can and cannot be shared. But these are complicated policies. The technical difficulty of implementing data policies, combined with the degradation of user experience those implementations produce, make IT systems costly, inefficient, and unpleasant to use. On top of which, the security theatre that surrounds them only adds to the insecurity of participating with such systems. It’s just not worth it for people to go through the hassle of sharing data.

Even when sharing is encouraged internally, technically possible with APIs, and legislation is favourable, current data infrastructure makes it difficult to monetise data. The best that happens is that external developers or citizens get free access to data. Lovely stuff for the user of the data, not so much for the publisher. There is no good way to get paid appropriately for published data. Some content can be published using an open-source license, sure, and attribution is great, but it doesn’t pay the bills. Dual-licensing for software is possible. See Oracle’s MySQL database which is dual-licensed under a commercial proprietary license and under the GPLv2 license. But in most cases licensing is costly, difficult to enforce and inefficient. Licensing of personal data is officially non-existent. Despite that in developing countries companies make money from this value. Around the world doctors use Facebook’s WhatsApp to send medical reports, nurses use Gmail to provide remedial advice, and that data, while not licensed, is published to advertisers as users uncaringly share the data.

Today there is no economic incentive for individuals to do anything other than give data away for free and for corporations to hoard it.

Blockchains and data exchanges are a solution

Blockchains are terrible databases

First and foremost, despite my clickbait-y headline I want to get this out of the way: public blockchains are in most ways worse than existing databases. They are slower, have less storage, are extremely energy-inefficient and in most cases less private. But these are design choices are made to improve one feature: decentralisation. By decentralisation I mean the elimination of a central administrator which leads to extreme fault tolerance and increased data integrity and tamper-evidence. The removal of centralised control and vulnerable centralised data repositories has trade-offs which make most blockchains unsuitable for a ton of use cases that have been spoken about. But in cases where security and tamper-evidence is more important than throughput, speed, capacity, and stable governance, well then public blockchains are well worth exploring.

So the question becomes: how important is security to individuals, corporates and Governments, right? The bull case for blockchains is that it matters a lot and it will matter more in the future. If security continues to be an afterthought, well existing databases are cheaper, faster and more convenient, so why bother with a blockchain?

Well, I tend to believe that security will become more important. Things like the Yahoo! or Equifax hacks certainly shine a light on the vulnerability of centralised data providers but tbh I really don’t think individuals are going to demand change. People are going mad for Amazon Echos and Dots and sticking them in every room, very few people are asking: what data is actually being collected? Where is it being stored? Is it encrypted? How can it be combined with other datasets? Security and data protection matters far more to business and to Government and the so-called Internet of Things will force the change.

Blockchains will help manage the sharing of data

Never has so much data been available for collection and analysis. Connected cars are throwing off vast amounts of data, the challenge is that every single stakeholder wants access for their own purposes: car makers want it to improve the driving experience; tire makers want it to see how their tires perform; City administrations want it for traffic prediction; and software makers want it to improve their self-driving software. As sensors are embedded in all sorts of everyday objects, everybody is fighting for who ‘owns’ the data. This is fighting yesterday’s war. Blockchains can provide an open, shared data layer in which all stakeholders have access to data.

Sure, not every bit of data will need a fully decentralised blockchain with proof-of-work. In most cases a simple distributed ledger with a Merkle tree will suffice (see DeepMind Health Verifiable Data Audit). Much of the data could even be stored off-chain with just links to the on-chain hash. Regardless of the blockchain flavour, cryptographically-secured distributed ledgers offer a better alternative than centralised databases. Of course, an assumption here is that blockchains don’t suffer the familiar fate of incompatibility by competing blockchains. The community does seem to be fully behind blockchain-connecting projects like Polkadot, Cosmos, Atomic Swap and AION. These services combined with zero-knowledge proofs mean data can be shared privately on public ledgers. At this point, we are close to the ideal of a globally shared database with easy and, ideally, public permissions.

Add in data exchanges…

Now, the final piece. Data exchanges like the Ocean Protocol bring together data buyers and sellers (also including bots and devices). As explained, today data is either given away for free or sits underutilised because people and organisations have no way to monetise it. A blockchain-based data exchange can enforce data quality standards, ownership and usage rules, and pay sellers to rent or sell data. A data exchange provides the missing component to a shared ledger: a business model. People and organisations can easily earn money from their data.

Sure, people won’t just stop using Google or Facebook tomorrow. The value they provide is far too great. But these new networks will change the conversation. The public will begin reading news stories about how they can be paid when people download their pictures or paid when they upload their smart watch data.

The End of AI Platform Monopolies

A global data sharing and monetization network

Right, so where were we? Oh yeah, we now have a global network of interconnected blockchains and DLTs that share value seamlessly with easy-to-use data exchanges. Hopefully, as the industry begins to focus on usability and user design, we will be in a world in which anybody can publish data with a press of a button or voice command. Payments in Bitcoin or other tokens are seamless and automated based on rules coded into smart contracts. For the average user, all they have done is agreed to conditionally share data, as they do today with Facebook and other systems, and next thing they know they have tokens to spend however they want. They can convert to a national currency or merrily purchase their preferred goods and services.

How does this lead to the end of AI platform monopolies? Well in 2017, the only thing that matters is data. Platforms like Google, Facebook, Baidu collect data to feed their AI algorithms (specifically their deep learning algorithms) improving their products. More data improves the products which in turn brings more customers and engagement which in turn generates more data.

When AI is the driver of product improvements proprietary data are the most valuable asset for platforms. In fact, access to proprietary data sets is one of the most important assets for an AI startup. The way to think about it is data is the supply and AI algorithms are the demand. And deep learning models are hungry.

Reducing the value of data

Here is the knockout: blockchains aggregate the supply side for free (almost) for all. Of course, there will be some transaction fees and other friction points, but compared to existing data infrastructure an open, shared data layer essentially commoditizes data. Or at the very least makes proprietary datasets much less valuable.

Firms that control supply — data — no longer dominate markets. Data stops becoming a moat and a competitive advantage. Now the demand-side becomes the most valuable place in the ecosystem. This is where the customer relationship is won with trust. Well, trust, and a simple, easy to use interface, maybe a conversational or voice UX. The only thing that matters in 2020: The customer relationship (The EU’s General Data Protection Regulation, or GDPR, will reinforce this).

Blockchain-enabled AI

A global incentive-aligned shared data layer

A second, a longer-term implication of a global shared data layer is blockchain-enabled AI. Agents (not even particularly intelligent agents) can use blockchains as a ‘substrate’ as Fred Ehrsam put it. Deploying and using agents on blockchains rather than using proprietary tools and platforms like Facebook Messenger will be more attractive to developers, users and regulators.

For developers, first, they have access to a vast amount of free (on public chains anyway) and structured data, data that they would never be able to buy or generate themselves at first. Second, they have structured and high-quality data (right now, just transaction data, but increasingly all sorts of value store and exchange). Third, native automation tools in smart contracts and hopefully very soon production sidechains make it easier to build reasonable agents that can perform reasonably complex actions. Finally, developers and companies that deploy agents have a native payment channel to be paid almost immediately based on all sorts of conditions like usage or user utility. The business models with tokens and smart contracts are not limited to up-front payment or a paywall. All sorts of new business models will be available for experimental developers.

Users benefit because unlike any other environments, they will have direct access to token capital, investment and real interest in the system. When users use a Facebook messenger bot, they get some utility. When they use an agent on a blockchain they can be rewarded or paid with tokens. Depending on the token economics a user can ‘own’ a stake in the agent or company behind the agent. The more the user uses or evangelises the product, the stronger the product and underlying blockchain gets. Network effects with a direct monetary reward thrown in. In a sense, a user is no longer a passive consumer of a service; they are a stakeholder. This model begins to look more and more like a digital cooperative.

The last stakeholder, and potentially the deciding factor will be regulators and Governments that demand some element of control or access to AI algorithms. The public and political tide are turning against technology companies. Certainly many Governments around the World are waking up to the power amassed by large US-based tech firms through their exploitation of data. Without overselling it, it seems to me that an open-source, auditable data structure would be an ideal technical solution for regulators that want a window into AI decision making and data used to train models. This would at the very least allow scrutiny of training data to check for bias as well as potentially providing an audit trail for exploration if an agent makes a bad decision. It’s not a leap to imagine regulators actually mandating the use of either a public blockchain or demanding a node in private networks for audibility of AI.

Combined with autonomous software

If this scenario plays out you have more developers, more users and happy regulators. There are many different descriptions, I like Autonomous Economic Agents (AEAs), these new types of decentralised AI are the logical next step when autonomous agents start using blockchains. The level of human involvement with the agents will vary; some AIs can be managed by traditional organisations, others will be managed by decentralised autonomous organisations (DAOs). Regardless of the human involvement, the fact is AIs will be accumulating tokens (seen another way, wealth). For example, an autonomous vehicle can be paid in tokens for rides and can pay for re-charging and servicing with tokens. Or an AI DAO managing a neighbourhood distributed energy grid in which energy is exchanged using smart contracts based on real-time supply and demand.

I don’t think many people have truly thought through the implications of this. A non-human and non-human controlled entity will have the ability to acquire resources and wealth. When people talk about exponential growth, this is exactly what they are talking about. Society and politics are simply not ready to even begin a discussion about these sorts of issues. Can an autonomous agent generate wealth? What is the optimal level of taxation that doesn’t act as a disincentive to activity? We already have issues collecting taxes as it is, how and who will collect taxes from an AI DAO?

Blockchain-enabled AI might seem pie in the sky. But unlike say artificial general intelligence (AGI) we know exactly the problems that need to be solved to bring this vision to reality. There are already rudimentary versions of these agents available today.  Blockchains combined with artificial intelligence is more than just a technical innovation: it’s an economic paradigm shift. The political philosophy written in next 10 years will be as important as the socialist and labour movement of the late 20th century.