Towards Advanced Artificial Intelligence using Blockchain Technologies

Jean-Louis Marechaux, IVADO Labs

IEEE Blockchain Technical Briefs, March 2019

A blockchain is a peer-to-peer network based on a shared distributed ledger, with self-executing business rules (smart contract), and where information is validated through a consensus mechanism. Information logged in a blockchain can never be modified or erased (the immutability aspect of a blockchain). New blocks created over time are digitally signed and added to the end of the chain. A blockchain can be public (anyone can join the network) or permissioned (only vetted participants are accepted). Multiple industries are considering blockchain to address some of their pain points that are difficult to solve with other technologies, specifically around the management of trusted transactions. The potential of blockchain is huge, but most obvious use-cases are around collaboration, traceability, automation, and security.

Artificial intelligence (AI) is meant to simulate human intelligence in order to interpret events and automate the decision-making process. Machine learning (ML), a subset of AI, is a discipline that focuses on training systems instead of programming them. An AI model learns from massive amount of historical data, provide predictive services based on past behaviors, and improves over time to become more accurate and efficient. If AI is becoming so popular in multiple industries, it is because the science behind it recently evolved, but also because new technologies are unlocking the potential of machine learning.

Blockchain mostly emerged from bitcoin, the decentralized cryptocurrency that appeared in 2009. If blockchain was initially used to record cryptocurrency transactions in a public ledger, it has since been successfully applied to other domains that are not related to financial assets. Non-cryptocurrency use of blockchain is becoming a game changer in supply chain, government agencies, accounting or human resources, and blockchain solutions are deployed by Walmart, Maersk, the UN Refugee Agency or the Government of Estonia [1][2][3][4]. Because artificial intelligence is about the automation of cognitive processes and blockchain the automation of transactions, there are specific scenarios where both technologies can be combined. A blockchain network can provide a decentralized platform to support some advanced AI capabilities.

Blockchain and Data Confidentiality for AI Models

As more and more regulations around the world are enforcing data privacy (e.g. HIPAA, GDPR, PIPEDA), sensitive information management is a major concern for most AI initiatives. How can we deal safely with large amounts of private data in order to train our models? Data anonymization is frequently applied for privacy protection, but the approach may sometime remove useful training information from datasets.

Another way to enforce privacy in machine learning is to feed models with encrypted data. This approach relies on a relatively new technique called homomorphic encryption (HE) [5], where computation can be done on ciphertext. IBM and Microsoft have released homomorphic encryption libraries (respectively HElib and SEAL), and at the NeurIPS conference in 2018, Intel announced a tool to support AI training on encrypted data (HE-Transformer, based on SEAL) [6].

From a blockchain perspective, security is also based on cryptography mechanisms. Some blockchain platforms are already exploring advanced techniques to leverage homomorphic encryption [7]. Using such blockchain and homomorphic encryption technologies, models can be trained without exposing underlying data, which ensures privacy and confidentiality through the machine learning process. The decentralized platform could then be used to support model training and provide privacy-preserving data for machine learning.

Blockchain to Prevent Data Corruption

Another problem in AI is to ensure that models are trained on relevant data. The quality of a model depends on the quality of the input data. And this leads to a major security concern in the AI world, where we must ensure that training datasets are not corrupted. If data used to train a model is modified over time by a malicious actor, AI systems can be flawed and become invalid (model bias). Therefore, consistency and traceability on training datasets is crucial.

To address potential data corruption in machine learning, blockchain technologies can be used at multiple levels. Let's first consider the modification made to a dataset over time. Such a modification can be captured as a blockchain transactions in the ledger. The blockchain ledger will always keep a record of who updated the data (though the digital identity of the party involved). Moreover, the blockchain consensus mechanism ensures that a modification is collectively approved before it is added to the ledger. The distributed aspect of a blockchain ledger makes it more secure than a centralized repository. Suspicious modification can be quarantined and eliminated so that the blockchain only contains an historical record of approved dataset updates.

Figure 1. Immutable list of dataset modifications. Each transaction relates to a data modification.

Now let's consider another aspect of data corruption. Typically, during AI training, a model uses historical data to calculate parameters. The estimated parameters are then saved and will be used later for prediction. Model parameters are just a set of calculated values, and they are not automatically linked to the original dataset. This means that if we discover that data is corrupted long after the model is trained, it is extremely difficult to know which estimated parameters have been corrupted as well, and what was the influence of the corrupted data on the model output. But if AI training is treated as a blockchain transaction, then the ledger will store valuable traceability information. During model training, a transaction record is created to store contextual information in the ledger, such as the training model type (e.g. logistic regression, k-means, random forest), the dataset used (source and version), and the value of the parameters (before and after the training). If it is discovered at some point that data was corrupted or invalid, it is then possible to know in which training the dataset has been used. A blockchain can be leveraged to improve dataset traceability and ML models’ consistency.

Figure 2. A blockchain for training history

Blockchain for Federated Learning

In a typical machine learning initiative, data needs to be collected and gathered in order to train a model. It is quite common to combine multiple datasets from multiple sources, and to enrich private business data with public data (weather, traffic, social media, events). This ML process becomes unrealistic when information owners want to prevent others from downloading datasets. The reason could be to ensure data privacy, to comply to specific regulation, or to keep a competitive advantage.

Federated Learning [8] is a collaborative form of machine learning that could address this issue. Instead of downloading and combining datasets to train a unique model, multiple models are trained at the source, then models are combined to create the final model. A blockchain can provide a platform to support federated learning on distributed data sources. Multiple blockchain nodes can access their own, decentralized data and train a model. Then the training outcome of multiple participants is assembled to create a global model. This distributed training process ensures that only data owners have direct access to private or sensitive information. But it also allows to train models on datasets that would not have been available in a centralized learning environment.

Blockchain for Explainable AI

Machine learning systems are usually quite opaque, based on a “black box” that consumes data to provide a result without really explaining the rationale behind the process. Explainable AI (XAI) [9] is a recent field of interest in the AI world, where the idea is to provide more transparency to increase users confidence in AI systems. A lot of people believe that XAI is needed for a widespread adoption of AI because we, humans, tend to distrust what we don’t understand. Moreover, with the advent of AI-powered systems in several industries, it is becoming critical to trace and explain AI decisions from a legal and ethical perspective. XAI is not an easy concept because explainability is difficult to define and is quite subjective. As user of and AI system, what do I really need to understand? Should it be the basic building blocks of the cognitive process, or the specific underlying mathematical models? There is no universal answer to this question, and each AI consumer, depending on the situation may be looking for different information.

A blockchain provides a system of proof where transactions are logged, timestamped, and signed. If it is not an answer to all XAI needs, blockchain can at least be used to provide some traceability on AI-powered system. With a blockchain-enabled environment, it is possible to link a specific AI output to all the different steps involved in the decision process. Or to trace back to the training and test datasets in order to understand which specific piece of information was the most influential on the end result. A blockchain could provide transparency, reproducibility and traceability for better AI explainability, governance, and transparency.

Figure 3. Blockchain for advanced AI capabilities

Blockchain is a promising platform to support advanced AI capabilities (see Figure 1). Some core capabilities of a blockchain framework, such as the distributed immutable ledger, the peer-to-peer network, and the smart contracts can be leveraged to enable collaboration, traceability, automation, and security. As AI is maturing, it can leverage blockchain technologies to improve sensitive information management, model consistency, distributed training, and overall explainability.

References

[1] Maersk and IBM Introduce TradeLens Blockchain Shipping Solution, https://www.maersk.com/-/media/ml/press/2018/20180809/final---maersk-and-ibm-introduce-tradelens-blockchain-shipping-solution.pdf. Accessed March 2, 2019.

[2] Walmart Food Safety, https://corporate.walmart.com/our-story/global-ethics-compliance/food-safety. Accessed March 2, 2019

[3] Russ Juskalian, “ Inside the Jordan refugee camp that runs on blockchain”, MIT Technology Review, https://www.technologyreview.com/s/610806/inside-the-jordan-refugee-camp-that-runs-on-blockchain/. Accessed March 2, 2019

[4] Estonian blockchain technology, https://e-estonia.com/wp-content/uploads/faq-a4-v02-blockchain.pdf. Accessed March 2, 2019.

[5] C. Gentry, S, Halevi, "Implementing Gentry’s Fully-Homomorphic Encryption Scheme". Available: https://researcher.watson.ibm.com/researcher/files/us-shaih/fhe-implementation.pdf, 2010.

[6] F. Boemer, Y. Lao, C. Wierzynski, "nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted data", Available: https://arxiv.org/pdf/1810.10121.pdf, 2018.

[7] G. Zyskind, O. Nathan, A, Pentland, "Enigma: Decentralized Computation Platform with Guaranteed Privacy", MIT Media Labs https://enigma.co/enigma_full.pdf

[8] J. Konecný et Al, "Federated Learning: Strategies for Improving Communication Efficiency", Available: https://www.maths.ed.ac.uk/~prichtar/papers/federated_communication_NIPS16.pdf

[9} DARPA, "Explainable Artificial Intelligence (XAI)". Accessed February 10, 2019.

Jean-Louis Marechaux is a Fellow Architect at IVADO Labs with a focus on AI-powered solutions for the supply chain industry. Until 2018, he was the Head of Technology at JDA Labs, the R&D entity of JDA Software. In this role, he was leading teams and research initiatives to assess the viability and the applicability of emerging technology to the supply chain and retail industry. Jean-Louis was first involved in software delivery projects in 1997. Part of the development teams, he has been developing applications first, and then has been acting as a coding architect. Through many projects in multiple industries (bank, finance, transportation, government, software), Jean-Louis acquired extensive skills in web technologies and Agile software development. Then he worked more than 10 years at IBM Software Group where he had different positions such as Software Engineer, Solution Architect, Community of Practice Lead, and Worldwide Cloud Advisor.

As part of his technical evangelist role at IBM, Jean-Louis has been a speaker at many conferences, and published several articles on emerging technologies. He is a member of IEEE, the IEEE Computer Society, and the IEEE Blockchain Community.

Editor:

Andrew Lippman is a Senior Scientist at MIT and founding associate director of the MIT Media Lab. He got his BS and MS at MIT, and PhD at EPFL, Lausanne. He has worked for 45 years on personal computing, networking and interactive systems. In the 1980s he directed the “Movie-Map” project that presaged Google's streetview. He helped pioneer visual computing and communications systems such as MPEG and digital HDTV. His current research group addresses Viral Communications, systems that are often peer-to-peer and can grow organically through adoption rather than a priori agreement. He has studied blockchains and digital currency for six years. Some recent work involves developing personal networks for social action and a blockchain-based identity control system for medical records.