Auto-translation of Regulatory Documents into Smart Contracts

Olivia Choudhury; Murtaza Dhuliawala; Nicholas Fay; Nolan Rudolph; Issa Sylla; Noor Fairoza; Daniel Gruen; and Amar Das, IBM Research, Cambridge, MA

IEEE Blockchain Technical Briefs, September 2018



The evolution of Blockchain 2.0 expanded the scope of this emerging technology beyond cryptocurrency by introducing smart contracts. Although the notion of smart contract was conceived by Nick Szabo twenty years ago [1], it was first implemented on the Ethereum blockchain in 2014 [2]. Smart contracts are self-executing computer programs that implement a set of functionalities, based on business rules, to validate transactions in a blockchain network. Such rules are found in contractual agreements, and include, but are not limited to, defining and enforcing the terms of a contract between parties, keeping a strict and cohesive schedule of deadlines, and allowing change to the original rules, given the consent of the parties involved. Smart contracts can automate such complex business logic by embedding, verifying, and enforcing the contractual clauses of an agreement without intermediaries. However, the translation of business rules written for regulatory purposes to a smart contract specifying a blockchain transaction can be challenging and time consuming. Moreover, business rules are often re-used among different contracts, and redundant effort is needed to generate similar smart contracts.

To address the above-mentioned challenges, we have developed a framework that automatically generates smart contracts from domain-specific business rules in regulatory documents. Such an infrastructure can not only reduce the level of expertise required for and the time and cost incurred in specifying smart contracts, but also support reproducibility. Our framework comprises two parts: (a) extraction of business rules from documents using machine learning and natural language processing techniques, and (b) conversion of extracted rules to smart contract functionalities using domain knowledge, formally represented as ontologies and semantic rules.

To demonstrate our framework, we consider the use case of clinical trials, that involve complex, multi-party interactions. A clinical trial protocol, equivalent to a business agreement, defines a list of pre-approved activities and required actions that must be satisfied by intended stakeholders. One of the major requirements of a protocol is the schedule of activities (SOA), a tabular representation of activities that must be accomplished at each study visit or within an allowable window. We show how our framework extracts SOA rules from a protocol and embeds them into a smart contract for subsequent enforcement and validation.

System Design

Extracting business rules from agreements

As the first step, we employ machine learning and natural language processing (NLP) techniques to extract relevant information that could potentially become rules or constraints for the smart contract. We leverage IBM Watson's suite of cognitive services for entity extraction and map them to rules that apply to them. In some cases, we also need specialized modules for extracting time information to ensure that constraints are time bound when they need to be. We utilize optical character recognition (OCR) and other computer vision techniques to extract SOA tables from PDFs of clinical trial protocols. This information is then processed using NLP in order to gain semantic meaning of the extracted information, which focus on the activities that need to be accomplished on a certain visit and when the visits occur. Through a human-in-the-loop visual interface, we verify the semantics extracted from the table, such as, when visits should occur, which lab tests and procedures must be conducted on those visits, and modifications to a visit. This verified information is then abstracted and passed along to the next step of the framework, as shown in Figure 1. Further details of the extraction of business rules from documents can be found in [3].

Embedding business rules into smart contract

We use standardized knowledge representation, such as ontologies and semantic rules, to model the information extracted in the previous step. An ontology conceptualizes the knowledge of a domain as classes (concepts in a domain), individuals (instances of a class), properties (common characteristics of instances), and relationships (between classes). We design a clinical trial ontology using the popular Web Ontology Language (OWL) [4]. We then follow the Semantic Web Rule Language (SWRL) [5] to express the extracted rules or constraints. SWRL allows writing Horn-like semantic rules [6], containing at most one positive literal, that are built on OWL concepts. The clinical trial ontology and associated semantic rules provide a knowledge base that can be further exploited for reasoning or drawing inference.

The constraints expressed as semantic rules must be incorporated into and enforced by the smart contract. To achieve this, we devise a context-free grammar to parse the required constraints from the rules. For a given domain, such as clinical trial, we create a smart contract template to stipulate the functionalities, based on rules derived from the ontology and protocol. This serves as the skeleton for generating the final smart contract to be used in the blockchain network. We represent the source code of this template in a hierarchical tree structure, called an abstract syntax tree (AST). The AST can be traversed in a depth-first search manner and manipulated to add the parsed constraints procured from the semantic rules. Once updated, we convert the AST into a new smart contract containing the embedded constraints parsed from the rules. This is illustrated in Figure 1. Details of this procedure can be found in [7].

Figure 1

Figure 1: System design of the framework for auto-generating a smart contract from domain-specific business rules.

Discussion and Future Work

Due to limited availability of labeled data, our current method of constraint extraction relies on hand-crafted features and rules. As shown in [3], although we achieve a precision of 0.95 across 20 training and test protocols, we can further improve the predictive capability by creating and leveraging a larger set of training data with annotations. Such a dataset will also help in deriving data-driven rules, thereby making the system more robust and capable of capturing complex rules and relationships. In order to avoid the need for domain expertise in designing ontologies and semantic rules, we will also develop an automated approach of building domain-specific knowledge base. For the purpose of demonstration, we have considered clinical trials and Go language as the use case and programming language, respectively. However, our framework can be easily applied to other use cases and programming languages.

Current efforts in generating a smart contract involve significant technical expertise,time, and cost. They also do not support reproducibility for a given application domain. Our novel framework, based on machine learning formalism, NLP, ontologies, semantic rules, and AST, can extract business rules from regulatory documents and incorporate these constraints into a smart contract. Such automation will reduce the level of inherent complexity associated with blockchain and smart contracts and encourage their adoption across different applications.


[1] N. Szabo, “The idea of smart contracts,” Nick Szabo's Papers and Concise Tutorials, vol. 6, 1997.

[2] G. Wood, “Ethereum: A secure decentralised generalised transaction ledger,” Ethereum Project Yellow Paper, vol. 151, pp. 1{32}, 2014.

[3] M. Dhuliawala, N. Fay, D. Gruen, and A. Das, “What Happens When? Interpreting Schedule of Activity Tables in Clinical Trial Documents,” ACM-BCB’18: 9th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (In Press), 2018.

[4] D. L. McGuinness, F. Van Harmelen, “OWL web ontology language overview,” W3C recommendation, vol. 10, no. 10, p. 2004, 2004.

[5] I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, M. Dean, “SWRL: A semantic web rule language combining OWL and RuleML,” W3C Member submission, vol. 21, p. 79, 2004.

[6] A. Horn, “On sentences which are true of direct unions of algebras,” The Journal of Symboic Logic, 16(1), 14-21, 1951.

[7] O. Choudhury, N. Rudolph, I. Sylla, N. Fairoza, and A. Das, “Auto-Generation of Smart Contracts from Domain-specific Ontologies and Semantic Rules,” IEEE Blockchain Conference, 2018.!GreenCom!CPSCom!SmartData!Blockchain!CIT!Cybermatics2018-



Olivia ChoudhuryOlivia Choudhury is a postdoctoral researcher at IBM Research, Cambridge, MA, USA. She received her PhD degree in Computer Science and Engineering from University of Notre Dame, USA. Her research interests include blockchain technology, federated learning, healthcare informatics, genomics, and distributed computing.



Murtaza DhuliawalaMurtaza Dhuliawala is a research software engineer at IBM Research, Cambridge, MA, USA. He received his Masters in Computer Science with specialization in Artificial Intelligence and Machine Learning from Georgia Institute of Technology, USA. His research and past experience span the areas of machine learning, deep learning, NLP, interactive storytelling, healthcare research, blockchain, finance applications, and developing cognitive, AI applications.


Nicholas FayNicholas Fay is a healthcare data scientist at IBM Research, Cambridge, MA, USA. He received his Masters in Computer Science from Rensselaer Polytechnic Institute, USA. His previous research focused on network science and social media analytics. His current research interest includes application of machine learning, NLP, and wearables in the healthcare space. He is also interested in blockchain, IoT, and application platforms surrounding them.


Nolan RudolphNolan Rudolph is a software engineer. He received his Bachelors of Science in Computer Science and Engineering from Ohio State University, USA. While at IBM Research, he explored the applicability of blockchain technology in healthcare. He now applies data science and programmatic decisioning in the adtech industry through his work at Dataxu.



Issa SyllaIssa Sylla is a research software engineer at IBM Research, Cambridge, MA, USA. He received his Bachelors of Arts in Middle Eastern Studies from Dartmouth College, USA. His research spans geographic disparities in clinical trials, application of blockchain technology within healthcare, and machine learning.



Noor FairozaNoor Fairoza is a DevOps Engineer at IBM Research, Cambridge, MA, USA. She received her Masters in Telecommunication Systems and Management from Northeastern University, USA. Her research interests include application of blockchain technology in healthcare, network engineering, and application of DevOps-Agile in research.



Daniel GruenDaniel Gruen is a Research Staff Member at IBM Research, Cambridge, MA, USA. He received his PhD in Cognitive Science from UCSD, USA. He works on the design of AI-based tools that let practitioners and strategic decision-makers seamlessly incorporate insights from big-data, analytics, visualization, and cognitive systems. His current work focuses on health-related applications, including automatic video understanding and summarization.



Amar DasAmar Das is the Director of IBM's Learning Health Systems team. He received his MD and PhD in Biomedical Informatics from Stanford University. His research focuses on developing new statistical, computational, organizational, and regulatory approaches to the assessment, deployment, and adoption of healthcare solutions. Prior to joining IBM Research, he was a faculty member at Stanford University Medical School and the Geisel School of Medicine at Dartmouth.



Claire-Isabelle CarlierClaire-Isabelle Carlier is an Enterprise Architect at Brookfield Renewable Partners, where she advises various operating businesses across the globe on their strategic technology planning. Her role involves supporting IT-OT convergence and adoption of new technologies. She became interested in blockchain back in 2015 shortly after the Ethereum platform was launched and the ideas of decentralized applications and smart contracts started spreading. Since joining Brookfield in 2017, she has been following closely the evolution of use cases for the energy sector and asset management, and the related market landscape of vendors and products. As member of IEEE Smart Cities Technical Committee, she has also been researching how blockchain could contribute to making cities smarter for citizens, organizations and municipalities.



Subscribe to the IEEE Blockchain Technical Briefs

Join our Blockchain Technical Community and receive our Technical Briefs by email.

Subscribe Now

IEEE Blockchain Technical Briefs Editorial Board

Gora Datta, FHL7, SMIEEE, SMACM, Managing Editor

2022-Q1 Editorial Team
Treena Basu, PhD, Editor-in-Chief
Sanjeev Khagram, PhD
Ujjwal Guin, PhD

2022-Q2 Editorial Team
Prashant Pillai, PhD, Editor-in-Chief
Susan Rea, PhD
Pierangelo Rosati, PhD
Avishek Nag, PhD

2022-Q3 Editorial Team
Sanjeev Khagram, PhD, Editor-in-Chief
Treena Basu, PhD
Samir Bhowmik, PhD
Ujjwal Guin, PhD
Ambar Sengupta, PhD

2022-Q4 Editorial Team
Nicolae Goga, PhD, Editor-in-Chief
Andrei Vasilateanu, PhD
C. Viorel Marian, PhD
Ramona Cristina Popa, PhD
Dumitru Roman, PhD
Anthony Simonet-Boulogne
Dominic Damoah, PhD
Cristi Taslitchi, PhD


View the 2018-2020 IEEE Blockchain Technical Briefs Editorial Board

Best of IEEE Blockchain Technical Briefs

Read the top five most popular IEEE Blockchain Technical Briefs articles of 2018.
Read more (PDF, 731 KB)

Past Issues

Q3 2022

Q2 2022

Q1 2022

January 2020

September 2019

June 2019

March 2019

January 2019

December 2018

September 2018

July 2018