# HealthHash: A Blockchain-based Framework For Electronic Health Records With Sharing Mechanism Joint By Federated Learning
## Abstract
In this project, we propose a blockchain-based solution and framework for healthcare record[1][2] sharing and to authorize access permission in a trusted, secure, and decentralized manner, with no involvement of a centralized trusted entity or third party. Our work provides a front-end page to interact with blockchain which is based on utilizing Ethereum smart contracts to govern and regulate the access permission control functions among the patients and hospital staff. Moreover, our solution leverages the benefits of IPFS (Inter Planetary File System) to store healthcare records on a decentralized file system. Further, our work combines blockchain-based federated learning into our application. We share the results by exchanging the model weights through mining data from authorized healthcare data under a federated learning framework[3].
## Introduction
Access to healthcare services across multiple hospitals or clinics has become more general for diagnosis and treatment, as healthcare services become more specialized and patient mobility increases. By knowing a patient's medical history, physicians can make clinical decisions fastly and achieve more accurate, safe, and effective diagnoses. However, electronic medical records (EHRs) are private and highly sensitive, and there is currently a lack of a secure and trusted data sharing system. Hence, most EHRs still have to be shared by fax or email, making it difficult to obtain and share medical records. Another problem arises when it comes to sharing medical data between hospitas. Although sharing of medical data between hospitals can promote the development of medical technology; however, security problems caused by centralized management are emerging in an endless stream in the medical system. How to share medical data without causing personal leakage, while protecting patient privacy can also promote the development of medical treatment is a major problem faced by the current medical system.
## Related Work
#### Federated learning:
The idea of federated learning solves the problems of not having enough computing power and the scarcity of data on a single machine. These two major problems are solved by having multiple nodes collaborate on training the model at hand, each using their local data. They then share their model updates, realizing a single model trained on the sum of their local data.
#### Smart contract
Smart contract are known as the piece of code that is used to perform any task on the blockchain. This piece of code is executed when the users send the transactions. They run on the blockchain directly, thus making themselves secure from any kind of tampering and alterations.
#### IPFS
IPFS is a protocol that uses peer-to-peer network for data storage. It provides secure data storage by using a cryptographic identifier that protects the data from alteration. All the data files stored on IPFS contain a hash value that is generated cryptographically, which is unique and is used for identification of stored data file on the IPFS. The cryptographic hash generated could be stored on the decentralized application to reduce the exhaustive computational operations over the blockchain. As a result, the IPFS protocol makes it a favorable choice for storing critical and sensitive data.
## Problem Formulation and Our Solution
In this section, we introduce our approach for designing a system that closely realizes a patient-centric and knowledge-sharing system. In pursuit of that, two principal components were set forth to serve as a basis for features and solutions employed in the system.
Furthermore, we present a blockchain based federated learning framework to tackle common challenges in decentralized federated learning and medical data sharing as well. This framework offers a solution to utilize medical data for development while protecting the privacy of the owner.
### Manage EHR using Smart Contract
EHR management faces lots of technical difficulties. For instance, central medical servers are low in capacity, susceptible to single-point failure, and vulnerable to insider attacks. Even patients do not know exactly where their sensitive data is being stored and how it is shared. Along with ensuring confidentiality, integrity, and availability of health information, it makes sure that healthcare providers and other authorized individuals have access to it.
The above properties can be achieved through smart contracts with identity-based access control powered by Ethereum, thanks to its properties like immutability, transparency, security, and incentive mechanisms. On the other hand, EHRs contain sensitive personal data (e.g., medical history of patients). As a result, storing and distributing EHR data is challenging. Simply storing EHR on Ethereum through a smart contract is not a feasible way due to the high gas fee, so we seek for a hybrid manner by integrating IPFS into our system to provide storage service. Further, thanks to cryptography, we can secure EHR with theoretical protection against cyber-attack by an encryption algorithm(AES-256).
### Knowledge Sharing with Privacy-Preserving under Blockchain Security
When patient authorizes the EHR to the hospital, it does not mean that the authorized doctor or hospital can abuse the trust between each other. Therefore, the need for information exchange between authorized identities becomes a significant problem.
Thanks to federated learning which has distinct privacy advantages compared to data center training on persisted data, we exchange information instead of whole medical data with minimal updates necessary to improve a particular model. Further, relying on federated learning alone cannot guarantee the security of the model under the decentralized architecture, private blockchain is then added to facilitate uploading and tracking updates, reward the trainer and validator, and to make the updates immutable and secure.
### Blockchain based Federated learning

##### System overview
Our design of federated learning distributed system is driven by two main roles: trainers and validators. Each validator will be assigned with several trainers, and the trainers will pass their trained models to the validator for validation via https endpoint. Each validator will select a trainer with the highest trust score among the trainers and compete with other validators with his model. Validators will decide which model to use between them by voting consensus. The winning model will be saved on the blockchain and broadcasted. Then, other validators can get the latest model from the blockchain, and pass the new model to the trainer underneath, thus completing the synchronization of the model while maintaining the quality of the model.
##### Blockchain's Role in our framework
Blockchain is the main fabric of communication and model synchronization across all validator nodes. It is used to facilitate uploading and tracking updates and to make the updates immutable and secure. To ensure model consistency across all validators, our consensus algorithms are mainly based on voting and elections which utilize Practical Byzantine Fault Tolerance (pBFT).
It is built on top of the blockchain framework Exonum that uses a variant of pBFT consensus algorithm. The primary reason behind this choice is that the voting-based consensus of Exonum is much more lightweight than the typical cryptographic PoW consensus. This lightweight consensus becomes even more crucial when the already computationally heavy nature of model training is taken into consideration.
##### Trust Score mechanism
We employ a reward-penalty policy to decide which model the validator should choose. Each trainer i is assigned a trust score $φ_i$ such that:
\begin{gather}
0 ≤ φ_i ≤ 1\\
\sum_{i=1}^{n} φ_i = 1
\end{gather}
A trainer’s trust score is used as the weight factor for that trainer updates in the federated learning algorithm. Trust scores are adjusted based on validation results. This enables validators to control the impact of trainer’s updates on the model based on its trust level. In addition, it creates a competition in the system since any rise in one trainer’s score causes a decline in other trainers’ scores. Upon receiving a trainer's update, validator adds its gradients to the latest model weights, and computes its validation score. After that, it updates the trainer’s trust score based on whether it leads to improvement or decline.
## Results
#### System Architecture

#### Blockchain based Federated learning

## Discussion
### Trainers-to-Validators Ratio
There is a trade-off between the number of trainers and validators. Increasing the number of trainers with good quality data should increase the model performance. Greedily, system would want all the nodes to be trainers, however, this has a main drawback. Relying on a single validator means that the validator is a trusted entity. Relying on multiple validators increases the trustworthiness of the model because of consensus. However, we want to refrain from having too many validators because this wastes computing resources that could have been utilized in training and it takes longer for a new version of model to be released.
### How To Collate Granted Data
We formulate our system with two components, the permission mechanism and the information-sharing mechanism. The issue comes from how we collate each data granted by the patient to be trained by federated learning, how these data are stored, and how to destroy these data we've collated once the patient revokes the permission, is our future work.
## Reference
[1] Ayesha Shahnaz, Usman Qanar, Ayesha Khalid(Member, IEEE). Using Blockchain for Electronic Health Records
[2] Q. Gan and Q. Cao, Adoption of electronic health record system: Multiple theoretical perspectives
Jan. 2014, pp. 2716–2724.
[3] Mohamed Ghanem, Fadi Dawoud, Habiba Gamal, Eslam Soliman, Tamer El-Batt, Hossam Sharara. FLoBC: A Decentralized Blockchain-Based Federated Learning Framework