# A Collaboration Platform for AI training ### *What's this and why we need it ?* While people working on Deep Learning (DL) project, usually we rely on public dataset such as ImageNet image, COCO image or TIMIT dataset to train with our DL network models for research purpose, and also more dataset you have, more accuracy of Deep learning network model, which is meaning most of time we face a issue: need more dataset to train our DL network model to imporve lost and accuracy, or develop a deeper layer with limited dataset. Companys like Facebook or Google collect lots of personal data by offering free application and free disk space to help people to upload or save their personal data such as photos, text or file to Internet. This also help tech company know more your action or habit than others as there are bunch of personal data uploaded by you. Data is oil for AI world, but we don't want to lost control of our peronsal data, how to control dataset but still help to improve AI model? ### *Yes, we can combine Blockchain solution and Deep Learning together!!* Public Blockchian provide a way to help to claim data we generated as owner through Blockchain transaction log, also we can give read-only permission to others who provide computing resource to help to train AI model without lost control of our own data, such as health info or Apple watchOS's heart rate data we generated.**[Figure 1]**. Trained AI model with data we own can help imporve AI to predict our behavior more accurately or send message to us in advance if AI found abnormal case observed. Even that with AI camera design, we can use our own personal photo to imporve AI camera to know if a thief or family people knock door. ```mermaid graph LR AIresearch[AI research] -->SM1{Smart Contract} SmartOracle((Smart Oracle)) SM1 --> |resource eval| D[Miner Pool 1] SM1 --> |resource eval| E[Miner Pool 2] SM1 --> |resource eval| F[Miner Pool 3] SM1 -- result validate ---SmartOracle ``` Figure(1) --- With this AI collaboration platform on Blockchain, we also host a Smart Contract to rent Hardware resources(GPU,FPGA or other accelators) to help to train Deep learning model provided by Miner Pool, As AI reasearch who willing to pay Token or ETH coin on this, usually the Miner in Blockchain loaded with multiple GPU card as default config, Miner can decide if they like to play as miner for Blockchain transaction log calculation or providing GPU/FPGA/Accelerator device as computing power **[Figure 2]** for training DL network model. In the end, smart contract will be executed automatically if condition(ex: Model accuracy) meets AI research criteria. **[Figure 3]** --- ```mermaid graph LR MinerPool1[Miner Pool] MinerPool1 -->|mining| D[Miner 1] MinerPool1 -->|mining| E[Miner 2] MinerPool1 -->|training| F[GPU/FPGA miner 1] MinerPool1 -->|training| G[DL accelerator miner 2] ``` Figure(2) ```mermaid graph LR Resource1[GPU/FPGA/AI Hardware renter] Resource1 -->|launch| A[DL framework on Docker] Resource1 -->|DL model file | MinerPool1[Miner Pool] SmartOracle((Smart Oracle)) --> |Eval accuracy| MinerPool2[Miner Pool] ``` Figure(3) --- ## Design: With this whitepaper, we need to deploy SmartContract on Blockchain, as this providing Miner pool to know total GAS or computing power it needs to train DL model provided by AI researcher/Company. From AI researcher/Company it provide the algorithm/mode or Docker image for specific purpose. As for dataset it could be public info or private dataset. Miner pool also collect inventory info about GPU IOPs capability that provided by miner as inventroy and dispatch request and collaborate on DL trainning between miner(ex: Multi host/GPGPU or single GPGPU traning). Besides, miner pool may use IPFS(*) as centeral file system for miner to upload trained model file (ex: ckpt or HDF5 file) and evalute its accurency and lost value as those info can be queried by Smart Oracle to check if threshold meet originator needs. --- * ### Smart Contracts & Oracle design (TBD) Blockchains can't access data outside chain network and that's why we need Oracle as data service provider to smart contracts on blockchain, oracle it provide needs and push into blockchain. Data need by smart contracts could be weather, payment results..etc. When particular value is reached, the smart contract changes its state and executes the programmatically predefined algorithms, automatically triggering an event on the blockchain. - [x] public dataset training - [ ] Private dataset training (bitmark) --- * ### Miner pool design(TBD) --- * ### DL model Training In this section, we will show example how Miner can train DL model by using Docker file or pull docker image from Hub. Here is repository exapmle to run Tensorflow environment based on Nvidia GPU with CUDA environment on NMT example. https://github.com/chesterkuo/DeepLearning_Docker * ### DL model benchmark In order to evalute each minder hardware and software stack solution, we need have a benchmark tool major for DNN network such as RNN, Covnolution performance on each stack, this can help to know each minder pool's capacability and resources, this is same idea on Bitcoin hashrate comparision, more higher more better for DL training. https://github.com/baidu-research/DeepBench --- #### Reference: BitMark https://bitmark.com/ Golem Network https://golem.network/ Oracle type https://www.ledger.fr/2016/08/31/hardware-pythias-bridging-the-real-world-to-the-blockchain/ What's oracles https://blog.oraclize.it/understanding-oracles-99055c9c9f7b #### Contributor: Chester kuo (chester.kuo@gmail.com) Jeff Lee (cvkkjeff@gmail.com) Seven Bai (sevenbai@gmail.com)