NVIDIA / Parabricks / GenomeWorks === ###### tags: `Parabricks-v3.1` ###### tags: `基因體`, `NVIDIA`, `Clara`, `Parabricks`, `二級分析` <br> [TOC] <br> ## [Github] [clara-parabricks](https://github.com/clara-parabricks) / [GenomeWorks](https://github.com/clara-parabricks/GenomeWorks) ### [總覽](https://github.com/clara-parabricks/GenomeWorks#overview) - 用於生物序列分析的 GPU 加速函數庫 - 有 3 個模組 - **cudamapper** - CUDA-accelerated sequence to sequence mapping 基於 CUDA 加速的序列到序列映射 - **cudapoa** - CUDA-accelerated partial order alignment 基於 CUDA 加速的偏序比對 - **cudaaligner** - CUDA-accelerated pairwise sequence alignment 基於 CUDA 加速的雙序列並列分析 ### [下載整份原始碼(source code)](https://github.com/clara-parabricks/GenomeWorks#clone-genomeworks) - #### 最新版本 ```bash git clone --recursive -b master \ https://github.com/clara-parabricks/GenomeWorks.git ``` - 目前最新版本是 GenomeWorks-0.5.1 - #### 開發版本 ```bash git clone --recursive \ https://github.com/clara-parabricks/GenomeWorks.git ``` ### 系統基本需求 (編譯原始碼的基本工具) - #### Ubuntu 16.04 or Ubuntu 18.04 ```shell $ lsb_release -a No LSB modules are available. DistributorID: Ubuntu Description: Ubuntu 16.04.7 LTS Release: 16.04 Codename: xenial ``` - #### CUDA 9.0+ (official instructions for installing CUDA are available here) ```shell $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 ``` - #### gcc/g++ 5.4.0+ ```shell $ gcc --version gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` - #### Python 3.6.7+ ```bash $ python --version Python 3.6.12 ``` - #### CMake (>= 3.0) ```bash $ cmake --version cmake version 3.18.2 CMake suite maintained and supported by Kitware (kitware.com/cmake). ``` - Ubuntu 16.04 所安裝的 cmake 不夠新 ```bash $ sudo apt install cmake $ cmake --version cmake version 3.5.1 ``` - [How to upgrade cmake in Ubuntu [duplicate]](https://askubuntu.com/questions/829310/how-to-upgrade-cmake-in-ubuntu) - 檢查現在的 cmake 版本 ```cmake --version``` - 移除舊版的 cmake ```sudo apt remove cmake``` - 下載最新的 cmake 套件 - https://cmake.org/download/ - Linux x86_64 - [cmake-3.18.2-Linux-x86_64.sh](https://github.com/Kitware/CMake/releases/download/v3.18.2/cmake-3.18.2-Linux-x86_64.sh) - 並放置在 ```/opt/``` - 執行安裝 cmake 套件 ```sudo bash /opt/cmake-3.18.2-Linux-x86_64.sh``` - 建立全域指令 (symbolic link) ```sudo ln -s /opt/cmake-3.18.2-Linux-x86_64/bin/* /usr/local/bin``` ### [C++套件:編譯 c++ 原始碼,並安裝](https://github.com/clara-parabricks/GenomeWorks#genomeworks-setup) - #### 編譯 & 安裝 (安裝了什麼?) ```bash mkdir build cd build cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install -Dgw_cuda_gen_all_arch=OFF make -j install ``` - 輸出結果 ``` GenomeWorks$ tree build/install build/install ├── benchmarks │   ├── cudaaligner │   │   └── README.md │   └── cudapoa │   └── README.md ├── bin │   ├── cudamapper │   └── cudapoa ├── cmake │   ├── cudaaligner.cmake │   ├── cudaaligner-release.cmake │   ├── cudamapper.cmake │   ├── cudamapper-release.cmake │   ├── cudapoa.cmake │   ├── cudapoa-release.cmake │   ├── gwbase.cmake │   ├── gwbase-release.cmake │   ├── gwio.cmake │   └── gwio-release.cmake ├── include │   └── claraparabricks │   └── genomeworks │   ├── cudaaligner │   │   ├── aligner.hpp │   │   ├── alignment.hpp │   │   └── cudaaligner.hpp │   ├── cudamapper │   │   ├── cudamapper.hpp │   │   ├── index.hpp │   │   ├── matcher.hpp │   │   ├── overlapper.hpp │   │   ├── sketch_element.hpp │   │   ├── types.hpp │   │   └── utils.hpp │   ├── cudapoa │   │   ├── batch.hpp │   │   ├── cudapoa.hpp │   │   └── utils.hpp │   ├── gw_config.hpp │   ├── io │   │   └── fasta_parser.hpp │   ├── logging │   │   └── logging.hpp │   ├── types.hpp │   ├── utils │   │   ├── allocator.hpp │   │   ├── cudasort.cuh │   │   ├── cudautils.hpp │   │   ├── device_buffer.hpp │   │   ├── device_preallocated_allocator.cuh │   │   ├── exceptions.hpp │   │   ├── genomeutils.hpp │   │   ├── graph.hpp │   │   ├── limits.cuh │   │   ├── mathutils.hpp │   │   ├── pinned_host_vector.hpp │   │   ├── signed_integer_utils.hpp │   │   ├── stringutils.hpp │   │   └── threadsafe_containers.hpp │   └── version.hpp ├── lib │   ├── libcudaaligner.a │   ├── libcudamapper.a │   ├── libcudapoa.a │   ├── libgwbase.a │   └── libgwio.a └── samples ├── sample_cudaaligner ├── sample_cudamapper └── sample_cudapoa 16 directories, 54 files ``` - 兩個命令列工具 cudamapper 和 cudapoa - 其他函式庫(.a檔)、引用標頭檔、還有3隻 demo 程式 - .c/.cpp 檔會產生 .o 檔,一堆 .o 檔可以連接成 .a 檔 <br> - #### 打包成 deb 套件 (為了可攜、易於安裝) ```bash GenomeWorks/build$ make package ``` - GenomeWorks-0.5.1-Linux.deb ```bash GenomeWorks$ find -name "*.deb" ./build/_CPack_Packages/Linux/DEB/GenomeWorks-0.5.1-Linux.deb ./build/GenomeWorks-0.5.1-Linux.deb ``` - 輸出結果 ``` GenomeWorks$ tree build/_CPack_Packages/Linux/DEB/ build/_CPack_Packages/Linux/DEB/ ├── GenomeWorks-0.5.1-Linux │   ├── control │   ├── control.tar.gz │   ├── data.tar.gz │   ├── debian-binary │   ├── md5sums │   └── usr │   └── local │   └── GenomeWorks-0.5.1 │   ├─ 結構同 build/install 資料夾 └── GenomeWorks-0.5.1-Linux.deb 20 directories, 60 files ``` ### Python 套件 - #### cuda 套件主要由 cython 編寫 - pyx 由 Cython 編寫的原始碼(類似 c 語言的 .c 檔) - pxd 由 Cython 編寫的標頭檔(類似 c 語言的 .h 檔) - #### 安裝方式有 3 種 - 直接從 pip 公開套件安裝 - 從原始碼安裝([script](https://github.com/clara-parabricks/GenomeWorks/blob/dev-v0.6.0/pygenomeworks/setup_pygenomeworks.py)) ```bash GenomeWorks$ cd pygenomeworks pygenomeworks$ python setup_genomeworks.py -h ``` - 建立一個 Wheel 套件並安裝 ```bash GenomeWorks$ cd pygenomeworks pygenomeworks$ pip install -r requirements.txt pygenomeworks$ python setup_pygenomeworks.py --create_wheel_only # 查看 Wheel 套件 pygenomeworks$ ls genomeworks_wheel/ genomeworks-0.5.1-cp36-cp36m-linux_x86_64.whl # 安裝 Wheel 套件 pygenomeworks$ pip install \ genomeworks_wheel/genomeworks-0.5.1-cp36-cp36m-linux_x86_64.whl # 查看 pip 的安裝套件 pygenomeworks$ pip list genomeworks Package Version ------------------ ------- ... genomeworks 0.5.1 ... ``` - #### 測試 python 套件 - 執行一行測試 ```python from genomeworks import cudaaligner ``` - 執行 [demo 程式](https://github.com/clara-parabricks/GenomeWorks/blob/dev-v0.6.0/pygenomeworks/samples/sample_cudaaligner) ```bash GenomeWorks$ cd pygenomeworks/samples samples$ ./sample_cudaaligner # python sample_cudaaligner Generating data... Data generation complete. Aligned sequences till 99 Aligned sequences till 199 Aligned sequences till 299 Aligned sequences till 399 Aligned sequences till 499 Aligned sequences till 599 Aligned sequences till 699 Aligned sequences till 799 Aligned sequences till 899 Aligned sequences till 998 ``` 或是帶 ```-p``` 參數 ``` samples$ ./sample_cudaaligner -p ``` - #### genomeworks 套件核心 - 包裝 c++ 檔(?) ```python from genomeworks import cudaaligner ``` - genomeworks/cudaaligner/cudaaligner.pxd 引用 - ```"claraparabricks/genomeworks/cudaaligner/cudaaligner.hpp"``` - ```"claraparabricks/genomeworks/cudaaligner/alignment.hpp"``` - ```"claraparabricks/genomeworks/cudaaligner/aligner.hpp"``` - ```#include <cuda_runtime_api.h>``` - ```/usr/local/cuda/include/cuda_runtime_api.h``` - find "cuda_runtime_api.h" ``` $ sudo find / \ -not -path /var/lib/docker/* \ -name "cuda_runtime_api.h" \ 2>/dev/null ``` <br> <hr> <br> ## [cudamapper](https://github.com/clara-parabricks/GenomeWorks/tree/dev-v0.6.0/cudamapper) - 模組功能 - provides minimizer-based GPU-accelerated approximate mapping 提供最小化器、GPU加速的近似映射 - 測試資料 - [GenomeWorks/cudamapper/data](https://github.com/clara-parabricks/GenomeWorks/tree/dev-v0.6.0/cudamapper/data)