# DBT安裝簡略指引 (含VM掛載以及linux command line) # 安裝dbt的前置作業 ## 掛載VM (Windows系統) ### 安裝 Virtualbox, Vagrant, Git Bash + [Virtualbox](https://www.virtualbox.org/wiki/Downloads), + [Vagrant](https://www.vagrantup.com/), + [Git Bash](https://git-scm.com/downloads) + C槽下新增 workspace 資料夾,設定成快速存取 + workspace資料夾下右鍵 git bash,即可進入終端機 (黑螢幕) ### Vagrant初次啟動 (終端機) + 在終端機輸入 ```console vagrant init # workspace資料夾會新增一個vagrantfile文字檔 ``` + 更動 Vagrantfile 檔案內容,刪除原本內容,貼上下面這段: ```ruby= # -*- mode: ruby -*- # vi: set ft=ruby : Vagrant.configure("2") do |config| # Maybe I will try vagrant box from https://roboxes.org/ later. config.vm.box = "bento/ubuntu-20.04" config.vm.box_version = "202112.19.0" # for vagrant plugin vagrant-disksize config.disksize.size = '50GB' # All configurations use 10.20.30.40 as entry. config.vm.network "private_network", ip: "10.20.30.40" # Create a forwarded port mapping which allows access to a specific port config.vm.network "forwarded_port", guest: 3000, host: 3000 config.vm.network "forwarded_port", guest: 10080, host: 10080 config.vm.provider "virtualbox" do |vb| vb.cpus = "4" vb.memory = "8192" end # Let timezone in guest be same as host # Install timezone plugin: # $ vagrant plugin install vagrant-timezone if Vagrant.has_plugin?("vagrant-timezone") config.timezone.value = "Asia/Taipei" end end ``` + 回黑螢幕畫面輸入: ```console vagrant up # 喚醒vagrant (i.e.開機) ``` + It's very common for poping out error messages after the **vagrant up** command, please read the hint on your screen, and follow the instruction. 1. unknonw cononfigureation section 'disksize' + solution ```console # Unknown configuration section 'disksize' vagrant plugin install vagrant-disksize ``` 2. Not in a hypervisor...... + solution: ask IT for help ```console Not in a hypervisor partition # VT-x is disabled in the BIOS for all CPU modes # Details: code E_FAIL (0x80004005), component ConsoleWrap, interface IConsole Please ask IT for help ``` + 靈活使用 ubuntu command line + vim ~/.bashrc ## 安裝python, postgres 以及dbt 1. python & pip ```console sudo apt install python3-pip # install python and pip ``` + If error messages pop out: + #E: Failed to fetch http://us.archive.ubuntu.com/ubuntu/pool/main/l/linux/linux-libc-dev_5.4.0-91.102_amd64.deb 404 Not Found [IP: 91.189.91.38 80] + #E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? + Try the following ```console sudo apt-get sudo apt install python3-pip ``` ```console pip # test if pip is installed or not ``` 2. 安裝postgres ```console sudo apt update sudo apt install postgresql postgresql-contrib sudo systemctl start postgresql@12-main sudo -i -u postgres #進入postgres測試是否安裝成功 exit #離開 ``` 3. 安裝dbt ```console pip install dbt-postgres ``` + 會出現 warning message (not on PATH...) + solution: 用vim編輯器打開 .bashrc ,修改PATH變數 ```console vim ~/.bashrc ``` 按<kbd>i</kbd>進入編輯模式 ```vim #依照warning message 修改PATH variable PATH=$PATH:/home/vagrant/.local/bin # 或是在最後一行加上 export PATH=$PATH:/home/vagrant/.local/bin ``` 按<kbd>:</kbd><kbd>q</kbd>離開 ```console # source 修改後的 .bashrc script source ~/.bashrc # 確認能不能執行 dbt -h ``` ## 其他工具安裝 (optional) ### 安裝指令 + **tree** ```console sudo apt-get install tree # install tree ``` + **cq** + purpose: yaml轉csv ```bash= wget https://github.com/markus-wa/cq/releases/download/v0.1.3/cq- native-linux #下載cq ls -x cq-native-linux # 查看cq檔案權限 chmod +x cq-native-linux # 將cq檔案變更成可執行 cp --help # 確認有無成功 mv cq-native-linux .local/bin/cq # move to a directory ``` + **vim yaml formatter** + purpose: 方便yaml檔編輯 (convenient for dbt) + Step1: install plugin ```console curl -fLo ~/.vim/autoload/plug.vim --create-dirs https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim ``` + Step2: creat a file called **.vimrc** under root directory ```console vagrant@vagrant:~$ vim .vimrc ``` + Step3: copy and paste the following codes to .vimrc ```vim autocmd FileType yaml setlocal ts=2 sts=2 sw=2 expandtab indentkeys-=0# indentkeys-=<:> foldmethod=indent nofoldenable call plug#begin() Plug 'tarekbecker/vim-yaml-formatter' call plug#end() let g:yaml_formatter_indent_collection=1 ``` # 啟動DBT專案 + 學習資源 (tutorial) ```bash= # sudo apt update # sudo apt install gh git clone https://github.com/dbt-labs/dbt-learn-demo.git git clone https://github.com/josephmachado/simple_dbt_project.git git clone https://github.com/dbt-labs/jaffle_shop.git ``` ## 創建 dbt 專案 ### 0. Environment and profile setting + create a postgres database ```console= # 切換到使用者postgres vagrant@vagrant:~$ sudo -i -u postgres # 進入到postgres資料庫內部 postgres@vagrant:~$ psql ``` ```sql= --建立database create database database_name ; ``` 按<kbd>\ </kbd><kbd>q</kbd>離開 ```console= #切換回原本的使用者 postgres@vagrant:~$ exit ``` + create a new direcotry called .dbt then create profiles.yml ```console vagrant@vagrant:~$ mkdir .dbt vagrant@vagrant:~$ vim profile.yml ``` 按<kbd>:</kbd><kbd>w</kbd><kbd>q</kbd>離開 ### 1. Set up a dbt project ```console vagrant@vagrant:~$ dbt init your_project_name ``` ### 2. Edit profiles.yml ```yaml= # example profiles.yml jaffle_shop: target: dev ouputs: dev: type: postgres host: localhost user: postgres password: postgres port: 5432 dbname: jaffle_shop schema: dbt_alice threads: 4 ``` + dbt debug warning/error messages: + dbt_project.yml variable changes + password changes ```yaml= # dbt_project.yml name: 'jaffle_shop' version: '0.1' profile: 'jaffle_shop' config-version: 2 ## "source-paths" becomes "model-paths" ## model-paths: ["models"] analysis-paths: ["analysis"] test-paths: ["tests"] ## "data-paths" becomes "seed-paths" ## seed-paths: ["data"] macro-paths: ["macros"] target-path: "target" clean-targets: - "target" - "dbt_modules" - "logs" models: jaffle_shop: materialized: table staging: materialized: view ``` ```bash= # password changes $ sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres';" ``` ```bash= # 產出html文件 $ dbt docs generate # host with server $ dbt docs serve # http://10.20.30.40:8080 ``` # Git repository and Git command ## 環境設定 ### config設定 ```command # 設定信箱 vagrant@vagrant:~$ git config --global user.email "h.h.chien@yandex.com" # 設定名稱 vagrant@vagrant:~$ git config --global user.name "Kevin" ``` ### ssh Key設定 + purpose:允許現在的主機與github server 通訊 + Step: + Go to this [website](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) for ssh generating and git settings + or follow the steps below ```command vagrant@vagrant:~$ ssh-keygen -t ed25519 -C "your-email@domain.com" #生成ssh key: private key and public key,ouput的圖型不重要 vagrant@vagrant:~$ eval "$(ssh-agent -s)" #啟動 ssh-agent vagrant@vagrant:~$ ssh-add ~/.ssh/id_ed25519 #確認ssh agent 能否正常運作 vagrant@vagrant:~$ cat ~/.ssh/id_ed25519.pub #印出public key,並將結果貼到git網站設定裡面的ssh上 vagrant@vagrant:~$ ssh -T git@github.com #測試是否連線成功 ``` ## 專案啟動 + 在本機啟動該專案的git版本控制 ``` console ## 將目前的路徑作為專案 vagrant@vagrant:~/your_project_dicretory$ git init ## 所有的檔案 add vagrant@vagrant:~/your_project_dicretory$ git add * ## commit vagrant@vagrant:~/your_project_dicretory$ git commit -m "type any message you want" ## 將Master分支改名為main分支 git branch -M main ``` + github 遠端儲存以及版本控制 + 先在 github 上建立該repository + 然後在終端機輸入 ```console vagrant@vagrant:~/your_project_dicretory$ git remote add origin git@github.com:your_user_name/your_project_name.git vagrant@vagrant:~/your_project_dicretory$ git push --set-upstream origin main ``` ```bash= ## 不需要同步的檔案 (個資、數據etc) vim .gitignore ## 在github上建立專案 ``` ## 常用git 指令 ### 分支相關 ```console git branch -a #確認現有全部有哪些分支 git branch new_branch_name #開設新分支 git checkout branch_name #切換分支 # 如果github上有main & develop兩個分支,可以直接切換,不需要先branch git chekcout develop # 從github上的main分支複製檔案下來 git pull # 將檔案推上github上的main分支 git push # 合併分支 git merge branch_name git rebase branch_name ``` ### add and commit ```console git add -A git commit -m 'any message' ``` # DBT (Data Build Tool) + 學習資源 (tutorial) ```bash= # sudo apt update # sudo apt install gh git clone https://github.com/dbt-labs/dbt-learn-demo.git git clone https://github.com/josephmachado/simple_dbt_project.git git clone https://github.com/dbt-labs/jaffle_shop.git ``` ## 創建 dbt 專案 ### 0. Environment and profile setting + create a database ```sql= create database database_name ; ``` + create a new direcotry called .dbt then create profiles.yml ```console vagrant@vagrant:~$ mkdir .dbt vagrant@vagrant:~$ vim profile.yml ``` 按<kbd>:</kbd><kbd>w</kbd><kbd>q</kbd>離開 ### 1. Set up a dbt project ```console vagrant@vagrant:~$ dbt init your_project_name ``` ### 2. Edit profiles.yml ```yaml= # example profiles.yml jaffle_shop: target: dev ouputs: dev: type: postgres host: localhost user: postgres password: postgres port: 5432 dbname: jaffle_shop schema: dbt_alice threads: 4 ``` + dbt debug warning/error messages: + dbt_project.yml variable changes + password changes ```yaml= # dbt_project.yml name: 'jaffle_shop' version: '0.1' profile: 'jaffle_shop' config-version: 2 ## "source-paths" becomes "model-paths" ## model-paths: ["models"] analysis-paths: ["analysis"] test-paths: ["tests"] ## "data-paths" becomes "seed-paths" ## seed-paths: ["data"] macro-paths: ["macros"] target-path: "target" clean-targets: - "target" - "dbt_modules" - "logs" models: jaffle_shop: materialized: table staging: materialized: view ``` ```bash= # password changes $ sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres';" ``` ```bash= # 產出html文件 $ dbt docs generate # host with server $ dbt docs serve # http://10.20.30.40:8080 ``` ## 小金庫計畫 ### Init a project and Create database ```bash= # init a project $ dbt init bank # To see what we should do in the next step $ dbt debug # create and modify the dbt_project.yml and profile .yml $ vim project, pofile # create a database called bank $ createdb bank ``` ### Ubuntu, vim 以及 git 指令 + Ubuntu ```bash= # linux # 返回上一層 $ cd .. ``` + vim ```vim # vim ## 開啟兩個垂直並列視窗 ## :Vex ## 儲存並離開 ## :wq ## 顯示行位 :set nu #進入編輯模式 i #離開編輯模式 esc #離開vim :q # 強制結束vim :q! https://stackoverflow.com/questions/27635787/vi-changing-tabs-to-comma-in-last-n-columns-of-a-file #1,$(全部) # s (尋找) # \t # g (global) :1,$ s:\t:,:g ``` + postgresql ```sql= ``` + vagrant ```bash= $ vagrant upload source destination ``` ```dbt= dbt --full-refresh ``` ## Knowledge about database + operating database + data warehouse + star schema ```console cat aaa.tsv cat aaa.tsv|tr-% > bbb.tsv ``` # Operational Database ## Edit the form of csv file ``` bash= # Change the format of csv (old regisform1) cat input.csv | sed -e '1,4d' | awk ' BEGIN {FS=","; OFS=","} {print $3, $5, $11, $13, $2}' # Change the format of csv (old regisform2) cat input.csv | awk ' BEGIN {OFS = ","} {print $3, $5, $11, $13, $2}' # Change the file to a sql query cat input2.csv | sed -e "s/,/','/g" | sed -e "s/^./'&/g" | sed -e "s/$/'/g" > input3.csv cat input3.csv | awk 'BEGIN {FS=","; OFS=","} {print "INSERT INTO students (id, student_name, student_birthday, telephone, classroom_id) VALUES (DEFAULT," $1, $2, $3, $5");"} ' # awk 編輯 FS = ","; OFS = "," ``` ```bash= # Create an (operatioinal) database createdb kuops_dev # 第一次 執行 clojure lienen vim dev-config.edn leinen run # 把 Lauren的 table schema migrate 到自己的schema leinen run migrate # check postgres -i -u postgres psql # insert sudo -i -u postgres psql -d kuops_dev -f input4.csv PGOPTIONS=--search_path=dbt_develop psql -v ON_ERROR_STOP=1 -d validator -f ops_student.sql ``` ```sql= /* Database Management */ create user username ; /* create user */ CREATE GROUP marketing WITH USER jonathan, david; grant connect on database database_name to username ; grant usage, create on schema schema_name to username ; GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA schema_name TO username; ``` + 批量刪除 ```sql= do $$ declare add integer := (SELECT count(*) FROM operation.ops_enrollment WHERE date(created_at) = current_date); begin while add > 0 loop raise notice 'Delete 10 rows'; delete from operation.ops_enrollment where enrollment_id in (SELECT enrollment_id FROM operation.ops_enrollment WHERE date(created_at) = current_date limit 10); add := add-10; end loop; end$$ ; ```