###### tags: `Hadoop` # CDH 部署步驟 :information_source: 何謂CDH: **Apache Hadoop 雖然部署容易並輕盈,不過當所需組件越來越多時,各生態系組件依賴及管理監理就顯得相對困難,所以Cloudera 就開發 CDH(Cloudera's Distribution Including Apache Hadoop)將整個Hadoop 生態系都進行包裝並且這也是開源的唷~~** ![](https://i.imgur.com/x9dpay6.png) ![](https://i.imgur.com/E86gCRS.png) :+1: 使用圖型化介面管理及監控更方便,且修改參數更佳容易 :::danger 注意事項 **叢集及單機版部署操作方式都類似,CDH都很Nice幫你包裝並且分配好** 1. 範例使用Google GCP 上創建 5個 VM 2. 範例使用版本為 CentOS 7 3. 範例每一個VM使用 8 Cores (含超執行緒),16G RAN,HDD 60G 4. 範例CDH 使用版本為 6.3 5. **Linux 版本、硬體配置或是CDH版本都可以自行選定及設定** 6. 注:GCP CentOS7 本身就配置 NTP軟體並指向Google NTP伺服器,所以不用在配置,如使用其他環境沒有NTP,請務必要配置 ::: :information_source: **詳細硬體需求、各主流Linux版本部署方式、不同CDH版本詳細介紹都可以[參閱Cloudera官方網站](https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_hardware_requirements.html)~~** :information_source: **推薦使用三大公有雲(都有免費使用方案):** 1.推薦Google GCP上實作,最簡單且直覺,最重要的是伺服器地區有台灣(網速非常快) 2.AWS EC2會執行到進階設定稍微複雜些 3.微軟 Azure 違反人性介面操作.. * 三大公有雲( [GCP](https://hackmd.io/@JeffWen/gcp-vm)、[EC2](https://hackmd.io/@JeffWen/aws-vm)、[Azure](https://hackmd.io/3ud2dx4USCqP5rln80NGHw) )建立VM方式請參閱~~ * Linux版本 ![](https://i.imgur.com/MAemVtW.png) CentOS Linux release 7.8.2003 (Core) * 安裝openjdk ![](https://i.imgur.com/vaBnxqz.png) * openjdk版本 ![](https://i.imgur.com/jeVVf3V.png) * 設定JAVA_HOME ![](https://i.imgur.com/6OV48Es.png) * 停用防火牆 ![](https://i.imgur.com/vFRdZGB.png) :warning: 請注意,該部署CDH均開通port為內部網路,外部網路一定要配置防火牆!!!不然會有8088 port 被挖礦攻擊 * 停用selinux ![](https://i.imgur.com/gYoamu7.png) * 設定台灣時區 ![](https://i.imgur.com/oSlXMUS.png) * 使用google時間伺服器 ![](https://i.imgur.com/p2HYqWL.png) :information_source: 只有GCP 可以使用Google時間伺服器~~,其餘方式請使用其他時間伺服器 * 設定root密碼 ![](https://i.imgur.com/ECe5DQD.png) :information_source: 設定完無密碼登入就不使用root管理 * 修改ssh設定 ![](https://i.imgur.com/P3vMFY6.png) * 修改ssh設定 ![](https://i.imgur.com/6MNKwm9.png) * 重啟ssh ![](https://i.imgur.com/LKn5Gzf.png) * Disabling Transparent Hugepages 等其它設定 ![](https://i.imgur.com/ffbm957.png) ```bash= echo 0 >/proc/sys/vm/swappiness echo "vm.swappiness=0" >> /etc/sysctl.conf echo "echo 0 > /proc/sys/vm/swappiness" >> /etc/rc.d/rc.local echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag echo "* soft nofile 128000" >> /etc/security/limits.conf echo "* hard nofile 128000" >> /etc/security/limits.conf echo "* soft nproc 128000" >> /etc/security/limits.conf echo "* hard nproc 128000" >> /etc/security/limits.conf ulimit -SHn 128000 ulimit -SHu 128000 ``` * 設定hosts清單 ![](https://i.imgur.com/5Coas5h.png) * 導入mysql 5.7 倉庫 ![](https://i.imgur.com/i734AOw.png) * 安裝mysql ![](https://i.imgur.com/jQ1q8xA.png) * 啟動mysql ![](https://i.imgur.com/0rWYsC2.png) * 顯示隨機產生root密碼 ![](https://i.imgur.com/UxskQ8A.png) ```bash cat /var/log/mysqld.log | grep 'password' ``` * 以root登入mysql ![](https://i.imgur.com/OMgJywl.png) * 生態系帳號及登入方式 ![](https://i.imgur.com/uj73U3q.png) ```sql= /* 設定帳密及資料庫範例 */ CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive12345678'; CREATE USER 'hive'@'%' IDENTIFIED BY 'hive12345678; CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; ``` :information_source: 官方建議生態系之資料庫帳號 ![](https://i.imgur.com/FmZLfVv.png) ---- ###### 導入CM * 導入CM倉庫 ![](https://i.imgur.com/zidEbFZ.png) * 安裝CM ![](https://i.imgur.com/6qJg1Jf.png) --- ###### 自選版 https://archive.cloudera.com/cm6/ * 手動下載CM版本 ![](https://i.imgur.com/DG7Ff0h.png) * 安裝CM ![](https://i.imgur.com/i8m8ss4.png) ```bash= wget https://archive.cloudera.com/cm6/6.3.1/redhat7/yum/RPMS/x86_64/cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm wget https://archive.cloudera.com/cm6/6.3.1/redhat7/yum/RPMS/x86_64/cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm yum localinstall -y cloudera-manager-server* yum localinstall -y cloudera-manager-daemons* ``` --- * 下載mysql connector ![](https://i.imgur.com/uS36Cpe.png) ```bash= wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz tar zxvf mysql-connector-java-5.1.46.tar.gz mkdir -p /usr/share/java/ cd mysql-connector-java-5.1.46 cp mysql-connector-java-5.1.46-bin.jar /usr/share/java/mysql-connector-java.jar ``` * CM 掛載至mysql ```bash /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm # scm 可以自行修改想要掛載mysql的帳密,該帳號要在mysql要有專屬的database # /opt/cloudera/cm/schema/scm_prepare_database.sh <databaseType> <databaseName> <databaseUser> ``` ![](https://i.imgur.com/C7YGGBE.png) * 啟動 cdh 伺服器 ![](https://i.imgur.com/pv2HsN9.png) :information_source: 7180 port監聽中,代表可以登入CM伺服器 --- * 至瀏覽器輸入CM伺服器 IP( Hostname ) + 7180 ex. 192.168.1.1:7180 or Hostname:7180 :warning: 如果使用三大公有雲部屬,請記得在防火牆規則開啟7180 :warning: 請注意千萬不要把8088 port 開放到所有IP,請限定IP開放!!! * 登入帳密預設admin ![](https://i.imgur.com/dNuBGMn.png) * 歡迎畫面 ![](https://i.imgur.com/vp53ysv.png) * 點選同意license ![](https://i.imgur.com/v3uZVSa.png) * 選擇所需的版本 ![](https://i.imgur.com/Ela9kCW.png) * 點選下一步 ![](https://i.imgur.com/XPbnejT.png) * 建立叢集名稱 ![](https://i.imgur.com/QaQDbrk.png) * 依據hosts清單搜尋機器 ![](https://i.imgur.com/u2w6Qew.png) :information_source: hosts參考 ![](https://i.imgur.com/dm6kmu5.png) * 選擇CM代理器倉庫及CDH版本 ![](https://i.imgur.com/DwVYFeh.png) :::success 1. 區分本地倉庫及公共倉庫,公共倉庫將至網路上下載再分發到各個節結點;本地倉庫由本地先下載完開設伺服器分送給其他節結點。 2. 如果節點數不多,建議先行下載安裝以節省時間。 3. [本地倉庫使用方式(官網)](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_ig_create_local_package_repo.html) ::: * 建議使用openjdk ![](https://i.imgur.com/Ri5dSR5.png) :information_source: Java版本一覽表 ![](https://i.imgur.com/21zGl5J.png) * 叢集無密碼登入設定 ![](https://i.imgur.com/wwDKSAz.png) :::warning 1. SSH設定要確認開啟root登入 ![](https://i.imgur.com/c3ioja2.png) ![](https://i.imgur.com/v4dVS8f.png) :information_source: 只有在部署期間需要root,部署結束後每一個組件都有專屬的帳號做管理唷~~~ ::: * 各節點安裝代理器 ![](https://i.imgur.com/RohrRoo.png) * 各節點分發及啟動CDH ![](https://i.imgur.com/Cqy8j7a.png) * 叢集檢查 ![](https://i.imgur.com/t8lJgFH.png) :::danger 1. Disabling Transparent Hugepages 2. Disable the tuned Service 3. Setting the vm.swappiness Linux Kernel Parameter ![](https://i.imgur.com/PVCxogu.png) ::: * 客製化叢集生態系 ![](https://i.imgur.com/BI36ems.png) * 依所需自訂 ![](https://i.imgur.com/yttjUqs.png) :information_source: HA: 2(master)+3(worker),建議zookeeper配置於worker ![](https://i.imgur.com/QrGGTgu.png) * 設定生態系套件所需metadata資料庫 ![](https://i.imgur.com/SkmoYfP.png) :information_source: 官方建議資料庫名稱及帳號配置 ![](https://i.imgur.com/3D3gF6W.png) * 設定完會執行第一次啟動(NameNode format,建立hadoop生態系帳號等等..) ![](https://i.imgur.com/X1bjdjL.png) * 部署完成(再依需求修改各生態系參數) ![](https://i.imgur.com/Rbn4Ufb.png) --- #### Pi 程式測試方式 ```bash sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 30 ``` ### 附錄-HA配置 #### NameNode HA * Zookeeper及JournalNode配置為奇數 :information_source: Jns用來寫入edits日誌至standby Namenode :information_source: Zp用來實作automatic failover :::danger Apache Hadoop官網: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally. ![](https://i.imgur.com/lZHAwsR.png) ::: 1. 啟動HDFS HA ![](https://i.imgur.com/EYdFaYd.png) 2. 建立Nameservice name ![](https://i.imgur.com/6GjClcV.png) 3. 選擇Standby Namenode 及 JournalNode 節點 ![](https://i.imgur.com/firLUOb.png) :information_source: 配置參考 ![](https://i.imgur.com/f1sPyUA.png) 4. 設定寫入日誌路徑、啟動automatic failover、分享日誌 ![](https://i.imgur.com/HJF3WDI.png) 5. 執行設定 ![](https://i.imgur.com/RAkUKMM.png) #### ResourceManager HA 1. 啟動Yarn HA ![](https://i.imgur.com/rfxPUf9.png) 2. 選擇Standby Resourcemanager ![](https://i.imgur.com/s50qUBY.png) 3. 執行設定 ![](https://i.imgur.com/GmmFrZb.png) ### 附錄-單機版部署注意事項 1. 部署完畢後將 hdfs 的 dfs.replication 改為 1 並重新啟動 ![](https://i.imgur.com/24WXxYd.png) 2. 重新啟動後 sudo -u hdfs hadoop fs -setrep -w 1 -R / :information_source: 把所有目錄中原本預設3個副本數都改成 1 個