Jeff
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ###### tags: `Hadoop` # 架設叢集兩三事 ## Hadoop 說明: 請參閱 https://hackmd.io/@JeffWen/HADOOP * Hadoop三種模式: 1. 單機模式(Standalone Mode): * 這種模式下不會啟動任何背景Java程式,此模式適合開發測試及除錯。 2. 偽分佈式模式(Pseudo-Distributed Mode) - Hadoop中的背景Java程式均運行於本機節點,可以模擬小規模的叢集,[架設步驟請參閱](https://hackmd.io/@JeffWen/bdsevm)。 3. 完全分佈式模式(Fully-Distributed Mode) - Hadoop中的背景Java程式運行數個主機上。 |屬性| 本機模式 | 偽分佈式模式 | 完全分佈式模式 | |:---:|:--------:|:--------:|:--------:| |fs.defaultFS| file:/// | hdfs:/// | hdfs:/// | |dfs.replication|N/A|1|3| |mapreduce .framework.name|N/A|yarn|yarn| |yarn.resourcemanager.hostname|N/A|localhost|resourcemanager| |yarn.nodemanager.auxervices|N/A|mapreduce_shuffle|mapreduce_shuffle| ### <Big>綱目</Big> :::warning **1. [Hadoop叢集基礎架設](#base) 2. [Spark及Jupyter應用程式安裝](#spark) 3. [叢集高可用性 HA(high availability)架設](#ha) 4. [一般叢集開(關)機程序](#normal) 5. [HA叢集開(關)機程序](#HA) 6. [Hadoop懶人腳本](#shell) 7. [SparkR架設,請參閱](https://hackmd.io/@JeffWen/sparkR)** ::: <h3 id="base">Hadoop叢集基礎架設</h3> 0. 要準備的事項 1. DHCP伺服器找出沒有使用的子網路段 ![](https://i.imgur.com/Q5y7bho.png) 取子網路段30-34來充當5台電腦ip :information_source: [DHCP科普請參考](http://linux.vbird.org/linux_server/0340dhcp.php) 2. 設定Windows 的host 名單 ![](https://i.imgur.com/fGOHVZa.png) ![](https://i.imgur.com/f1eDQfo.png) :information_source:[hosts科普請參考](https://zh.wikipedia.org/wiki/Hosts%E6%96%87%E4%BB%B6) 3. 擁有一台Ubuntu 18.04 server 請參考[Ubuntu 18.04 LTS Server 安裝不求人~~](https://hackmd.io/@JeffWen/ByVQYR2M8) 4. 資源配置 cpu:4 ram:8G 1台nna 1台rma 3台worker --- 1. 停用IPv6(**管理者身份**) 1. 檢查一下網路跟監聽的狀態(切換管理者) ```bash= ip addr show lsof -nPi ``` ![](https://i.imgur.com/mB24bEw.png) 2. 修改開機設定檔 ```bash nano /etc/default/grub ``` ![](https://i.imgur.com/Ruf4tH9.png) 3. 更新開機設定檔 ```bash update-grub # update-grub2 ``` ![](https://i.imgur.com/J9Nd9Wr.png) 4. 重新開機 ```bash reboot ``` 5. 檢查一下IPv6是否已經停用了 ```bash= ip addr show lsof -nPi ``` --- 2. 安裝pip(**管理者身份**) 1. 安裝python開發工具箱 ```bash= sudo apt update sudo apt install python3-dev ``` 2. 安裝pip ```bash= #取得最新版pip腳本 wget https://bootstrap.pypa.io/get-pip.py python3 get-pip.py ``` --- 3. 建立hadoop帳號(**管理者身份**) 1.hadoop帳號 ```bash sudo adduser hadoop ``` 2. 檢查是否已經創立 ```bash= grep 'hadoop' /etc/passwd grep 'hadoop' /etc/group grep 'hadoop' /etc/shadow ls -l /home ``` ![](https://i.imgur.com/NrHWEY3.png) 4. 安裝OpenJDK8(**管理者身份**) 1. 更新倉庫清單 ```bash apt update ``` 2. 安裝openjdk ```bash apt install openjdk-8-jdk ``` 3. 確認jdk及jre版本 ```bash= java -version javac -version ``` ![](https://i.imgur.com/G0RMOdB.png) 4. 建立openjdk環境變數腳本 ```bash nano /etc/profile.d/jdk.sh ``` 5. 編輯openjdk環境變數 ```bash export JAVA_HOME='/usr/lib/jvm/java-8-openjdk-amd64' ``` ![](https://i.imgur.com/NRulRia.png) 6. 重新載入設定檔,並檢查設定是否正確 ```bash source /etc/profile.d/jdk.sh # . /etc/profile.d/jdk.sh ``` ![](https://i.imgur.com/3jtuRKc.png) 5. 建立無密碼login(**Hadoop身份**) 1. 切換hadoop帳號 ```bash su - hadoop ``` 3. 打造ssh公鑰及私鑰 ```bash ssh-keygen -t rsa ``` ![](https://i.imgur.com/HHdLYDn.png) ![](https://i.imgur.com/BRa0XU2.png) 4. 將打造好的公鑰複製一份給hadoop ```bash ssh-copy-id hadoop@localhost ``` ![](https://i.imgur.com/qJ5gDVs.png) 5. 測試一下無密碼登入(不用輸入密碼代表成功了) ```bash ssh hadoop@localhost ``` ![](https://i.imgur.com/0nz0spb.png) :warning:**要馬上exit退出來!!!** ~~6. 將自己的公鑰傳給其他人(自己電腦玩可以省略這一步)~~ :warning:**此方法並非正規作法,會提供寫入權限** ```bash= ssh-copy-id hadoop@192.168.XX.XXX ssh-copy-id hadoop@192.168.XX.XXX ssh-copy-id hadoop@192.168.XX.XXX ``` --- 6. 建立Linux hotsts名單(**管理者身份**) ```bash= nano /etc/hosts ``` ![](https://i.imgur.com/yunaC1S.png) --- 7. 下載及安裝hadoop(**管理者身份**) 1. 下載 ```bash= cd wget http://ftp.tc.edu.tw/pub/Apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz ``` :information_desk_person: 如果載點失效,請至[官網下載~~](https://spark.apache.org/downloads.html) 2. 解壓縮 ```bash tar -tvf hadoop-3.2.1.tar.gz #查看一下檔案內容 tar -xvf hadoop-3.2.1.tar.gz -C /usr/local ``` 3. 更名 ```bash mv /usr/local/hadoop-3.2.1 /usr/local/hadoop ``` 4. 改變資料夾及檔案擁有者 ```bash chown -R hadoop:hadoop /usr/local/hadoop ``` --- 8. 設定hadoop使用者環境變數 (**Hadoop身份**) 1. 設定.bashrc ```bash nano ~/.bashrc ``` ![](https://i.imgur.com/FdzPaih.png) ```bash= # Set HADOOP_HOME export HADOOP_HOME=/usr/local/hadoop # Set HADOOP_MAPRED_HOME export HADOOP_MAPRED_HOME=${HADOOP_HOME} # Add Hadoop bin and sbin directory to PATH export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin ``` 2. 重新載入設定檔 ```bash source ~/.bashrc # . .bashrc ``` 3. 查看環境變數 ![](https://i.imgur.com/mLRFOgM.png) --- 9. 更改 Hadoop運行程式時環境腳本(**Hadoop身份**) ```bash nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh ``` ![](https://i.imgur.com/SaJCOJS.png) ```bash= export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop ``` --- 10. 更改 Hadoop core-site.xml(**Hadoop身份**) ```bash nano /usr/local/hadoop/etc/hadoop/core-site.xml ``` ![](https://i.imgur.com/ECXvLk6.png) ```xml= <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/data</value> <description>Temporary Directory.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://test30.example.org</value> <description>Use HDFS as file storage engine</description> </property> ``` :information_source: Hadoop 3.2.0版之後有檢查語法指令 ```bash hadoop conftest ``` ![](https://i.imgur.com/floyZJQ.png) --- 11. 更改 Hadoop mapred-site.xml(**Hadoop身份**) ![](https://i.imgur.com/83zJ63V.png) 參考hortonworks配置 ```bash nano /usr/local/hadoop/etc/hadoop/mapred-site.xml ``` ![](https://i.imgur.com/IXQ4srT.png) ```xml= <property> <name>mapreduce.map.memory.mb</name> <value>2048</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1638m</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>4096</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx3276m</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>4096</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx3276m</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>819</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>test32.example.org:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>test32.example.org:19888</value> </property> ``` --- 12. 更改 Hadoop yarn-site.xml(**Hadoop身份**) ```bash nano /usr/local/hadoop/etc/hadoop/yarn-site.xml ``` ![](https://i.imgur.com/5rMard1.png) ```xml= <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>6144</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>6144</value> </property> <property> <name>yarn.nodemanager.resource.detect-hardware-capabilities</name> <value>true</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>3</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>test31.example.org</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> ``` --- 13. 更改Hadoop hdfs-site.xml(**Hadoop身份**) ```bash nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml ``` ![](https://i.imgur.com/CfFZs01.png) ```xml= <property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> <description>The name of the group of super-users. The value should be a single group name.</description> </property> ``` --- 14. 建立Hadoop worker檔(**管理者身份**) ```bash nano /usr/local/hadoop/etc/hadoop/workers ``` ![](https://i.imgur.com/ZCqlC08.png) --- 15. 複製電腦 (**管理者身份**) * 使用前檢查事項: * 停用IPv6 * 建立hadoop帳號 * 完成無密碼login * 建立hosts檔 * 建立worker檔 * 安裝OpenSSH server * OpenJDK8安裝並設定 * 下載hadoop並修改完環境變數及相關設定 * ***確認無誤再執行不然會多做很多事情...*** 1. 複製資料夾並更改資料夾名稱 ![](https://i.imgur.com/tqaMaER.png) 3. 更改UUID(與資料夾名稱一樣) ![](https://i.imgur.com/SBuwnVW.png) 4. 開機時候選擇I Copied It(會自動建立新的MAC) ![](https://i.imgur.com/ABYpY39.png) 6. 編輯cloud.cfg設定檔 ```bash sudo nano /etc/cloud/cloud.cfg ``` ![](https://i.imgur.com/b1PzJnl.png) 4. 修改hostname ```bash hostnamectl set-hostname <HOSTNAME> #HOSTNAME自行設定 ``` ![](https://i.imgur.com/pBBw6Dm.png) 5. 編輯50-cloud-init.yaml設定檔,修改網路IP ```bash sudo nano /etc/netplan/50-cloud-init.yaml ``` ![](https://i.imgur.com/partsPW.png) 6. 套用網路設定 ```bash sudo netplan apply ``` ![](https://i.imgur.com/zQmkAJ9.png) 7. 重開機(有多少台電腦就重複做幾次...) ```bash reboot ``` * 全部電腦開機 ![](https://i.imgur.com/LkwuHZ6.jpg) --- 16. Namenode format(**hadoop身份**) ```bash hdfs namenode -format #只有Namenode那一台 ``` ![](https://i.imgur.com/HyyduLX.png) --- 17. 啟動hdfs(**hadoop身份**) ```bash= start-dfs.sh #只有Namenode那一台 ``` ![](https://i.imgur.com/CR533V3.png) ![](https://i.imgur.com/0nYeeC3.png) ![](https://i.imgur.com/pUFSslT.png) http://test30.example.org:9870 --- 18. 啟動yarn(**hadoop身份**) ```bash= start-yarn.sh #只有Resourcemanager那一台 ``` ![](https://i.imgur.com/08Bk1Wp.png) ![](https://i.imgur.com/Vf3xnh2.png) ![](https://i.imgur.com/GrkR9jR.png) http://test31.example.org:8088/ --- 19. 啟動History Server(**hadoop身份**) ```bash= mapred --daemon start historyserver #只有History Server那一台 #mr-jobhistory-daemon.sh start historyserver (deprecated) ``` ![](https://i.imgur.com/h20b78M.png) ![](https://i.imgur.com/Rj7fFw1.png) --- 20. 跑個pi 測試一下mapreduce (**hadoop身份**) ```bash= hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi 30 100 ``` ![](https://i.imgur.com/2TAltZW.png) ![](https://i.imgur.com/3hPwPwP.png) ![](https://i.imgur.com/LcgnWeG.png) ![](https://i.imgur.com/0B4yjNo.png) 會自動建立hadoop的目錄 ![](https://i.imgur.com/HQfxlPk.png) <Big>恭喜你完成第一階段Hadoop基本架設~~~</Big> --- <h3 id="spark">Spark及Jupyter應用程式安裝</h3> 0. 請先確認叢集均已啟動服務 **:information_source: 請參閱[一般叢集開(關)機程序](#normal)** 1. 下載及安裝Spark(**管理者身份**) 1. 下載 ```bash= cd wget http://ftp.tc.edu.tw/pub/Apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz ``` 2. 解壓縮 ```bash tar -xvf spark-2.4.4-bin-hadoop2.7.tgz -C /usr/local ``` 3. 更名 ```bash mv /usr/local/spark-2.4.4-bin-hadoop2.7 /usr/local/spark ``` 4. 修改spark資料夾及檔案使用者 ```bash chown -R hadoop:hadoop /usr/local/spark ``` --- 2. 修改Spark環境變數(**hadoop身份**) 1. 設定.bashrc ```bash nano ~/.bashrc ``` ![](https://i.imgur.com/yxskMpp.png) 2. 重新載入設定檔 ```bash source ~/.bashrc #( . .bashrc) ``` 3. 查看環境變數 ![](https://i.imgur.com/VaUwRiU.png) --- 3. 更改 Spark運行程式時環境腳本(**hadoop身份**) 1. 複製並建立一份spark-env腳本 ```bash cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh ``` 2. 編輯spark-env腳本 ```bash nano /usr/local/spark/conf/spark-env.sh ``` ![](https://i.imgur.com/ij1RPYu.png) --- 4. 跑個pi 測試一下Spark(**hadoop身份**) ```bash= cd $SPARK_HOME ./bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 1 \ --num-executors 3 \ --queue default \ examples/jars/spark-examples*.jar \ 100 ``` ![](https://i.imgur.com/RZABqfU.png) ![](https://i.imgur.com/bPOqpn1.png) ![](https://i.imgur.com/oMoDOw3.png) ![](https://i.imgur.com/2DuS7x0.png) --- * 明顯看出Spark 遠遠勝過 MapReduce (同樣跑pi 100次) ![](https://i.imgur.com/dHCrGcS.png) MapReduce花費3分11秒 Spark花費14秒 --- 5. 停止Spark運行程式時都要上傳jar檔到hdfs方式(**hadoop身份**) ![](https://i.imgur.com/roEKIYC.png) ![](https://i.imgur.com/5zK9vjK.png) * *每次都要上傳那麼多檔案擾民...* 1. 在 hdfs 建立目錄放jar檔 ```bash= hdfs dfs -mkdir -p /user/spark/share/jars hdfs dfs -put $SPARK_HOME/jars/* /user/spark/share/jars/ hdfs dfs -ls /user/spark/share/jars ``` 2. 上傳jar檔到hdfs ```bash= hdfs dfs -mkdir -p /user/spark/share/jars hdfs dfs -put $SPARK_HOME/jars/* /user/spark/share/jars/ ``` ![](https://i.imgur.com/kyfu3b4.png) 3. 確認jar檔都上傳 ```bash hdfs dfs -ls /user/spark/share/jars | wc -l ``` ![](https://i.imgur.com/DDyxtSP.png) 4. 編輯spark-defaults.conf 的hdfs路徑 ```bash= cp /usr/local/spark/conf/spark-defaults.conf.template /usr/local/spark/conf/spark-defaults.conf nano /usr/local/spark/conf/spark-defaults.conf ``` ![](https://i.imgur.com/1asowvX.png) 5. 跑個pi 檢測一下 ![](https://i.imgur.com/UJycGf3.png) 出現一堆的Not copying就代表成功了 ![](https://i.imgur.com/lGHepEw.png)) 時間減少1秒 --- 6. 使用PySpark shell(**hadoop身份**) 1. 使用Spark的readme當範本測試一下 ![](https://i.imgur.com/mSoTsEH.png) 2. 開啟pyspark shell ```bash= cd $SPARK_HOME ./bin/pyspark --master yarn --deploy-mode client --num-executors 1 --executor-cores 1 ``` ![](https://i.imgur.com/P2v6Go8.png) ![](https://i.imgur.com/3dSZ52L.png) 3. 運行程式看看 ![](https://i.imgur.com/seyBcxi.png) --- 7. 安裝jupter 系列及pyspark 等套件(**管理者身份**) 1. 安裝pyspark 套件 ```bash= pip3 install pyspark ``` 2. 安裝jupter 系列套件 ```bash= pip3 install jupyterlab ``` --- 8. jupyter 系列遠端使用及產生密碼(**一般使用者身份**) 1. 創建jupyter設定檔 ```bash jupyter notebook --generate-config ``` ![](https://i.imgur.com/E1GRhbo.png) 2. 修改設定檔 ```bash nano .jupyter/jupyter_notebook_config.py ``` 3. 將登入網域開成全域 ```bash c.NotebookApp.ip = '0.0.0.0' ``` ![](https://i.imgur.com/YlJI0fJ.png) 4. 產生密碼 ```bash jupyter notebook password ``` ![](https://i.imgur.com/6jj04m4.png) 5. 開啟筆記本或是Lab ```bash jupyter notebook #jupyter lab ``` ![](https://i.imgur.com/2VF72ym.png) **就可以藉由瀏覽器登入** ![](https://i.imgur.com/oBMg1F8.jpg) ![](https://i.imgur.com/pvPGI7l.jpg) *~~或是用手機登入打code...瘋掉拉~~* <Big>恭喜你完成第二階段應用程式安裝~~~</Big> --- <h3 id="ha">叢集高可用性 HA(high availability)架設</h3> :::danger HA並非必要,步驟稍微複雜些,如果電腦效能沒有那麼好請自行斟酌 ::: :warning: 準備事項: 1. 三台Worker電腦當作Journalnodes及ZooKeeper 2. 原本的NameNode電腦多新增一個ResourceManager Stantby 3. 原本的ResourceManager電腦多新增一個NameNode Stantby --- 0. 請先確認叢集均已停止服務 **:information_source: [HA叢集開(關)機程序](#HA)** 1. 新增hdfs-site.xml檔,並SCP到其他台電腦(**hadoop身份**) ```bash nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml ``` ![](https://i.imgur.com/GNS3rEL.png) ```xml= <property> <name>dfs.nameservices</name> <value>nncluster</value> </property> <property> <name>dfs.ha.namenodes.nncluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.nncluster.nn1</name> <value>test30.example.org:8020</value> </property> <property> <name>dfs.namenode.http-address.nncluster.nn1</name> <value>test30.example.org:9870</value> </property> <property> <name>dfs.namenode.rpc-address.nncluster.nn2</name> <value>test31.example.org:8020</value> </property> <property> <name>dfs.namenode.http-address.nncluster.nn2</name> <value>test31.example.org:9870</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://test32.example.org:8485;test33.example.org:8485;test34.example.org:8485/nncluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/journalnode</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property> <property> <name>dfs.client.failover.proxy.provider.nncluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> ``` :information_source: SCP指令範例 ```bash scp /usr/local/hadoop/etc/hadoop/hdfs-site.xml hadoop@test31:/usr/local/hadoop/etc/hadoop ``` ![](https://i.imgur.com/aCEx6Q9.png) 2. 更正core-site.xml檔,並SCP到其他台電腦(**hadoop身份**) ```bash nano /usr/local/hadoop/etc/hadoop/core-site.xml ``` ![](https://i.imgur.com/Azjkjts.png) ```xml= <property> <name>fs.defaultFS</name> <value>hdfs://nncluster</value> </property> ``` 3. 三台電腦(Journalnodes)建立journalnode資料夾(**hadoop身份**) ```bash mkdir ~/journalnode ``` 4. 啟動journalnode,並jps確認(**hadoop身份**) ```bash hdfs --daemon start journalnode ``` ![](https://i.imgur.com/b1kSI1P.png) 5. active NameNode限定(**hadoop身份**) ```bash hdfs namenode -initializeSharedEdits ``` ![](https://i.imgur.com/xW5B5gC.png) ![](https://i.imgur.com/OXOt7UN.png) :warning: **請確認有出現Sucessfully started new epoch 1** :information_desk_person: *如果此叢集是全新未使用過的請先format一下!!!!!* ```bash hdfs namenode -format ``` 6. 啟動第一台NameNode(**hadoop身份**) ```bash hdfs --daemon start namenode ``` ![](https://i.imgur.com/xdJyQFV.png) 7. 第二台NameNode複製metadata(**hadoop身份**) ```bash hdfs namenode -bootstrapStandby ``` ![](https://i.imgur.com/0l6kVnO.png) ![](https://i.imgur.com/O1vrqwg.png) :warning: **請確認有出現has been successfully formatted** 8. 啟動第二台NameNode(**hadoop身份**) ```bash hdfs --daemon start namenode ``` ![](https://i.imgur.com/3KeeCTa.png) 9. 停止全部NameNode再啟動(**hadoop身份**) ```bash= stop-dfs.sh start-dfs.sh ``` ![](https://i.imgur.com/MMinzyz.png) :information_desk_person: **兩台namenode及三台journal node均會一起停止及啟動** 10. 激活第一台NameNode,並檢查狀態(**hadoop身份**) ```bash= hdfs haadmin -transitionToActive nn1 hdfs haadmin -getServiceState nn1 hdfs haadmin -getServiceState nn2 ``` ![](https://i.imgur.com/v1YsD2J.png) ![](https://i.imgur.com/D4V8QSD.png) ![](https://i.imgur.com/UpHbVKb.png) 11. 啟動Yarn(**hadoop身份**) ```bash start-yarn.sh ``` ![](https://i.imgur.com/xWkBrLl.png) 12. 啟動Job history server(**hadoop身份**) ```bash mapred --daemon start historyserver ``` ![](https://i.imgur.com/TVU2ucE.png) 13. 切換一下active Namenode(**hadoop身份**) ```bash= hdfs haadmin -transitionToStandby nn1 hdfs haadmin -getServiceState nn1 hdfs haadmin -getServiceState nn2 hdfs haadmin -transitionToActive nn2 hdfs haadmin -getServiceState nn2 ``` ![](https://i.imgur.com/aJ9SloD.png) ![](https://i.imgur.com/KbRR8DB.png) ![](https://i.imgur.com/wuY18oT.png) 14. 跑個PI測試一下新起Namenode能不能正常運作(**hadoop身份**) ```bash hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi 30 100 ``` ![](https://i.imgur.com/nCIQeQS.png) ![](https://i.imgur.com/kS10yNp.png) 15. 下載ZooKeeper並安裝(三台Zookeeper電腦都要)(**管理者身份**) 1. 下載ZooKeeper ```bash wget http://ftp.tc.edu.tw/pub/Apache/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz ``` 2. 解壓縮 ```bash tar -xvf apache-zookeeper-3.5.6-bin.tar.gz -C /usr/local ``` 3. 更名 ```bash mv /usr/local/apache-zookeeper-3.5.6-bin /usr/local/zookeeper ``` 4. 修改擁有者 ```bash chown -R hadoop:hadoop /usr/local/zookeeper ``` 16. 複製zoo_sample.cfg並編輯zoo.cfg(可以SCP到另外兩台)(**hadoop身份**) ```bash= cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg nano /usr/local/zookeeper/conf/zoo.cfg ``` ![](https://i.imgur.com/8xmNuIt.png) ![](https://i.imgur.com/aOTGRb7.png) ```bash= dataDir=/usr/local/zookeeper/zoodata #修改 admin.serverPort=8010 #新增 server.1=test32.example.org:2888:3888 #新增 server.2=test33.example.org:2888:3888 #新增 server.3=test34.example.org:2888:3888 #新增 ``` 17. 修改zkEnv.sh檔(可以SCP到另外兩台)(**hadoop身份**) ```bash= nano /usr/local/zookeeper/bin/zkEnv.sh ``` ![](https://i.imgur.com/ZOYfch7.png) ```bash= #新增 ZOO_LOG_DIR="/usr/local/zookeeper/logs" ZOO_LOG4J_PROP="INFO,ROLLINGFILE" ``` 18. 建立存放LOG資料夾(**hadoop身份**) ```bash= mkdir /usr/local/zookeeper/zoodata echo "1" > /usr/local/zookeeper/zoodata/myid #第一台zookeeper做 echo "2" > /usr/local/zookeeper/zoodata/myid #第二台zookeeper做 echo "3" > /usr/local/zookeeper/zoodata/myid #第三台zookeeper做 ``` :warning: myid請務必要與zoo.cfg設定一樣 ![](https://i.imgur.com/AWlnjAS.png) 19. 修改環境變數(**hadoop身份**) 1. 編輯.bashrc ```bash nano ~/.bashrc ``` 2. 新增環境變數 ```bash= export ZOOKEEPER_HOME=/usr/local/zookeeper export PATH=$PATH:$ZOOKEEPER_HOME/bin ``` ![](https://i.imgur.com/JfKYvX2.png) 3. 載入環境變數 ```bash source ~/.bashrc # . ~/.bashrc ``` 20. 啟動ZooKeeper(三台電腦均要啟動) ```bash= zkServer.sh start zkServer.sh status jps ``` ![](https://i.imgur.com/8U2BwW6.png) :information_desk_person: 只有一台啟動時候,查看狀態會說It is probably not running.代表目前沒有其他zookeeper溝通 21. 依序停止下列服務 ```bash= #停止Historyserver mapred --daemon stop historyserver #停止ResoureManager stop-yarn.sh #停止NameNode stop-dfs.sh ``` 22. 新增hdfs-site.xml,並SCP到其他台電腦(**hadoop身份**) ```bash nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml ``` ![](https://i.imgur.com/Tpppyzd.png) ```xml= <!--新增 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> ``` 23. 新增core-site.xml,並SCP到其他台電腦(**hadoop身份**) ```bash nano /usr/local/hadoop/etc/hadoop/core-site.xml ``` ![](https://i.imgur.com/RwrXFIk.png) ```xml= <!--新增 --> <property> <name>ha.zookeeper.quorum</name> <value>master1.example.org:2181,master2.example.org:2181,master3.example.org:2181</value> </property> ``` 24. NameNode限定(**hadoop身份**) ```bash hdfs zkfc -formatZK ``` ![](https://i.imgur.com/c3X1cZa.png) :information_desk_person: 請確認出現Successfully created /hadoop-ha/nncluster in ZK字樣 25. 啟動NameNode(NameNode限定)(**hadoop身份**) ```bash start-dfs.sh ``` ![](https://i.imgur.com/PXk8Lwq.png) :information_desk_person:將會自動啟動DFSZKFailoverController服務 26. 測試NameNode故障自動轉移(NameNode限定)(**hadoop身份**) ```bash= hdfs --daemon stop namenode hdfs haadmin -getServiceState nn1 hdfs haadmin -getServiceState nn2 ``` ![](https://i.imgur.com/Y6H2Ts4.png) 27. 新增及刪除yarn-site.xml,並SCP到其他台電腦(**hadoop身份**) ![](https://i.imgur.com/2542fsi.png) ```xml= <!--刪除property --> <property> <name>yarn.resourcemanager.hostname</name> <value>test31.example.org</value> </property> <!--新增property --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>rmcluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>test31.example.org</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>test30.example.org</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>test31.example.org:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>test30.example.org:8088</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>test32.example.org:2181,test33.example.org:2181,test34.example.org:2181</value> </property> ``` 28. 依序啟動下列服務(**hadoop身份**) ```bash= #啟動ResoureManager start-yarn.sh #啟動Historyserver mapred --daemon start historyserver ``` 29. 測試ResourceManager故障自動轉移(Resourcemanager限定)(**hadoop身份**) ```bash yarn --daemon stop resourcemanager yarn rmadmin -getServiceState rm1 yarn rmadmin -getServiceState rm2 ``` ![](https://i.imgur.com/FkLwtmR.png) 30. 如果有修改Spark-defaults.conf運行程式載入Jar檔,請記得修訂 ```bash nano /usr/local/spark/conf/spark-defaults.conf ``` ![](https://i.imgur.com/JLblhkX.png) <Big>恭喜你完成最後一階段 高可用性HA(high availability)架設~~~</Big> <h3 id="normal">一般叢集開(關)機程序</h3> 1. 基本叢集開機 ```bash= #NameNode啟動 start-dfs.sh #ResourceManager啟動 start-yarn.sh #Historyserver啟動 mapred --daemon start historyserver ``` 2. 基本叢集關機 ```bash= #Historyserver停止 mapred --daemon stop historyserver #ResourceManager停止 stop-yarn.sh #NameNode停止 stop-dfs.sh ``` <h3 id="HA">HA叢集開(關)機程序</h3> 1. HA叢集開機 ```bash= #ZooKeeper啟動 zkServer.sh start #NameNode啟動 start-dfs.sh #ResourceManager啟動 start-yarn.sh #Historyserver啟動 mapred --daemon start historyserver ``` 2. HA叢集關機 ```bash= #Historyserver停止 mapred --daemon stop historyserver #ResourceManager停止 stop-yarn.sh #NameNode停止 stop-dfs.sh #ZooKeeper停止 zkServer.sh stop ``` <h3 id="shell">Hadoop懶人腳本</h3> 0. [**Hadoop懶人腳本介紹**](https://github.com/JeffWen0105/wen/tree/master/iiiEduBdse/release/hadoop) 1. [叢集一鍵開機](https://github.com/JeffWen0105/wen/blob/master/iiiEduBdse/release/hadoop/hadoopStartBdse12) 2. [叢集一鍵關機](https://github.com/JeffWen0105/wen/blob/master/iiiEduBdse/release/hadoop/hadoopStopBdse12) 3. [叢集狀態檢查](https://github.com/JeffWen0105/wen/blob/master/iiiEduBdse/release/hadoop/nodecheckBdse12) 4. [叢集安全遠端複製](https://github.com/JeffWen0105/wen/blob/master/iiiEduBdse/release/hadoop/safeScpBdse12) 5. [叢集遠端複製](https://github.com/JeffWen0105/wen/blob/master/iiiEduBdse/release/hadoop/scpBdse12)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully