wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz tar -xzvf hadoop-3.3.6.tar.gz [下班??????](https://www.books.com.tw/products/0010310956) 搜尋:/ ### 安裝 Hadoop - 環境變數: ( env | grep HADOOP 輸出畫面 ) ``` env | grep HADOOP ``` ![image](https://hackmd.io/_uploads/B1U8P2S6T.png) - Java version ``` java -version ``` ![image](https://hackmd.io/_uploads/SkgdvhH6a.png) - Hadoop version ``` /usr/local/hadoop/bin/hadoop version ``` ![image](https://hackmd.io/_uploads/SJBnwnBaT.png) - 你的 SSH 公私金鑰 ![image](https://hackmd.io/_uploads/SJ1eeSV6T.png) - SSH 免密碼登入畫面(就業後記得不要公開分享私鑰喔!) ![image](https://hackmd.io/_uploads/H18SWHVpT.png) - windows的公、私鑰 ![image](https://hackmd.io/_uploads/BkxvfB46p.png) - cat authorized_keys ![image](https://hackmd.io/_uploads/HJUlGrETT.png) ### hadoop-mapreduce-examples (pi) Usage 參數1: 任務的數量為 15,並且每個 Map 任務將處理 10000 個樣本 hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar pi 15 10000 output: Estimated value of Pi is 3.14125333333333333333 ![image](https://hackmd.io/_uploads/Hkj8aSNa6.png) 參數2: 任務的數量為 30,並且每個 Map 任務將處理 20000 個樣本 hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar pi 30 20000 output: Estimated value of Pi is 3.14164666666666666667 ![image](https://hackmd.io/_uploads/SkZtpHNp6.png) 結論 任務、樣本數越高,算出來的 pi 精確度越高! ### Pseudo-Distributed 以 Pseudo-Distributed 模式架設 Hadoop 提交您的 1. hadoop 設定檔 1. start-all.sh 輸出 ![image](https://hackmd.io/_uploads/HyvbIvNpa.png) 3. jps 輸出 4. netstat 命令輸出 ![image](https://hackmd.io/_uploads/ByZnXwV6a.png) 1. port ![image](https://hackmd.io/_uploads/rkmGBw4Ta.png) #### core-site.xml ``` <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> ``` ![image](https://hackmd.io/_uploads/SkKEzLVTa.png) #### hdfs-site.xml 範例 dfs.replication: 1 dfs.namenode.name.dir: file:///home/hadoop/hadoopdata/hdfs/namenode dfs.datanode.data.dir: file:///home/hadoop/hadoopdata/hdfs/datanode ``` <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value> </property> </configuration> ``` #### mapred-site.xml 範例 mapreduce.framework.name: yarn yarn.app.mapreduce.am.env: HADOOP_MAPRED_HOME=$HADOOP_HOME mapreduce.map.env:HADOOP_MAPRED_HOME=$HADOOP_HOME mapreduce.reduce.env:HADOOP_MAPRED_HOME=$HADOOP_HOME ``` <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> </configuration> ``` #### yarn-site.xml yarn 設定檔 name: yarn.nodemanager.aux-services value: mapreduce_shuffle ``` <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> ``` ### Homework - hadoop-mapreduce-examples (grep) hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep ~/input ~/output 'dfs[a-z.]+' 查詢hadoop用法 ![image](https://hackmd.io/_uploads/rkh3gTrap.png) ```bash= hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep ``` ![image](https://hackmd.io/_uploads/SkvsbTra6.png) ```bash= ls ~/output/ ``` ![image](https://hackmd.io/_uploads/rketb6STa.png) ### Workshop - NameNode web interface (overview) ![image](https://hackmd.io/_uploads/BJru9wHpT.png) - Name Node:網頁網址與截圖 http://10.167.218.125:9870/dfshealth.html#tab-overview ![image](https://hackmd.io/_uploads/Sk1q9DSpp.png) - Live Nodes:網頁網址與Volume Directory截圖 找到Http Address,請點擊 (hostname 需要換成 ip address) http://10.167.218.125:9864/datanode.html ![image](https://hackmd.io/_uploads/ryfSwYHpp.png) - NameNode Storage :請找到 Storage Directory ![image](https://hackmd.io/_uploads/HkDMHFrTa.png) - 請回想這這資料夾怎麼產生的,當初建立的指令 ![image](https://hackmd.io/_uploads/HJ_3LtHpp.png) ### NameNode Logs - 題目 - 日誌位置: 請你在 Utilities 頁面中,找到以下的日誌網址 ![image](https://hackmd.io/_uploads/B13fjFBap.png) Resource Manager log http://10.167.218.125:9864/logs/hadoop-hadoop-resourcemanager-kao.log Node Manager log http://10.167.218.125:9864/logs/hadoop-hadoop-nodemanager-kao.log NameNode log http://10.167.218.125:9864/logs/hadoop-hadoop-namenode-kao.log DataNode log http://10.167.218.125:9864/logs/hadoop-hadoop-datanode-kao.log - 題目 - Resource Manager log: ```bash= 2024-03-02 16:33:50,753 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: STARTUP_MSG: Starting ResourceManager 2024-03-02 16:33:51,208 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/usr/local/hadoop/etc/hadoop/core-site.xml 2024-03-02 16:33:51,357 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/usr/local/hadoop/etc/hadoop/yarn-site.xml 2024-03-02 16:33:51,488 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Using Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler 2024-03-02 16:33:52,212 INFO org.apache.hadoop.conf.Configuration: found resource capacity-scheduler.xml at file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml 2024-03-02 16:33:56,579 INFO org.apache.hadoop.ipc.Server: Listener at 0.0.0.0:8033 2024-03-02 16:33:57,591 INFO org.apache.hadoop.ipc.Server: Listener at 0.0.0.0:8031 2024-03-02 16:33:58,463 INFO org.apache.hadoop.ipc.Server: Listener at 0.0.0.0:8032 2024-03-02 16:34:00,025 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state ``` 1. ResourceManager 啟動過程中加載了哪些配置文件?請列出文件名。 core-site.xml 2024-03-02 16:33:51,208 yarn-site.xml 2024-03-02 16:33:51,357 capacity-scheduler.xml 2024-03-02 16:33:52,212 2. ResourceManager 啟動後,它在哪些埠號上監聽? 8033 2024-03-02 16:33:56,579 8031 2024-03-02 16:33:57,591 8032 2024-03-02 16:33:58,463 3. 從日誌中,你如何知道 ResourceManager 已成功轉換到活動狀態? ResourceManager: active state 2024-03-02 16:34:00,025 ### YARN web UI http://10.167.218.144:8088/cluster ![image](https://hackmd.io/_uploads/HyEVUTHT6.png) ![image](https://hackmd.io/_uploads/SkLAUTBaT.png) ### HDFS - 情境: 作為一名資料工程師,你正在使用 Pseudo-Distributed 模式的 Hadoop 集群來處理和分析文本數據。 你的任務是利用 Hadoop 的 wordcount 範例應用程序來計算一批文本文件中單詞出現的頻率。 為了組織這次的數據處理任務,你決定創建一個名為 project_wordcount 的目錄結構,在 raw_data 目錄中存放待處理的文本文件,並將 wordcount 處理後的結果保存到 processed_data 目錄中。 - 題目: 1. 把過程傳上來 1. 把 processed_data 目錄中的結果傳上來 1. 把 YARN 上 submit 的 App 連結貼上來 - 提示: 1. 本地待處理的數據 (sample_text.txt): ``` Hadoop is an open-source framework that allows to store and process big data in a distributed environment. Hadoop enables data processing over large clusters of computers. ``` - 預期輸出: ``` hdfs dfs -cat /project_wordcount/processed_data/part-r-00000 Hadoop 2 a 1 allows 1 an 1 and 1 big 1 clusters 1 computers. 1 data 2 distributed 1 enables 1 environment. 1 framework 1 in 1 is 1 large 1 of 1 open-source 1 over 1 process 1 processing 1 store 1 that 1 to 1 ``` ```bash= hdfs dfs -mkdir -p /project_wordcount/rawdata hdfs dfs -ls /project_wordcount/rawdata hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /project_wordcount/rawdata/sample_text.txt /project_wordcount/processed_data ``` ![image](https://hackmd.io/_uploads/B1xw9aBap.png) ![image](https://hackmd.io/_uploads/S1cvcpBTp.png) ### Hadoop Pseudo-Distributed error counts - 情境 在一個快節奏且數據驅動的業務環境中,確保 Hadoop 服務運行的健康和穩定性至關重要。運維團隊經常需要對系統日誌進行深入分析,以識別和預防潛在的問題。為了提高這一過程的效率,你被委以重任,利用 Hadoop 自身的能力來自動化這一分析過程。 - 題目 你的公司希望你對 Hadoop 服務的運行狀態進行深入分析,以評估其健康程度。作為準備階段,我們首先需要從系統日誌中整理出運行狀態的基本信息。具體來說,我們希望你: 1. 選取 Utilities –> Logs 頁面的第一個日誌文件作為輸入數據源。 1. 使用 grep 功能,統計所有以 "ERROR" 開頭的日誌訊息條目的總數。 1. 如果沒有以 "ERROR" 開頭的日誌條目,則按以下順序查找 "WARN" 和 "INFO" 開頭的日誌條目。 - 預期上傳檔案 1. 日誌文件:選取的原始日誌文件。 1. Reduce 指令:用於處理日誌文件的 MapReduce 程序或腳本 1. 輸出文件:分析結果的輸出文件,顯示 "ERROR"、"WARN" 和 "INFO" 日誌條目的統計數據。 程序 ``` 把要查看的log檔案寫到txt檔案 cat /usr/local/hadoop/logs/hadoop-hadoop-datanode-master.log | grep ERROR >datanode_log_ERROR.txt && 建立hdfs資料夾 hdfs dfs -mkdir -p /user/hadoop/project_wordcount/HW_raw_data 查詢hdfs資料夾 hdfs dfs -ls /user/hadoop/project_wordcount/HW_raw_data && 把檔案放到hdfs hdfs dfs -put /usr/local/hadoop/logs/datanode_log_ERROR.txt /user/hadoop/project_wordcount/HW_raw_data && hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep /user/hadoop/project_wordcount/HW_raw_data /user/hadoop/project_wordcount/HW_processed_data 'ERROR' && hdfs dfs -cat /user/hadoop/project_wordcount/HW_processed_data/part-r-00000 ``` ![image](https://hackmd.io/_uploads/Hy24pknTp.png) ![image](https://hackmd.io/_uploads/HkL43k3aT.png) ### Fully-Distributed-Topology 綁定主機名稱與 IP 位址: 設定於 Master, Slave1, Slave2 三台主機: ssh hadoop@10.167.218.144 ``` sudo vi /etc/hosts 10.167.218.144 master 10.167.218.181 slave1 10.167.218.151 slave2 ``` 老師 ``` ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC8ev4KiOYZJXpL8rnHY1a9BQIUeJTdPQRTWG/xdekqx96CpQTJcHYlBh6NZ9J2Rbwy5ngtoM+dqsLGa4OSJN67wCqypP6V2f7cg7PTD+ljv3JXo3kd6EE1jpH8SgehOBELpJocfX8qJhhAWBFlKPWSXwNnSpeLqMD9fgn8BGvahsSxjD0oxjv8X8nSc/ncWVdAszRBtabQIlQHnELuWBpJjITiZMDGV0ZWeS0DxCaVvHRSpV3z6kqDpG50VrO9vPQN5vKO9nmVa+0tkP+BHD8d2MxlRXoRe/qyUFYCnyvqy70F23X5ZA40d7tcwQkU8QnNjxE/a5/w50v7/s4zAPEJJGmUtaz7+g1+bIFC4GkvxAghT/Pvyy42ZciTaOP3LgVP3o6wtG/mIrw1U/0cFVic+V+MQIiqG7fMEBJ3fBcwcY8rRI/bsi/Zw6tfJY654lBcsdeiEk6gri1/9qYgD8ZaakcqDP7ev9ij5MTIbofzmvs6VLxBcje4xdvPwYVIGd0= hadoop@ntc ``` master公 ``` ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC9YJrONsV/jzpUnnpzmTKwwLZkTVR1tNAadSM/fMP1q0JiEedFphXnzMbrdzrUrs/YzwbhLwSrFdEuQQH/2DZ5ECdnXH0XCT8zVSFD1ua+0oVS9GFv6PHhm6884VLuOOnf6mUFe5z3Bc21TPtae038uj/ptSMHXXeQLa6RqXcXQqjrX5Qolu45qOCafo8MumJ4HuiUPA6Jm48qFkSMUhNao9CPtenf7SddTzEIKbLRhG+26Uf4CeZaMdk5mRkeUi1wWVIBz0bwPoMb0N+KtEsORkMM5X3j+kXZ53yCze8aoN6bdmuFFqg2kmU7di8hq/iTL4RiCIEkbp4oN8APHuuv9od2AjtM+COShLSDwzhOTaHAHItAXxQiwmcLWwDNTYJyDP7KIdi5mylx+HmHL7XdSB68rBmkfITeQ9QvWF2EZU9nyi2KtAVE1eiuiPuIs+UWvMHIOcmYG2jDAfIjAmEG7cVbm/ADKLdAxnf7Q9iHAdZUunoWV6HNhawZ+XX0eu0= hadoop@kao ``` slave1公 ``` ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDEzXiiSGW8A2VYq+YZJoY6VrK18Jt1oaI+X8nep/KHGx1trcDa9xF2YHGsdPUJyxtP9JhPANLaqt/crU2tjvUwRDVxgI0hvjmwuVVSYtdqCy3qBaTgVU6LodWLUK9AtiKfiEUDUXyuVP5vGG6RG4aDGDWUTTMWTVJgW2G5aT9g0riAOQ7YHppOM0ddlg5ega948cz7L/kGOUwX7yoTp+PTPDu+bUHpKIV+S1s/enDotoisw3tR4lFvN8Zys+E57BpuFVtm3WU2j1j8OysHmLxrQWAd/L3RKSB8gd9DbFfwKetEm78Y+RBdYrdZfWvK44JYb6my/RN5juLfifNOWCsslg9otczSxI66ki63si0YM07OBZ/0KmGq+QKz6nKSPPOJkpP+z89lBj3ZSjJBm2gLbwc0eNbzoQ9QXxt0nM9XaE6AgiH+uxcdkiOReyfXEejGJQCRZuxhtBHCdCEcXUEyseXOliVr3y8ubyD8nz3Eq8PWGaIMW73RnQQWMJJ4yWU= hadoop@slave1 ``` slave2公 ``` ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDCHGPkb2Zk0Pdj348EgNb9ztfkXvt3G9ZSP0X1SLOj2o6WrUVqqASXWs9AwkMESsHkJ5bH1ttPtAIN+MWZLwTRSwqhFIpPGX7rAe2H8XUcy5hncLCDzNpauHe4kS7q6ge4i1vrn5AZFR+srqln6wd1oYjPsOW87FsPh63crgameH3jXB5ix5TeEtO/s2ovIjYTkZemw4IPIdZs6QFDUWg8GUwHTounm0SD+ELGeSON0pa9Nwhh3zIj9oFvDFaCo9Z9hMXJmHIxtezNDkzSDPJ66OhrRPhF1mysQ5PG68WH6wC6jy/anQaxIKJJjOXFW8TLKEYqQrM3D4vJRTthNLfGMENcL5usEMYEh+BkfGvjzVkVM1VosRvJ3t7T7jyo8ePcYvjjnNll5f+Cx7txOLjpr8ZIuEWkec7HtjyRXFTTisr9sVWDTuzApkJ/8mGKhlGB8PT9P7wIW3kQRO+wvuPnLqp/kw/fNCHnEDYX9MGVCaagVSVdvMq+0CXptbNX/78= hadoop@slave2 ``` ### Fully-Distributed Deploy (Signal-Host) #### core-site.xml vim $HADOOP_HOME/etc/hadoop/core-site.xml ``` <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>fs.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> </configuration> ``` ```bash= <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>fs.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> </configuration> ``` #### hdfs-site.xml vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml ``` <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.namenode.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property> </configuration> ``` ``` <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!--censed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.namenode.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property> </configuration> ``` #### yarn-site.xml vim $HADOOP_HOME/etc/hadoop/yarn-site.xml ``` <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration> ``` ![image](https://hackmd.io/_uploads/r1-MWkvTT.png) #### mapred-site.xml vim $HADOOP_HOME/etc/hadoop/mapred-site.xml 這樣會沒有辦法執行:(老師亂寫) ``` <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>/usr/local/hadoop</value> </property> <property> <name>mapreduce.map.env</name> <value>/usr/local/hadoop</value> </property> <property> <name>mapreduce.reduce.env</name> <value>/usr/local/hadoop</value> </property> </configuration> ``` 改成這樣:(上面是老師亂寫) ``` <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> </configuration> ``` #### 指定 workers: 在所有機器中,新增以下檔案,以指定 workers 路徑: /usr/local/hadoop/etc/hadoop/workers 內容 ``` vim /usr/local/hadoop/etc/hadoop/workers slave1 slave2 ``` 以 SSH 協定佈署到各個位置: 在 Master 機器中,將設定檔複製到 slave1, slave2 ```bash= slave1: scp /usr/local/hadoop/etc/hadoop/* hadoop@slave1:/usr/local/hadoop/etc/hadoop/ slave2: scp /usr/local/hadoop/etc/hadoop/* hadoop@slave2:/usr/local/hadoop/etc/hadoop/ ``` ![image](https://hackmd.io/_uploads/Bk_58yP6a.png) #### 出現問題 master少DataNode、NodeManager slave2少NodeManager ![image](https://hackmd.io/_uploads/Hycl4bwTT.png) #### 解決方法 1. 把設定檔案刪掉重新建立 2. stop-all 3. 格式化 4. rm /tmp 5. start-all 6. watch jps:查詢jps派工狀況 #### 正常結果:有派工給奴隸 master ![image](https://hackmd.io/_uploads/HysCuNwT6.png) ![image](https://hackmd.io/_uploads/r1uit4D6T.png) slave1 ![image](https://hackmd.io/_uploads/rJD7FNvp6.png) ![image](https://hackmd.io/_uploads/BymtKNPap.png) slave2 ![image](https://hackmd.io/_uploads/rkOrFNwap.png) ![image](https://hackmd.io/_uploads/ByKDYVD6p.png) Datanode(localhost:9870) ![image](https://hackmd.io/_uploads/rkGTvNw6p.png) scp /usr/local/hadoop/etc/hadoop/* hadoop@wei:/usr/local/hadoop/etc/hadoop/ scp /usr/local/hadoop/etc/hadoop/* hadoop@slave1:/usr/local/hadoop/etc/hadoop/ scp ~/.ssh/authorized_keys hadoop@wei:~/.ssh/authorized_keys scp ~/.ssh/authorized_keys hadoop@slave1:~/.ssh/authorized_keys scp ~/.ssh/authorized_keys hadoop@slave2:~/.ssh/authorized_keys /etc/hosts 也要都一樣 ![image](https://hackmd.io/_uploads/SJaHLX2TT.png) ![image](https://hackmd.io/_uploads/SkYLL7naa.png) ![image](https://hackmd.io/_uploads/SJrvLX2aT.png) ### jps有問題 stop-all pkill -9 java 刪乾淨jps port in use ### mapper mapper.py ```bash= import sys for line in sys.stdin: line = line.strip() # 去除首尾空格 words = line.split() for word in words: print(word + "," + "1") ``` ``` echo "Deer Bear River Car Car River Deer Car Bear" | python3 mapper.py ``` ![image](https://hackmd.io/_uploads/HkNkUV6TT.png) ### reducer reducer.py ```bash= import sys line_input = [] for line in sys.stdin: line = line.strip() # 去除首尾空格 arr_line = line.split(",") line_input.append(arr_line) result = {} for item in line_input: key = item[0] # 取得元素作為鍵 count = int(item[1]) # 取得計數值並轉換為整數 if key in result: result[key] += count # 如果鍵已存在,增加計數值 else: result[key] = count # 如果鍵不存在,設定初始計數值 # 輸出結果 for key, value in result.items(): print(f"{key},{value}") ``` ```bash= echo "Deer Bear River Car Car River Deer Car Bear" | python3 mapper.py | python3 reducer.py ``` ### 在 Hadoop 測試 Python 撰寫的 MapReduce ``` /usr/local/hadoop/bin/hadoop jar '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar' \ -files ~/mapper.py, ~/reducer.py \ -mapper 'python3 mapper.py' \ -reducer 'python3 reducer.py' \ -input hdfs:<path to input file> \ -output hdfs:<path to output> ``` 上傳 input file: ``` echo "Deer Bear River Car Car River Deer Car Bear" > input.txt hdfs dfs -ls / hdfs dfs -mkdir /input hdfs dfs -put input.txt /input/ hdfs dfs -ls /input ``` Run MapReduce 檔案位置 /home/hadoop/mapper.py /home/hadoop/reducer.py ``` /usr/local/hadoop/bin/hadoop jar '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar' \ -files /home/hadoop/mapper.py,/home/hadoop/reducer.py \ -mapper 'python3 mapper.py' \ -reducer 'python3 reducer.py' \ -input hdfs:/input/input.txt \ -output hdfs:/result ``` ans. ``` /usr/local/hadoop/bin/hadoop jar '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar' \ -files /home/hadoop/workshop/mina/mapper.py,/home/hadoop/workshop/mina/reducer.py \ -mapper 'python3 mapper.py' \ -reducer 'python3 reducer.py' \ -input hdfs:/input/input.txt \ -output hdfs:/result_mina ``` ``` hdfs dfs -ls /result_mina hdfs dfs -cat /result_mina/part-00000 ``` ![image](https://hackmd.io/_uploads/H1YUpHTTT.png) ssh hadoop@10.167.218.144 ssh hadoop@10.167.218.181 ssh hadoop@10.167.218.151 ### 修改設定slave 步驟一:關閉hadoop /usr/local/hadoop/sbin/stop-all.sh 步驟二:抓公鑰給別人 cat ~/.ssh/id_rsa.pub 步驟三:加入別人的公鑰 echo '別人的公鑰' >> ~/.ssh/authorized_keys 步驟四:修改IP對應名稱,用一個好辨識的名稱即可(master、slave都要做) sudo vim /etc/hosts 步驟五:將別人的slave加入員工清單,把前一步驟好辨識的名稱加入 vim /usr/local/hadoop/etc/hadoop/workers 步驟六:移除暫存、設定,格式化hdfs rm -rf ~/hadoopdata/hdfs/* rm -rf /tmp/* rm -rf /usr/local/hadoop/tmp/* rm -rf /usr/local/hadoop/logs/* rm -rf /usr/local/hadoop/hdfs/name/* rm -rf /usr/local/hadoop/hdfs/data/* mkdir -p ~/hadoopdata/hdfs/{namenode,datanode} /usr/local/hadoop/bin/hdfs namenode -format -y 新增此兩項 rm -rf ~/.ssh/known_hosts pkill -9 java 步驟七:將設定檔匯入對方電腦 scp /usr/local/hadoop/etc/hadoop/* hadoop@slave1:/usr/local/hadoop/etc/hadoop/ scp /usr/local/hadoop/etc/hadoop/* hadoop@slave2:/usr/local/hadoop/etc/hadoop/ scp /etc/hosts [user_name]@[server_name]:/etc/hosts 補充 worker(放在/usr/local/hadoop/etc/hadoop/)、 authrization ,這兩個檔案也要scp給slave 步驟八:開啟hadoop /usr/local/hadoop/sbin/start-all.sh 補充:只要master做就好