wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xzvf hadoop-3.3.6.tar.gz
[下班??????](https://www.books.com.tw/products/0010310956)
搜尋:/
### 安裝 Hadoop
- 環境變數: ( env | grep HADOOP 輸出畫面 )
```
env | grep HADOOP
```

- Java version
```
java -version
```

- Hadoop version
```
/usr/local/hadoop/bin/hadoop version
```

- 你的 SSH 公私金鑰

- SSH 免密碼登入畫面(就業後記得不要公開分享私鑰喔!)

- windows的公、私鑰

- cat authorized_keys

### hadoop-mapreduce-examples (pi)
Usage
參數1:
任務的數量為 15,並且每個 Map 任務將處理 10000 個樣本
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar pi 15 10000
output: Estimated value of Pi is 3.14125333333333333333

參數2:
任務的數量為 30,並且每個 Map 任務將處理 20000 個樣本
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar pi 30 20000
output: Estimated value of Pi is 3.14164666666666666667

結論
任務、樣本數越高,算出來的 pi 精確度越高!
### Pseudo-Distributed
以 Pseudo-Distributed 模式架設 Hadoop
提交您的
1. hadoop 設定檔
1. start-all.sh 輸出

3. jps 輸出
4. netstat 命令輸出

1. port

#### core-site.xml
```
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
```

#### hdfs-site.xml
範例
dfs.replication: 1
dfs.namenode.name.dir: file:///home/hadoop/hadoopdata/hdfs/namenode
dfs.datanode.data.dir: file:///home/hadoop/hadoopdata/hdfs/datanode
```
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
```
#### mapred-site.xml
範例
mapreduce.framework.name: yarn
yarn.app.mapreduce.am.env: HADOOP_MAPRED_HOME=$HADOOP_HOME
mapreduce.map.env:HADOOP_MAPRED_HOME=$HADOOP_HOME
mapreduce.reduce.env:HADOOP_MAPRED_HOME=$HADOOP_HOME
```
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
</configuration>
```
#### yarn-site.xml
yarn 設定檔
name: yarn.nodemanager.aux-services
value: mapreduce_shuffle
```
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
```
### Homework - hadoop-mapreduce-examples (grep)
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep ~/input ~/output 'dfs[a-z.]+'
查詢hadoop用法

```bash=
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep
```

```bash=
ls ~/output/
```

### Workshop - NameNode web interface (overview)

- Name Node:網頁網址與截圖
http://10.167.218.125:9870/dfshealth.html#tab-overview

- Live Nodes:網頁網址與Volume Directory截圖
找到Http Address,請點擊 (hostname 需要換成 ip address)
http://10.167.218.125:9864/datanode.html

- NameNode Storage :請找到 Storage Directory

- 請回想這這資料夾怎麼產生的,當初建立的指令

### NameNode Logs
- 題目 - 日誌位置:
請你在 Utilities 頁面中,找到以下的日誌網址

Resource Manager log
http://10.167.218.125:9864/logs/hadoop-hadoop-resourcemanager-kao.log
Node Manager log
http://10.167.218.125:9864/logs/hadoop-hadoop-nodemanager-kao.log
NameNode log
http://10.167.218.125:9864/logs/hadoop-hadoop-namenode-kao.log
DataNode log
http://10.167.218.125:9864/logs/hadoop-hadoop-datanode-kao.log
- 題目 - Resource Manager log:
```bash=
2024-03-02 16:33:50,753 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: STARTUP_MSG: Starting ResourceManager
2024-03-02 16:33:51,208 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/usr/local/hadoop/etc/hadoop/core-site.xml
2024-03-02 16:33:51,357 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/usr/local/hadoop/etc/hadoop/yarn-site.xml
2024-03-02 16:33:51,488 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Using Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
2024-03-02 16:33:52,212 INFO org.apache.hadoop.conf.Configuration: found resource capacity-scheduler.xml at file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml
2024-03-02 16:33:56,579 INFO org.apache.hadoop.ipc.Server: Listener at 0.0.0.0:8033
2024-03-02 16:33:57,591 INFO org.apache.hadoop.ipc.Server: Listener at 0.0.0.0:8031
2024-03-02 16:33:58,463 INFO org.apache.hadoop.ipc.Server: Listener at 0.0.0.0:8032
2024-03-02 16:34:00,025 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state
```
1. ResourceManager 啟動過程中加載了哪些配置文件?請列出文件名。
core-site.xml 2024-03-02 16:33:51,208
yarn-site.xml 2024-03-02 16:33:51,357
capacity-scheduler.xml 2024-03-02 16:33:52,212
2. ResourceManager 啟動後,它在哪些埠號上監聽?
8033 2024-03-02 16:33:56,579
8031 2024-03-02 16:33:57,591
8032 2024-03-02 16:33:58,463
3. 從日誌中,你如何知道 ResourceManager 已成功轉換到活動狀態?
ResourceManager: active state 2024-03-02 16:34:00,025
### YARN web UI
http://10.167.218.144:8088/cluster


### HDFS
- 情境:
作為一名資料工程師,你正在使用 Pseudo-Distributed 模式的 Hadoop 集群來處理和分析文本數據。
你的任務是利用 Hadoop 的 wordcount 範例應用程序來計算一批文本文件中單詞出現的頻率。
為了組織這次的數據處理任務,你決定創建一個名為 project_wordcount 的目錄結構,在 raw_data 目錄中存放待處理的文本文件,並將 wordcount 處理後的結果保存到 processed_data 目錄中。
- 題目:
1. 把過程傳上來
1. 把 processed_data 目錄中的結果傳上來
1. 把 YARN 上 submit 的 App 連結貼上來
- 提示:
1. 本地待處理的數據 (sample_text.txt):
```
Hadoop is an open-source framework that allows to store and process big data in a distributed environment.
Hadoop enables data processing over large clusters of computers.
```
- 預期輸出:
```
hdfs dfs -cat /project_wordcount/processed_data/part-r-00000
Hadoop 2
a 1
allows 1
an 1
and 1
big 1
clusters 1
computers. 1
data 2
distributed 1
enables 1
environment. 1
framework 1
in 1
is 1
large 1
of 1
open-source 1
over 1
process 1
processing 1
store 1
that 1
to 1
```
```bash=
hdfs dfs -mkdir -p /project_wordcount/rawdata
hdfs dfs -ls /project_wordcount/rawdata
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /project_wordcount/rawdata/sample_text.txt /project_wordcount/processed_data
```


### Hadoop Pseudo-Distributed error counts
- 情境
在一個快節奏且數據驅動的業務環境中,確保 Hadoop 服務運行的健康和穩定性至關重要。運維團隊經常需要對系統日誌進行深入分析,以識別和預防潛在的問題。為了提高這一過程的效率,你被委以重任,利用 Hadoop 自身的能力來自動化這一分析過程。
- 題目
你的公司希望你對 Hadoop 服務的運行狀態進行深入分析,以評估其健康程度。作為準備階段,我們首先需要從系統日誌中整理出運行狀態的基本信息。具體來說,我們希望你:
1. 選取 Utilities –> Logs 頁面的第一個日誌文件作為輸入數據源。
1. 使用 grep 功能,統計所有以 "ERROR" 開頭的日誌訊息條目的總數。
1. 如果沒有以 "ERROR" 開頭的日誌條目,則按以下順序查找 "WARN" 和 "INFO" 開頭的日誌條目。
- 預期上傳檔案
1. 日誌文件:選取的原始日誌文件。
1. Reduce 指令:用於處理日誌文件的 MapReduce 程序或腳本
1. 輸出文件:分析結果的輸出文件,顯示 "ERROR"、"WARN" 和 "INFO" 日誌條目的統計數據。
程序
```
把要查看的log檔案寫到txt檔案
cat /usr/local/hadoop/logs/hadoop-hadoop-datanode-master.log | grep ERROR >datanode_log_ERROR.txt
&&
建立hdfs資料夾
hdfs dfs -mkdir -p /user/hadoop/project_wordcount/HW_raw_data
查詢hdfs資料夾
hdfs dfs -ls /user/hadoop/project_wordcount/HW_raw_data
&&
把檔案放到hdfs
hdfs dfs -put /usr/local/hadoop/logs/datanode_log_ERROR.txt /user/hadoop/project_wordcount/HW_raw_data
&&
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep /user/hadoop/project_wordcount/HW_raw_data /user/hadoop/project_wordcount/HW_processed_data 'ERROR'
&&
hdfs dfs -cat /user/hadoop/project_wordcount/HW_processed_data/part-r-00000
```


### Fully-Distributed-Topology
綁定主機名稱與 IP 位址:
設定於 Master, Slave1, Slave2 三台主機:
ssh hadoop@10.167.218.144
```
sudo vi /etc/hosts
10.167.218.144 master
10.167.218.181 slave1
10.167.218.151 slave2
```
老師
```
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC8ev4KiOYZJXpL8rnHY1a9BQIUeJTdPQRTWG/xdekqx96CpQTJcHYlBh6NZ9J2Rbwy5ngtoM+dqsLGa4OSJN67wCqypP6V2f7cg7PTD+ljv3JXo3kd6EE1jpH8SgehOBELpJocfX8qJhhAWBFlKPWSXwNnSpeLqMD9fgn8BGvahsSxjD0oxjv8X8nSc/ncWVdAszRBtabQIlQHnELuWBpJjITiZMDGV0ZWeS0DxCaVvHRSpV3z6kqDpG50VrO9vPQN5vKO9nmVa+0tkP+BHD8d2MxlRXoRe/qyUFYCnyvqy70F23X5ZA40d7tcwQkU8QnNjxE/a5/w50v7/s4zAPEJJGmUtaz7+g1+bIFC4GkvxAghT/Pvyy42ZciTaOP3LgVP3o6wtG/mIrw1U/0cFVic+V+MQIiqG7fMEBJ3fBcwcY8rRI/bsi/Zw6tfJY654lBcsdeiEk6gri1/9qYgD8ZaakcqDP7ev9ij5MTIbofzmvs6VLxBcje4xdvPwYVIGd0= hadoop@ntc
```
master公
```
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC9YJrONsV/jzpUnnpzmTKwwLZkTVR1tNAadSM/fMP1q0JiEedFphXnzMbrdzrUrs/YzwbhLwSrFdEuQQH/2DZ5ECdnXH0XCT8zVSFD1ua+0oVS9GFv6PHhm6884VLuOOnf6mUFe5z3Bc21TPtae038uj/ptSMHXXeQLa6RqXcXQqjrX5Qolu45qOCafo8MumJ4HuiUPA6Jm48qFkSMUhNao9CPtenf7SddTzEIKbLRhG+26Uf4CeZaMdk5mRkeUi1wWVIBz0bwPoMb0N+KtEsORkMM5X3j+kXZ53yCze8aoN6bdmuFFqg2kmU7di8hq/iTL4RiCIEkbp4oN8APHuuv9od2AjtM+COShLSDwzhOTaHAHItAXxQiwmcLWwDNTYJyDP7KIdi5mylx+HmHL7XdSB68rBmkfITeQ9QvWF2EZU9nyi2KtAVE1eiuiPuIs+UWvMHIOcmYG2jDAfIjAmEG7cVbm/ADKLdAxnf7Q9iHAdZUunoWV6HNhawZ+XX0eu0= hadoop@kao
```
slave1公
```
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDEzXiiSGW8A2VYq+YZJoY6VrK18Jt1oaI+X8nep/KHGx1trcDa9xF2YHGsdPUJyxtP9JhPANLaqt/crU2tjvUwRDVxgI0hvjmwuVVSYtdqCy3qBaTgVU6LodWLUK9AtiKfiEUDUXyuVP5vGG6RG4aDGDWUTTMWTVJgW2G5aT9g0riAOQ7YHppOM0ddlg5ega948cz7L/kGOUwX7yoTp+PTPDu+bUHpKIV+S1s/enDotoisw3tR4lFvN8Zys+E57BpuFVtm3WU2j1j8OysHmLxrQWAd/L3RKSB8gd9DbFfwKetEm78Y+RBdYrdZfWvK44JYb6my/RN5juLfifNOWCsslg9otczSxI66ki63si0YM07OBZ/0KmGq+QKz6nKSPPOJkpP+z89lBj3ZSjJBm2gLbwc0eNbzoQ9QXxt0nM9XaE6AgiH+uxcdkiOReyfXEejGJQCRZuxhtBHCdCEcXUEyseXOliVr3y8ubyD8nz3Eq8PWGaIMW73RnQQWMJJ4yWU= hadoop@slave1
```
slave2公
```
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDCHGPkb2Zk0Pdj348EgNb9ztfkXvt3G9ZSP0X1SLOj2o6WrUVqqASXWs9AwkMESsHkJ5bH1ttPtAIN+MWZLwTRSwqhFIpPGX7rAe2H8XUcy5hncLCDzNpauHe4kS7q6ge4i1vrn5AZFR+srqln6wd1oYjPsOW87FsPh63crgameH3jXB5ix5TeEtO/s2ovIjYTkZemw4IPIdZs6QFDUWg8GUwHTounm0SD+ELGeSON0pa9Nwhh3zIj9oFvDFaCo9Z9hMXJmHIxtezNDkzSDPJ66OhrRPhF1mysQ5PG68WH6wC6jy/anQaxIKJJjOXFW8TLKEYqQrM3D4vJRTthNLfGMENcL5usEMYEh+BkfGvjzVkVM1VosRvJ3t7T7jyo8ePcYvjjnNll5f+Cx7txOLjpr8ZIuEWkec7HtjyRXFTTisr9sVWDTuzApkJ/8mGKhlGB8PT9P7wIW3kQRO+wvuPnLqp/kw/fNCHnEDYX9MGVCaagVSVdvMq+0CXptbNX/78= hadoop@slave2
```
### Fully-Distributed Deploy (Signal-Host)
#### core-site.xml
vim $HADOOP_HOME/etc/hadoop/core-site.xml
```
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>fs.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
```
```bash=
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>fs.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
```
#### hdfs-site.xml
vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
```
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property>
</configuration>
```
```
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--censed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property>
</configuration>
```
#### yarn-site.xml
vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
```
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
```

#### mapred-site.xml
vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
這樣會沒有辦法執行:(老師亂寫)
```
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>/usr/local/hadoop</value>
</property>
</configuration>
```
改成這樣:(上面是老師亂寫)
```
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
</configuration>
```
#### 指定 workers:
在所有機器中,新增以下檔案,以指定 workers
路徑: /usr/local/hadoop/etc/hadoop/workers
內容
```
vim /usr/local/hadoop/etc/hadoop/workers
slave1
slave2
```
以 SSH 協定佈署到各個位置:
在 Master 機器中,將設定檔複製到 slave1, slave2
```bash=
slave1:
scp /usr/local/hadoop/etc/hadoop/* hadoop@slave1:/usr/local/hadoop/etc/hadoop/
slave2:
scp /usr/local/hadoop/etc/hadoop/* hadoop@slave2:/usr/local/hadoop/etc/hadoop/
```

#### 出現問題
master少DataNode、NodeManager
slave2少NodeManager

#### 解決方法
1. 把設定檔案刪掉重新建立
2. stop-all
3. 格式化
4. rm /tmp
5. start-all
6. watch jps:查詢jps派工狀況
#### 正常結果:有派工給奴隸
master


slave1


slave2


Datanode(localhost:9870)

scp /usr/local/hadoop/etc/hadoop/* hadoop@wei:/usr/local/hadoop/etc/hadoop/
scp /usr/local/hadoop/etc/hadoop/* hadoop@slave1:/usr/local/hadoop/etc/hadoop/
scp ~/.ssh/authorized_keys hadoop@wei:~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys hadoop@slave1:~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys hadoop@slave2:~/.ssh/authorized_keys
/etc/hosts 也要都一樣



### jps有問題
stop-all
pkill -9 java 刪乾淨jps
port in use
### mapper
mapper.py
```bash=
import sys
for line in sys.stdin:
line = line.strip() # 去除首尾空格
words = line.split()
for word in words:
print(word + "," + "1")
```
```
echo "Deer Bear River Car Car River Deer Car Bear" | python3 mapper.py
```

### reducer
reducer.py
```bash=
import sys
line_input = []
for line in sys.stdin:
line = line.strip() # 去除首尾空格
arr_line = line.split(",")
line_input.append(arr_line)
result = {}
for item in line_input:
key = item[0] # 取得元素作為鍵
count = int(item[1]) # 取得計數值並轉換為整數
if key in result:
result[key] += count # 如果鍵已存在,增加計數值
else:
result[key] = count # 如果鍵不存在,設定初始計數值
# 輸出結果
for key, value in result.items():
print(f"{key},{value}")
```
```bash=
echo "Deer Bear River Car Car River Deer Car Bear" | python3 mapper.py | python3 reducer.py
```
### 在 Hadoop 測試 Python 撰寫的 MapReduce
```
/usr/local/hadoop/bin/hadoop jar '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar' \
-files ~/mapper.py, ~/reducer.py \
-mapper 'python3 mapper.py' \
-reducer 'python3 reducer.py' \
-input hdfs:<path to input file> \
-output hdfs:<path to output>
```
上傳 input file:
```
echo "Deer Bear River Car Car River Deer Car Bear" > input.txt
hdfs dfs -ls /
hdfs dfs -mkdir /input
hdfs dfs -put input.txt /input/
hdfs dfs -ls /input
```
Run MapReduce
檔案位置
/home/hadoop/mapper.py
/home/hadoop/reducer.py
```
/usr/local/hadoop/bin/hadoop jar '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar' \
-files /home/hadoop/mapper.py,/home/hadoop/reducer.py \
-mapper 'python3 mapper.py' \
-reducer 'python3 reducer.py' \
-input hdfs:/input/input.txt \
-output hdfs:/result
```
ans.
```
/usr/local/hadoop/bin/hadoop jar '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar' \
-files /home/hadoop/workshop/mina/mapper.py,/home/hadoop/workshop/mina/reducer.py \
-mapper 'python3 mapper.py' \
-reducer 'python3 reducer.py' \
-input hdfs:/input/input.txt \
-output hdfs:/result_mina
```
```
hdfs dfs -ls /result_mina
hdfs dfs -cat /result_mina/part-00000
```

ssh hadoop@10.167.218.144
ssh hadoop@10.167.218.181
ssh hadoop@10.167.218.151
### 修改設定slave
步驟一:關閉hadoop
/usr/local/hadoop/sbin/stop-all.sh
步驟二:抓公鑰給別人
cat ~/.ssh/id_rsa.pub
步驟三:加入別人的公鑰
echo '別人的公鑰' >> ~/.ssh/authorized_keys
步驟四:修改IP對應名稱,用一個好辨識的名稱即可(master、slave都要做)
sudo vim /etc/hosts
步驟五:將別人的slave加入員工清單,把前一步驟好辨識的名稱加入
vim /usr/local/hadoop/etc/hadoop/workers
步驟六:移除暫存、設定,格式化hdfs
rm -rf ~/hadoopdata/hdfs/*
rm -rf /tmp/*
rm -rf /usr/local/hadoop/tmp/*
rm -rf /usr/local/hadoop/logs/*
rm -rf /usr/local/hadoop/hdfs/name/*
rm -rf /usr/local/hadoop/hdfs/data/*
mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}
/usr/local/hadoop/bin/hdfs namenode -format -y
新增此兩項
rm -rf ~/.ssh/known_hosts
pkill -9 java
步驟七:將設定檔匯入對方電腦
scp /usr/local/hadoop/etc/hadoop/* hadoop@slave1:/usr/local/hadoop/etc/hadoop/
scp /usr/local/hadoop/etc/hadoop/* hadoop@slave2:/usr/local/hadoop/etc/hadoop/
scp /etc/hosts [user_name]@[server_name]:/etc/hosts
補充
worker(放在/usr/local/hadoop/etc/hadoop/)、
authrization ,這兩個檔案也要scp給slave
步驟八:開啟hadoop
/usr/local/hadoop/sbin/start-all.sh
補充:只要master做就好