# Hadoop 安裝
## 環境
#### 安裝環境:
- 我們將在 Ubuntu 22.04 Server 上安裝 Hadoop
- 建議您安裝在 VirtualBox 上
- 方便課堂上架設多個節點
#### Ubuntu 下載點:
- [Get Ubuntu Server](https://ubuntu.com/download/server)
#### VirtualBox 參數 (最低要求):
- Storage: 20 GB
- RAM: 2GB
## 安裝:
#### SSH key pair 登入:
- 因各個節點,需要以 SSH key pair 方式登入
- [How To Configure SSH Key-Based Authentication on a Linux Server](https://www.digitalocean.com/community/tutorials/how-to-configure-ssh-key-based-authentication-on-a-linux-server)
#### Installing Java:
- Hadoop 由 Java 建構而成,因此要先安裝 Java 的執行環境
```bash=
sudo apt update
sudo apt-get upgrade
sudo apt install default-jdk
```
- 驗證你的 Java version
```bash=
$ java -version
openjdk version "11.0.22" 2024-01-16
OpenJDK Runtime Environment (build 11.0.22+7-post-Ubuntu-0ubuntu222.04.1)
OpenJDK 64-Bit Server VM (build 11.0.22+7-post-Ubuntu-0ubuntu222.04.1, mixed mode, sharing)
```
#### 安裝 SSH:
```bash=
sudo apt install ssh
```
#### 新增 hadoop user:
```bash=
sudo adduser hadoop
```
#### 新增 hadoop 為 sudor:
- [visudo,設定可以使用sudo的使用者](https://dchesmis.blogspot.com/2018/05/visudosudo.html)
#### 切換 hadoop user:
```bash=
su - hadoop
```
#### 設定 SSH 金鑰:
```bash=
ssh-keygen -t rsa
```
<kbd>
</kbd>
#### 設定 SSH 登入權限:
```bash=
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 640 ~/.ssh/authorized_keys
```
#### SSH to the localhost (應該要免密碼):
<kbd>
</kbd>
#### Download Hadoop:
- 下載 Hadoop
- [Apache Hadoop 官網](https://hadoop.apache.org/releases.html)
```bash=
$ wget https://dlcdn.apache.org/hadoop/common/hadoop-<version>/hadoop-<version>.tar.gz
```
#### Decompression:
- 解壓縮安裝包
```bash=
$ tar -xzvf hadoop-<version>.tar.gz
```
#### Move Hadoop binary file:
- 將執行檔移動到特定資料夾上
```bash=
$ sudo mv hadoop-<version> /usr/local/hadoop
```
- 執行看看你下載的 Hadoop 是否正常
- 如果出現 `JAVA_HOME is not set and could not be found`
- 請往下參考從
- `Configuring Hadoop’s Java Home` 到
- `設定你的 Hadoop 設定檔` 章節
```
$ /usr/local/hadoop/bin/hadoop
ERROR: JAVA_HOME is not set and could not be found.
```
## Configuring Hadoop’s Java Home
- 你需要讓 Hadoop 知道你的 Java 執行環境
#### 找出你的 Java 路徑:
```bash=
$ readlink -f /usr/bin/java | sed "s:bin/java::"
/usr/lib/jvm/java-11-openjdk-amd64/
```
#### 設定環境變數:
- 開啟 ~/.bashrc
```bash=
vi ~/.bashrc
```
- 在最後加上
```bash=
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop/
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
```
- 載入環境變數
```bash=
source ~/.bashrc
```
#### 設定你的 Hadoop 設定檔:
```bash=
$ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
```
- JAVA_HOME
```
JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
```
## Running Hadoop
```bash=
$ /usr/local/hadoop/bin/hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop -r abe5358143720085498613d399be3bbf01e0f131
Compiled by ubuntu on 2022-03-20T01:18Z
Compiled with protoc 2.5.0
From source with checksum 39bb14faec14b3aa25388a6d7c345fe8
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.6.jar
```
## Workshop - 安裝 Hadoop
#### 題目:
- 你也安裝 Hadoop 在你的 Ubuntu 上吧!
- 請上傳你的:
1. 環境變數: ( env | grep HADOOP 輸出畫面 )
2. Java version
3. Hadoop version
4. SSH 免密碼登入畫面
5. 你的 SSH 公私金鑰 (就業後記得不要公開分享私鑰喔!)
#### 有任何的問題請提供:
1. 以上題目所有的成果
2. 你的 IP address / username
3. 準備接收我的公鑰