# Hadoop 安裝 ## 環境 #### 安裝環境: - 我們將在 Ubuntu 22.04 Server 上安裝 Hadoop - 建議您安裝在 VirtualBox 上 - 方便課堂上架設多個節點 #### Ubuntu 下載點: - [Get Ubuntu Server](https://ubuntu.com/download/server) #### VirtualBox 參數 (最低要求): - Storage: 20 GB - RAM: 2GB ## 安裝: #### SSH key pair 登入: - 因各個節點,需要以 SSH key pair 方式登入 - [How To Configure SSH Key-Based Authentication on a Linux Server](https://www.digitalocean.com/community/tutorials/how-to-configure-ssh-key-based-authentication-on-a-linux-server) #### Installing Java: - Hadoop 由 Java 建構而成,因此要先安裝 Java 的執行環境 ```bash= sudo apt update sudo apt-get upgrade sudo apt install default-jdk ``` - 驗證你的 Java version ```bash= $ java -version openjdk version "11.0.22" 2024-01-16 OpenJDK Runtime Environment (build 11.0.22+7-post-Ubuntu-0ubuntu222.04.1) OpenJDK 64-Bit Server VM (build 11.0.22+7-post-Ubuntu-0ubuntu222.04.1, mixed mode, sharing) ``` #### 安裝 SSH: ```bash= sudo apt install ssh ``` #### 新增 hadoop user: ```bash= sudo adduser hadoop ``` #### 新增 hadoop 為 sudor: - [visudo,設定可以使用sudo的使用者](https://dchesmis.blogspot.com/2018/05/visudosudo.html) #### 切換 hadoop user: ```bash= su - hadoop ``` #### 設定 SSH 金鑰: ```bash= ssh-keygen -t rsa ``` <kbd>![image](https://hackmd.io/_uploads/rk6ZcJep6.png) </kbd> #### 設定 SSH 登入權限: ```bash= cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 640 ~/.ssh/authorized_keys ``` #### SSH to the localhost (應該要免密碼): <kbd>![image](https://hackmd.io/_uploads/rkJ_qyep6.png) </kbd> #### Download Hadoop: - 下載 Hadoop - [Apache Hadoop 官網](https://hadoop.apache.org/releases.html) ```bash= $ wget https://dlcdn.apache.org/hadoop/common/hadoop-<version>/hadoop-<version>.tar.gz ``` #### Decompression: - 解壓縮安裝包 ```bash= $ tar -xzvf hadoop-<version>.tar.gz ``` #### Move Hadoop binary file: - 將執行檔移動到特定資料夾上 ```bash= $ sudo mv hadoop-<version> /usr/local/hadoop ``` - 執行看看你下載的 Hadoop 是否正常 - 如果出現 `JAVA_HOME is not set and could not be found` - 請往下參考從 - `Configuring Hadoop’s Java Home` 到 - `設定你的 Hadoop 設定檔` 章節 ``` $ /usr/local/hadoop/bin/hadoop ERROR: JAVA_HOME is not set and could not be found. ``` ## Configuring Hadoop’s Java Home - 你需要讓 Hadoop 知道你的 Java 執行環境 #### 找出你的 Java 路徑: ```bash= $ readlink -f /usr/bin/java | sed "s:bin/java::" /usr/lib/jvm/java-11-openjdk-amd64/ ``` #### 設定環境變數: - 開啟 ~/.bashrc ```bash= vi ~/.bashrc ``` - 在最後加上 ```bash= export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 export HADOOP_HOME=/usr/local/hadoop/ export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" ``` - 載入環境變數 ```bash= source ~/.bashrc ``` #### 設定你的 Hadoop 設定檔: ```bash= $ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh ``` - JAVA_HOME ``` JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 ``` ## Running Hadoop ```bash= $ /usr/local/hadoop/bin/hadoop version Hadoop 3.3.6 Source code repository https://github.com/apache/hadoop -r abe5358143720085498613d399be3bbf01e0f131 Compiled by ubuntu on 2022-03-20T01:18Z Compiled with protoc 2.5.0 From source with checksum 39bb14faec14b3aa25388a6d7c345fe8 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.6.jar ``` ## Workshop - 安裝 Hadoop #### 題目: - 你也安裝 Hadoop 在你的 Ubuntu 上吧! - 請上傳你的: 1. 環境變數: ( env | grep HADOOP 輸出畫面 ) 2. Java version 3. Hadoop version 4. SSH 免密碼登入畫面 5. 你的 SSH 公私金鑰 (就業後記得不要公開分享私鑰喔!) #### 有任何的問題請提供: 1. 以上題目所有的成果 2. 你的 IP address / username 3. 準備接收我的公鑰