---
tags: Spark
title: Basic Setting
---
# Ubuntu 20.04.1 Hadoop 3.2.1 Spark 3.0.1 -- Single Node(Standalone)
---
[TOC]
---
## Ubuntu 20.04.1 LTS
[Website of Ubuntu Desktop](https://ubuntu.com/download/desktop)
[Download Link](https://ubuntu.com/download/desktop/thank-you?version=20.04.1&architecture=amd64)
## update
```
sudo apt-get update
sudo apt-get upgrade
sudo apt-get autoremove
```
## Hadoop 3.2.1
[Hadoop 3.2.1](https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz)
[Download Link](https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz)
`wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz`
`sudo tar -xvf hadoop-3.2.1.tar.gz`
## Spark 3.0.1
[Spark 3.0.1](https://spark.apache.org/downloads.html)
[Download Link w/ hadoop 3.2 prebuild](https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz)
`wget https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz`
`sudo tar -xvf spark-3.0.1-bin-hadoop3.2.tgz`
## java
testing:
`java --version`
`sudo apt install openjdk-8-jre-headless`
`sudo apt install openjdk-8-jdk-headless`
java place:
`update-alternatives --display java`
## ssh, pdsh
Download:
`sudo apt install ssh`
`sudo apt install pdsh`
Generate key:
```
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
```
try ssh:
`ssh localhost`
## folder setting
`cd /home/master/cluster`
pcdm@master:~$ --> pcdm@master:~/cluster$
`pcdm@master:~/cluster$ sudo mv hadoop-3.2.1 hadoop`
`pcdm@master:~/cluster$ sudo mv spark-3.0.1-bin-hadoop3.2 spark`
這裡小心,如果有用adduser 記得要用adduser後的名字
`pcdm@master:~/cluster$ sudo chown -R pcdm:master hadoop`
`pcdm@master:~/cluster$ sudo chown -R pcdm:master spark`
## .bashrc setting
`sudo gedit ~/.bashrc`
```
export PDSH_RCMD_TYPE=ssh
export JAVA_HOME=/usr/lib/jvm/"java-8-openjdk-amd64"
export CLASSPTH=$JAVA_HOME/lib
export HADOOP_INSTALL=/home/master/cluster/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export HADOOP_CLASSPATH=$(hadoop classpath)
export SPARK_HOME=/home/master/cluster/spark
export PATH=$PATH:$SPARK_HOME/bin
export PATH=$PATH:$SPARK_HOME/sbin
```
python:
`sudo ln -s /usr/bin/python3 /usr/bin/python`
Dont forget:
`source ~/.bashrc`
## Hadoop Setting (with Yarn)
core-site.xml:
`cd ~/cluster/hadoop/etc/hadoop/`
`sudo gedit core-site.xml`
```
<!-- namenode的位置 -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
```
hdfs-site.xml:
datafile
```
mkdir ~/cluster/hdfs
mkdir ~/cluster/hdfs/datanode
mkdir ~/cluster/hdfs/namenode
```
setting
`cd ~/cluster/hadoop/etc/hadoop/`
`sudo gedit hdfs-site.xml`
```
<configuration>
<!-- hdfs的數據副本數量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- namenode的儲存位置 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/master/cluster/hdfs/namenode</value>
</property>
<!-- datanode的儲存位置 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/master/cluster/hdfs/datanode</value>
</property>
</configuration>
```
mapred-site.xml
```
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
```
yarn-site.xml
```
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
```
## Execution_hadoop
Format the filesystem
`hdfs namenode -format`
Start dfs, yarn:
`start-dfs.sh`
`start-yarn.sh`
or `start-all.sh`
Stop dfs, yarn:
`stop-dfs.sh`
`stop-yarn.sh`
or `stop-all.sh`
## Execution_spark
Master:
`start-master.sh`
Slaves:
`start-slaves.sh`
## how-to-turn-off-info-logging-in-spark
first,
`cp conf/log4j.properties.template conf/log4j.properties`
then
change
`log4j.rootCategory=INFO, console`
to
`log4j.rootCategory=WARN, console
`