# Installing the Hadoop Stack using Docker
Follow the instructions to install Hadoop and other tools on Docker in a few simple steps.
Make sure you do all these steps on your **host OS**, not any VM, because Docker acts like a VM in itself, so a VM inside a VM doesn't make sense.
1. Download Git from [this website](https://git-scm.com/downloads) and install it. Just click `Next` for all steps in the installation.
2. Download and install Docker Desktop from [this website](https://www.docker.com/products/docker-desktop/). You might have to restart your PC after the installation.
3. Start the Docker Desktop application (which starts the Docker process in the background).
From here, you have two options: get a prebuilt image or build the image yourselves.
---
## Using a prebuilt image
1. Open a new terminal on your PC, and run your CPU-specific command to pull the appropriate image from Docker Hub:
```bash
# AMD-based (Intel)
docker pull silicoflare/hadoop:amd
# ARM-based (Mac M series)
docker pull silicoflare/hadoop:arm
```
---
## Build the image yourselves
1. Open a new terminal in your PC, and type the following command to clone my installation repository. Make sure you run your OS-specific command only.
```bash
# AMD-based (Intel)
git clone -b amd --single-branch https://github.com/silicoflare/docker-hadoop
# ARM-based (Mac M series)
git clone -b arm --single-branch https://github.com/silicoflare/docker-hadoop
```
2. Navigate into the newly created directory
```bash
cd docker-hadoop
```
3. Start the Docker build process of the image. Remember, this process will take anything from 15 to 30 minutes depending on your internet. You might also have to use `sudo` if any permission errors arise.
```bash
docker build -t hadoop .
```
4. Wait for the build to finish.
---
## Using the image
Once you build or pull the image, it is time to create a container with the image. The following command creates a container, maps the required ports, and opens a terminal inside the container. Make sure you replace `SRN` in the command with your SRN in caps.
```bash
docker run -it -p 9870:9870 -p 8088:8088 -p 9864:9864 --name SRN hadoop
```
Once the container is created and a shell like this shows up:
```
root@6aaa78189146:/#
```
Type `init` and press Enter. This stops all running processes, formats the HDFS namenodes, and starts all processes. After everything completes, type `jps` and check if there are around 7 processes.
This completes the installation of all tools required for the Big Data course. Just type `exit` to exit from the container once done.
From the next time, to reopen the container, just open Docker Desktop, open a new terminal, and type the following:
```bash
docker start -ai SRN
```
Once the Docker shell opens, just type `restart` to restart all processes.
# Tips to Remember
- If you get permission errors while using Docker commands, use `sudo` before the commands
- If you get an error that says `docker daemon is not running`, make sure you start Docker Desktop and try again
- To copy files from current directory into the root directory of the container:
```bash
docker cp ./filename SRN:/
```
- To copy files from container to current directory:
```bash
docker cp SRN:/path/to/file .
```
- If you get an error that says `port already allocated`, type `docker ps`, check the running containers and stop them.
- If you get an error that says `request returned Internal Server Error`, it means your Docker build was not successful. Make sure you run the Docker build command again.
If you still have doubts after this, just contact me on my email: [suraj.b.m555@gmail.com](mailto:suraj.b.m555@gmail.com), or add a comment in this page.
---
## Testing the tools
The latest versions of the following tools are all preinstalled in this image:
- hdfs
- pig
- hbase
- hive
- flume-ng
- sqoop
- zookeeper
- spark
- kafka
- postgresql