Installing the Hadoop Stack using Docker

Follow the instructions to install Hadoop and other tools on Docker in a few simple steps.

Make sure you do all these steps on your host OS, not any VM, because Docker acts like a VM in itself, so a VM inside a VM doesn't make sense.

Download Git from this website and install it. Just click Next for all steps in the installation.
Download and install Docker Desktop from this website. You might have to restart your PC after the installation.
Start the Docker Desktop application (which starts the Docker process in the background).

From here, you have two options: get a prebuilt image or build the image yourselves.

Using a prebuilt image

Open a new terminal on your PC, and run your CPU-specific command to pull the appropriate image from Docker Hub:

# AMD-based (Intel)
docker pull silicoflare/hadoop:amd

# ARM-based (Mac M series)
docker pull silicoflare/hadoop:arm

Build the image yourselves

Open a new terminal in your PC, and type the following command to clone my installation repository. Make sure you run your OS-specific command only.

# AMD-based (Intel)
git clone -b amd --single-branch https://github.com/silicoflare/docker-hadoop

# ARM-based (Mac M series)
git clone -b arm --single-branch https://github.com/silicoflare/docker-hadoop

Navigate into the newly created directory

cd docker-hadoop

Start the Docker build process of the image. Remember, this process will take anything from 15 to 30 minutes depending on your internet. You might also have to use sudo if any permission errors arise.

docker build -t hadoop .

Wait for the build to finish.

Using the image

Once you build or pull the image, it is time to create a container with the image. The following command creates a container, maps the required ports, and opens a terminal inside the container. Make sure you replace SRN in the command with your SRN in caps.

docker run -it -p 9870:9870 -p 8088:8088 -p 9864:9864 --name SRN hadoop

Once the container is created and a shell like this shows up:

root@6aaa78189146:/#

Type init and press Enter. This stops all running processes, formats the HDFS namenodes, and starts all processes. After everything completes, type jps and check if there are around 7 processes.

This completes the installation of all tools required for the Big Data course. Just type exit to exit from the container once done.

From the next time, to reopen the container, just open Docker Desktop, open a new terminal, and type the following:

docker start -ai SRN

Once the Docker shell opens, just type restart to restart all processes.

Tips to Remember

If you get permission errors while using Docker commands, use sudo before the commands
If you get an error that says docker daemon is not running, make sure you start Docker Desktop and try again
To copy files from current directory into the root directory of the container:

docker cp ./filename SRN:/

To copy files from container to current directory:

docker cp SRN:/path/to/file .

If you get an error that says port already allocated, type docker ps, check the running containers and stop them.
If you get an error that says request returned Internal Server Error, it means your Docker build was not successful. Make sure you run the Docker build command again.

If you still have doubts after this, just contact me on my email: suraj.b.m555@gmail.com, or add a comment in this page.

Testing the tools

The latest versions of the following tools are all preinstalled in this image:

hdfs
pig
hbase
hive
flume-ng
sqoop
zookeeper
spark
kafka
postgresql

Installing the Hadoop Stack using Docker

Using a prebuilt image

Build the image yourselves

Using the image

Tips to Remember

Testing the tools

Read more

Customizing your Bash Prompt

Big Data Lab 1