# Docker image with all required Big-Data tools
> Breaking ur head trying to get all big data tools setup?
> Well here is a little surprise for you:
<hr>
## Reqired software:
* [Docker:link: ](https://www.docker.com/products/docker-desktop/)
* [git:link: ](https://git-scm.com/downloads)
<hr>
## Creating a container:
Please refer the following post : [:link: click here]()
## First steps:
after you get the shell prompt "root@xxxxxx:/# "(do this only after creating container for first time)
<pre>init</pre>
---
# Testing the tools:
## * HADOOP:
> ## Start with the following:
>> check for directories already made:
>> <pre>hdfs dfs -ls \</pre>
>> create a n input directory:
>> <pre>hdfs dfs -mkdir /input </pre>
>> transfer a random file from ur PC or use the below to get one of mine:
>> <pre>wget https://raw.githubusercontent.com/hermanumrao/c-tutorial/main/data_struct/tree_data_structure/AVL_trees.cpp</pre>
>> transfer this file to the hadoop file system
>> <pre>hdfs dfs -put AVL_trees.cpp /input/ </pre>
>> write mapper and reducer functions to count number of words
>> let's say you name them mapper.py and reducer.py
>> to test your mapper and reducer programs locally run the following:
>> <pre>cat AVL_trees.cpp | python3 mapper.py | python3 reducer.py</pre>
>> then run the the following to perform the operation on hdfs using:
>> <pre>chmod +x mapper.py
>> chmod +x reducer.py
>> hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar -input /input/AVL_trees.cpp -output /output -mapper /mapper.py -reducer /reducer.py</pre>
>> finally check for the output:
>> <pre>hdfs dfs -cat /output/part-00000</pre>
## * HIVE:
>> start the hive shell by using:
>> <pre>hive</pre>
>> create a new database in the hive shell
>> <pre>create database if not exists testdatabase;</pre>
>> and now create a table
>> <pre>CREATE TABLE customers (custId INT, fName STRING, lName STRING, city STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'STORED AS TEXTFILE;</pre>
>> finally :
>> <pre>LOAD DATA INPATH '/user/' overwrite into table customers;</pre>
>> you can now quit hive using `<ctrl>+c`
## * PIG:
>> run the entire thing below to create a local file
>> <pre>echo "001,Rajiv,Reddy,21,9848022337,Hyderabad" > /tmp/student_details.txt
>> echo "002,siddarth,Battacharya,22,9848022338,Kolkata" >> /tmp/student_details.txt
>> echo "003,Rajesh,Khanna,22,9848022339,Delhi" >> /tmp/student_details.txt
>> echo "004,Preethi,Agarwal,21,9848022330,Pune" >> /tmp/student_details.txt
>> echo "005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar" >> /tmp/student_details.txt
>> echo "006,Archana,Mishra,23,9848022335,Chennai" >> /tmp/student_details.txt
>> echo "007,Komal,Nayak,24,9848022334,trivendram" >> /tmp/student_details.txt
>> echo "008,Bharathi,Nambiayar,24,9848022333,Chennai" >> /tmp/student_details.txt</pre>
>>now let us start
>> <pre>pig -x local</pre>
## * SQOOP:
>> to check sqoop installation type :
>> <pre>sqoop-version</pre>
## * POSTGRES:
>> to start postgres:
>> <pre>service postgresql start
>> su postgres</pre>
>> now ur prompt should look like postgres@xxxxx : -
>> type:
>> <pre>psql</pre>
## * POSTGRES & SQOOP connection:
> ## Import stuff to postgres
>> open another terminal and start a new docker terminal on it with the same container name: (add sudo in front if required)
>> <pre>docker exec -it anyname bash</pre>
>> also make sure psql is running on a terminal using the commands given previously:
>> lets start with creating a new table on postgres:
>> <pre>ALTER USER postgres PASSWORD 'your_strong_password';
>> CREATE DATABASE testdb ;
>> \c testdb
>> create table stud1(id int,name text);
>> Select * from stud1;
>> </pre>
>> as of now nothing should be in the table
>> now let us start creating a text file on our local system
>> <pre>echo "1,hello" > input.txt
>> echo "2,bye" >> input.txt
>> cat input.txt</pre>
>> lets add this file to hdfs
>> <pre>hdfs dfs -put input.txt /tmp/</pre>
>> and finally let us use sqoop to import this file to postgresql
>> <pre>sqoop export --connect jdbc:postgresql://localhost:5432/testdb --username postgres --password your_strong_password --table stud1 --export-dir /tmp/input.txt</pre>
>>finally to check if the import to postgres was successful
>>on the postgres terminal run:
>><pre>Select * from stud1;</pre>
> ## To Export stuff from postgres
>> <pre>sqoop import --connect jdbc:postgresql://localhost:5432/testdb --username postgres --password your_strong_password --table stud1 --m 1
>>
>> hdfs dfs -cat stud1/part-m-00000</pre>
## * FLUME:
## * KAFKA:
## * SPARK:
---
> ### Exiting container:
> to exit the container just press: `<ctrl>+d`
---
> ### Restarting the container:
> type the following on ur local machine's terminal (add sudo if requires)
> <pre>docker start anyname
> docker exec -it anyname bash</pre>
---
## Tips to Remember
* If you get permission errors while using Docker commands, use sudo before the commands
* If you get an error that says docker daemon is not running, make sure you start Docker Desktop and try again using `docker start anyname`
* To copy files from current directory into the root directory of the container:
<pre>docker cp ./filename anyname:/</pre>
* To copy files from container to current directory:
<pre>docker cp anyname:/path/to/file .</pre>
* If you get an error that says port already allocated, type `docker ps`, check the running containers and stop them using `docker stop anyname`.
* If you get an error that says request returned Internal Server Error, it means your Docker build was not successful. Make sure you run the Docker build command again.