# Docker image with all required Big-Data tools > Breaking ur head trying to get all big data tools setup? > Well here is a little surprise for you: <hr> ## Reqired software: * [Docker:link: ](https://www.docker.com/products/docker-desktop/) * [git:link: ](https://git-scm.com/downloads) <hr> ## Creating a container: Please refer the following post : [:link: click here]() ## First steps: after you get the shell prompt "root@xxxxxx:/# "(do this only after creating container for first time) <pre>init</pre> --- # Testing the tools: ## * HADOOP: > ## Start with the following: >> check for directories already made: >> <pre>hdfs dfs -ls \</pre> >> create a n input directory: >> <pre>hdfs dfs -mkdir /input </pre> >> transfer a random file from ur PC or use the below to get one of mine: >> <pre>wget https://raw.githubusercontent.com/hermanumrao/c-tutorial/main/data_struct/tree_data_structure/AVL_trees.cpp</pre> >> transfer this file to the hadoop file system >> <pre>hdfs dfs -put AVL_trees.cpp /input/ </pre> >> write mapper and reducer functions to count number of words >> let's say you name them mapper.py and reducer.py >> to test your mapper and reducer programs locally run the following: >> <pre>cat AVL_trees.cpp | python3 mapper.py | python3 reducer.py</pre> >> then run the the following to perform the operation on hdfs using: >> <pre>chmod +x mapper.py >> chmod +x reducer.py >> hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar -input /input/AVL_trees.cpp -output /output -mapper /mapper.py -reducer /reducer.py</pre> >> finally check for the output: >> <pre>hdfs dfs -cat /output/part-00000</pre> ## * HIVE: >> start the hive shell by using: >> <pre>hive</pre> >> create a new database in the hive shell >> <pre>create database if not exists testdatabase;</pre> >> and now create a table >> <pre>CREATE TABLE customers (custId INT, fName STRING, lName STRING, city STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'STORED AS TEXTFILE;</pre> >> finally : >> <pre>LOAD DATA INPATH '/user/' overwrite into table customers;</pre> >> you can now quit hive using `<ctrl>+c` ## * PIG: >> run the entire thing below to create a local file >> <pre>echo "001,Rajiv,Reddy,21,9848022337,Hyderabad" > /tmp/student_details.txt >> echo "002,siddarth,Battacharya,22,9848022338,Kolkata" >> /tmp/student_details.txt >> echo "003,Rajesh,Khanna,22,9848022339,Delhi" >> /tmp/student_details.txt >> echo "004,Preethi,Agarwal,21,9848022330,Pune" >> /tmp/student_details.txt >> echo "005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar" >> /tmp/student_details.txt >> echo "006,Archana,Mishra,23,9848022335,Chennai" >> /tmp/student_details.txt >> echo "007,Komal,Nayak,24,9848022334,trivendram" >> /tmp/student_details.txt >> echo "008,Bharathi,Nambiayar,24,9848022333,Chennai" >> /tmp/student_details.txt</pre> >>now let us start >> <pre>pig -x local</pre> ## * SQOOP: >> to check sqoop installation type : >> <pre>sqoop-version</pre> ## * POSTGRES: >> to start postgres: >> <pre>service postgresql start >> su postgres</pre> >> now ur prompt should look like postgres@xxxxx : - >> type: >> <pre>psql</pre> ## * POSTGRES & SQOOP connection: > ## Import stuff to postgres >> open another terminal and start a new docker terminal on it with the same container name: (add sudo in front if required) >> <pre>docker exec -it anyname bash</pre> >> also make sure psql is running on a terminal using the commands given previously: >> lets start with creating a new table on postgres: >> <pre>ALTER USER postgres PASSWORD 'your_strong_password'; >> CREATE DATABASE testdb ; >> \c testdb >> create table stud1(id int,name text); >> Select * from stud1; >> </pre> >> as of now nothing should be in the table >> now let us start creating a text file on our local system >> <pre>echo "1,hello" > input.txt >> echo "2,bye" >> input.txt >> cat input.txt</pre> >> lets add this file to hdfs >> <pre>hdfs dfs -put input.txt /tmp/</pre> >> and finally let us use sqoop to import this file to postgresql >> <pre>sqoop export --connect jdbc:postgresql://localhost:5432/testdb --username postgres --password your_strong_password --table stud1 --export-dir /tmp/input.txt</pre> >>finally to check if the import to postgres was successful >>on the postgres terminal run: >><pre>Select * from stud1;</pre> > ## To Export stuff from postgres >> <pre>sqoop import --connect jdbc:postgresql://localhost:5432/testdb --username postgres --password your_strong_password --table stud1 --m 1 >> >> hdfs dfs -cat stud1/part-m-00000</pre> ## * FLUME: ## * KAFKA: ## * SPARK: --- > ### Exiting container: > to exit the container just press: `<ctrl>+d` --- > ### Restarting the container: > type the following on ur local machine's terminal (add sudo if requires) > <pre>docker start anyname > docker exec -it anyname bash</pre> --- ## Tips to Remember * If you get permission errors while using Docker commands, use sudo before the commands * If you get an error that says docker daemon is not running, make sure you start Docker Desktop and try again using `docker start anyname` * To copy files from current directory into the root directory of the container: <pre>docker cp ./filename anyname:/</pre> * To copy files from container to current directory: <pre>docker cp anyname:/path/to/file .</pre> * If you get an error that says port already allocated, type `docker ps`, check the running containers and stop them using `docker stop anyname`. * If you get an error that says request returned Internal Server Error, it means your Docker build was not successful. Make sure you run the Docker build command again.