# Project 5 cmd **Hint:** nano filename.java starts the editor <control> o to save a file <control> x to exit nano. **Copy and paste(transfer data from machine to cluster):** $sftp userID@heinz-jumbo.heinz.cmu.local sftp>put MaxTemperature.java sftp>get MaxTemperature.java **Copy existing directory and content to new directory (R: recursive):** cp -R source_dir destination_dir **List of files of HDFS** $hadoop dfs -ls /user/student169/input/ **Place input file under HDFS for subsequent map reduce processing:** $hadoop dfs -copyFromLocal /home/userID/input/1902.txt /user/userID/input/1902.txt **Look content of a file on HDFS:** $hadoop dfs -cat /user/userID/input/testFile **Remove a local directory of Java classes that may contain Java packages (class is compiled and exists in a directory that corresponds to Java package names):** $rm -r temperature_classes **Copy from within local dir:** cp /home/public/WordCount.java /home/student169/WordCount.java **Copy from local to Hadoop:** hadoop dfs -copyFromLocal /home/public/words.txt /user/student169/input/words.txt ## Start **Connect to Cisco:** ssh -l student169 heinz-jumbo.heinz.cmu.local Lehaoh77++ ## Task0 ``` # Create folder cd Project5 mkdir Part_1 cd Part_1 mkdir Task0 cd Task0 # Create a directory for Java classes mkdir task0_classes # Copy WordCount.java to directory cp /home/public/WordCount.java /home/student169/Project5/Part_1/Task0/WordCount.java # Compile javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./task0_classes -d task0_classes WordCount.java # Generate jar file jar -cvf task0_classes.jar -C task0_classes/ . # Copy words.txt to HDFS hadoop dfs -copyFromLocal /home/public/words.txt /user/student169/input/words.txt # Execute map reduce hadoop jar /home/student169/Project5/Part_1/Task0/task0_classes.jar org.myorg.WordCount /user/student169/input/words.txt /user/student169/output # Check input file hadoop dfs -ls /user/student169/input/ # Check output file hadoop dfs -ls /user/student169/output/ # Remove output directory hadoop dfs -rmr /user/student169/output # Check output hadoop dfs -cat /user/student169/output/part-r-00000 hadoop dfs -cat /user/student169/output/part-r-00001 hadoop dfs -cat /user/student169/output/part-r-00002 # Move to output file hadoop dfs -getmerge /user/student169/output /home/student169/Project5/Part_1/Task0/Task0Output # Check output cat ~/Project5/Part_1/Task0/Task0Output ``` ## Task1 ```bash! TODO: modify java file, no need to sort # Create folder cd Project5 mkdir Part_1 cd Part_1 mkdir Task1 cd Task1 # Create a directory for Java classes mkdir lettercounter_classes # Create LetterCounter.java nano LetterCounter.java 1. copy code 2. control+o 3. enter 4. control+x # Compile javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./lettercounter_classes -d lettercounter_classes LetterCounter.java # Remove file not able to compile if needed rm LetterCounter.java rm task1_classes.jar # Generate jar file jar -cvf lettercount.jar -C lettercounter_classes/ . # Execute map reduce hadoop jar /home/student169/Project5/Part_1/Task1/lettercount.jar org.myorg.LetterCounter /user/student169/input/words.txt /user/student169/output # Remove output directory hadoop dfs -rmr /user/student169/output # Check output of map reduce result hadoop dfs -cat /user/student169/output/part-r-00000 hadoop dfs -cat /user/student169/output/part-r-00001 hadoop dfs -cat /user/student169/output/part-r-00002 # Move to output file hadoop dfs -getmerge /user/student169/output /home/student169/Project5/Part_1/Task1/Task1Output # Check output file cat Task1Output # Sort the merged file and save in different file sort -k 2nr Task1Output > Task1OutputTest # Delete the original file and rename it rm Task1Output mv Task1OutputTest Task1Output cat Task1Output ``` ## Task2 ```bash! TODO: rename java file Under Part_1 folder # Create Task2 folder mkdir Task2 cd Task2 # Create a directory for Java classes mkdir searchfact_classes # Create SearchFact.java nano SearchFact.java 1. copy code 2. control+o 3. enter 4. control+x # Compile javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./searchfact_classes -d searchfact_classes SearchFact.java # Remove file not able to compile if needed rm WordCount.java rm task2_classes.jar # Generate jar file jar -cvf searchfact.jar -C searchfact_classes/ . # Execute map reduce hadoop jar /home/student169/Project5/Part_1/Task2/searchfact.jar org.myorg.SearchFact /user/student169/input/words.txt /user/student169/output # Remove output directory hadoop dfs -rmr /user/student169/output # Check output of map reduce result hadoop dfs -cat /user/student169/output/part-r-00000 hadoop dfs -cat /user/student169/output/part-r-00001 hadoop dfs -cat /user/student169/output/part-r-00002 # Move to output file hadoop dfs -getmerge /user/student169/output /home/student169/Project5/Part_1/Task2/Task2Output # Check output file cat Task2Output ``` ## Task3 ```bash! Under Part_1 folder # Create Task3 folder mkdir Task3 cd Task3 # Copy three files: MaxTemperature.java, MaxTemperatureMapper.java and MaxTemperatureReducer.java to directory cp /home/public/MaxTemperature.java /home/student169/Project5/Part_1/Task3/MaxTemperature.java cp /home/public/MaxTemperatureMapper.java /home/student169/Project5/Part_1/Task3/MaxTemperatureMapper.java cp /home/public/MaxTemperatureReducer.java /home/student169/Project5/Part_1/Task3/MaxTemperatureReducer.java # Create classes for the java file mkdir temperature_classes # Compile three files javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./temperature_classes -d temperature_classes MaxTemperatureMapper.java javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./temperature_classes -d temperature_classes MaxTemperatureReducer.java javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./temperature_classes -d temperature_classes MaxTemperature.java # Generate jar file jar -cvf temperature.jar -C temperature_classes/ . # Copy combinedYears.txt to HDFS hadoop dfs -copyFromLocal /home/public/combinedYears.txt /user/student169/input/combinedYears.txt # Check content of combinedYears.txt hadoop dfs -cat /user/student169/input/combinedYears.txt # Execute map reduce hadoop jar /home/student169/Project5/Part_1/Task3/temperature.jar edu.cmu.andrew.mm6.MaxTemperature /user/student169/input/combinedYears.txt /user/student169/output # Remove output directory hadoop dfs -rmr /user/student169/output # Check output of map reduce result hadoop dfs -cat /user/student169/output/part-00000 hadoop dfs -cat /user/student169/output/part-00001 hadoop dfs -cat /user/student169/output/part-00002 # Move to output file hadoop dfs -getmerge /user/student169/output /home/student169/Project5/Part_1/Task3/Task3Output ``` ## Task4 ```bash! mkdir Task4 cd Task4 # nano three java file after modified nano MinTemperature.java nano MinTemperatureMapper.java nano MinTemperature.java mkdir temperature_classes # Compile javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./temperature_classes -d temperature_classes MinTemperatureMapper.java javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./temperature_classes -d temperature_classes MinTemperatureReducer.java javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./temperature_classes -d temperature_classes MinTemperature.java # Generate jar file jar -cvf mintemperature.jar -C temperature_classes/ . # Remove output hadoop dfs -rmr /user/student169/output # Execute map reduce hadoop jar /home/student169/Project5/Part_1/Task4/mintemperature.jar edu.cmu.andrew.mm6.MinTemperature /user/student169/input/combinedYears.txt /user/student169/output # Check output of map reduce result hadoop dfs -cat /user/student169/output/part-00000 hadoop dfs -cat /user/student169/output/part-00001 hadoop dfs -cat /user/student169/output/part-00002 # Move to output file hadoop dfs -getmerge /user/student169/output /home/student169/Project5/Part_1/Task4/Task4Output ``` ## Task5 ```bash! mkdir Task5 cd Task5 # copy text file to input hadoop dfs -copyFromLocal /home/public/P1V.txt /user/student169/input/P1V.txt # Write in java file nano CountCrime.java # Create class folder mkdir countcrime_classes # Compile javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./countcrime_classes -d countcrime_classes CountCrime.java # Remove output hadoop dfs -rmr /user/student169/output # Create jar file jar -cvf rapesplusrobberies.jar -C countcrime_classes/ . # Execute map reduce hadoop jar /home/student169/Project5/Part_1/Task5/rapesplusrobberies.jar org.myorg.CountCrime /user/student169/input/P1V.txt /user/student169/output # Check output in HDFS hadoop dfs -ls /user/student169/output/ hadoop dfs -cat /user/student169/output/part-r-00000 hadoop dfs -cat /user/student169/output/part-r-00001 hadoop dfs -cat /user/student169/output/part-r-00002 # Move to output file hadoop dfs -getmerge /user/student169/output /home/student169/Project5/Part_1/Task5/Task5Output ``` ## Task6 ```bash! mkdir Task6 cd Task6 mkdir oaklandcrimestats_classes # Write in java file nano AssaultCrimeStats.java # Compile javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./oaklandcrimestats_classes -d oaklandcrimestats_classes AssaultCrimeStats.java # Create jar file jar -cvf oaklandcrimestats.jar -C oaklandcrimestats_classes/ . # Remove output hadoop dfs -rmr /user/student169/output # Execute map reduce hadoop jar /home/student169/Project5/Part_1/Task6/oaklandcrimestats.jar org.myorg.AssaultCrimeStats /user/student169/input/P1V.txt /user/student169/output # Check output in HDFS hadoop dfs -ls /user/student169/output/ hadoop dfs -cat /user/student169/output/part-r-00000 hadoop dfs -cat /user/student169/output/part-r-00001 hadoop dfs -cat /user/student169/output/part-r-00002 # Move to output file hadoop dfs -getmerge /user/student169/output /home/student169/Project5/Part_1/Task6/Task6Output ``` ## Task7 ```bash! mkdir Task7 cd Task7 mkdir oaklandcrimestatskml_classes # Copy text file to input hadoop dfs -copyFromLocal /home/public/CrimeLatLonXYTabs.txt /user/student169/input/CrimeLatLonXYTabs.txt # Create java file nano AssaultCrimeStatsKML.java # Compile javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:./oaklandcrimestatskml_classes -d oaklandcrimestatskml_classes AssaultCrimeStatsKML.java # Remove output hadoop dfs -rmr /user/student169/output # Create jar file jar -cvf oaklandcrimestatskml.jar -C oaklandcrimestatskml_classes/ . # Execute map reduce hadoop jar /home/student169/Project5/Part_1/Task7/oaklandcrimestatskml.jar org.myorg.AssaultCrimeStatsKML /user/student169/input/CrimeLatLonXYTabs.txt /user/student169/output # Check output in HDFS hadoop dfs -ls /user/student169/output/ hadoop dfs -cat /user/student169/output/part-r-00000 hadoop dfs -cat /user/student169/output/part-r-00001 hadoop dfs -cat /user/student169/output/part-r-00002 # Move to output file hadoop dfs -getmerge /user/student169/output /home/student169/Project5/Part_1/Task7/Task7Output ```