Big Data Lab 1

There are some changes to make for the Lab 1 code to work properly.

Make sure jps shows 6 processes when executed


Install JDK 8

  1. Update packages
sudo apt update -y
sudo apt upgrade -y
  1. Uninstall all other JDK versions
sudo apt remove --purge openjdk* default-jdk*
  1. Install JDK 8
sudo apt install -y openjdk-8-jdk
  1. Check java version
java -version

Install JUnit

sudo apt install -y junit

Edit mapred-site.xml

sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following inside the <configuration> tags:

<property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
    <name>mapreduce.map.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
    <name>mapreduce.reduce.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>


Task 2

  • Delete the 100MB file before executing the block size command each time
hdfs dfs -rm /L1/100MB.txt
  • Use the following block sizes:
    1] 1048576
    2] 2097152
    3] 3145728