Hadoop基本操作

# Hadoop基本操作 ``` sudo su sbin/start-dfs.sh ``` 開好hadoop 可用下面指令檢查: ``` jps ``` ## YARN on a Single Node 編輯以下兩個檔案 etc/hadoop/mapred-site.xml: ``` <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration> ``` etc/hadoop/yarn-site.xml: ``` <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value> </property> </configuration> ``` Start ResourceManager daemon and NodeManager daemon: (在hadoop那資料夾裡執行) ``` sbin/start-yarn.sh ``` 成功就會在 http://localhost:8088/ 看到以下畫面 ![](https://i.imgur.com/39jBSvE.png) ## Do MapReduce Jobs 建立WordCount.java ``` import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 在hadoop-3.3.1/etc/hadoop/hadoop-env.sh 加入 ``` export JAVA_HOME=/usr/java/default export PATH=${JAVA_HOME}/bin:${PATH} export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar ``` CMD編譯java檔生成jar ``` bin/hadoop com.sun.tools.javac.Main WordCount.java jar cf wc.jar WordCount*.class ``` 執行 ``` bin/hadoop fs -ls ``` 看檔案 #### 若Permission Denied出現如下圖 ![](https://i.imgur.com/nDuKnZV.png) 可以在前指令前都加上 ``` sudo -u <supergroup前的名稱> ``` #### 若沒問題應該不會有資料出現先把系統結構創好(個人習慣) | └-user *** └-<用戶名> ********** └-wordcount *************** └-input(放input) *************** └-ouput(不用創) 創資料夾指令 ``` bin/hadoop fs -mkdir -p /user/<用戶名>/wordcount/input ``` 接著在本機創兩個input檔案: file01 ``` Hello World Bye World ``` file02 ``` Hello Hadoop Goodbye Hadoop ``` (vim file01 這樣就行，不用副檔名) 使用以下指令放到hdfs裡: ``` bin/hadoop fs -put <file01路徑>/file01 <剛建的系統路徑>/input bin/hadoop fs -put <file02路徑>/file02 <剛建的系統路徑>/input ``` (後面的路徑以我為範例就是/user/<用戶名>/wordcount/input) 可以用: ``` bin/hadoop fs -ls /user/<用戶名>/wordcount/input ``` 看有沒有檔案 ![](https://i.imgur.com/MyruUXb.png) 有了的話就能執行了 ``` bin/hadoop jar wc.jar WordCount <剛建的系統路徑>/input <剛建的系統路徑>output ``` 然後就沒問題的話會看到map,reduce在跑 (對，只有0便100%) 如下圖 ![](https://i.imgur.com/iccLJoZ.png) 執行 ``` bin/hadoop fs -ls /user/<用戶名>/wordcount/output ``` 可以看到有_SUCCESS跟result ![](https://i.imgur.com/JO3S40G.png) 執行 ``` bin/hadoop fs -cat /user/<用戶名>/wordcount/output/part-r-00000 ``` 就能看到結果了 ![](https://i.imgur.com/B1wykwQ.png) 順帶一提 http://localhost:8088/ 可看到JOB成功 ![](https://i.imgur.com/hknqcxS.png) 用完關掉 ``` sbin/stop-yarn.sh sbin/stop-dfs.sh ```