Hadoop Workshop

# Hadoop Workshop ## [計算每個使用者的總消費金額](https://hackmd.io/@yillkid/r1f7HrFHc/https%3A%2F%2Fhackmd.io%2F%40yillkid%2FHJnI-0QBn) * mapper.py ```py= import sys stdInput =[] for line in sys.stdin: line = line.strip() # 去除首尾空格 stdInput.append(line) sorted_data = sorted(stdInput, key=lambda x: str(x.split(',')[0])) for result in sorted_data: print(result) ``` * reducer.py ```py= import sys stdInput =[] for line in sys.stdin: line = line.strip() # 去除首尾空格 stdInput.append(line) output = {} for entry in stdInput: user, value = entry.split(',') value = int(value) if user in output: output[user] += value else: output[user] = value for key, value in output.items(): result = f"{key},{value}" print(result) ``` ``` cat input.txt | python3 mapper.py | python3 reducer.py ``` ![](https://hackmd.io/_uploads/rkVZp6NUh.png) ``` /usr/local/hadoop/bin/hdfs dfs -put input.txt /workshop/ ``` ``` /usr/local/hadoop/bin/hadoop jar '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.5.jar' \ -mapper 'python3 mapper.py' \ -file /home/samantha/mapper.py \ -reducer 'python3 reducer.py' \ -file /home/samantha/reducer.py \ -input hdfs:/workshop/input.txt \ -output hdfs:/workshop/output ```