Big Data - Data Engineering Grundlagen ====================================== Short URL: https://bit.ly/2Tcf9S0 Cheat Sheets: https://ufile.io/njrucldo Folien: https://ufile.io/9gdul2ag HDFS Puzzle: https://ufile.io/8jmbxuin Putty Download: https://the.earth.li/~sgtatham/putty/latest/w32/putty.exe MapReduce Beispiel (Word Count): https://wiki.apache.org/hadoop/WordCount YARN Beispiel: https://github.com/hortonworks/simple-yarn-app/blob/master/src/main/java/com/hortonworks/simpleyarnapp/ApplicationMaster.java Distributed Consensus (Zookeeper): https://www.confluent.io/blog/distributed-consensus-reloaded-apache-zookeeper-and-replication-in-kafka/ ## Serverliste vmniedersachsen-polizei-de001.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de002.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de003.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de004.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de005.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de006.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de007.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de008.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de009.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de010.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de011.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de012.westeurope.cloudapp.azure.com vmniedersachsen-polizei-de013.westeurope.cloudapp.azure.com ### SSH Zugangsdaten User: trainadm Passwort: Train@Woodmark Port: 22 ## Ambari [http://vmniedersachsen-polizei-de[001-013].westeurope.cloudapp.azure.com:8080](http://vmniedersachsen-polizei-de[001-013].westeurope.cloudapp.azure.com:8080) User: admin Passwort: admin ## Zeppelin [http://vmniedersachsen-polizei-de[001-013].westeurope.cloudapp.azure.com:9995](http://vmniedersachsen-polizei-de[001-013].westeurope.cloudapp.azure.com:9995) User: admin Passwort: admin ## Spark Streaming Zeppelin: `GIP_Aufgaben/Data Engineering/Twitter_SparkStreaming_Kafka` Twitter-Credentials: https://hackmd.io/@ludo/SJj6mKIlI ## Nifi [http://vmniedersachsen-polizei-de[001-013].westeurope.cloudapp.azure.com:9090/nifi](http://vmniedersachsen-polizei-de[001-013].westeurope.cloudapp.azure.com:9090/nifi) ## Sqoop ``` sqoop export --connect jdbc:postgresql://localhost:5432/twitter --username twitter --password bigdata --table tweets_per_day --export-dir /apps/hive/warehous/twitter.db/tweets_per_day --fields-terminated-by '|' ``` ``` SELECT word, count(*) as wcount FROM solution_twitter.tweets LATERAL VIEW explode(split(tweets.hashtag, ', ')) tweets as word GROUP BY word ORDER BY wcount DESC LIMIT 10; ``` ## Kafka ``` cd /usr/hdp/current/kafka-broker/bin ## Produce # Erstelle ein Topic ./kafka-topics.sh --create --zookeeper hdp-sandbox.train.woodmark.de:2181 --replication-factor 1 --partitions 4 --topic testtopic # Zeige welche Topics angelegt sind /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper hdp-sandbox.train.woodmark.de:2181 # Zeige mir Details zu einem bestimmten Topic an ./kafka-topics.sh --describe --zookeeper hdp-sandbox.train.woodmark.de:2181 --topic testtopic # Schreibe eine Message echo "test line" | ./kafka-console-producer.sh --broker-list hdp-sandbox.train.woodmark.de:6667 --topic testtopic # Schreibe eine Message (kontinuierlich) ./kafka-console-producer.sh --broker-list hdp-sandbox.train.woodmark.de:6667 --topic testtopic ## Consume # Zeige mir den Inhalt einer Topic an (Alles) sh kafka-console-consumer.sh --zookeeper hdp-sandbox.train.woodmark.de:2181 --topic twitter_topic --from-beginning #Verlasse den Kafka Kontext #Strg + C ``` ## HDFS ``` export HADOOP_USER_NAME=hdfs echo "Hallo Welt" > a.txt hdfs dfs -put a.txt /tmp/ hdfs dfs -ls /tmp/a.txt hdfs dfs -mkdir -p /tmp/nifi/test hdfs dfs -mv /tmp/a.txt /tmp/nifi/test hdfs dfs -ls /tmp/nifi/test hdfs dfs -text /tmp/nifi/test/a.txt hdfs dfs –chmod 777 /tmp/nifi ``` ## Hive ``` beeline !connect jdbc:hive2://localhost:10000/default Enter username for jdbc:hive2://localhost:10000/default: <press Enter> Enter password for jdbc:hive2://localhost:10000/default: <press Enter> ```