Big Data - Data Engineering Grundlagen ====================================== Date: December 2020 ## Information Aktuelle IPs: https://bit.ly/3lCsqOA Short URL: https://bit.ly/3l7NQCQ Cheat Sheets: [Hier](https://teams.microsoft.com/_#/files/Allgemein?groupId=309bc066-c881-483b-8623-4758a03139ab&threadId=19%3A0c4235dbf1f4482dafbc122ac83ea581%40thread.tacv2&ctx=channel&context=General&rootfolder=%252Fsites%252FG-BigDataEngineeringSchulungPolizeiakademie%252FFreigegebene%2520Dokumente%252FGeneral) Folien: tbd Aufgaben: [Hier](https://teams.microsoft.com/_#/files/Allgemein?groupId=309bc066-c881-483b-8623-4758a03139ab&threadId=19%3A0c4235dbf1f4482dafbc122ac83ea581%40thread.tacv2&ctx=channel&context=General&rootfolder=%252Fsites%252FG-BigDataEngineeringSchulungPolizeiakademie%252FFreigegebene%2520Dokumente%252FGeneral) HDFS Puzzle: tbd Putty Download: https://the.earth.li/~sgtatham/putty/latest/w32/putty.exe MapReduce Beispiel (Word Count): https://cwiki.apache.org/confluence/display/HADOOP2/WordCount YARN Beispiel: https://github.com/hortonworks/simple-yarn-app/blob/master/src/main/java/com/hortonworks/simpleyarnapp/ApplicationMaster.java Distributed Consensus (Zookeeper): https://www.confluent.io/blog/distributed-consensus-reloaded-apache-zookeeper-and-replication-in-kafka/ ## Kommunikation Microsoft Teams: https://bit.ly/2J5sU2t ### Miro: Einführung: https://miro.com/app/board/o9J_ldNowwk=/ Retrospective: https://miro.com/app/board/o9J_ldNowz8=/ HDFS Puzzle: - Puzzle 1: https://miro.com/app/board/o9J_ldN3scw=/ - Puzzle 2: https://miro.com/app/board/o9J_ldNoO_Q=/ - Puzzle 3: https://miro.com/app/board/o9J_ldN3sQM=/ ## Serverliste Namenskürzel | Server ------- | ------ Ga | vmniedersachsen-polizei-de001.westeurope.cloudapp.azure.com Ho | vmniedersachsen-polizei-de002.westeurope.cloudapp.azure.com Ja | vmniedersachsen-polizei-de003.westeurope.cloudapp.azure.com Mi | vmniedersachsen-polizei-de004.westeurope.cloudapp.azure.com Oe | vmniedersachsen-polizei-de005.westeurope.cloudapp.azure.com Po | vmniedersachsen-polizei-de006.westeurope.cloudapp.azure.com Sc | vmniedersachsen-polizei-de007.westeurope.cloudapp.azure.com St | vmniedersachsen-polizei-de008.westeurope.cloudapp.azure.com Wi | vmniedersachsen-polizei-de009.westeurope.cloudapp.azure.com Ze | vmniedersachsen-polizei-de010.westeurope.cloudapp.azure.com ### SSH Zugangsdaten User: trainadm Passwort: Train@Woodmark Port: 22 ## Ambari [http://vmniedersachsen-polizei-de[001-010].westeurope.cloudapp.azure.com:8080](http://vmniedersachsen-polizei-de[001-010].westeurope.cloudapp.azure.com:8080) User: admin Passwort: admin ## Zeppelin [http://vmniedersachsen-polizei-de[001-010].westeurope.cloudapp.azure.com:9995](http://vmniedersachsen-polizei-de[001-010].westeurope.cloudapp.azure.com:9995) User: admin Passwort: admin ## Spark Streaming Zeppelin: `GIP_Aufgaben/Data Engineering/Twitter_SparkStreaming_Kafka` Twitter-Credentials: https://bit.ly/3lf6XuO ## Nifi [http://vmniedersachsen-polizei-de[001-010].westeurope.cloudapp.azure.com:9090/nifi](http://vmniedersachsen-polizei-de[001-010].westeurope.cloudapp.azure.com:9090/nifi) ## Sqoop ``` sqoop export --connect jdbc:postgresql://localhost:5432/twitter --username twitter --password bigdata --table tweets_per_day --export-dir /apps/hive/warehous/twitter.db/tweets_per_day --fields-terminated-by '|' ``` ``` SELECT word, count(*) as wcount FROM solution_twitter.tweets LATERAL VIEW explode(split(tweets.hashtag, ', ')) tweets as word GROUP BY word ORDER BY wcount DESC LIMIT 10; ``` ## Kafka ``` cd /usr/hdp/current/kafka-broker/bin ## Produce # Erstelle ein Topic ./kafka-topics.sh --create --zookeeper hdp-sandbox.train.woodmark.de:2181 --replication-factor 1 --partitions 4 --topic testtopic # Zeige welche Topics angelegt sind /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper hdp-sandbox.train.woodmark.de:2181 # Zeige mir Details zu einem bestimmten Topic an ./kafka-topics.sh --describe --zookeeper hdp-sandbox.train.woodmark.de:2181 --topic testtopic # Schreibe eine Message echo "test line" | ./kafka-console-producer.sh --broker-list hdp-sandbox.train.woodmark.de:6667 --topic testtopic # Schreibe eine Message (kontinuierlich) ./kafka-console-producer.sh --broker-list hdp-sandbox.train.woodmark.de:6667 --topic testtopic ## Consume # Zeige mir den Inhalt einer Topic an (Alles) sh kafka-console-consumer.sh --zookeeper hdp-sandbox.train.woodmark.de:2181 --topic twitter_topic --from-beginning #Verlasse den Kafka Kontext #Strg + C ``` ## HDFS ``` ``` ## Hive ``` beeline !connect jdbc:hive2://localhost:10000/default Enter username for jdbc:hive2://localhost:10000/default: <press Enter> Enter password for jdbc:hive2://localhost:10000/default: <press Enter> ```
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up