ITA STEPES-BD - HackMD

# ITA STEPES-BD ## Recuperação de dados do seguimento Médicos por meio do Apache Spark - Verificando a conectividade com o cluster: Passou ✅ ![](https://i.imgur.com/Ytey2yh.png) - Verificando o Schema: Passou ✅ ``` v8@ge72-apache:~/ITA/ELE-2-1/CE-229/ipbl2020/2_Médico/Sprint3$ pyspark --master spark://10.0.20.2:7077 --conf "spark.mongodb.input.uri=mongodb://172.17.0.1/StebesBd.physicians?readPreference=primaryPreferred" --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.2 Python 2.7.16 (default, Oct 10 2019, 22:02:15) [GCC 8.3.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. Ivy Default Cache set to: /home/v8/.ivy2/cache The jars for the packages stored in: /home/v8/.ivy2/jars :: loading settings :: url = jar:file:/home/v8/Programs/spark-2.4.5-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-6a424786-a63e-4475-9efa-19c8f5ecc8b9;1.0 confs: [default] found org.mongodb.spark#mongo-spark-connector_2.11;2.4.2 in central found org.mongodb#mongo-java-driver;3.12.5 in central :: resolution report :: resolve 201ms :: artifacts dl 4ms :: modules in use: org.mongodb#mongo-java-driver;3.12.5 from central in [default] org.mongodb.spark#mongo-spark-connector_2.11;2.4.2 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 2 | 0 | 0 | 0 || 2 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-6a424786-a63e-4475-9efa-19c8f5ecc8b9 confs: [default] 0 artifacts copied, 2 already retrieved (0kB/11ms) 20/06/26 23:00:59 WARN Utils: Your hostname, ge72-apache resolves to a loopback address: 127.0.1.1; using 192.168.0.13 instead (on interface enp5s0) 20/06/26 23:00:59 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/v8/Programs/spark-2.4.5-bin-hadoop2.7/jars/spark-unsafe_2.11-2.4.5.jar) to method java.nio.Bits.unaligned() WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 20/06/26 23:01:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/06/26 23:01:01 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 20/06/26 23:01:01 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.5 /_/ Using Python version 2.7.16 (default, Oct 10 2019 22:02:15) SparkSession available as 'spark'. >>> from pyspark.sql import SparkSession >>> session = SparkSession.builder().getOrCreate() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'Builder' object is not callable >>> session = SparkSession.builder.getOrCreate() >>> df = session.read.format("mongo").load() [Stage 0:> (0 + 0) / 1]20/06/26 23:02:55 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources >>> df.printSchema() root |-- _id: struct (nullable = true) | |-- oid: string (nullable = true) |-- cpf: double (nullable = true) |-- crm: string (nullable = true) |-- name: string (nullable = true) |-- privateKey: string (nullable = true) |-- publicKey: string (nullable = true) ``` Obteve-se o Schema esperado para os dados de Médicos. - Verificar o funcionamento das queries básicas: Falhou ❌ Ocorreu o seguinte erro ao tentar realizar qualquer query para recuperar os dados de Médicos: ``` >>> df.take(10) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/spark/python/pyspark/sql/dataframe.py", line 574, in take return self.limit(num).collect() File "/opt/spark/python/pyspark/sql/dataframe.py", line 535, in collect sock_info = self._jdf.collectToPython() File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.IllegalArgumentException: u'Unsupported class file major version 58' ``` Aparentemente é algum problema relacionado ao JDK, cuja solução ainda não foi encontrada pela dificuldade de identiticar o erro por meio da mensagem da exception e pela falta de documentação relacionada a esse tipo de falha.