# ITA STEPES-BD
## Recuperação de dados do seguimento Médicos por meio do Apache Spark
- Verificando a conectividade com o cluster: Passou ✅

- Verificando o Schema: Passou ✅
```
v8@ge72-apache:~/ITA/ELE-2-1/CE-229/ipbl2020/2_Médico/Sprint3$ pyspark --master spark://10.0.20.2:7077 --conf "spark.mongodb.input.uri=mongodb://172.17.0.1/StebesBd.physicians?readPreference=primaryPreferred" --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.2
Python 2.7.16 (default, Oct 10 2019, 22:02:15)
[GCC 8.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /home/v8/.ivy2/cache
The jars for the packages stored in: /home/v8/.ivy2/jars
:: loading settings :: url = jar:file:/home/v8/Programs/spark-2.4.5-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-6a424786-a63e-4475-9efa-19c8f5ecc8b9;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.4.2 in central
found org.mongodb#mongo-java-driver;3.12.5 in central
:: resolution report :: resolve 201ms :: artifacts dl 4ms
:: modules in use:
org.mongodb#mongo-java-driver;3.12.5 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.4.2 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 2 | 0 | 0 | 0 || 2 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-6a424786-a63e-4475-9efa-19c8f5ecc8b9
confs: [default]
0 artifacts copied, 2 already retrieved (0kB/11ms)
20/06/26 23:00:59 WARN Utils: Your hostname, ge72-apache resolves to a loopback address: 127.0.1.1; using 192.168.0.13 instead (on interface enp5s0)
20/06/26 23:00:59 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/v8/Programs/spark-2.4.5-bin-hadoop2.7/jars/spark-unsafe_2.11-2.4.5.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/06/26 23:01:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/06/26 23:01:01 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/06/26 23:01:01 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Python version 2.7.16 (default, Oct 10 2019 22:02:15)
SparkSession available as 'spark'.
>>> from pyspark.sql import SparkSession
>>> session = SparkSession.builder().getOrCreate()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Builder' object is not callable
>>> session = SparkSession.builder.getOrCreate()
>>> df = session.read.format("mongo").load()
[Stage 0:> (0 + 0) / 1]20/06/26 23:02:55 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
>>> df.printSchema()
root
|-- _id: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- cpf: double (nullable = true)
|-- crm: string (nullable = true)
|-- name: string (nullable = true)
|-- privateKey: string (nullable = true)
|-- publicKey: string (nullable = true)
```
Obteve-se o Schema esperado para os dados de Médicos.
- Verificar o funcionamento das queries básicas: Falhou ❌
Ocorreu o seguinte erro ao tentar realizar qualquer query para recuperar os dados de Médicos:
```
>>> df.take(10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/sql/dataframe.py", line 574, in take
return self.limit(num).collect()
File "/opt/spark/python/pyspark/sql/dataframe.py", line 535, in collect
sock_info = self._jdf.collectToPython()
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'Unsupported class file major version 58'
```
Aparentemente é algum problema relacionado ao JDK, cuja solução ainda não foi encontrada pela dificuldade de identiticar o erro por meio da mensagem da exception e pela falta de documentação relacionada a esse tipo de falha.