# Como Usar o Jupyter Hub no Cluster EMR
* Passo 0) Entrar na página do EMR AWS
* Passo 1) Clonar um cluster e.g.: ds__base_impacto__20220711__1m4.large_3m3.xlarge
* [exemplo](https://us-east-1.console.aws.amazon.com/elasticmapreduce/home?region=us-east-1#cluster-details:j-1SI0RL8POR3E1)
* Passo 2) Esperar o Cluster Inicializar;
* Passo 3) Ir em Histórico do Aplicativo e procurar o Application JupyterHub
* Passo 4) Copiar a URL e colar no navegador
* login: jovyan senha: jupyter
* 
```python=
```from pyspark.sql import SparkSession
def get_spark(memory: int, experiment: bool = True):
"""Creates a PySpark session
:param memory: value of memory to use for driver and executor
:param experiment: flag to define if the current use of the session is for
experiments or not to activate arrow optimizations. Defaults to True
:type memory: int
:type experiment: bool, optional
:return: PySpark session initialized
:rtype: SparkSession
"""
spark = (
SparkSession
.builder
.config('spark.sql.broadcastTimeout', '360000')
.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer')
.config('spark.sql.execution.arrow.pyspark.enabled', str(not experiment).lower())
.config('spark.sql.execution.arrow.pyspark.fallback.enabled', str(not experiment).lower())
.config('spark.driver.memory', f'{memory}G')
.config('spark.executor.memory', f'{memory}G')
.config('spark.driver.maxResultSize', '4G')
.config('spark.sql.debug.maxToStringFields', 100)
.config('spark.ui.showConsoleProgress', 'true')
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.0")
.getOrCreate()
)
spark.sparkContext.setLogLevel('ERROR')
return spark
```
imports iniciais
# Pyspark
```python
import pyspark.sql.functions as f
from pyspark.sql.window import Window
from functools import reduce
import warnings
warnings.filterwarnings("ignore")
jovyan
jupyter
```