Como Usar o Jupyter Hub no Cluster EMR

# Como Usar o Jupyter Hub no Cluster EMR * Passo 0) Entrar na página do EMR AWS * Passo 1) Clonar um cluster e.g.: ds__base_impacto__20220711__1m4.large_3m3.xlarge * [exemplo](https://us-east-1.console.aws.amazon.com/elasticmapreduce/home?region=us-east-1#cluster-details:j-1SI0RL8POR3E1) * Passo 2) Esperar o Cluster Inicializar; * Passo 3) Ir em Histórico do Aplicativo e procurar o Application JupyterHub * Passo 4) Copiar a URL e colar no navegador * login: jovyan senha: jupyter * ![](https://i.imgur.com/0z3SHWV.png) ```python= ```from pyspark.sql import SparkSession def get_spark(memory: int, experiment: bool = True): """Creates a PySpark session :param memory: value of memory to use for driver and executor :param experiment: flag to define if the current use of the session is for experiments or not to activate arrow optimizations. Defaults to True :type memory: int :type experiment: bool, optional :return: PySpark session initialized :rtype: SparkSession """ spark = ( SparkSession .builder .config('spark.sql.broadcastTimeout', '360000') .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') .config('spark.sql.execution.arrow.pyspark.enabled', str(not experiment).lower()) .config('spark.sql.execution.arrow.pyspark.fallback.enabled', str(not experiment).lower()) .config('spark.driver.memory', f'{memory}G') .config('spark.executor.memory', f'{memory}G') .config('spark.driver.maxResultSize', '4G') .config('spark.sql.debug.maxToStringFields', 100) .config('spark.ui.showConsoleProgress', 'true') .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.0") .getOrCreate() ) spark.sparkContext.setLogLevel('ERROR') return spark ``` imports iniciais # Pyspark ```python import pyspark.sql.functions as f from pyspark.sql.window import Window from functools import reduce import warnings warnings.filterwarnings("ignore") jovyan jupyter ```