Try   HackMD

Stage IV - Presentation & Delivery

Course: Big Data - IU S23
Author: Firas Jolha

Dataset

Agenda

Prerequisites

  • Hortonworks Data Platform (HDP) is installed
  • Python 2.7 is installed
  • Pip 20.3.4 is installed

Objectives

  • Build a web dashboard in Streamlit

Install Streamlit

You can easily install streamlit version 0.55.2 via pip. You just need to run the command:

pip install streamlit --ignore-installed

We used the option --ignore-installed to avoid issues during installation.

Note: When you install a new package in Python 2.7, If pip gives the follwing error:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Then add the line nameserver 8.8.8.8 to the file /etc/resolv.conf.

Build Streamlit app

For the project purposes, you have to display at least the results of EDA and PDA, in addition to data characteristics but try to build a cool dashboard for your project. We can import the package streamlit as follows:

import streamlit as st

As we know that the analysis results are stored as csv files and here we can read them as Pandas or Spark DataFrame as follows:

import pandas as pd

emps = pd.read_csv("data/emps.csv")
depts = pd.read_csv("data/depts.csv")
q1 = pd.read_csv("output/q1.csv")
q2 = pd.read_csv("output/q2.csv")
q3 = pd.read_csv("output/q3.csv")
q4 = pd.read_csv("output/q4.csv")
q5 = pd.read_csv("output/q5.csv")
q6 = pd.read_csv("output/q6.csv")

st.write

st.write is used to display information into your Streamlit app. It does different things depending on what you throw at it. Unlike other Streamlit commands, write() has some unique properties:

  • You can pass in multiple arguments, all of which will be written.
  • Its behavior depends on the input types as follows.
  • It returns None, so its "slot" in the App cannot be reused.

We can print some text on the dashboard.

st.write("# Big Data Project  \n _Employee Salary_$^{Prediction}$ :sunglasses:  \n", "*Year*: **2023**")

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • Using this function, you can print the formatted Markdown string and emoji shortcodes.
  • You can display a dataframe, Matplotlib figure, and Altair chartetc
  • You need to add double whitespace before \n if you need to use it.

We can display a dataframe as follows:

# Display the descriptive information of the dataframe
emps_description = emps.describe()
st.write(emps_description)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

We can display Altair charts as follows:

import altair as alt
c = alt.Chart(emps).mark_circle().encode(
    x='ename', y='deptno', size='sal', color='sal', tooltip=['ename', 'deptno', 'sal'])
st.write(c)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Text elements

Streamlit provides specific functions for different text elements but st.write can be used to perform similar jobs.

st.markdown

Display string formatted as Markdown.

st.markdown("We can add equations such as $sin^2(x)+cos^2(x) = 1$")

st.divider is not supported in Streamlit v0.55.2 but we can use st.markdown("---") for adding dividers.

The function st.markdown(body, unsafe_allow_html = False) has an argument unsafe_allow_html which can be used to add html tags to the dashboard. By default, any HTML tags found in the body will be escaped and therefore treated as pure text. This behavior may be turned off by setting this argument to True.

That said, the package authors strongly advise against it. It is hard to write secure HTML, so by using this argument you may be compromising your users' security. Only for this project, it is fine to use it.

st.title

Display text in title formatting. Each document should have a single st.title(), although this is not enforced.

st.title("# Big Data Project  \n _Employee Salary_$^{Prediction}$ :sunglasses:  \n", "*Year*: **2023**")

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

As you can see, we can not write markdown text for the title. This function will not change the title of the dashboard.

st.header

Display text in header formatting.

st.header("Data Characteristics")

st.subheader

Display text in subheader formatting.

st.subheader("Emps table")

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

st.code

Display a code block with optional syntax highlighting.

st.code("SELECT * FROM employees WHERE deptno = 10;", language = 'sql')

st.text

Write fixed-width and preformatted text.

st.text("This is a text!")

st.latex

Display mathematical expressions formatted as LaTeX. Supported LaTeX functions are listed at Katex.org.

st.latex("sin^2(x)+cos^2(x)=1")

Data display elements

When you're working with data, it is extremely valuable to visualize that data quickly, interactively, and from multiple different angles. That's what Streamlit is actually built and optimized for.

There are two main functions for displaying the dataframes. st.dataframe displays the dataframe as an interactive table whereas st.table displays the dataframe as a static table.

st.dataframe(q1)
st.table(q1)

Chart elements

It is recommended to build charts using Altair or Matplotlib packages since the Streamlit package provides only limited settings then display the charts via st.pyplot or st.altair_chart respectively. Indeed you can add css styles to your dashboard as follows:

st.markdown("<style>{}</style>".format(<YOUR_STYLE>), unsafe_allow_html = True)

Media elements

You can add images to the dashboard via st.image function.

# To center the image
st.markdown("""<style>body {
    background-color: #eee;
}

.fullScreenFrame > div {
    display: flex;
    justify-content: center;
}
</style>""", unsafe_allow_html=True)

# set the image and the caption
st.image("https://i2.wp.com/hr-gazette.com/wp-content/uploads/2018/10/bigstock-Recruitment-Concept-Idea-Of-C-250362193.jpg", caption = "Employees and Departments", width=400)

Status elements

Streamlit provides a few methods that allow you to add animation to your apps. These animations include progress bars, status messages (like warnings), and celebratory balloons.

import time

with st.spinner('Wait for it...'):
    time.sleep(5)
st.balloons()
st.success('Done!')
st.error('This is an error')
st.warning('This is a warning')
st.info('This is a purely informational message')

progress_text = "Operation in progress. Please wait."
st.text(progress_text)
my_bar = st.progress(0)

for percent_complete in range(100):
    time.sleep(0.1)
    my_bar.progress(percent_complete + 1)
st.success("Done!")

Input widgets

With widgets, Streamlit allows you to bake interactivity directly into your apps with buttons, sliders, text inputs, and more.

st.button

Display a button widget.

def clicked():
    st.write('Hello there!')
def unclicked():
    st.write('Goodbye')

if st.button('Say hello'):
    clicked()
else:
    unclicked()

st.checkbox

Display a checkbox widget.

def clicked():
    st.write('Great!')
def unclicked():
    st.write('It is fine!')

if st.checkbox('Do you agree?'):
    clicked()
else:
    unclicked()

st.radio and st.selectbox

st.radio displays a radio button widget.

genre = st.radio(
    "What\'s your favorite movie genre",
    ('Comedy', 'Drama', 'Documentary'))

if genre == 'Comedy':
    st.write('You selected comedy.')
else:
    st.write("You didn\'t select comedy.")

    
option = st.selectbox(
    'How would you like to be contacted?',
    ('Email', 'Home phone', 'Mobile phone'))

st.write('You selected:', option)

st.text_input and st.number_input

st.text_input displays a single-line text input widget. st.number_input displays a numeric input widget.

number = st.number_input('Insert a number')
st.write('The current number is ', number)

title = st.text_input('Movie title', 'Life of Brian')
st.write('The current movie title is', title)

st.date_input and st.time_input

st.date_input displays a date input widget. st.time_input displays a time input widget.

import datetime

d = st.date_input(
    "When\'s your birthday",
    datetime.date(2019, 7, 6))
st.write('Your birthday is:', d)

t = st.time_input('Set an alarm for', datetime.time(8, 45))
st.write('Alarm is set for', t)

Dashboard Example

st.markdown('---')
st.title("Big Data Project **2023**")
st.markdown("""<style>body {
    background-color: #eee;
}

.fullScreenFrame > div {
    display: flex;
    justify-content: center;
}
</style>""", unsafe_allow_html=True)

st.image("https://i2.wp.com/hr-gazette.com/wp-content/uploads/2018/10/bigstock-Recruitment-Concept-Idea-Of-C-250362193.jpg", caption = "Employees and Departments", width=400)

#st.markdown("<p style='text-align: center; color: grey;'>Employees and Departments</p>", unsafe_allow_html=True)

st.markdown('---')
st.header('Descriptive Data Analysis')
st.subheader('Data Characteristics')
emps_dda = pd.DataFrame(data = [["Employees", emps.shape[0]-1, emps.shape[1]], ["Departments", depts.shape[0], depts.shape[1]]],columns = ["Tables", "Features", "Instances"])
st.write(emps_dda)
st.markdown('`emps` table')
st.write(emps.describe())
st.markdown('`depts` table')
st.write(depts.describe())

st.subheader('Some samples from the data')
st.markdown('`emps` table')
st.write(emps.head(5))
st.markdown("`depts` table")
st.write(depts.head(5))

st.markdown('---')
st.header("Exploratory Data Analysis")
st.subheader('Q1')
st.text('The distribution of employees in departments')
st.bar_chart(q1)

st.subheader('Q2')
st.text('The average salary in departments')
st.table(q2)
st.line_chart(q2['sal_avg'], width=400)

st.markdown('---')
st.header('Predictive Data Analytics')
st.subheader('ML Model')
st.markdown('1. Linear Regression Model')
st.markdown('Settings of the model')
st.table(pd.DataFrame([['setting1', 1.0], ['setting2', 0.01], ['....','....']], columns = ['setting', 'value']))

st.markdown('2. SVC Regressor')
st.markdown('Settings of the model')
st.table(pd.DataFrame([['setting1', 1.0], ['setting2', 'linear'], ['....','....']], columns = ['setting', 'value']))

st.subheader('Results')
st.text('Here you can display metrics you are using and values you got')
st.table(pd.DataFrame([]))
st.markdown('<center>Results table</center>', unsafe_allow_html = True)
st.subheader('Training vs. Error chart')
st.write("matplotlib or altair chart")
st.subheader('Prediction')
st.text('Given a sample, predict its value and display results in a table.')
st.text('Here you can use input elements but it is not mandatory')

Run Streamlit

HDP comes with a list of custom ports and you can check them by looking at the prots forwarded in virtual box or docker.

We will use the first port 60000 for Streamlit server. By default, Streamlit uses port 8501, but you can run it on a custom port by specifying the server port as follows:

streamlit run <streamlit_app.py> --server.port 60000

Here we are running the Streamlit app in <streamlit_app.py> on the port 60000. You can open a web tab on your local machine for localhost:60000 to view the Streamlit app.

References