<style type="text/css">
.reveal .slides {
min-height: 100%;
width: 100%;
height: 96mm;
}
.reveal img, .reveal video, .reveal iframe {
width: 100%;
min-width: 100%;
max-height: 100%;
}
html, .reveal {
color: #CFC7BF;
}
ol, ul {
font-size: 3rem;
}
.reveal pre code {
display: block;
width:100%;
line-height: 1.5em;
border-radius: 0.9rem;
font-size: 2rem;
padding: 3rem;
overflow: auto;
}
</style>
---
# Career Accelerator in Data Science
## [Northeastern University](https://www.northeastern.edu/)
Data Analysis of ProjectFunction
by Jacob Tessers
---
## Who I Am
- Originally from Southern California, I have lived in Utah for the past 17 years.
- I am married with no children.
- Currently working full-time while finishing this program.
- I am an avid gamer, though I also enjoy movies and learning.
- I hope to establish myself as a data analyst and further my learning to become a data scientist at some point down the line.
---

---
## Technical Abilities
- Some experience with Python before this course, which helped tremendously.
- Completed courses with DataCamp and Coursera.
- Salesforce, particularly Tableau
- SQL
- Github
---
## The Client
- Daryl Cecile (software developer at Capital One) is one of the founders of Project Function.
- Rizwana Khan is the other founder, but I didn't meet her.
- [Project Function](https://projectfunction.io/) has since 2018 taught over 95 sessions in Web Development, Design with Unity and general coding.
- User friendly to beginner and experienced tech learners.
- Sessions are free of charge.
- Large focus on helping minorities succeed in tech.
---
## Coming up with Project Function

---
### Project Goals
- Explore the activites on Project Function’s online platform.
- See if students using the online platform perform better overall.
- Recommend improvements on the ways data is recorded.
- Identify patterns in student behavior using a small data sample.
---
## Project Challenges
- Data came from a data warehouse and was messy.
- There were many irrelevant tables.
- There was duplicated data.
- Some data was incomplete.
- There was a lot of overall missing data as I were given a small sample.
---
##### Small Snapshot of Activity Log

---
### Tech Stack
- Google Sheets - Data came in Excel format
- [Pandas](https://pandas.pydata.org/pandas-docs/stable/)
- [Seaborn](https://seaborn.pydata.org/)
- [Plotly](https://plotly.com/)
- [Matplotlib](https://matplotlib.org/)
---
### Data Journey
- Which tables from the google sheet were relevant?
- Download as .csv.
- Push to Deepnote to query the data.
---
### Data Cleaning
- Repetivtive steps
- Had to use duplicated and drop_duplicates methods a lot
<iframe title="Embedded cell output" src="https://embed.deepnote.com/229baaf3-d665-413f-81cc-b5ca6160d049/6dee76f1-37d5-4e2e-95f0-b53463d43763/2ddf998538d746deb38192a05eda0a78?height=371" height="371" width="800"/>
---
#### Dropping Unnecessary Columns
<iframe title="Embedded cell output" src="https://embed.deepnote.com/8d15e87d-b710-4fff-80ef-ff0e4cb338ff/57808016-6f53-43ea-beb0-100cfe12df57/9add9f616e244b88a6886e337e404eeb?height=83" height="83" width="500"/>
---
### Data Exploration
---
```python=
activity_log['activity'].unique()
```
---

---
### Renaming the Activity Column
As there were many unique activites, with several very similar, I added a column to group activites by an Overview, e.g., "Accessed a Session" for all entries that related to accessing different sessions. here is a small sample of how I used `.apply()` to do that.
```python=
activity_log['activity'].apply(
lambda x: "Accessed a Session" if "Session" in x else x)
```
---
#### Interactive Visualization
---
<iframe title="Embedded cell output" src="https://embed.deepnote.com/229baaf3-d665-413f-81cc-b5ca6160d049/6dee76f1-37d5-4e2e-95f0-b53463d43763/c2517b0e15eb477495d9d13b9e80073a?height=601" height="600" width="700"/>
---
<iframe title="Embedded cell output" src="https://embed.deepnote.com/229baaf3-d665-413f-81cc-b5ca6160d049/6dee76f1-37d5-4e2e-95f0-b53463d43763/5cc9e11279094aa9aadcb3830671b060?height=601" height="601" width="700"/>
---
<iframe title="Embedded cell output" src="https://embed.deepnote.com/229baaf3-d665-413f-81cc-b5ca6160d049/6dee76f1-37d5-4e2e-95f0-b53463d43763/b49b12e0cf434f5db8c3130fef04ab36?height=601" height="601" width="800"/>
---
<iframe title="Embedded cell output" src="https://embed.deepnote.com/8d15e87d-b710-4fff-80ef-ff0e4cb338ff/6dee76f1-37d5-4e2e-95f0-b53463d43763/6b20f04b3a0f4908abcc13018d07836f?height=668" height="668" width="500"/>
---
### Personalizing the Data
By further analysis and incorporation of users on the platform, I was able to see how much involvement a particular user had, distinguishing between Online and In-Person participation.
---
<iframe title="Embedded cell output" src="https://embed.deepnote.com/8d15e87d-b710-4fff-80ef-ff0e4cb338ff/57808016-6f53-43ea-beb0-100cfe12df57/acb8442a3e284cc2a124a8b7305f6a72?height=676" height="676" width="700"/>
---
### Conclusion
- Data in a format makes it much easier to understand.
- We can see that while more learners happened to be online and while they did participate, in-person learners happened to overall participate more.
- Factors could include:
Work schedules,
Life or Sickness,
Fluctuation of Interest
---
### Suggestions
- Log out data as timestamps in Activity log.
- Enforce uniqueness contraints on assignment submissions to overwrite duplicate submissions.
- Larger dataset to explore individual student behaviors.
---
{"metaMigratedAt":"2023-06-17T01:15:08.257Z","metaMigratedFrom":"YAML","title":"Final Presentation","breaks":true,"slideOptions":"{\"theme\":\"dark\",\"transition\":\"fade\"}","contributors":"[{\"id\":\"f0015132-8b8a-41f2-ada9-1b9733cf13b1\",\"add\":8679,\"del\":2757}]"}