Prompt:
I want to create a cool data science project. My skills are in using python and pandas and seaborn. I'd like my project to really stand out and be impressive - I'm going to be presenting it to a panel of judges in a "shark tank" style format. The first step is discovering interesting data to analyze. Act as my data science mentor, and ask a few questions about my personal interests, and in turn suggest 5 great project ideas. Please respond in the following format:
Here’s what I got as a response:
My follow up: I actually really like the first one, that one seems particularly relevant to my needs, and interesting in a professional setting. Can you help give me a few more hypotheses that I can explore, as well as some sample code to generate a few visualizations? What kind of predictive models using sci-kit learn would I be able to build?
Visualization 2: Popular Programming Languages
This visualization shows the popularity of different programming languages in the SEA region over time.
Regarding predictive models using scikit-learn, there are several possibilities for this project. Here are a few examples:
Predicting Coding Bootcamp Enrollment: You can build a classification model to predict whether a person is likely to enroll in a coding bootcamp based on factors such as their educational background, work experience, and coding skills.
Predicting Tech Job Openings: Using regression techniques, you can create a model to predict the number of tech job openings in a specific country based on factors like the country's GDP, population, education indicators, and coding education metrics.
Predicting Coding Bootcamp Completion: You can build a binary classification model to predict whether a student will successfully complete a coding bootcamp based on various features such as demographics, enrollment details, and performance in pre-bootcamp assessments.
These are just a few examples of the predictive models you can build using scikit-learn. The choice of the model will depend on the specific question you want to answer and the available data. Remember to preprocess and prepare your data appropriately before training the models.
I hope this helps you get started with your project! Feel free to ask if you have any further questions.
Actually, it looks like we should use this dataset. https://www.kaggle.com/datasets/fccuser/2021-new-coder-survey, because GPT hallucinated - https://www.kaggle.com/rtatman/coding-bootcamp-enrollment-stats doesn’t actually exist. Jesus.
(I used bard)
There’s still quite a ways to go, but I think we have some ideas. Now, we begin with the end in mind. What are some big questions we would want to answer, that a shark tank judge would care about? The idea is to deliver real business value - because your skills are useful.
The question is basically like this: