<img src="https://i.imgur.com/OhTn0sc.png" width="50%" /><br> July 5, 2023 <img src="https://media0.giphy.com/media/3ov9jNziFTMfzSumAw/giphy.gif" /> Remember that assignments are *not graded*. Please follow the instructions below. ## Exercise 1 - GitHub Account :::warning :warning: Indenpendently of this exercise, you will need a GitHub account for submitting homeworks/assignments and measuring progress. Might as well create one now, unless you already have one. ::: If you have a GitHub account, then feel free to skip this exercise. If you do not have a [GitHub](https://github.com/) account, then visit the site and [create an account](https://github.com/join). Register with your Andrew ID. If you do, make sure to [add an alternative (personal) email](https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-personal-account-on-github/managing-email-preferences/adding-an-email-address-to-your-github-account) as well. On Canvas, you will find a discussion board named `Submit and share your GitHub username`. Submit your username in that discussion board. ## Exercise 2 - GitHub repository Go to [GitHub](https://github.com/) and create a new public repository <img src="https://i.imgur.com/S681ws3.png" width="35%" /><br> and name it `data-science` ![](https://hackmd.io/_uploads/SkGvMZ4tn.png) and make sure to initialize it with an empty `README.md` <img src="https://i.imgur.com/QRdubSB.png" width="75%" /><br> ## Exercise 3 - Google Drive directory :::warning :warning: You will need these folders to start working on the next assignments in lab ::: Go to [Google Drive](https://www.google.com/drive/) and login with your Andrew credentials. Create a new folder ![](https://i.imgur.com/nw0TnMl.png) and name it `Data Science ![](https://hackmd.io/_uploads/ryUuw0fYh.png) After you create this folder, create a subfolder inside this folder and name it `datasets`. We will be using these folder to upload public datasets. ## Exercise 4 - Netflix 2021 Go to [Kaggle.com](https://www.kaggle.com/datasets/syedmubarak/netflix-dataset-latest-2021) and download the file `Netflix Dataset Latest 2021.xlsx` from the World Cheese Awards datasets. Upload the file `Netflix Dataset Latest 2021.xlsx` to the folder `Data Science/` in Google Drive. ## Exercise 5 - Ten Simple Rules for Taking Advantage of Git and GitHub Read the article so you have a better understanding of Git and GitHub as tools for version control. [![](https://i.imgur.com/K6KMVSc.png)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004947) ## Exercise 6 - Divide and conquer In simple terms, *divide and conquer* is a problem-solving strategy that involves breaking a big problem into smaller, more manageable parts, solving each part separately, and then combining the solutions to solve the overall problem. In the ^Reading* section for today's lecture there are three papers ![](https://hackmd.io/_uploads/rkEvF0fKn.png) Divide into three groups. Have each group read one of the papers and create a two paragraph summary of the paper and a Power Point slide highlighting the most important things about the paper you were assigned to. ## Exercise 7 - The Past, Present, and Future, of the Data Science Notebook In this project you will learn how to use Google Colab and Jupyter Lab. If you have time, then listen to the episode [The Past, Present, and Future, of the Data Science Notebook](https://www.datacamp.com/podcast/past-present-future-data-science-notebooks). How would you summarize this episode? --- Copyright © 2023 Pittsburgh Supercomputing Center. All Rights Reserved. The [Biomedical Applications Group](https://www.psc.edu/biomedical-applications/) at the [Pittsburgh Supercomputing Center](http://www.psc.edu) in the [Mellon College of Science](https://www.cmu.edu/mcs/) at [Carnegie Mellon University](http://www.cmu.edu).