---
# System prepended metadata

title: Development Setup

---

# Development Setup

At this point, you should be ready to contribute to larger scale Python projects, and this really requires a more robust development framework on your local computer. On this page, we cover the following tools you'll use for your project in the coming weeks:

- Miniconda
- Environments
- Version Control
- Visual Studio Code

Before you get started, make sure you know how to access your computer's terminal (Mac/Linux) or PowerShell (PC).
Once there, [here is a list of common commands](https://gist.github.com/bradtraversy/cc180de0edee05075a6139e42d5f28ce) to try out (**be careful, and avoid modifying files**).

## Step 1: Miniconda

**Do your best** to get through this first step before you come to class ... We should have time to settle any issues you have in class, but the class meeting will go a lot smoother for you if you have this all set up before we start!

As you've already seen, most of your work in this class (and data science in general) will be done using Python. But before working on any Python-based project, you'll need a Python [environment](https://mljar.com/blog/python-virtual-environment-explained). In this class, we use [*Mini*conda](https://docs.conda.io/en/latest/miniconda.html) to manage our environments (the full [Anaconda](https://docs.anaconda.com/free/anaconda/getting-started/what-is-distro) distribution is bloated and can get quite cumbersome). So, the first thing you'll need to do is **install Miniconda**. This will create your “base” environment.

1. First, navigate to the available **"[Miniconda Installers](https://www.anaconda.com/download/success)",** and download the appropriate *Graphical* installer for your OS. E.g., I have an M2 MacBook, so I select `64-Bit (Apple silicon) Graphical Installer`. Open the installer, and follow the installation prompts.
   - To access conda, just open your command line terminal/shell. You should see `(base)` at the beginning of the prompt. *Note: Windows users may need to [open a specific PowerShell for Anaconda](https://docs.conda.io/projects/conda/en/stable/user-guide/getting-started.html#starting-conda).*
2. *(Recommended for macOS)* Go ahead and (re)install the latest version of [Xcode](https://developer.apple.com/xcode/resources) developer tools from the [app store](https://apps.apple.com/us/app/xcode/id497799835?mt=12). Note: you may need to update your system to the latest version beforehand.
3. *(Optional)* Install [Homebrew](https://brew.sh/) for various tools you may encounter in the future.
   - If you use the copy/paste Terminal install, you need to make sure you're an admin.
   - If you use the .pkg install, read the instructions at the end before closing the installer!
   - You *may* need to add the appropriate `eval ...` line to the bottom of your bash profile. For example, with an M- chip Silicon Mac, I run `echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.bash_profile`.

## Step 2: Projects and Environments

Ideally, every data science project should have its own **[environment](https://mljar.com/blog/python-virtual-environment-explained)**, which is essentially a collection of defined Python packages for a Python project. For example, your weekly codespaces run on a separate environment which gives you access to particular packages. There are actually many different environment management tools, but here we'll just discuss pip and venv.

Before we begin here, create a "projects" folder somewhere convenient on your computer. You can name the folder whatever you like; for example, I have a *“Code”* folder in my home directory (i.e., on a Mac, this is a folder in the same directory as *Movies* and *Pictures*, etc.). Here is where you’ll put all future data science projects, each one contained in a separate folder *within* your projects folder.

A project folder should have an [appropriate](https://gravitydept.com/blog/devising-a-git-repository-naming-convention) name that clearly alludes to the project. Further, the project's environment name, project folder name, and GitHub repository name should all be the same.

*Below, `<myenv>` should be replaced with this name.*

### Pip Environment (recommended)

[Pip](https://pip.pypa.io/en/stable) is the [official](https://packaging.python.org/en/latest/guides/tool-recommendations) package installer for Python, and it is an industry standard for professional Python-based projects. Thankfully, we can create pip environments *using* Miniconda.

1. Start with a yml file, and place it in your project's folder.
   - **For web apps**, like the one you'll build in this class, start with something like [this](https://github.com/leontoddjohnson/dstools/blob/main/envs/env_app.yml).
2. Open the yml file in some kind of text editor:
   - Update the `name` field with your project/environment name.
   - Select a Python version (i.e., replace the `x`). I recommend using the [latest version in "security" status](https://devguide.python.org/versions).
   - Make sure it contains `ipykernel` so you can access Jupyter notebooks.
   - Add (or change) any packages you know you’ll need under `pip:` (and **only** under `pip:`).
3. Navigate to the location of the yml file in your terminal using `cd`. Or, you can drag the folder itself *onto* the terminal window to paste the location.
4. **From the base environment**, in the same place where the yml file is saved, run `conda env create -f environment.yml` to create your environment.

Now you should have access to this environment for coding! See below for instructions on installing packages as you do your work.

### venv Environment (optional)

Venv is another industry standard, lightweight Python environment framework used for more robust Python coding projects. If you’re going to use [venv](https://docs.python.org/3/library/venv.html), make sure you completely deactivate Anaconda by running `conda deactivate` or `deactivate` until you can no longer see `(base)` in the terminal command line. Note, this option requires a bit more set up than what is listed here, depending on the project you're working on.

1. *(Only if you haven't installed Miniconda ...) Make sure you have [Python](https://www.python.org/downloads) installed.*
2. Create a “venvs” folder in your home directory (e.g., use `cd ~` then `mkdir venvs`).
3. Create your environment in the home folder with `python3 -m venv ~/venvs/<myenv>`.
   - You may need to replace `python3` with `python`.
   - Technically, you can replace `~` with whatever path you prefer, but I think the home directory is usually best.
4. Activate the environment with `source ~/venvs/<myenv>/bin/activate`.
   - If you want to get fancy, there are ways to [alias this command](https://linuxize.com/post/how-to-create-bash-aliases) making it easier for you in the future.
5. Set up ipykernel so this environment can be used by Jupyter Notebooks.
   - Run `pip install --upgrade pip`, then `pip install ipykernel`.
6. Do your work, and install as needed (see below).
   - *To return back to Anaconda, you can just close and reopen a new terminal, or run `conda activate`.*
7. When needed, save your environment in a `requirements.txt` with `pip freeze > requirements.txt` (in your project folder directory). This can be used later to recreate the environment.
8. Deactivate this environment with `deactivate`.

### Installing Packages

As you do your work in Python, you'll find that you may need particular packages for your project. To install packages:

1. First, activate your environment from the command line with `conda activate <myenv>`.
2. In pip and venv environments, **only use `pip install __` to install packages.**
   - Every time you install a project-based package, I recommend you [add it to your environment.yml file](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually) **under "pip:"**.
   - Then, *if* you ever do need to share (or recreate) your environment, you could get into the habit of [exporting your environment.yml file](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#exporting-the-environment-yml-file) periodically.
3. You can deactivate your environment with `conda deactivate`.

A few notes:

- In the `(base)` environment, **only use `conda install`** (see [this](https://www.anaconda.com/using-pip-in-a-conda-environment/)).
- **For each new package install, read the documentation \*FIRST\*.** Some packages may require separate steps for install.

## Step 3: Version Control

### Git and GitHub

So far, we've been using Git to track changes *on* GitHub. Now, we will use it to track code locally on your computer (you should already have Git installed on your machine, as it comes with Miniconda). Before doing anything with Git, you'll need to declare your "user" information in the command line by running the following two lines:

```
  git config --global user.email "github-email@example.com"
  git config --global user.name "Your Name"
```

*Note: "github-email@example.com" must match the email address associated with your GitHub account.*

Then, to initiate Git for a project, you can:

- **start locally and publish:** navigate to your project folder in the command line, and run `git init` to initialize the folder as a Git repository. Then, use VS Code (below) to publish this repository to GitHub.
- **start on GitHub and clone:** [create](https://docs.github.com/en/repositories/creating-and-managing-repositories/quickstart-for-repositories) a repository on GitHub, and then use VS Code (below) to [clone](https://github.com/git-guides/git-clone) it locally.

In this class, you can expect to manage all your [Git actions](https://www.atlassian.com/git/tutorials/setting-up-a-repository) within VS Code (see below); this includes pushing, pulling, etc. That said, you may need to use other [git commands](https://education.github.com/git-cheat-sheet-education.pdf) if things get hairy. And, if you accidentally commit/push something that should be ignored, refer to [this article on using git rm](https://www.30secondsofcode.org/git/s/purge-file/).

### Recommendations

- **Use the [.gitignore file](https://www.atlassian.com/git/tutorials/saving-changes/gitignore). It is your friend.**
- (If you want to get fancy …) Have a working branch and a main branch.
  - Do all your work in the working branch, and then merge into the main branch when you have a working version.
- (if you want to get *really* fancy ...) Use [semantic versioning](https://semver.org/#semantic-versioning-200) (*"[major update].[minor update].[patch]",* e.g., 1.1.0) and define [version releases](https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository) as they come.

## Step 4: Visual Studio Code

An integrated development editor (IDE) is a desktop software that makes coding in Python much easier, and it's the best way to manage the kind of coding you'll do with your groups. There are many options for IDEs (e.g., PyCharm, Atom, Spyder, etc.), but as it is easily the most widely used, **VS Code is the recommended IDE for this class**.

To get started:

1. Install [Visual Studio Code](https://code.visualstudio.com/). 
2. Open VS Code. On the left pane of the IDE window, use the button shaped like four boxes to install some useful [extensions](https://marketplace.visualstudio.com/) (use the search bar):
   - [GitHub Pull Requests and Issues tool](https://code.visualstudio.com/docs/sourcecontrol/github)
   - [GitHub Copilot](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot)
   - Python *(this should install a few other related extensions)*
   - Jupyter *(this will also install a few other related extensions)*
   - autoDocstring (Python Docstring Generator)
   - Code Spell Checker
   - Markdown All in One
   - (recommended) Markdown Preview Enhanced
3. **(Optional)** Consider using the [code . command](https://code.visualstudio.com/docs/editor/command-line#_launching-from-command-line) to open VS code from a project folder.
   - *macOS users need to follow an [extra step](https://code.visualstudio.com/docs/setup/mac#_launching-from-the-command-line)*
4. [Sign into GitHub](https://code.visualstudio.com/docs/sourcecontrol/intro-to-git#_set-up-git-in-vs-code) to use version control extensions.
5. Review the [VS Code Intro Videos](https://code.visualstudio.com/docs/getstarted/introvideos) and the [VS Code Tutorial](https://code.visualstudio.com/docs/getstarted/getting-started).
6. Open project your folder(s) as needed, and save your [workspace(s)](https://code.visualstudio.com/docs/editing/workspaces/workspaces) if you like.
7. Choose your environment as needed, and code away!
   - For [notebooks](https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_create-or-open-a-jupyter-notebook), you'll [choose an appropriate kernel](https://code.visualstudio.com/docs/datascience/jupyter-kernel-management) as needed. *With pip, typically your environment will be a "Python" environment.*
   - Otherwise, you'll select an environment from the ["Interpreter" list in VS Code](https://code.visualstudio.com/docs/python/environments#_working-with-python-interpreters). *This is also accessible in the bottom right side of the status bar on VS Code.* 