Python environments workshop

This document was created for a workshop at Harvard Medical School on Nov 14, 2022.
(It may have been edited since then, see top of the page for last change)

Table of contents

Python environments workshop

Overview

Python is an extremely popular, powerful, and flexible programming language and ecosystem. But it can be confusing for newcomers (and even those who have used Python for years) to understand exactly what it means to have Python and Python packages "installed" on a system.

The goal of this workshop & document is to demystify and answer the following questions:

What is Python? What does it mean to "install Python"?
Where does Python live on your system?
What are Python modules and packages?
When I pip install or conda install a package, where do they download from? where does it install to?
When I import a package, where does Python look for it?
How can I reliably (re)create a python environment so that I can reproduce what I did on one system on another?

:eyes: Note: There are many ways to set-up python and install pacakges (indeed, this is one of the reasons it can be so confusing!). The conda-based approach presented here is by no means the only way, so this guide represents an "opiniated" way to approach setting up python environments that is commonly used in data science. But most of the concepts are the same in other environment management systems.

Terms

Python interpreter: The actual python executable that parses and runs human readable source code.

Type which python (mac/linux) or where python (windows) to show the path to the active python interpreter.
Module: An organizational unit of python code. Usually, a single file ending in .py that contains Python definitions and expressions.
Package: A collection of modules. Usually, this is a folder of python modules that also contains an __init__.py file. "Package" also frequently refers to an installable python library/application (e.g. numpy, matplotlib, pandas…)
Package Manager: A program that automates the installation, updating and removal of packages (e.g. pip, conda)
Virtual Environment: An isolated collection of packages, settings, and an associated python interpreter. Virtual environments allow many different collections of Python and packages to exist on the same system
Environment Manager: A program that automates the creation and deletion of virtual environments (e.g. conda, virtualenv, venv)

More common terms and programs in the python ecosystem…

pip: A python package manager. By default, pip installs packages from pypi.org.
PyPI.org ("The python package index"): A repository of python packages (where pip looks for packages).
conda: An environment and package manager. By default, conda installs packages from anaconda.org. Note that conda can install both python and non-python packages. (see also mamba, a fast implementation of conda written in C)
anaconda (the organization): The company behind the anaconda distribution, package index, and a number of other python ecosystem initiatives.
anaconda (the distribution): A distribution of software including a python executable, the conda program, and a few hundred pre-installed python packages in the base environment.
anaconda.org: a package index from Anaconda. When you conda install ... something, it searches & installs packages from anaconda.org
conda-forge: a channel on anaconda.org with a huge amount of packages relevant to the scientific python ecosystem, and an organization that facilitates the building of conda packages.
miniconda: (alternative to the anaconda distribution). A minimal installer for conda that does not contain all of the additional packages in the anaconda distribution (see also miniforge and mambaforge, which install conda and mamba respectively, and set conda-forge as the default channel.)

Core tools for setting up Python

There are three classes of tools that you'll want to be familiar with when using Python:

Something to install Python itself, preferably with multiple versions installed at the same time
Something to download and install Python packages.
Something to create and destroy virtual environments. (This is technically optional, but essentially mandatory in practice)

There are many tools that perform each of these tasks, and some tools perform multiple tasks. The following venn diagram shows where a few commonly used programs fit into these classes:

For our purposes, we will be using conda as an environment manager and a tool to install Python itself; and we will use both conda and pip to install Python packages.

❓ conda vs. mamba

Throughout this page, whenever I refer to conda as a command you can run on the command line, you can substitute the command/program mamba.

mamba is a reimplementation of the conda package manager in C++. It is much faster than conda in many cases, and – unlike conda – doesn't require Python itself, which removes a "bootstrapping" problem in some cases.

You can install mamba (using conda!) into your base environment:

conda install mamba -n base -c conda-forge

What is a virtual environment

A virtual environment is an isolated collection of packages, settings, and an associated Python interpreter, that allows multiple different collections to exist on the same system

Why would I need more than 1 environment?

Conflicting package dependencies:

You install packageA
- (packageA depends on other packages including numpy<1.0)
Everything is going fine, but then…
You install packageB
- (packageB also depends on numpy, but requires numpy>1.1)
Now you have a broken environment, and packageA may no longer work

Environments allow you to use a different set of packages for different projects and applications.

Exercise 1: Installing Python via conda

:trophy: The goal of this section is to get the python interpreter installed.

When completed, you should:

✅ have a new mambaforge folder in your home directory
✅ be able to run conda (and/or mamba) from the terminal
✅ have a base environment with python installed.

There are many ways to get python installed. Here, we will jump to my recommended approach of installing python via conda.

Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. With it, you can create virtual environments and install python itself (along with many other python and even non-python programs!)

The most well-known way to install conda is via anaconda.com. However, we will install conda using "miniforge". Miniconda & Miniforge are much smaller distributions than the anaconda distribution. They provide the bare minimum required to get started with python conda-based virtual environments. Specifically, the installer will:

Install conda, python, and a couple other packages useful for bootstrapping environments (like pip.) into a new folder in your home folder.
Configure conda-forge as the default (and only) channel.
Optionally, install mamba (if you used mambaforge)

Note: miniforge is very similar to miniconda, except that it also sets up conda-forge as your default conda channel. We'll learn about conda channels below.

Install

Mac or Linux

In a terminal, run:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh

Windows

Download the latest installer: https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Windows-x86_64.exe
Double click it on the file browser.

Initialize?

During install, you will see a question like "Do you wish the installer to initialize conda/Miniforge/Mambaforge?". It's best to enter "yes" to this.

Why?

Key parts of conda's functionality require that it interact directly with the shell within which conda is being invoked. The conda activate and conda deactivate commands specifically are shell-level commands. That is, they affect the state (e.g. environment variables) of the shell context being interacted with. Other core commands, like conda create and conda install, also necessarily interact with the shell environment. They're therefore implemented in ways specific to each shell. Each shell must be configured to make use of them.

This command makes changes to your system that are specific and customized for each shell. To see the specific files and locations on your system that will be affected before, use the --dry-run flag. To see the exact changes that are being or will be made to each location, use the --verbose flag.

Where did it go?

Take a moment to make sure you know what just happened to your system above 👆.

By default anaconda/miniconda/miniforge/mambaforge will create a new folder in your home directory (e.g. ~/mambaforge, or ~/miniconda3)

Alternative approaches:

Just for the sake of completeness, here are some alternative methods that you will see recommended in various place, along with why I didn't use them here.

Installing from miniconda

Instructions

Download the version you want from https://docs.conda.io/en/latest/miniconda.html

Why I prefer miniforge

Miniforge and miniconda are very similar in that they are both provide the bare minimum to get the conda environment and package manager installed. Miniforge also configures conda-forge as the default (and only) channel. Not having conda-forge in your configuration is a common source of problems for many newcomers.

Installing from python.org

Instructions

Download the version you want from https://www.python.org/downloads/

Why I prefer miniforge

While downloading from python.org is of course the "canonical" way to install Python, it will install it at the system level; and, by default, all installed packages will go into your "global" collection of packages.

Installing conda gets us python, and the machinery to create virtual environments all in one, and can install it into your home directory without any special permissions. You could accomplish a similar thing with python.org, pyenv, and/or venv/virtualenv… but conda very quickly gets us everything we need.

Installing from anaconda.com

Instructions

Click the download button on https://www.anaconda.com/, double click the installer and follow the prompts.

Why I prefer miniforge

While installing from anaconda.com does get us everything and more from miniconda, it is "bloated" in that it additionally comes with many hundreds of packages pre-installed in the base environment.

In most cases, you will want to create multiple environments with a collection of packages for your specific tasks. Anaconda provides a very quick way to get up and running with scientific python, but also comes at a very large package size, and obscures a few very basic details and best practices about (re)creating environments.

Installing on mac from Homebrew

Instructions

After installing Homebrew, run:

brew install python

Why I prefer miniforge

While homebrew is fantastic for programs that you only want 1 version of, it can be challenging for something like Python (where you often want python 3.7, 3.8, and 3.9 installed all at the same time). Also, similar to installing from python.org, a homebrew install can get you into trouble with global package installs if you're not careful.

Installing python with pyenv

Instructions

pyenv is a tool that lets you easily install and switch between multiple Python installations.

See installation guide here, and for an introduction to pyenv, see this blog post

Why I prefer miniforge

There's nothing wrong with using pyenv, it's very convenient and lighter weight than conda. Since I generally know that I will also want to be installing packages with conda, I tend to use conda for python as well.

But if you know you only want to install with pip, then pyenv can get you setup quickly, and also create virtual environments with pyenv virtualenv.

Exercise 2: Creating a virtual environment

:trophy: The goal of this section is to create (and delete) some virtual environments. When completed, you should know how to

✅ create a new virtual environment with conda with a specific version of Python
✅ activate and deactivate environments
✅ know which environment (and python interpreter) you're currently using
✅ delete an environment

If you retain one bit of advice today, let it be this:

:heart_eyes: Virtual environments are your best friend! Create and recreate them liberally :heart_eyes:

Environments allow you to experiment with various packages and versions without fear of breaking your entire system (and needing to reinstall everything). As you install packages over time, you will inevitably install something that doesn't "play well" with something else you've already installed. In some cases this can be hard to recover from. With virtual environments, you can just create a fresh environment and start again – without needing to do major surgery on your system.

Creating virtual environments

Reminder: mamba is a fast version of conda. I use it here in these examples, but if you don't have it installed, you can replace the "mamba" command with "conda".

# create a new empty environment named 'ENV_NAME'
mamba create --name ENV_NAME

# create a new empty environment named 'ENV_NAME'  (`-n` is short for `--name`)
mamba create -n ENV_NAME

You can also install things (using conda/mamba) in the same command that creates the new environment by adding a list of packages to install to the end of the command. For example, you'll usually want to create an environment with a specific version of python installed:

# create a new environment with the latest version of python
mamba create -n ENV_NAME python

# create a new environment with python 3.10
mamba create -n ENV_NAME python=3.10

Where did it go?

Take a moment to make sure you know what just happened 👆.

Calling conda create will result in a new folder in the envs folder in your conda installation (e.g. ~/<conda_folder>/envs/ENV_NAME)

Activating environments

We've now created an environment named ENV_NAME, but we aren't currently "using" it. To activate a virtual environment, use conda activate

# activate environment named 'ENV_NAME'
conda activate ENV_NAME

:eyes: you should see your prompt change to include (ENV_NAME) somewhere, indicating the active environment.

Now, when we run the command python (or any other command that in turn calls python), the specific interpreter that we installed into our environment will be used. To prove this to yourself – or if you ever want to double check which python is being used – type:

# on mac/linux
which python
# on windows cmd
where python

*You must activate environments each time you open a new terminal.

:question: What does activating an environment actually do?

The main effect of calling conda activate ENV_NAME is to add the environment folder (a.k.a. the "prefix", which usually lives in ~/<conda_folder>/envs/ENV_NAME) to the front of your shell's PATH.

It will also update the CONDA_PREFIX and CONDA_DEFAULT_ENV environment variables to reflect your activate environment prefix and name.

# activate an environment
conda activate ENV_NAME
echo $PATH                    # windows: echo %PATH%
env | grep CONDA              # windows: set | findstr "CONDA" 

# deactivate and look again
conda deactivate
echo $PATH                    # windows: echo %PATH%
env | grep CONDA              # windows: set | findstr "CONDA"

Deactivating environments

To deactivate the environment, use conda deactivate:

# deactivate the currently active environment
conda deactivate

# or, explicitly activate the base environment
conda activate base

Deleting environments

To delete an environment permanently, first make sure to deactivate it, then enter:

conda env remove -n ENV_NAME

… the folder in ~/<conda_folder>/envs should be gone now.

Alternative environment managers

Conda is not the only game in town for managing virtual environments! You'll want to use conda if you're going to be using conda install to add packages, but if you know you don't need to install using conda, there are alternative environment managers like:

Exercise 3: Installing packages

:trophy: The goal of this section is to learn how to install packages into the active environment

When completed, you should:

✅ know how to install with pip
✅ know how to install with conda
✅ know where to go to read more about specifying versions and sources

The extensive ecosystem of third-party scientific packages is a primary driver of the success of Python. (By "third-party" here, I mean packages and modules that don't ship with the Python standard library; packages like numpy, pandas, and matplotlib.) Most of the time, the first thing you'll do after creating an environment is to install some packages.

Installing packages with `pip`

To install with pip, use the install command:

pip install requests

pip can install packages from many different locations:

# install the current working directory
pip install .

# install a file that someone sent you
pip install some_local_file.whl

# install the bleeding edge dev version from some github repo
pip install git+https://github.com/psf/requests

Installing packages with `conda` or `mamba`

To install with conda or mamba, use the install command:

mamba install requests
# or
conda install requests

Often with conda, you will want to install from a specific channel (discussed below). You can either add channels permanently to your config, or specify channels at install time:

conda install -c conda-forge requests

Installing specific versions

Both pip and conda have a lot of ways to specify version constraints and package sources. See their respective documentation pages for details:

(… or use pip install --help or conda install --help on the command line)

The most important thing to know is how to install a specific version:

pip install package==1.2.3
conda install package=1.2.3

Where do packages come from and install to?

This section discusses where pip and conda look for packages when you run install, and where those packages end up on your computer.

This part is generally a bit mysterious to new Python users, and it can be very elucidating to understand where packages are downloaded from, and where they go on your computer when you install them

One of the main differences between installing a package using conda vs pip is the package repository that gets used (i.e. where the package is downloaded). It's useful to have a sense for where these programs look for packages when you use the install command.

Package sources: PyPI and anaconda.org

The two main package repositories are PyPI.org and anaconda.org

`pip` searches at PyPI.org

By default, pip searches for packages in the Python Package Index (PyPI; pronounced "pie-pee-eye", not "pie pie").

If you'd like to use a web browser to see what packages, versions and files are available, you could also search directly at https://pypi.org/.

As an example, if you search for numpy, it will lead you to the page dedicated to the numpy package. Clicking on release history will show you all versions available and their dates of release:

And clicking download files will show you the exact files that pip would be selecting from and installing if you were to type pip install numpy (more on "source distributions" and "binary distributions" later):

:information_source: while you won't usually manually go to PyPI.org to search/download a package, it's still educational – and sometimes useful - to view the index directly in a browser like this.

`conda` searches at anaconda.org

By default, conda/mamba searches for packages at anaconda.org. Here, however, things are a little more complicated than PyPI: conda has the concept of channels. Channels are the locations where packages are stored; if we search for numpy as we did above, this time we see a lot of entries:

Each entry above is numpy, built and distributed in a different channels. By default, packages are downloaded from the defaults channel; however: you'll amost always want to use the conda-forge channel.

The conda-forge channel

Conda-forge is an awesome community-driven collection of (~20K) packages, which are found in the conda-forge channel at anaconda.org. (The name "conda-forge" can also refer to the organization of open source contributors that maintains the channel as well.)

As mentioned above, to install a package from a specific channel, use the -c flag when installing:

conda install -c conda-forge PACKAGE_NAME

If you regularly install from a specific channel, like conda-forge, you can modify your channels list. For example, to add the conda-forge channel:

conda config --add channels conda-forge

(now you no longer need to use -c conda-forge every time you use conda)

Miniforge & Mambaforge

Adding conda-forge to your channels is so common, and so useful, that Miniforge – the installer we used above to install conda – was created. It is a minimal conda installer (just like Miniconda), with the added feature that conda-forge is set as the default channel. Hopefully you now understand why we used it!

Where do packages go when you install them?

:shrug: This may be one of the biggest mysteries for Python newcomers!

In most cases (though there are many exceptions), when you run pip install or conda install:

Packages will be added to your site-packages folder

Platform	Standard installation location
Mac/Linux	`prefix/lib/pythonX.Y/site-packages`
Windows	`prefix\Lib\site-packages`

(… where prefix will depend on the active virtual environment.)

In a "global" python installation without a virtual environment (:scream:) prefix will be something like /usr/local/lib/pythonX.Y/site-packages on Unix systems and C:\PythonXY\Lib\site-packages on Windows.

If you have a conda virtual environment active, prefix will refer to your environment folder (e.g. ~/<conda_folder>/envs/ENV_NAME/)

To print your current prefix using Python:

python -c "import sys; print(sys.prefix)"

To print your site-packages folder location:

python -c "import site; print(site.getsitepackages())"

Some exceptions …

pip and conda don't always install to site packages…

User installs

The --user flag makes pip install packages in your home directory instead, which doesn't require any special privileges.
```
pip install --user PACKAGE_NAME
```
:warning: User installs can be a cause of confusing environment problems, as it can result in the import of packges (or versions) that you didn't think you had installed in your environment. I try to avoid using --user installs, and delete them if I discover them on my system. See also the discussion of sys.path below for tips on finding where a package is being imported from.
Editable installs

A common way to install packages you are actively developing is to pip install in "editable" mode, with -e/--editable:
```
# install the current working directory in "editable mode"
pip install -e .
```

Listing all package locations

A good way to show where all the packages in your environment are installed is pip list with the "verbose" flag -v added:











$ pip list -v

Package      Version    Editable location  Location                                           Installer
------------ ---------- ------------------ -------------------------------------------------- ---------
certifi      2022.9.24                     ~/mambaforge/envs/ttt/lib/python3.11/site-packages pip
nictool      0.1.0      ~/dev/self/nic     ~/mambaforge/envs/ttt/lib/python3.11/site-packages pip
numpy        1.23.4                        ~/mambaforge/envs/ttt/lib/python3.11/site-packages conda
pip          22.3.1                        ~/mambaforge/envs/ttt/lib/python3.11/site-packages
requests     2.28.1                        ~/.local/lib/python3.11/site-packages              pip
setuptools   65.5.1                        ~/mambaforge/envs/ttt/lib/python3.11/site-packages
wheel        0.38.3                        ~/mambaforge/envs/ttt/lib/python3.11/site-packages

6. nictool was installed locally in "editable" mode
7. numpy was installed using conda (everything else with pip)
9. requests was installed using --user

`pip` and `conda`

The difference between (and when to use) pip and conda is one of the most common questions/confusions I see.

These are both extensive, complicated tools, with a broad range of functionality, so it's hard to summarize quickly without glossing over important details… but here goes :smile:

pip installs Python packages (mostly from PyPI), and does not manage virtual environments. conda installs any package (including Python itself, not just Python packages) – mostly from anaconda.org – and can also manage virtual environments.

Summarized in a table:

	`conda`	`pip`
Manages	binaries	wheel or source
Can require local compiler	No	Yes
Package types	Any	Python only
Creates environments	Yes, built-in	No (use `venv`/`virtualenv`)
Strict dependency checks	Yes	No^*
Default package source	anaconda.org	PyPI

binaries? wheels?

To really understand the motivation for conda – and what "can require local compiler" means in the table above – one must understand a little about "compiled" binaries. This is a bit beyond the scope here, but here's a very brief intro for those interested:

Binary files and and C Extensions

Python is an "interpreted language". Among other things, this generally means that the job of converting the human-readable source code into executable machine code is done on the machine executing the code. The developer ships a .py file.

By constrast, compiled languages like C/C++ are generally converted into machine code elsewhere (by the developer), for each platform being supported, and then shipped to the end user as (e.g.) an .exe file.

C extensions

Lower-level compiled languages like C often perform better than "pure" Python code. However, it's very common for Python developers to write or generate small parts of their code (e.g. just the very frequently used functions) in C. These "C extensions" must then be compiled for each platform.

Many packages in the scientific python ecosystem have at least some compiled code.

If you've ever run pip install ... and seen a ton of text fly by with some big red "failed to compile" errors at the end, then you've seen what can happen when you try to install a package that includes C extensions that are not pre-compiled for each platform. Not all computers have the programs necessary to compile these extensions, and so when pip tries to install and compile these packages, they may fail.

"Wheels" are a binary distribution format that you'll see on PyPI that allow a developer to pre-compile their extensions for every platform they'd like, so that the end user doesn't need to compile it. A wheel can be simply unzipped and dropped into site-packages.

Conda doesn't use wheels, but a conda package achieves the same goal of distributing pre-compiled files so that the end-user needn't compile them.

Exercise 4: A caveat when using both `pip` and `conda`

Most of the time, it is fine to use both pip install and conda install in the same environment. Sometimes, you don't have a choice: it is up to the package developer to make their packages available via conda and/or pip and you will find packages that are only available on pip, or only on conda.

However, you should be aware that there are cases where installing the same package from both package managers can cause problems (regardless of whether you install the package, or it get installed as an indirect dependency)

Here's an example of something you might very reasonably do that would result in a broken environment.

# create a new environment with python
conda create -n doomed_env python=3.10

# activate it
conda activate doomed_env

# install spyder (which depends on pyqt) using pip
pip install spyder

# go ahead and launch spyder ... so far so good!
spyder

# now install pyqt from conda (or... one of many conda packages that depend on it!)
conda install pyqt

# try to launch spyder again...
spyder

here's the error I see:

WARNING: You might be loading two sets of Qt binaries into the
same process. Check that all plugins are compiled against the
right Qt binaries. Export DYLD_PRINT_LIBRARIES=1 and check that
only one set of binaries are being loaded.

What happened here?

Without going into too much detail: both package managers (pip and conda) tried to install some stuff into the same folder (site-packages/PyQt5). However, they installed slightly different "parts" (different compiled binaries) resulting in a package that just can't run.

note: This won't always happen: this particular case was caused by the fact that the package is unfortunately called pyqt in conda, but pyqt5 in pip… making it even harder for the two programs to "work together". Moreover, the order in which you install things (i.e. pip-then-conda, vs conda-then-pip) can also affect whether you run into this.

The main lessons here:

in general, it's mostly ok to use pip and conda together
however, be aware that you may occasionally find conflicts
If you don't know why it broke, just create a new environment and try to install everything from either pip or conda.
if that's not possible (due to certain package distribution limitations), you might need to spend some time experimenting, then create a recipe for your specific package needs.

Exercise 5: `sys.path`

The goal of this section is to understand where Python looks for modules when you type import PACKAGE

We've learned that packages generally (but not always) end up in the site-packages folder in your environment. Let's now discuss where the Python interpreter finds when you import them. It's pretty simple:

Python searches for modules on sys.path

sys is an important module in the Python standard library (it will always be available to you).

Viewing `sys.path`

Let's look at sys.path:

Start a python interpreter:

python

Now, import sys and print out sys.path:

import sys
print(sys.path)

You'll see something like this:

[
    '/Users/talley/mambaforge/envs/ENV_NAME/bin',
    '/Users/talley/mambaforge/envs/ENV_NAME/lib/python310.zip',
    '/Users/talley/mambaforge/envs/ENV_NAME/lib/python3.10',
    '/Users/talley/mambaforge/envs/ENV_NAME/lib/python3.10/lib-dynload',
    '',
    '/Users/talley/mambaforge/envs/ENV_NAME/lib/python3.10/site-packages'
]

There are three particularly important entries in there.

.../ENV_NAME/lib/python3.10, This is where all of standard library modules will be found.
'': This empty string refers to "the current working directory": which starts as the directory you were in when you launched python. If you want to see the current directory in python:
```
import os
print(os.getcwd())
```
.../ENV_NAME/lib/python3.10/site-packages: This is the site-packages folder we discussed above. Most of your installed packages should be there.

Importing modules in the current directory.

Let's take advantage of the empty string entry '' in sys.path. Exit out of python (type exit()) and create a new file named mymodule.py with the following function:

# mymodule.py

def hello():
    print("hi!")

Now, start python again

python

Then import your new module and use the function

import mymodule

mymodule.hello()

Take home message: Any custom modules you've created in the current working directory may be imported directly as long as '' exists in sys.path.

Modifying `sys.path`

sys.path is not static: you can modify it like any Python list.

One reason you might want to do this is to add a folder of modules with some useful code that you've stored somewhere on your computer:

import sys

sys.path.append('/Users/talley/my_handy_python_stuff')

… Now I can import any python modules in /Users/talley/my_handy_python_stuff'!

Don't get too carried away relying on modifying sys.path. If you have a set of custom code that you routinely use across many different environments, consider creating a proper python package that you can pip install as usual (remember, packages don't necessarily need to be public on PyPI: you can pip install from github, or locally as well)

If you'd like to know all the nitty-gritty details of how sys.path gets initialized at runtime, see the Python documentation.

Exercise 6: Environment recipes

:trophy: The goal of this section is to learn strategies for recreating environments with a specific set of packages

When completed, you should:

✅ Know how to create and use requirement files in pip
✅ Know how to create and use environment.yml files in conda
✅ Understand the limitations of environment recipes
✅ Be aware of lock files (that comprehensively list the exact version of every package in an environment.)

Requirements.txt and environments.yml

Don't get attached to environments; create requirements files!

Environments are made to be broken and (re)created. Don't view an environment as something you worked hard to create "just right", and dread having to recreate. View them as a little sandbox that is isolated exactly so that it can be broken without messing up the rest of your system.

You may even occasionally need or want to uninstall the entire anaconda/miniconda/mambaforge folder, along with all of the environments you've made.

If you have an environment that you would be "sad" to loose, you should instead work to create a recipe for that environment that you can use to recreate the environment whenever necessary. Both pip and conda support this.

`pip` requirements files

Requirements files serve as a list of items to be installed by pip, when using pip install. Files that use this format are often called “pip requirements.txt files”, since requirements.txt is usually what these files are named (although, that is just a convention, not a requirement).

Each line of a requirements file supports the same requirements specifier syntax that you would use for pip install ...

# requirements.txt
numpy 
nd2[legacy]
urllib3 @ https://github.com/urllib3/urllib3/archive/refs/tags/1.26.8.zip

To install everything listed in a requirements file, use the -r flag with pip install

pip install -r requirements.txt

:eyes: this will not create a new environment, it will install everything in requirements.txt into the currently active environment.

conda environment files

conda allows you to create a new environment from an environment file (conventionally, these are called environment.yml, but that is not a requirement):

Environment files have a specific structure (see full documentation).

#environment.yml
name: stats2
channels:
  - conda-forge
dependencies:
  - python=3.9
  - bokeh=2.4.2
  - pandas=1.4.4
  - flask=2.2.2

To create an environment from an environment file:

conda env create -f environment.yml

:eyes: the name of the new environment is determined by the name field in the file itself, but you can override it by appending -n MY_NAME

"Locked" dependencies (for really reproducible environments)

Sometimes you'd like to have some assurance that you will be able to recreate an environment "indefinitely" in the future (for example: to reproduce an analysis for a paper you wrote 10 years ago).

Now, it may appear like the environment.yml listed above would provide a fully "reproducible" setup that you could run years later to achieve the same thing

But there's a problem:

Each of the dependencies we declared has its own dependencies.

(Did you notice how many other things got installed in the stats2 environment above?) … and it's completely possible that one of those subdependencies might release a version in the future that changes or breaks one of our direct dependencies, and the result of our code.

To create a comprehensive list of the pinned versions of every package in our environment, we can use conda env export:

env export > environment.lock.yml

… and if we later run conda env create -f environment.lock.yml, we should get an exact duplicate of our current environment (at least, as long as we're on the same operating system).

`conda-lock`

Exported/locked environment files are great, but still have some practical difficulties & annoyances. They don't work "effortlessly" cross-platform, it can be hard to update one of the dependencies, and they can still be somewhat slow to solve and install (even though in theory you know exactly what packages are needed).

If you do this a lot, consider looking into conda-lock, which solves all of the above with a unified lockfile format.

# install conda-lock
conda install -c conda-forge conda-lock

# generate a multi-platform lockfile
conda-lock -f environment.yml -p osx-64 -p linux-64

Locking a pip-based environment

We won't go into them here, but there are also lockfile solutions for the pip ecosystem (that do not require using conda):

Tips & best practices

✅ always work in a virtual environment
✅ don't be afraid to wipe it and start over
✅ avoid installing into the conda base environment (and never with pip)
✅ create a "kitchen sink" environment rather than using base

When should you create a virtual environment?

Basically, all the time! :joy:

Whenever you'd like to experiment with new packages, or different versions of packages.
If you routinely work on various projects that aren't directly related, creating a small separate environment dedicated to each "task" or project can help avoid surprising behavior.
If you are ever experiencing "strange" behavior, or a package doesn't seem to be working as advertised. The very first thing you should do is create a fresh virtual environment, install only what you need, and see if the problem persists.
Before opening an issue with a software developer (e.g. on github.com or on image.sc) to say "this doesn't work", first make sure you can reproduce it in a clean environment.

Avoid installing stuff into `base` environment

You should (almost) never install things into your base conda environment. Do your work in another environment and leave the base environment only for dependencies that actually manage environments (like conda itself, or mamba, or other dev-related dependencies like conda-build, etc…)

In particular, try to never pip install anything into your base environment.

how to "unpip" your base environment

If you ever make a mess of your base environment with pip, and would like to restore
base to something like its original state, you can run this command (on unix systems)

conda activate base
conda list | grep pypi_0 | awk '{print $1;}' | xargs -I {} sh -c "pip uninstall -y {}";
conda install -y --revision 0
conda install mamba
mamba update -n base conda -y

A "kitchen sink" environment

Because frequently switching environments can be annoying, I like to create a "kitchen sink" environment (I call it "all") that I use for all of my generic tasks, and I install things into it with reckless abandon.

conda create -y -n all python
conda activate all

If you keep installing things into one environment, it will eventually break. At that point, just delete it and recreate it. To make it easier to re-create a complicated environment, use environment files (discussed below).

To help avoid installing into base, you might consider "auto-activating" this environment:

# in your ~/.zshrc or ~/.bash_profile
conda activate all

Integrated Development Environments (IDEs)

An "Integrated Development Environment" (IDE) is essentially a text-editor designed for code. They come with a ton of conveniences like autocompletion, syntax highlighting, debuggers, and lots more. They can be a little intimidating at first, but if you plan to do a lot of programming, the investment is well worth it: they will become an indispensible tool.

People get a little religious debating the merits of their favorite code editing programs :laughing: … so I'll refrain from attempting to list pros & cons of each of these. They all have a lot to offer, and you should try to get comfortable with one of these.

For whatever IDE you choose, you should definitely learn how to activate a specific python environment (see links for each IDE below)

VS Code

https://code.visualstudio.com/docs/python/environments
PyCharm

https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html
spyder

https://docs.spyder-ide.org/current/faq.html#using-existing-environment
JupyterLab (not really an IDE, but much closer than Jupyter Notebook)

Generally, JupyterLab will be installed in the environment you want to use… but you can also use nb_conda_kernels to access other environments.

Jupyter Notebook?

While Jupyter Notebook is certainly very useful for sharing code with others, and for exploratory analysis, I would discourage you from thinking about Jupyter notebooks as "the place" where one goes to run some Python code. Notebooks are much harder to version control (i.e. in a git repository – they are complicated JSON files, not simple python files), and they discourage code reuse and organization (Notebooks are something of a dead end: people very rarely import from a notebook)

Definitely get comfortable writing python scripts, and using an interactive read-eval-print-loop (REPL) like IPython (or even the plain python prompt). It will pay off to be comfortable using python without needing to start up Jupyter Notebook.

Cheat sheet

The following is a summary of some of the commands we've discussed here, and what they do:

command	description
`conda` (or `mamba`)
`conda create -n ENV_NAME python=3.10`	create an environment named `ENV_NAME` (with Python 3.10 installed)
`conda env remove -n ENV_NAME`	remove environment named `ENV_NAME`
`conda info -e`	list all conda environments
`conda activate ENV_NAME`	activate env named `ENV_NAME`
`conda deactivate`	deactivate current environment
`conda install -c conda-forge numpy`	install `numpy` using conda (from the `conda-forge` channel) into the current environment
`conda install numpy==1.23.4`	install specific version of `numpy` using conda (using whatever channels are in your configuration)
`conda remove numpy`	uninstall `numpy` from the current environment
`conda list`	list all packages installed in the current environment
`conda config --add channels conda-forge`	add the conda forge channel to your config
`pip`
`pip install numpy`	install `numpy` (from PyPI) into the current environment
`pip install numpy -U`	install/update `numpy` to latest version (from PyPI)
`pip install numpy==1.23.4`	install specific version of `numpy` (from PyPI)
`pip install numpy`	uninstall `numpy` from the current environment
`pip list`	List installed packages (add `-v` to show package locations and installer)

Good attributes of the sys module to be aware of:

attribute	description
`sys.path`	A list of strings that specifies the search path for modules.
`sys.executable`	A string giving the absolute path of the executable binary for the Python interpreter.
`sys.prefix`	A string giving the site-specific directory prefix where the platform independent Python files are installed

Python environments workshop

Overview

Terms

Core tools for setting up Python

What is a virtual environment

Exercise 1: Installing Python via conda

Install

Mac or Linux

Windows

Initialize?

Alternative approaches:

Instructions

Why I prefer miniforge

Instructions

Why I prefer miniforge

Instructions

Why I prefer miniforge

Instructions

Why I prefer miniforge

Instructions

Why I prefer miniforge

Exercise 2: Creating a virtual environment

Creating virtual environments

Activating environments

Deactivating environments

Deleting environments

Exercise 3: Installing packages

Installing packages with pip

Installing packages with conda or mamba

Installing specific versions

Where do packages come from and install to?

Package sources: PyPI and anaconda.org

pip searches at PyPI.org

conda searches at anaconda.org

The conda-forge channel

Where do packages go when you install them?

Some exceptions …

Listing all package locations

pip and conda

C extensions

Exercise 4: A caveat when using both pip and conda

Exercise 5: sys.path

Viewing sys.path

Importing modules in the current directory.

Modifying sys.path

Exercise 6: Environment recipes

Requirements.txt and environments.yml

pip requirements files

conda environment files

"Locked" dependencies (for really reproducible environments)

conda-lock

Tips & best practices

When should you create a virtual environment?

Avoid installing stuff into base environment

A "kitchen sink" environment

Integrated Development Environments (IDEs)

Cheat sheet

Installing packages with `pip`

Installing packages with `conda` or `mamba`

`pip` searches at PyPI.org

`conda` searches at anaconda.org

`pip` and `conda`

Exercise 4: A caveat when using both `pip` and `conda`

Exercise 5: `sys.path`

Viewing `sys.path`

Modifying `sys.path`

`pip` requirements files

`conda-lock`

Avoid installing stuff into `base` environment