# Python environments workshop :::warning This document was created for a workshop at Harvard Medical School on **Nov 14, 2022**. *(It may have been edited since then, see top of the page for last change)* ::: **Table of contents** [TOC] ## Overview Python is an extremely popular, powerful, and flexible programming language and ecosystem. But it can be confusing for newcomers (and even those who have used Python for years) to understand exactly *what* it means to have Python and Python packages "installed" on a system. The goal of this workshop & document is to demystify and answer the following questions: - What is Python? What does it mean to "install Python"? - Where does Python *live* on your system? - What are Python modules and packages? - When I `pip install` or `conda install` a package, where do they download from? where does it install to? - When I `import` a package, where does Python look for it? - How can I reliably (re)create a python environment so that I can reproduce what I did on one system on another? :::info :eyes: **Note**: There are *many* ways to set-up python and install pacakges (indeed, this is one of the reasons it can be so confusing!). The conda-based approach presented here is by no means the only way, so this guide represents an "opiniated" way to approach setting up python environments that is commonly used in data science. But most of the concepts are the same in other environment management systems. ::: ### Terms ::::warning Python interpreter ~ The actual python executable that parses and runs human readable source code. > Type `which python` (mac/linux) or `where python` (windows) to show the path to the active python interpreter. Module ~ An organizational unit of python code. Usually, a single file ending in `.py` that contains Python definitions and expressions. Package ~ A collection of modules. Usually, this is a folder of python modules that also contains an `__init__.py` file. "Package" also frequently refers to an installable python library/application (e.g. `numpy`, `matplotlib`, `pandas`...) Package Manager ~ A program that automates the installation, updating and removal of packages (e.g. `pip`, `conda`) Virtual Environment ~ An isolated collection of packages, settings, and an associated python interpreter. Virtual environments allow many different collections of Python and packages to exist on the same system Environment Manager ~ A program that automates the creation and deletion of virtual environments (e.g. `conda`, `virtualenv`, `venv`) :::spoiler More common terms and programs in the python ecosystem... - **pip**: A python package manager. By default, `pip` installs packages from pypi.org. - **PyPI.org** ("The python package index"): A repository of python packages (where `pip` looks for packages). - **conda**: An environment *and* package manager. By default, `conda` installs packages from anaconda.org. Note that conda can install both python and non-python packages. (see also [**mamba**](https://mamba.readthedocs.io/en/latest/), a fast implementation of conda written in C) - **anaconda (the organization)**: The company behind the anaconda distribution, package index, and a number of other python ecosystem initiatives. - **anaconda (the distribution)**: A distribution of software including a `python` executable, the `conda` program, and a few hundred pre-installed python *packages* in the base environment. - **anaconda.org**: a package index from Anaconda. When you `conda install ...` something, it searches & installs packages from anaconda.org - **conda-forge**: a channel on anaconda.org with a huge amount of packages relevant to the scientific python ecosystem, and an organization that facilitates the building of conda packages. - **miniconda**: (alternative to the anaconda distribution). A minimal installer for `conda` that does not contain all of the additional packages in the anaconda distribution (see also **miniforge** and **mambaforge**, which install `conda` and `mamba` respectively, and set conda-forge as the default channel.) ::: :::: ### Core tools for setting up Python There are three classes of tools that you'll want to be familiar with when using Python: 1. Something to install **Python itself**, preferably with multiple versions installed at the same time 2. Something to download and install **Python packages**. 3. Something to create and destroy **virtual environments**. (This is technically optional, but essentially mandatory in practice) There are many tools that perform each of these tasks, and some tools perform multiple tasks. The following venn diagram shows where a few commonly used programs fit into these classes: ![](https://i.imgur.com/FTmTtSK.png) For our purposes, we will be using [**conda**](https://docs.conda.io/en/latest/) as an **environment manager** and a tool to **install Python** itself; and we will use both `conda` and `pip` to **install Python packages**. :::warning **❓ `conda` vs. `mamba`** Throughout this page, whenever I refer to `conda` as a command you can run on the command line, you can substitute the command/program `mamba`. `mamba` is a reimplementation of the conda package manager in C++. It is *much* faster than `conda` in many cases, and – unlike conda – doesn't require Python itself, which removes a "bootstrapping" problem in some cases. You can install mamba (using `conda`!) into your base environment: ``` conda install mamba -n base -c conda-forge ``` ::: ### What is a virtual environment A virtual environment is an isolated collection of **packages**, **settings**, and an associated **Python interpreter**, that allows multiple different collections to exist on the same system ![](https://i.imgur.com/3xO9p6f.png) ::::info :::spoiler **Why would I need more than 1 environment?** Conflicting package dependencies: 1. You install **packageA** - (**packageA** depends on other packages including **numpy<1.0**) 1. Everything is going fine, but then... 1. You install **packageB** - (**packageB** also depends on numpy, but requires **numpy>1.1**) 1. Now you have a *broken environment*, and **packageA** may no longer work Environments allow you to use a different set of packages for different projects and applications. ::: :::: ## Exercise 1: Installing Python via conda :::success :trophy: The goal of this section is to get the python interpreter installed. When completed, you should: ✅ have a new `mambaforge` folder in your home directory ✅ be able to run `conda` (and/or `mamba`) from the terminal ✅ have a base environment with python installed. ::: There are many ways to get python installed. Here, we will jump to my recommended approach of installing python via conda. [**Conda**](https://docs.conda.io/en/latest/) is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. With it, you can create virtual environments and install python itself (along with many other python and even non-python programs!) The most well-known way to install conda is via anaconda.com. However, we will install conda using "miniforge". [Miniconda](https://docs.conda.io/en/latest/miniconda.html) & [Miniforge](https://github.com/conda-forge/miniforge) are *much* smaller distributions than the anaconda distribution. They provide the bare minimum required to get started with python conda-based virtual environments. Specifically, the installer will: - Install `conda`, `python`, and a couple other packages useful for bootstrapping environments (like `pip`.) into a new folder in your home folder. - Configure `conda-forge` as the default (and only) channel. - Optionally, install `mamba` (if you used `mambaforge`) > Note: *miniforge* is very similar to *miniconda*, except that it also sets up conda-forge as your default conda channel. We'll learn about conda channels below. ### Install #### Mac or Linux In a terminal, run: ```shell curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" bash Mambaforge-$(uname)-$(uname -m).sh ``` #### Windows 1. Download the latest installer: https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Windows-x86_64.exe 1. Double click it on the file browser. #### Initialize? During install, you will see a question like "Do you wish the installer to initialize conda/Miniforge/Mambaforge?". It's best to enter "yes" to this. ::::info :::spoiler Why? Key parts of `conda`'s functionality require that it interact directly with the shell within which `conda` is being invoked. The `conda activate` and `conda deactivate` commands specifically are shell-level commands. That is, they affect the state (e.g. environment variables) of the shell context being interacted with. Other core commands, like `conda create` and `conda install`, also necessarily interact with the shell environment. They're therefore implemented in ways specific to each shell. Each shell must be configured to make use of them. This command makes changes to your system that are specific and customized for each shell. To see the specific files and locations on your system that will be affected before, use the `--dry-run` flag. To see the exact changes that are being or will be made to each location, use the `--verbose` flag. ::: :::: :::warning **Where did it go?** Take a moment to make sure you know what just happened to your system above 👆. By default anaconda/miniconda/miniforge/mambaforge will create a new folder in your home directory (e.g. `~/mambaforge`, or `~/miniconda3`) ::: ### Alternative approaches: Just for the sake of completeness, here are some alternative methods that you will see recommended in various place, along with why I didn't use them here. :::spoiler Installing from miniconda #### Instructions Download the version you want from https://docs.conda.io/en/latest/miniconda.html #### Why I prefer miniforge Miniforge and miniconda are *very* similar in that they are both provide the bare minimum to get the `conda` environment and package manager installed. Miniforge also configures `conda-forge` as the default (and only) channel. Not having conda-forge in your configuration is a common source of problems for many newcomers. ::: :::spoiler Installing from python.org #### Instructions Download the version you want from https://www.python.org/downloads/ #### Why I prefer miniforge While downloading from python.org is of course the "canonical" way to install Python, it will install it at the system level; and, by default, all installed packages will go into your "global" collection of packages. Installing `conda` gets us `python`, *and* the machinery to create virtual environments all in one, and can install it into your home directory without any special permissions. You could accomplish a similar thing with python.org, `pyenv`, and/or `venv/virtualenv`... but `conda` very quickly gets us everything we need. ::: :::spoiler Installing from anaconda.com #### Instructions Click the download button on https://www.anaconda.com/, double click the installer and follow the prompts. #### Why I prefer miniforge While installing from `anaconda.com` does get us everything and more from `miniconda`, it is "bloated" in that it additionally comes with many [hundreds of packages](https://docs.anaconda.com/anaconda/packages/pkg-docs/) pre-installed in the base environment. In most cases, you will want to create multiple environments with a collection of packages for your specific tasks. Anaconda provides a very quick way to get up and running with scientific python, but also comes at a very large package size, and obscures a few very basic details and best practices about (re)creating environments. ::: :::spoiler Installing on mac from Homebrew #### Instructions After [installing Homebrew](https://brew.sh/#install), run: ```sh brew install python ``` #### Why I prefer miniforge While homebrew is fantastic for programs that you only want 1 version of, it can be challenging for something like Python (where you often want python 3.7, 3.8, and 3.9 installed all at the same time). Also, similar to installing from python.org, a homebrew install can get you into trouble with global package installs if you're not careful. ::: :::spoiler Installing python with pyenv #### Instructions [pyenv](https://github.com/pyenv/pyenv) is a tool that lets you easily install and switch between multiple Python installations. See installation guide [here](https://github.com/pyenv/pyenv#installation), and for an introduction to pyenv, see [this blog post](https://realpython.com/intro-to-pyenv/) #### Why I prefer miniforge There's nothing wrong with using `pyenv`, it's very convenient and lighter weight than conda. Since I generally know that I will *also* want to be installing packages with `conda`, I tend to use conda for python as well. But if you know you only want to install with `pip`, then `pyenv` can get you setup quickly, and also create virtual environments with `pyenv virtualenv`. ::: ## Exercise 2: Creating a virtual environment :::success :trophy: The goal of this section is to create (and delete) some virtual environments. When completed, you should know how to ✅ create a new virtual environment with `conda` with a specific version of Python ✅ activate and deactivate environments ✅ know which environment (and python interpreter) you're currently using ✅ delete an environment ::: If you retain one bit of advice today, let it be this: :::info :heart_eyes: **Virtual environments are your best friend! Create and recreate them liberally** :heart_eyes: ::: Environments allow you to experiment with various packages and versions without fear of breaking your entire system (and needing to reinstall everything). As you install packages over time, you will *inevitably* install something that doesn't "play well" with something else you've already installed. In some cases this can be hard to recover from. With virtual environments, you can just create a fresh environment and start again – without needing to do major surgery on your system. ### Creating virtual environments :::warning *Reminder*: `mamba` is a fast version of `conda`. I use it here in these examples, but if you don't have it installed, you can replace the "`mamba`" command with "`conda`". ::: ```bash # create a new empty environment named 'ENV_NAME' mamba create --name ENV_NAME # create a new empty environment named 'ENV_NAME' (`-n` is short for `--name`) mamba create -n ENV_NAME ``` You can also *install* things (using `conda`/`mamba`) in the same command that creates the new environment by adding a list of packages to install to the end of the command. For example, you'll usually want to create an environment with a specific version of python installed: ```bash # create a new environment with the latest version of python mamba create -n ENV_NAME python # create a new environment with python 3.10 mamba create -n ENV_NAME python=3.10 ``` :::warning **Where did it go?** Take a moment to make sure you know what just happened 👆. Calling `conda create` will result in a new folder in the `envs` folder in your conda installation (e.g. `~/<conda_folder>/envs/ENV_NAME`) ::: ### Activating environments We've now created an environment named `ENV_NAME`, but we aren't currently "using" it. To *activate* a virtual environment, use `conda activate` ```bash # activate environment named 'ENV_NAME' conda activate ENV_NAME ``` > :eyes: you should see your prompt change to include `(ENV_NAME)` somewhere, indicating the active environment. Now, when we run the command `python` (or any other command that in turn calls `python`), the specific interpreter that we installed into our environment will be used. To prove this to yourself – or if you ever want to double check which `python` is being used – type: ```bash # on mac/linux which python # on windows cmd where python ``` *You must activate environments each time you open a new terminal. ::::info :::spoiler :question: **What does activating an environment actually *do*?** The main effect of calling `conda activate ENV_NAME` is to add the environment folder (a.k.a. the "prefix", which usually lives in `~/<conda_folder>/envs/ENV_NAME`) to the front of your [shell's `PATH`](https://medium.com/@jalendport/what-exactly-is-your-shell-path-2f076f02deb4). It will also update the `CONDA_PREFIX` and `CONDA_DEFAULT_ENV` environment variables to reflect your activate environment prefix and name. ```bash # activate an environment conda activate ENV_NAME echo $PATH # windows: echo %PATH% env | grep CONDA # windows: set | findstr "CONDA" # deactivate and look again conda deactivate echo $PATH # windows: echo %PATH% env | grep CONDA # windows: set | findstr "CONDA" ``` ::: :::: ### Deactivating environments To *deactivate* the environment, use `conda deactivate`: ```bash # deactivate the currently active environment conda deactivate # or, explicitly activate the base environment conda activate base ``` ### Deleting environments To *delete* an environment permanently, first make sure to deactivate it, then enter: ```shell conda env remove -n ENV_NAME ``` :::warning ... the folder in `~/<conda_folder>/envs` should be gone now. ::: ::::success :::spoiler **Alternative environment managers** Conda is not the only game in town for managing virtual environments! You'll want to use `conda` if you're going to be using `conda install` to add packages, but if you know you don't need to install using conda, there are alternative environment managers like: - [Pipenv](https://pipenv.pypa.io/en/latest/) - [`venv`](https://docs.python.org/3/library/venv.html) - [`virtualenv`](https://virtualenv.pypa.io/en/latest/) and [`virtualenvwrapper`](https://virtualenvwrapper.readthedocs.io/en/latest/) ::: :::: ## Exercise 3: Installing packages :::success :trophy: The goal of this section is to learn how to install packages into the active environment When completed, you should: ✅ know how to install with `pip` ✅ know how to install with `conda` ✅ know where to go to read more about specifying versions and sources ::: The extensive ecosystem of third-party scientific packages is a primary driver of the success of Python. (By "third-party" here, I mean packages and modules that *don't* ship with the [Python standard library](https://docs.python.org/3/library/); packages like `numpy`, `pandas`, and `matplotlib`.) Most of the time, the first thing you'll do after creating an environment is to install some packages. ### Installing packages with `pip` To install with pip, use the `install` command: ```shell pip install requests ``` `pip` can install packages from *many* different locations: ```bash # install the current working directory pip install . # install a file that someone sent you pip install some_local_file.whl # install the bleeding edge dev version from some github repo pip install git+https://github.com/psf/requests ``` ### Installing packages with `conda` or `mamba` To install with `conda` or `mamba`, use the `install` command: ```bash mamba install requests # or conda install requests ``` Often with `conda`, you will want to install from a specific **channel** (discussed below). You can either add channels permanently to your config, or specify channels at install time: ``` conda install -c conda-forge requests ``` ### Installing specific versions Both `pip` and `conda` have a *lot* of ways to specify version constraints and package sources. See their respective documentation pages for details: - [Pip Requirement Specifiers](https://pip.pypa.io/en/stable/reference/requirement-specifiers/) - [Installing packages in conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html#installing-packages) > (... or use `pip install --help` or `conda install --help` on the command line) The most important thing to know is how to install a specific version: ``` pip install package==1.2.3 conda install package=1.2.3 ``` ## Where do packages come from and install to? :::success This section discusses where `pip` and `conda` look for packages when you run `install`, and where those packages end up on your computer. ::: This part is generally a bit mysterious to new Python users, and it can be very elucidating to understand where packages are downloaded from, and where they go on your computer when you install them One of the main differences between installing a package using `conda` vs `pip` is the package repository that gets used (i.e. where the package is downloaded). It's useful to have a sense for where these programs look for packages when you use the `install` command. ### Package sources: PyPI and anaconda.org The two main package repositories are PyPI.org and anaconda.org #### `pip` searches at PyPI.org By default, `pip` searches for packages in [the Python Package Index](https://pypi.org/) (PyPI; pronounced "pie-pee-eye", not "pie pie"). If you'd like to use a web browser to see what packages, versions and files are available, you could also search directly at https://pypi.org/. As an example, if you [search for numpy](https://pypi.org/search/?q=numpy), it will lead you to the page dedicated to the `numpy` package. Clicking on [release history](https://pypi.org/project/numpy/#history) will show you all versions available and their dates of release: ![versions on pypi](https://i.imgur.com/RTkGcwm.png) And clicking [download files](https://pypi.org/project/numpy/#files) will show you the *exact* files that `pip` would be selecting from and installing if you were to type `pip install numpy` (more on "source distributions" and "binary distributions" later): ![download files on pypi](https://i.imgur.com/IDA7gvD.png) :::warning *:information_source: while you won't usually manually go to PyPI.org to search/download a package, it's still educational – and sometimes useful - to view the index directly in a browser like this.* ::: #### `conda` searches at anaconda.org By default, `conda`/`mamba` searches for packages at [anaconda.org](https://anaconda.org). Here, however, things are a little more complicated than PyPI: `conda` has the concept of [**channels**](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html). Channels are the locations where packages are stored; if we [search for `numpy`](https://anaconda.org/search?q=numpy) as we did above, this time we see a *lot* of entries: ![](https://i.imgur.com/PC9Ks8f.png) Each entry above is `numpy`, built and distributed in a different channels. By default, packages are downloaded from [the `defaults` channel](https://repo.anaconda.com/pkgs/); however: **you'll amost always want to use the conda-forge channel.** #### The conda-forge channel [Conda-forge](https://conda-forge.org/) is an awesome community-driven collection of (~20K) packages, which are found in the `conda-forge` channel at anaconda.org. (The name "conda-forge" can also refer to the organization of open source contributors that maintains the channel as well.) As mentioned above, to install a package from a specific channel, use the `-c` flag when installing: ```shell conda install -c conda-forge PACKAGE_NAME ``` If you regularly install from a specific channel, like `conda-forge`, you can [modify your channels list](https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#config-channels). For example, to add the `conda-forge` channel: ```shell conda config --add channels conda-forge ``` (now you no longer need to use `-c conda-forge` every time you use `conda`) :::info **Miniforge & Mambaforge** Adding conda-forge to your channels is so common, and so useful, that [Miniforge](https://github.com/conda-forge/miniforge/) – the installer [we used above](#Exercise-1-Installing-Python-via-conda) to install `conda` – was created. It is a minimal conda installer (just like Miniconda), with the added feature that `conda-forge` is set as the default channel. Hopefully you now understand why we used it! ::: ### Where do packages go when you install them? :shrug: This may be one of the biggest mysteries for Python newcomers! In most cases (though there are many exceptions), when you run `pip install` or `conda install`: :::info **Packages will be added to your `site-packages` folder** ::: | Platform | Standard installation location | | --------- | ------------------------------ | | Mac/Linux | `prefix/lib/pythonX.Y/site-packages` | | Windows | `prefix\Lib\site-packages` | (... where `prefix` will depend on the active virtual environment.) In a "global" python installation without a virtual environment (:scream:) `prefix` will be something like `/usr/local/lib/pythonX.Y/site-packages` on Unix systems and `C:\PythonXY\Lib\site-packages` on Windows. **If you have a conda virtual environment active, `prefix` will refer to your environment folder (e.g. `~/<conda_folder>/envs/ENV_NAME/`)** :::warning To print your current `prefix` using Python: ```bash python -c "import sys; print(sys.prefix)" ``` To print your site-packages folder location: ```bash python -c "import site; print(site.getsitepackages())" ``` ::: #### Some exceptions ... `pip` and `conda` don't *always* install to site packages... - *User installs* The `--user` flag makes `pip` install packages in your home directory instead, which doesn't require any special privileges. ``` pip install --user PACKAGE_NAME ``` :::danger :warning: User installs can be a cause of confusing environment problems, as it can result in the import of packges (or versions) that you didn't think you had installed in your environment. I try to avoid using `--user` installs, and delete them if I discover them on my system. See also the discussion of `sys.path` below for tips on finding where a package is being imported from. ::: - *Editable installs* A common way to install packages you are actively *developing* is to `pip install` in "editable" mode, with `-e`/`--editable`: ```bash # install the current working directory in "editable mode" pip install -e . ``` ### Listing all package locations A good way to show where all the packages in your environment are installed is `pip list` with the "verbose" flag `-v` added: ```shell= $ pip list -v Package Version Editable location Location Installer ------------ ---------- ------------------ -------------------------------------------------- --------- certifi 2022.9.24 ~/mambaforge/envs/ttt/lib/python3.11/site-packages pip nictool 0.1.0 ~/dev/self/nic ~/mambaforge/envs/ttt/lib/python3.11/site-packages pip numpy 1.23.4 ~/mambaforge/envs/ttt/lib/python3.11/site-packages conda pip 22.3.1 ~/mambaforge/envs/ttt/lib/python3.11/site-packages requests 2.28.1 ~/.local/lib/python3.11/site-packages pip setuptools 65.5.1 ~/mambaforge/envs/ttt/lib/python3.11/site-packages wheel 0.38.3 ~/mambaforge/envs/ttt/lib/python3.11/site-packages ``` <small style="color: gray;"> 6. <code>nictool</code> was installed locally in "editable" mode<br /> 7. <code>numpy</code> was installed using conda (everything else with pip)<br /> 9. <code>requests</code> was installed using <code>--user</code> </small> ## `pip` and `conda` The difference between (and when to use) `pip` and `conda` is one of the most common questions/confusions I see. These are both extensive, complicated tools, with a broad range of functionality, so it's hard to summarize quickly without glossing over important details... but here goes :smile: :::info `pip` installs *Python* packages (mostly from PyPI), and does not manage virtual environments. `conda` installs *any* package (including Python itself, not just Python packages) – mostly from anaconda.org – and can *also* manage virtual environments. ::: Summarized in a table: | | `conda` | `pip` | |-:| :---: | :-: | |**Manages**| binaries | wheel or source| |**Can require local compiler**| No | Yes| |**Package types**| Any | Python only | |**Creates environments**| Yes, built-in | No (use `venv`/`virtualenv`) | |**Strict dependency checks** | Yes | No<sup>*</sup> | |**Default package source**| anaconda.org | PyPI | ::::info **binaries? wheels?** To really understand the motivation for conda – and what "can require local compiler" means in the table above – one must understand a little about "compiled" binaries. This is a bit beyond the scope here, but here's a very brief intro for those interested: :::spoiler Binary files and and C Extensions Python is an "interpreted language". Among other things, this generally means that the job of converting the human-readable source code into executable machine code is done *on* the machine executing the code. The developer ships a `.py` file. ![](https://i.imgur.com/FM5IyVl.png) By constrast, compiled languages like C/C++ are generally converted into machine code *elsewhere* (by the developer), for each platform being supported, and then shipped to the end user as (e.g.) an `.exe` file. ![](https://i.imgur.com/GdAnJl3.png) #### C extensions Lower-level compiled languages like C often perform better than "pure" Python code. However, it's very common for Python developers to write or generate small parts of their code (e.g. just the very frequently used functions) in C. These "[C extensions](https://docs.python.org/3/extending/extending.html)" must then be compiled for each platform. *Many* packages in the scientific python ecosystem have at least some compiled code. If you've ever run `pip install ...` and seen a ton of text fly by with some big red "failed to compile" errors at the end, then you've seen what can happen when you try to install a package that includes C extensions that are *not* pre-compiled for each platform. Not all computers have the programs necessary to compile these extensions, and so when pip tries to install and compile these packages, they may fail. "[Wheels](https://packaging.python.org/en/latest/specifications/binary-distribution-format/)" are a binary distribution format that you'll see on PyPI that allow a developer to *pre-compile* their extensions for every platform they'd like, so that the end user doesn't need to compile it. A wheel can be simply unzipped and dropped into `site-packages`. Conda doesn't use wheels, but a conda package achieves the same goal of distributing pre-compiled files so that the end-user needn't compile them. ::: :::: ## Exercise 4: A caveat when using both `pip` and `conda` :::success Most of the time, it is fine to use both `pip install` and `conda install` in the same environment. Sometimes, you don't have a choice: it is up to the package developer to make their packages available via `conda` and/or `pip` and you will find packages that are only available on `pip`, or only on `conda`. However, you should be aware that there are cases where installing the same package from both package managers can cause problems (regardless of whether *you* install the package, or it get installed as an indirect dependency) ::: Here's an example of something you might very reasonably do that would result in a broken environment. ```bash # create a new environment with python conda create -n doomed_env python=3.10 # activate it conda activate doomed_env # install spyder (which depends on pyqt) using pip pip install spyder # go ahead and launch spyder ... so far so good! spyder # now install pyqt from conda (or... one of many conda packages that depend on it!) conda install pyqt # try to launch spyder again... spyder ``` here's the error I see: ``` WARNING: You might be loading two sets of Qt binaries into the same process. Check that all plugins are compiled against the right Qt binaries. Export DYLD_PRINT_LIBRARIES=1 and check that only one set of binaries are being loaded. ``` **What happened here?** Without going into too much detail: both package managers (`pip` and `conda`) tried to install some stuff into the *same* folder (`site-packages/PyQt5`). However, they installed slightly different "parts" (different compiled binaries) resulting in a package that just can't run. > *note:* This won't always happen: this particular case was caused by the fact that the package is unfortunately called `pyqt` in conda, but `pyqt5` in pip... making it even harder for the two programs to "work together". Moreover, the *order* in which you install things (i.e. `pip`-then-`conda`, vs `conda`-then-`pip`) can also affect whether you run into this. :::info **The main lessons here**: - in general, it's mostly ok to use `pip` and `conda` together - however, be aware that you may *occasionally* find conflicts - If you don't know why it broke, just create a new environment and try to install everything from *either* `pip` or `conda`. - if that's not possible (due to certain package distribution limitations), you might need to spend some time experimenting, then create a [recipe](#Exercise-6-environment-recipes) for your specific package needs. ::: ## Exercise 5: `sys.path` :::success The goal of this section is to understand where Python looks for modules when you type `import PACKAGE` ::: We've learned that packages generally (but not always) end up in the `site-packages` folder in your environment. Let's now discuss where the Python interpreter finds when you `import` them. It's pretty simple: :::info **Python searches for modules on `sys.path`** ::: `sys` is an important module in the [Python standard library](https://docs.python.org/3/library/) (it will always be available to you). ### Viewing `sys.path` Let's look at `sys.path`: Start a python interpreter: ```shell python ``` Now, import `sys` and print out `sys.path`: ```python import sys print(sys.path) ``` You'll see something like this: ```python [ '/Users/talley/mambaforge/envs/ENV_NAME/bin', '/Users/talley/mambaforge/envs/ENV_NAME/lib/python310.zip', '/Users/talley/mambaforge/envs/ENV_NAME/lib/python3.10', '/Users/talley/mambaforge/envs/ENV_NAME/lib/python3.10/lib-dynload', '', '/Users/talley/mambaforge/envs/ENV_NAME/lib/python3.10/site-packages' ] ``` There are three particularly important entries in there. 1. **`.../ENV_NAME/lib/python3.10`**, This is where all of [standard library](https://docs.python.org/3/library/) modules will be found. 2. **`''`**: This **empty string** refers to "the current working directory": which starts as the directory you were in when you launched `python`. If you want to see the current directory in python: ```python import os print(os.getcwd()) ``` 2. **`.../ENV_NAME/lib/python3.10/site-packages`**: This is the `site-packages` folder we [discussed above](#Where-do-packages-go-when-you-install-them). Most of your installed packages should be there. ### Importing modules in the current directory. Let's take advantage of the empty string entry `''` in `sys.path`. Exit out of python (type `exit()`) and create a new file named `mymodule.py` with the following function: ```python # mymodule.py def hello(): print("hi!") ``` Now, start python again ```shell python ``` Then import your new module and use the function ```python import mymodule mymodule.hello() ``` :::success **Take home message:** Any custom modules you've created in the current working directory may be imported directly as long as `''` exists in `sys.path`. ::: ### Modifying `sys.path` `sys.path` is not static: you can modify it like any Python list. One reason you might want to do this is to add a folder of modules with some useful code that you've stored somewhere on your computer: ```python import sys sys.path.append('/Users/talley/my_handy_python_stuff') ``` ... Now I can import any python modules in `/Users/talley/my_handy_python_stuff'`! :::warning Don't get too carried away relying on modifying `sys.path`. If you have a set of custom code that you routinely use across many different environments, consider [creating a proper python package](https://packaging.python.org/en/latest/tutorials/packaging-projects/) that you can `pip install` as usual (remember, packages don't necessarily need to be public on PyPI: you can `pip install` from github, or locally as well) ::: > *If you'd like to know all the nitty-gritty details of how `sys.path` gets initialized at runtime, see the [Python documentation](https://docs.python.org/3/library/syspathinit.html).* ## Exercise 6: Environment recipes :::success :trophy: The goal of this section is to learn strategies for recreating environments with a specific set of packages When completed, you should: ✅ Know how to create and use requirement files in `pip` ✅ Know how to create and use `environment.yml` files in conda ✅ Understand the limitations of environment recipes ✅ Be aware of lock files (that comprehensively list the exact version of every package in an environment.) ::: ### Requirements.txt and environments.yml :::info **Don't get attached to environments; create requirements files!** ::: Environments are made to be broken and (re)created. Don't view an environment as something you worked hard to create "just right", and dread having to recreate. View them as a little sandbox that is isolated exactly *so that it can be* broken without messing up the rest of your system. You may even occasionally need or want to uninstall the *entire* anaconda/miniconda/mambaforge folder, along with all of the environments you've made. If you have an environment that you would be "sad" to loose, you should instead work to create a *recipe* for that environment that you can use to recreate the environment whenever necessary. Both `pip` and `conda` support this. #### `pip` requirements files [Requirements files](https://pip.pypa.io/en/stable/reference/requirements-file-format/) serve as a list of items to be installed by pip, when using pip install. Files that use this format are often called “pip requirements.txt files”, since `requirements.txt` is usually what these files are named (although, that is just a convention, not a requirement). Each line of a requirements file supports the same [requirements specifier syntax](https://pip.pypa.io/en/stable/reference/requirement-specifiers/) that you would use for `pip install ...` ```txt # requirements.txt numpy nd2[legacy] urllib3 @ https://github.com/urllib3/urllib3/archive/refs/tags/1.26.8.zip ``` To install everything listed in a requirements file, use the `-r` flag with `pip install` ```shell pip install -r requirements.txt ``` :::warning :eyes: this will *not* create a new environment, it will install everything in `requirements.txt` into the currently active environment. ::: #### conda environment files `conda` allows you to create a new environment from an [environment file](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) (conventionally, these are called `environment.yml`, but that is not a requirement): Environment files have a specific structure ([see full documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually)). ```yaml #environment.yml name: stats2 channels: - conda-forge dependencies: - python=3.9 - bokeh=2.4.2 - pandas=1.4.4 - flask=2.2.2 ``` To create an environment from an environment file: ```shell conda env create -f environment.yml ``` :::warning :eyes: the name of the new environment is determined by the `name` field in the file itself, but you can override it by appending `-n MY_NAME` ::: ### "Locked" dependencies (for *really* reproducible environments) Sometimes you'd like to have some assurance that you will be able to recreate an environment "indefinitely" in the future (for example: to reproduce an analysis for a paper you wrote 10 years ago). Now, it may appear like the `environment.yml` listed above would provide a fully "reproducible" setup that you could run years later to achieve the same thing But there's a problem: **Each of the dependencies we declared has its *own* dependencies.** (Did you notice how many other things got installed in the `stats2` environment above?) ... and it's completely possible that one of those subdependencies might release a version in the future that changes or breaks one of our direct dependencies, and the result of our code. To create a *comprehensive* list of the pinned versions of every package in our environment, we can use `conda env export`: ```shell env export > environment.lock.yml ``` ... and if we later run `conda env create -f environment.lock.yml`, we should get an exact duplicate of our current environment (at least, as long as we're on the same operating system). ### `conda-lock` Exported/locked environment files are great, but still have some practical difficulties & annoyances. They don't work "effortlessly" cross-platform, it can be hard to update one of the dependencies, and they can still be somewhat slow to solve and install (even though in theory you know exactly what packages are needed). If you do this a lot, consider looking into [conda-lock](https://conda-incubator.github.io/conda-lock/), which solves all of the above with a unified lockfile format. ```shell # install conda-lock conda install -c conda-forge conda-lock # generate a multi-platform lockfile conda-lock -f environment.yml -p osx-64 -p linux-64 ``` :::warning **Locking a pip-based environment** We won't go into them here, but there are also lockfile solutions for the `pip` ecosystem (that do not require using conda): - [pip-tools](https://pip-tools.readthedocs.io/en/latest/) - [Pipenv](https://pipenv.pypa.io/en/latest/) ::: ## Tips & best practices :::success ✅ always work in a virtual environment ✅ don't be afraid to wipe it and start over ✅ avoid installing into the `conda` base environment (and never with pip) ✅ create a "kitchen sink" environment rather than using `base` ::: ### When should you create a virtual environment? Basically, all the time! :joy: - Whenever you'd like to experiment with new packages, or different versions of packages. - If you routinely work on various projects that aren't directly related, creating a small separate environment dedicated to each "task" or project can help avoid surprising behavior. - If you are ever experiencing "strange" behavior, or a package doesn't seem to be working as advertised. The very first thing you should do is create a fresh virtual environment, install only what you need, and see if the problem persists. - Before opening an issue with a software developer (e.g. on github.com or on image.sc) to say "this doesn't work", first make sure you can reproduce it in a clean environment. ### Avoid installing stuff into `base` environment You should (almost) never install things into your base conda environment. Do your work in another environment and leave the base environment only for dependencies that actually manage environments (like `conda` itself, or `mamba`, or other dev-related dependencies like `conda-build`, etc...) In particular, **try to never `pip` install anything into your base environment**. ::::danger :::spoiler **how to "unpip" your base environment** If you ever make a mess of your base environment with `pip`, and would like to restore base to something like its original state, you can run this command (on unix systems) ```shell conda activate base conda list | grep pypi_0 | awk '{print $1;}' | xargs -I {} sh -c "pip uninstall -y {}"; conda install -y --revision 0 conda install mamba mamba update -n base conda -y ``` ::: :::: ### A "kitchen sink" environment Because frequently switching environments can be annoying, I like to create a "kitchen sink" environment (I call it "`all`") that I use for all of my generic tasks, and I install things into it with reckless abandon. ```shell conda create -y -n all python conda activate all ``` If you keep installing things into one environment, it *will* eventually break. At that point, just delete it and recreate it. To make it easier to re-create a complicated environment, use [environment files](#Exercise-6-environment-recipes) (discussed below). To help avoid installing into `base`, you might consider "auto-activating" this environment: ```bash # in your ~/.zshrc or ~/.bash_profile conda activate all ``` ## Integrated Development Environments (IDEs) An "Integrated Development Environment" (IDE) is essentially a text-editor designed for code. They come with a ton of conveniences like autocompletion, syntax highlighting, debuggers, and lots more. They can be a little intimidating at first, but if you plan to do a lot of programming, the investment is well worth it: they will become an indispensible tool. People get a little religious debating the merits of their favorite code editing programs :laughing: ... so I'll refrain from attempting to list pros & cons of each of these. They *all* have a lot to offer, and you *should* try to get comfortable with one of these. For whatever IDE you choose, you should definitely learn how to activate a specific python environment (see links for each IDE below) - [VS Code](https://code.visualstudio.com/) https://code.visualstudio.com/docs/python/environments ![vscode](https://i.imgur.com/oeDlY5A.png) - [PyCharm](https://www.jetbrains.com/pycharm/) https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html ![pycharm](https://i.imgur.com/2byTVRI.png) - [spyder](https://www.spyder-ide.org/) https://docs.spyder-ide.org/current/faq.html#using-existing-environment ![spyder](https://i.imgur.com/Y96AJE9.jpg) - [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) (not really an IDE, but much closer than Jupyter Notebook) Generally, JupyterLab will be installed *in* the environment you want to use... but you can also use [`nb_conda_kernels`](https://github.com/Anaconda-Platform/nb_conda_kernels) to access other environments. ![jupyterlab](https://i.imgur.com/vUY8ZgW.png) :::info **Jupyter Notebook?** While Jupyter Notebook is certainly very useful for *sharing* code with others, and for exploratory analysis, I would discourage you from thinking about Jupyter notebooks as "the place" where one goes to run some Python code. Notebooks are much harder to version control (i.e. in a git repository – they are complicated JSON files, not simple python files), and they discourage code reuse and organization (Notebooks are something of a dead end: people very rarely [`import` from a notebook](https://jupyter-notebook.readthedocs.io/en/4.x/examples/Notebook/rstversions/Importing%20Notebooks.html)) Definitely get comfortable writing python scripts, and using an interactive read-eval-print-loop (REPL) like [IPython](https://ipython.org/) (or even the plain `python` prompt). It will pay off to be comfortable using python without needing to start up Jupyter Notebook. ::: ## Cheat sheet The following is a summary of some of the commands we've discussed here, and what they do: | <div style="width:300px">command</div> | description | |---------|------| | **`conda`** (or `mamba`) | | | `conda create -n ENV_NAME python=3.10` | create an environment named `ENV_NAME` (with Python 3.10 installed) | | `conda env remove -n ENV_NAME` | remove environment named `ENV_NAME` | | `conda info -e` | list all conda environments | | `conda activate ENV_NAME` | activate env named `ENV_NAME` | | `conda deactivate` | deactivate current environment | | `conda install -c conda-forge numpy` | install `numpy` using conda (from the `conda-forge` channel) into the current environment | | `conda install numpy==1.23.4` | install specific version of `numpy` using conda (using whatever channels are in your configuration) | | `conda remove numpy` | uninstall `numpy` from the current environment | | `conda list` | list all packages installed in the current environment| | `conda config --add channels conda-forge` | add the conda forge channel to your config | | **`pip`** | | | `pip install numpy` | install `numpy` (from PyPI) into the current environment | | `pip install numpy -U` | install/update `numpy` to latest version (from PyPI) | | `pip install numpy==1.23.4` | install specific version of `numpy` (from PyPI) | | `pip install numpy` | uninstall `numpy` from the current environment | | `pip list` | List installed packages (add `-v` to show package locations and installer) | Good attributes of the `sys` module to be aware of: | <div style="width:300px">attribute</div> | description | |---------|------| | `sys.path` | A list of strings that specifies the search path for modules. | | `sys.executable` | A string giving the absolute path of the executable binary for the Python interpreter. | | `sys.prefix` | A string giving the site-specific directory prefix where the platform independent Python files are installed |