# Collaborative Document
2021-06-14 Introduction to programming environments with conda.
Welcome to The Workshop Collaborative Document.
This Document is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.adsf
----------------------------------------------------------------------------
## 👮Code of Conduct
* Participants are expected to follow those guidelines:
* Use welcoming and inclusive language.
* Be respectful of different viewpoints and experiences.
* Gracefully accept constructive criticism.
* Focus on what is best for the community.
* Show courtesy and respect towards other community members.
## ⚖️ License
All content is publicly available under the Creative Commons Attribution License: [creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/).
## 🙋Getting help
To ask a question, type `/hand` in the chat window.
To get help, type `/help` in the chat window.
You can ask questions in the document or chat window and helpers will try to help you.
## 🖥 Workshop website
[link](https://escience-academy.github.io/2021-06-14-conda/)
🛠 Setup
[link](https://escience-academy.github.io/2021-06-14-conda/setup/)
## Post-workshop survey
[link](https://www.surveymonkey.com/r/656DRSY)
## 👩🏫👩💻🎓 Instructors
Johan Hidding, Victor Azizi
## 🧑🙋 Helpers
Dafne van Kuppevelt, Sarah Alidoost
## 👩💻👩💼👨🔬🧑🔬🧑🚀🧙♂️🔧 Roll Call
Name/ pronouns (optional) / job, role / social media (twitter, github, ...) / background or interests (optional) / city
- Johan / he, him, his / RSE@Netherlands eScience Center / tw&gh:jhidding / Amersfoort
- Lieke de Boer / she, her / Community Manager eScience Center / neuroscience + open science / Berlin
- Dafne van Kuppevelt / she, her / Research Software Engineer / @dafnevk / machine learning, NLP, teaching / Utrecht
- Victor Azizi / he, him, his / Research Software Engineer / High performance computing / Utrecht
- Yifang Shi / she,her/ Scientific developer @ Amsterdam University / Remote sensing, LiDAR / Amsterdam
- Anne Stultiens / she, her / PhD candidate clinical chemistry / Chemist / Maastricht
- Damian Audley /he / Instrument Scientist / Detector development & calibration / SRON Netherlands Institute for Space Research / Groningen
- Sanne Rutten / MRI Data Scientist / PhD Cognitive Neuroscience
- / Maastricht
- Cristian Nogales/ he, him, his/ PhD candidate, Maastricht University / @cristianngcl / Maastricht
- Tousif Jamal/he, him, his/ PhD Candidate, Radboud Universiy/ @t0us1f/ computational Neuroscience/ Nijmegen
- Leonardo, PhD clinical neuroscience, Maas
- Leila Inigo / she , her/ PhD student, TU Delft
- n Movement Sciences/Amsterdam
- Richard van Dijk, Research software engineer, University Leiden, LIACS, data science.
- Robbin Romijnders / he, him, his / PhD student / Kiel University, Germany
- Na Chen / she, her/ PhD candidate / Wageningen University
- Willem Jan Vreeling, SRON Groningen, engineer
- Sarah Alidoost/ She, her/ RSE eScience Center
- Yahua Zi/ she,her/PhD candidate/@ziyahua (twitter)/Amsterdam
- Shengnan Liu/ she, her/Postdoc fellow/Erasmus MC, Rotterdam
## Icebreaker: introduce yourself in three words
- Johan: music loving astronomer
- Leonardo: tennis, neuroscience, food
- Victor: Climbing gardening programmer
- Lieke: Food, science, crafts
- Richard: I do icebreakers on Mars (cycling, running, books, ....
- Dafne: outdoors, food, pregnant
- Anne: cycling, foodie, health
- Cristian: Adrenaline junkie boulderer
- Tousif: Football, Food, Travelling
- Robbin: football, cycling, reading
- Yifang: sport, trees, LiDAR
- Fangxin: sport science, move, jogging
- Sanne: Sport, Eurocup, reading
- Damian: space, electronics, programming
- Prateek: Satellies, Trees, Programming
- Yahua: Sport science, Yoga, Tennis
- Willem: cycling (repairing bicycle), swimming, reading
- Leila: dancing , science , love
## 🗓️ Agenda
09:00 Welcome and icebreaker
09:15 Introduction
09:45 Working with environments
10:45 Coffee Break
11:00 Using packages and channels
11:30 Coffee Break
11:45 Sharing environments
12:45 Wrap-up
13:00 END
## 🧠 Collaborative Notes
To check where your python executable is installed:
- Open a python console (e.g. by typing `python` in your terminal)
- Type:
```python
import sys
sys.prefix
```
And to see what is in on the path:
```python=
sys.path
```
### Create directory
If you haven't done it yet, create a new `introduction-to-conda-for-data-scientists`{: .ic } directory on your Desktop in order to maintain a consistent workspace for all your conda environment.
For osx and linux the following commands should create the directory
```bash
$ cd ~/Desktop
$ mkdir introduction-to-conda-for-data-scientists
$ cd introduction-to-conda-for-data-scientists
```
For Windows users you may need to reverse the direction of the slash and run
the commands from the command prompt.
~~~bash
> cd ~\Desktop
> mkdir introduction-to-conda-for-data-scientists
> cd introduction-to-conda-for-data-scientists
~~~
## environments
To create a new environment called `my_env` with a specific python version, and the dask package:
```bash
conda activate #activate base environment
# create new environment with python 3.6 and dask
conda create --name my-env python=3.6 dask
conda activate my-env #activate the newly created environment
```
You see at the beginning of your prompt `(my-env)`, indicating that you're in the environment. And if you open python, you see that you have the correct version number.
If you are already in an environment, you can install new packages
```bash=
conda create ml-env python=3.6 pandas scikit-learn
conda activate ml-env
conda install numba
```
This will also update existing packages. To prevent that:
```bash=
conda install --freeze-installed numba
```
(Installing Python this way is a bit different, because all packages depend on python)
To see which versions exist:
```bash
conda search dask
```
To install a package from outside an environment
```
conda install --name=my-env dask
```
To see where conda environments 'live' on your system
- By default they live somewhere in your home folder `/Users/$USERNAME/miniconda3/envs` or `C:\Users\$USERNAME\Anaconda3`(windows)
- `/home/$user/miniconda3/envs` (linux)
- We can see all environments by using ls (linux) or dir (windows) on this folder.
- Or we can run `conda info –envs` or `conda env list`
to see how much space they take (in bash on OS/Linux):
```bash
cd /home/$user/miniconda3/envs
du -hd 1`
```
The base environment is the miniconda installation itself, so it is located one directory up, e.g.: `/home/$user/miniconda3`
## Local conda environments
- Conda environments can be installed in a different place
- Local conda environments do not show up with `conda env list`
- Local environments make the code more reproducible
- Uses the `--prefix` argument instead of the `--name` argument
-
Now go back to our workshop directory, and create a project directory:
```bash=
cd ~/Desktop/introduction-to-conda-for-data-scientists/
mkdir project-dir
cd project-dir
# Now create conda env with prefix and R installed
conda create --prefix=./env r-base
```
This environment is now stored locally in our project directory
To activate it, we need to specifiy the path:
```bash=
conda activate ./env
```
To shorten the prefix in your command prompt:
```bash=
conda config --set env_prompt '({name})'
```
### More commands
To deactivate an environment:
```bash=
conda deactivate
```
to list all environments:
```bash=
conda env list
```
To remove an environment:
```bash=
conda remove --name my-env --all
```
To remove a local environment, use --prefix instead of --name
### Packages and channels
if we search a package:
```bash
conda search dask
```
We see which versions can be downloaded as an package with conda. you can also see that the pacakage is found in a certain channel. You can also search on the anaconda website.
### What are conda packages?
A conda package is a compressed archive file (`.tar.bz2`) that contains:
* system-level libraries
* Python or other modules
* executable programs and other components
* metadata under the `info/` directory
* a collection of files that are installed directly into an `install` prefix.
Conda keeps track of the dependencies between packages and platforms; the conda package format is
identical across platforms and operating systems.
### Package Structure
All conda packages have a specific sub-directory structure inside the tarball file. There is a
`bin` directory that contains any binaries for the package; a `lib` directory containing the
relevant library files (i.e., the `.py` files); and an `info` directory containing package metadata.
As an example of Conda package structure consider the [Conda](https://pytorch.org/) package for
Python 3.6 version of PyTorch targeting a 64-bit Mac OS, `pytorch-1.1.0-py3.6_0.tar.bz2`.
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>.
├── bin
│ └── convert-caffe2-to-onnx
│ └── convert-onnx-to-caffe2
├── info
│ ├── LICENSE.txt
│ ├── about.json
│ ├── files
│ ├── git
│ ├── has_prefix.json
│ ├── hash_input.json
│ ├── index.json
│ ├── paths.json
│ ├── recipe/
│ └── test/
└── lib
└── python3.6
└── site-packages
├── caffe2/
├── torch/
└── torch-1.1.0-py3.6.egg-info/
</pre></div>
</div>
### Channels
To install packages from a specific channel:
```
conda create --channel main --name ml-env pytorch
```
You may specify multiple channels for installing packages by passing the --channel argument multiple times, conda will adhere to the order you specify":
```bash=
conda install --channel conda-forge --channel bioconda scipy=1.6
```
conda-forge contains many more packages, it is curated but not as strictly as the conda main channel.
### Conda vs Pip
When using Conda to manage environments for your Python project it is a good idea to install packages available via both Conda and Pip using Conda; however there will always be cases where a package is only available via Pip in which case you will need to use Pip. Many of the common pitfalls of using Conda and Pip together can be avoided by adopting the following practices.
- Always explicitly install pip in every Python-based Conda environment.
- Always be sure your desired environment is active before installing anything using pip.
- Prefer `python -m pip install` over `pip install`; never use `pip` with the `--user` argument.
Some Python packages are not on any of the conda channels, but available to install through pip. But we need to have pip install in the environment!
```bash=
conda activate ./env
conda install pip
```
You can check whether you are using the pip within conda:
```bash=
which pip
```
To install a package with pip (do not use --user option!!!!)
```bash=
pip install noodles
```
Or, to really make sure you use the right pip:
```bash=
python -m pip install noodles
```
To list all packages in environment:
```bash=
conda list
```
This also shows the pip-installed packages.
### Sharing environments
Let's start with a new environment and create a new project directory
```bash=
conda create --name conda-workshop
conda activate conda-workshop
mkdir my-project
cd my-project
```
We create an environmetn file with your favorite text editor. Create a new file in this directory, e.g. with the nano editor:
```bash=
nano environment.yml
```
Give the file the following contents:
```
name: machine-learning-env
dependencies:
- ipython
- matplotlib
- pandas
- pip
- python
- scikit-learn
```
Better is to specify exact version numbers of the packages (usually specify minor version number, not the patch number , so 3.1 instead of 3.1.1):
```
name: machine-learning-env
dependencies:
- ipython=7.13
- matplotlib=3.1
- pandas=1.0
- pip=20.0
- python=3.6
- scikit-learn=0.22
```
Suppose you use git for version control, always add environment.yml to your git repository!
```bash=
git init
git add environment.yml
git commit -m "Create enviornment.yml"
```
(Can also use environment.yml within Continious Integration system, e.g. on github! check out our other courses to learn about version control and continious integration)
Now we can create an environmen from this file:
```bash
conda env create --prefix ./env --file environment.yml
conda activate ./env
```
(In a git repository, we should not add ./env to our environment! Add this to the `.gitignore` file)
Now if we install a new package in the environent, we can update the environment.yml file:
```bash
conda install dask
conda env export
```
If you want to export an enviroment that you are not in, use the `--name` option
To make a much cleaner version that only includes the packages that you explitely installed
```bash
conda env export --from-history
```
You can edit it manually to give it a nicer name.
You can also specify the channels (this is done by default when generating the environment).
Suppose we add a package to our enviornmentfile, for example:
```
name: machine-learning-env
dependencies:
- ipython=7.13
- matplotlib=3.1
- pandas=1.0
- pip=20.0
- python=3.6
- scikit-learn=0.22
- flask
```
Then we can update our environment with:
```bash=
conda env update --prefix ./env --file environment.yml --prune
```
To completely rebuild from scratch, force to overwrite ./env:
```bash=
$ conda env create --prefix ./env --file environment.yml --force
```
To specify a pip package to our environment file:
```
name: machine-learning-env
dependencies:
- ipython=7.13
- matplotlib=3.1
- pandas=1.0
- pip=20.0
- python=3.6
- scikit-learn=0.22
- flask
- pip
- kaggle==1.5
- yellowbrick==0.9
```
Notice the double equal sign for the pip packages!
## Exercises
### Exercise 1 (OLD!):
What would you use environments for? What are the benefits? What are the drawbacks? Have you ever experienced any real-life problems that would have been solved by environments?
- Room 1:
- Specific software package for analyizing brain data
- Develop and run software simultaniously
- No need to downgrade the whole machine to run software
- Easy of sharing software with collegues
- Orderly manage projects
- Being able to upgrade your system without breaking the software
- Room 2:
- avoid conflicts between libraries
- separate environments for projects that have different dependencies
- some packages we couldn't install outside of Conda
- Room 3:
- different versions of packages
- running on different machines or cpu / cores in one machine?
- no drawbacks! Maybe too flexible then you loose insight to keep all package up-to-date?
- real-life experiments: sharing test results
- reproduce result with collaborators
- Room 4:
- conflict while using different versions of packages
- drawbacks: remember member
- different versions
- real life experience: easily switch between different libraries, e.g. tensorflow 1 and tensorflow2
## Exercise 2
1. Create a new environment called “machine-learning-env” with Python and the most current versions of IPython, Matplotlib, Pandas, Numba and Scikit-Learn.
2. How would you specify a specific version of python to work with in this environment ?
Link to conda commands: [Conda general commands](https://docs.conda.io/projects/conda/en/latest/commands.html
)
Answers:
- Room 1:
conda create --name=machine-learning-env IPython Matplotlib Pandas Numba Scikit-Learn
conda create --name=machine-learning-env python=3.8 IPython Matplotlib Pandas Numba Scikit-Learn
- Room 2:
- Room 3:
- conda create --name=machine-learning-env python dask IPython Matplotlib Pandas Numba Scikit-Learn
- conda create --name=machine-learning-env python=2.7 dask IPython Matplotlib Pandas Numba Scikit-Learn
- Room 4:
1. $ conda create --name=machine-learning-env python ipython matplotlib pandas numba scikit-learn
2. Adding the specific version number after the package name
## Exercise 3
Dask provides advanced parallelism for data science workflows enabling performance at scale for the core Python data science tools such as Numpy Pandas, and Scikit-Learn. Have a read through the [official documentation](https://docs.conda.io/projects/conda/en/latest/commands.html) for the `conda install` command and see if you can figure out how to install Dask into the `machine-learning-env` that you created in the previous challenge.
- Anne: conda activate machine-learning-env + conda install dask
- Binosha
- Cristian: conda activate machine-learning-env --> conda install dask
- Damian: conda install dask
- Fangxin:
conda activate machine-learning-env
conda install dask
- Leila : `conda activate machine-learning-env`, `conda install dask
- `ning-env`, `conda install dask`
- Leonardo: conda activate machine-learning-env ; conda install dask (/numba)
- Na
-conda activate machine-learning-env
-conda install dask
- Richard:
- conda activate ml-env (it works!)
- conda uninstall numba
- conda install numba
- conda deactivate ml-env
- conda uninstall --name=ml-env numba
- conda list --name=ml-env
- Robbin
- Sanne: conda install dask (after: activating the env)
- Tousif: conda activate machine-learning-env; conda install dask.
- Prateek
- Yahua: conda activate machine-learning-env
- Yifang 1:conda deactivate 2:conda activate machine-learning-env, 3: conda install dask
- Willem Jan: it all works
-
- Shengnan:
- conda activate machine-learning-env
- conda install dask
# Exercise 4
Like many projects, PyTorch has its own channel on Anaconda Cloud. This channel has several interesting packages, in particular pytorch (PyTorch core) and torchvision (datasets, transforms, and models specific to computer vision).
Create a new directory called my-computer-vision-project and then create a Python 3.6 environment in a sub-directory called env/ with the two packages listed above. Also include the most recent version of jupyterlab in your environment (so you have a nice UI) and matplotlib (so you can make plots).
- Anne
- mkdir my-computer-vision-project
- cd my-computer-vision-project/
- conda activate
- conda create --prefix=./env --channel=main pytorch python=3.6 jupyterlab matplotlib
- Binosha
- Cristian
- conda create --channel=pytorch --prefix=./env python=3.6 jupyterlab matplotlib
- Damian
- conda create --channel=pytorch --prefix=./env python=3.6 jupyterlab matplotlib
- conda activate ./env
- Fangxin
- Leila
-
- Leonardo:
mkdir my-computer-vision-project
cd my-computer-vision-project
conda deactivate
conda create --prefix=./env --channel=main pytorch python=3.6 --channel pytorch (or diff)
- Na
--conda create --prefix=./my-computer-vision-project python=3.6 jupyterlab matplotlib
- Richard
- conda create --prefix=./env python=3.6 --channel pytorch
- Robbin
- Sanne
- mkdir my-computer-vision-project
- cd my-computer-vision-project
- conda create -p=./env --channel=main pytorch python=3.6 pytorch torchvision jupyterlab matplotlib
- Tousif
- Prateek
- Yahua
- Yifang `mkdir my-computer-vision-project` `cd my-computer-vision-project` `conda create --name=env python=3.6 jupyterlab matplotlib`
- Willem it works
- Shengnan
- mkdir my_computer-vision-project
- cd my_computer-vision-project
- conda deactivate
- conda create --prefix=./env --channel=pytorch pytorch python=3.6 jupyter matplotlib
-
## Exercise: create a new environment from a yaml file
Create a new project directory and then create a new environment.yml file inside your project directory with the following contents.
```
name: scikit-learn-env
dependencies:
- ipython=7.13
- matplotlib=3.1
- pandas=1.0
- pip=20.0
- python=3.6
- scikit-learn=0.22
```
Now use this file to create a new Conda environment. Where is this new environment created? Using the same environment.yml file create a Conda environment as a sub-directory called env/ inside a newly created project directory. Compare the contents of the two environments.
- Anne
- Binosha
- Cristian
- code environment.yml
- conda env create --prefix ./scikit-env --file environment.yml
- Damian
- conda env create --prefix ./scikit-env --file ./environment.yaml
now works OK
- Fangxin
- Leila
- Leonardo
- Na
-- conda create --prefix./envs
-- cd envs
-- nano environment.yml
-- git init
-- git add environment.yml
-- conda env update --prefix./env --file environment.yml
- Richard:
- mkdir my-project
- cd my-project
- notepad environment2.yml
- git init
- git add ...
- git commit ...
- conda env create --prefix ./env2 --file environment.yml
-
- Robbin
- Sanne
- Tousif: mkdir my-computer-vision-project cd my-computer-vision-project conda create --name=env python=3.6 jupyterlab matplotlib
-
- Prateek
- Yahua
- Yifang
- Willem Jan: conda env create --name=johan --file enviro.yml; conda activate johan
-
-
- Shengnan
### Sharing environments
```
name: machine-learning-env
dependencies:
- ipython=7.13
- matplotlib=3.1
- pandas=1.0
- pip=20.0
- python=3.6
- scikit-learn=0.22
```
`conda env create --prefix ./env --file environment.yml`
`conda env export --name base`
`conda env export --name machine-learning-env --from-history`
## Tips and tops
Tips (what we can improve on):
- Test the commands in the course better (From Johan)
- It was perhaps very fast and confusing at the end , which I consider one of the most useful parts
-
-
-
Tops (what we did well):
- Very good combination: online doc and presentation on Zoom
- good authentic examples
-
-
-
-
## 📚 Resources
Link to conda commands: [Conda general commands](https://docs.conda.io/projects/conda/en/latest/commands.html
)
[Workshop Survey](https://www.surveymonkey.com/r/656DRSY)
[eScience Center Digital Skills Programme](https://escience-academy.github.io/)