owned this note
                
                
                     
                     owned this note
                
                
                     
                    
                
                
                     
                    
                
                
                     
                    
                        
                            
                            Published
                        
                        
                            
                                
                                Linked with GitHub
                            
                            
                                
                                
                            
                        
                     
                
            
            
                
                    
                    
                
                
                    
                
                
                
                    
                        
                    
                    
                    
                
                
                
                    
                
            
            
         
        
        ---
tags: Conda
---
https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/
# Introduction to Conda for (Data) Scientists
## 8th January 2021 9-12.30 CET
> Note: **Be aware that below is a mix of how it is suggested in the material and how we will do it. To be cleaned to leave only the 'how we do it' after everyone agreed that we do it that way** [name=Samantha]
### Timetable (preliminary suggestion, to be tested)
> Note: below suggested to switch 'using packages and channels' with 'sharing environments'  [name=samantha] 
| Time | Lesson | Teacher |
| -------- | -------- | ------- |
|9-10 | Getting started, Working with Environments pt1 | Naoe, Samantha, Anne |
|10-10.15| Break | |
|10.15-11.15| Working with Environments pt2* , Using packages and channels pt1*| Anne |
|11.15-11.30| Break | |
|11.30 - 12.00| Using packages and channels pt2*, Sharing environments*, | Anne, Samantha |
\* placement to be tested
### More detailed timetable (preliminary suggestion)
> Note: below suggested to switch 'using packages and channels' with 'sharing environments'  [name=samantha] 
> :+1: I think the switch makes sense[name=naoe]
| Time | Lesson, Exercise |
| -------- | -------- |
|**9-9.10**| **Welcome, Code of conduct, why we are here and what we do**|
| | **Getting started, Working with Environments pt1** |
|9.10 - 9.30| Getting started|
|9.30 - 9.45| Working with environments until end of installing packages into environments|
|9.45 - 10| 'Say hi to your breakoutroom friends', Creating , activating, deactivating exercises|
|**10-10.15**|**Break** |
| |**Working with Environments pt2, Using packages and channels pt1** |
|10.15 - 10.50| Working with environments to the end|
|10.50 - 11.05| Installing packages, listing, deleting exercises|
|11.05 - 11.15| Using packages and channels pt1  |
|**11.15-11.30**|**Break**|
||**Using packages and channels pt 2, Sharing environments**|
|11.30 - 11.50| Using packages and channels pt 2, channel exercise as demo
|11.50 - 12.05| Sharing environments 
|12.05 - 12.20| Creating and updating from yml exercises
|**12.20 - 12.30**| **Feedback, ending**
## Exercise sessions
* 9.45 - 10:  'Say hi to your breakoutroom friends', Creating , activating, deactivating exercises
    * https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/02-working-with-environments/index.html
    * Creating a new environment
    * Activate an existing environment by name
    * Deactivate the active environment
* 10.50 - 11.05: Installing packages, listing, deleting exercises
    * https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/02-working-with-environments/index.html
    * Installing a package into a specific environment
    * Creating a new environment as a sub-directory within a project directory
    * Activate an existing environment by path
    * Listing the contents of a particular environment.
    * Delete an entire environment
* DEMO by the instructor: 
    * Specifying channels when installing packages 
    * ? Alternative syntax for installing packages from specific channels?
* 12.05 - 12.20: Creating and building from yml exercise
    * https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/03-sharing-environments/index.html
    * Create a new environment from a YAML file.
    * Add Dask to the environment to scale up your analytics
* Optional: xx
## Times (as suggested in lesson material):
1. Getting started with Conda 10 + 5 min
2. Working with Environments 60 + 15 min
3. Sharing Environments 30 + 15 min
4. Using packages and channels 20 + 10 min
5. Managing GPU dependencies 45 + 15 min
## Times in our preliminary timetable (lesson + exercise)
1. Getting started with Conda 30 min
2. Working with Environments 95 min
3. Using packages and channels 30 min
4. Sharing Environments 30 min
## Teaching (as suggested in lesson material, ~~strike~~ is our suggestion to leave out):
> Note: all times and content as suggested in lesson material
**1. Getting started with Conda 10 min**
What is Conda?
Why should I use a package and environment management system?
Why use Conda (+pip)?
**2. Working with Environments 60 min**
What is a Conda environment
Creating environments
Activating an existing environment
Deactivate the current environment
Installing a package into an existing environment
Where do Conda environments live?
How do I specify a location for a Conda environment?
Listing existing environments
Listing the contents of an environment
Deleting entire environments
**3. Sharing Environments 30 min**
Working with environment files
~~Making Jupyter aware of your Conda environments~~
**4. Using packages and channels 20 min**
What are Conda packages?
How do I install a package from a specific channel?
What are Conda channels?
How do I install a package from a specific channel?
My package isn’t available on the defaults channel! What should I do?
What actually happens when I install packages?
~~5. (Managing GPU dependencies)~~
## Exercises:
(all times and content as suggested in lesson material)
> Note : checkboxes indicate suggestions on which exercises to do/demo [name=samantha] 
**1. Getting started with Conda  5 min**
- [x] Discussion benefits vs costs for having envs for each project
**2. Working with Environments  15 min**
- [x] Creating new environment
- [x] Activate environment
- [x] Deactivate active environment
- [x] Installing a package into a specific environment
- [ ] Installing packages into Conda environments using pip
- [x] Creating a new environment as a sub-directory within a project directory
    - See a note about `tensorflow` version. 
- [x] Activate an existing environment by path
- [ ] Conda can create environments for R projects too!
 
- [x] Delete an entire environment
**3. Sharing Environments  15 min**
- [x] Create a new environment from a YAML file.
    - `xgboost=1.0` does not work. See note below for further details.
- [x] Add Dask to the environment to scale up your analytics
    - not sure about this one. Helpful, not helpful, what do you think?
    - [name=Radovan] I think it's helpful because I didn't know how to modify an existing environment when adding files to it. This is a nice alternative to remove+recreate.
- [ ] Create a kernel for a Conda environment
**4. Using packages and channels 10 min**
- [x] Specifying channels when installing packages (demo)
- [ ] Alternative syntax for installing packages from specific channels
    - instead maybe an overview of different ways of specifying channels
        - conda-forge::xgboost
        - --channel conda-forge
        - adding conda-forge to defaults
        - ...? 
---
## Notes
* [name=Samantha]
    * mention that Anaconda is not the only way and how these are related
        * virtualenv
        * pip
        * docker
        * pipenv
        * venv
        * Poetry?
        * pyenv?
        * ...
        * this maybe in the beginning or the end??
        
    * how to use anaconda navigator, and what its good for
    
    * show some trouble shooting?
        * check same package in other channel
        * check lower version
        * lower python version
        * search on https://anaconda.org/anaconda/
    * hints on how to fix 'could not satisfy dependencies'
        * downgrade python?
        * ...
    * a note on source activate ? 
        * https://docs.conda.io/projects/conda/en/latest/release-notes.html#id225
    * terminal always starts in base environment, how to not let it?
        * ```conda config --set auto_activate_base false``` or
        * ```changeps1: False``` in .condarc
    * Where do Conda environments live?
        * MacOS, Linux: ```which python``` from within environment (/Users/username/anaconda/bin/python, /home/username/anaconda/bin/python)
        * Windows: ```where python``` from within env (C:\Users\username\Anaconda3\python.exe)
    * citing Anaconda in paper (change as needed)
        * Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. <https://anaconda.com>
        
    * condarc: https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#searching-for-condarc
    
    * there is a few comments on version control in the note boxes. Maybe mention basics of it in beginning?
    * how to create environment file without all dependencies but just eg pandas=1.2.3 and matplotlib=1.2.4 
        * -> conda env export --from-history (this also solves OS interoperability issue?)
    * summary in the end with everything important (similar to: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) 
    
---
*  [name=Naoe]
    *  Setup
        *  Is everything (all the commands) in the following episodes done in the directory (workshpace) made in the last part of the [setup](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/setup/)? [name=davidrpugh] Yes. I have added a note at the end of the setup instructions to make this clear.
        *  Some who had installed anaconda more than a while ago may encounter some problems as conda becomes unacceptably slow (which I experienced). In my case, uninstalling anaconda completely and installing miniconda instead solved the performance problem. [name=davidrpugh] Yes. I have added a note in the setup instructions mentioning this possibility and encouraging a fresh install of Anaconda/Miniconda.
        *  (Mac) [Installing miniconda](https://conda.io/projects/conda/en/latest/user-guide/install/macos.html) does not specify `activate`, but it seems both this installation guide and the lesson material ([ep2, "Returning to the base environment"](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/02-working-with-environments/index.html) assume that `conda init` was run at some point of the installation process. I tried installing miniconda by using shell script (not using pkg) but was not asked to `conda init`. 
    
    *  Ep. 2: Exercise "Creating a new environment as a sub-directory within a project directory"
        *  The latest version of `tensorflow` that can be installed via `conda` is 2.0.0 ([name=naoe] failed with `tensorflow=2.1` as it is written). 
        *  It is meanwhile possible to install a higher version by `pip`. (ex. after installing 2.0.0 via `conda`, running `pip install --upgrade tensorflow` upgrades `tensorflow` to 2.4.0, but it seems the system was hung. Probably this is related to ep.3 "While you should never version control the contents of your env/ environment sub-directory"?) 
        *  Ref: [Issue #40](https://github.com/carpentries-incubator/introduction-to-conda-for-data-scientists/issues/40)
        * [name=Samantha] on Ubuntu16 with anaconda3 it works, **on macos mojave not**, windows 10 it tooks few min to solve environment but worked (with miniconda it would have needed to download 450MB)
        * [name=davidrpugh] Not sure what the best solution is here. Support for TF on Mac is pretty poor relative to Linux and Windows. Simplest approach would be to leave off the version number and allow Conda to pick the most recent version of TF available for the OS consistent with the constraints on the other versions. Alternatively, the focus of this episode and exercise is installing using `--prefix` into a sub-directory: complexity of TF is a distraction to this objective. Probably better to defer discussion of TF envs until the episode on GPUs. Thoughts?
    
    *  Ep. 3: Exercise "Create a new environment from a YAML file."
        *  `xgboost=1.0` is not found. Removing "=1.0" does not work either.
        *  `conda search xgboost` returns no match found for `xgboost`, but `*xgboost*` returns many variations, including `py-xgboost`.
        *  By changing `xgboost=1.0` to `py-xgboost`, it went fine.
        *  After this, `xgboost` was found in channel `conda-forge`
        *  XGBoost documentation's [installation instruction](https://xgboost.readthedocs.io/en/latest/build.html) seems supporting `pip3 install xgboost`
        *  per 29. Dec. 2020, not any relevant issue is filed.
        *  [name=Samantha] same here. py-xgboost without fixed version is found, not xgboost as is, only from conda-forge channel
        *  [name=davidrpugh] Reason that I didn't use `py-xgboost` is that it installs very old version of `xgboost`. The best way to install `xbgoost` with conda is to install from `conda-forge` channel. To avoid having to introduce channels concept, I can simply not use `xgboost` in this exercise as the point it to show how to create env from `environment.yml` file. Thoughts?
    *  Ep. 3: Exercise "Add Dask to the environment to scale up your analytics"
        *  the solution shows `conda env update --prefix ./env --file environment.yml --force`, but shouldn't it be `conda env update --prefix ./env --file environment.yml --prune`? If it is `conda env create`, then it should be `--force` at the end to make it from scratch?
            *  [name=Samantha] true, ```conda update``` does not have ```--force``` argument, only ```conda create``` does. ```conda update``` only has ```--force-reinstall```, which may result in same thing, ```--prune``` may then still be added to remove unneccessary pakcages. However both ```conda create ... --force``` and ```conda update ... --prune``` would work here, the first one removes and builds evertyhing from scratch, the second one just finds the packages that are not yet installed and adds them. The latter may result in unsatisfyable dependencies issues, so it may be better to suggest to do the whole thing from scratch (?) The difference between ```conda update... --force-reinstall``` and ```conda create ... --force``` is that the first one reinstalls all user requested packages (all that are mentioned in the yml, if there is packages that are not any more in yml or have been installed by conda install, they stay as they are), the latter creates completely new environment. Everything that is not in the environment.yml but for whatever reason was in the environment in question before will not be there anymore after this operation. 
            *  [name=davidrpugh] All true and very well said! I have fixed the typo in the solution to this exercise and provided the alternative solution using `conda env update` command.
        *  What is the difference between `conda install --prefix ./env dask=2.16` and specifying `dask` in `environment.yml` and update the environment? 
            *  [name=Samantha] as I understand it: for you, right now, there is no difference. But if you created the environment from yml, then you do conda install something later. This later installed package will not be in your yml (and with that, if the yml is under version control also not under version control) unless you create a new one (easy to forget). If you put it right away into the yml you are ready to share it without creating a new yml. 
            *  [name=davidrpugh] All true and very well said. I tend to avoid installing packages into existing envs directly (unless I an doing quick protoyping). Installing into existing envs may result in changes to the contents of the environment (unless you use `--freeze-installed` option).
    *  Ep. 4
        *  What if a package written in environment.yml is not found via `defaults` channel but can be found via `conda forge`? There is description of how to write when packages need to be installed via `pip`but information is missing about how to change channel in environment.yml. 
            *  [name=Samantha] this also does not work for me with '=' under the pip requirements, it suggests to use '==' instead, does it work for you with '=' as written in the exercise?
                *  [name=Naoe] No, '=' did not work for me, either. On the other hand, I understood how to specify channel (ex. `conda forge`) in `environment.yml`
            * see also below
            * [name=davidrpugh] Yes. I have not put an explicit description of how to added channel priorities to environment.yml files. I have opened an [issue](https://github.com/carpentries-incubator/introduction-to-conda-for-data-scientists/issues/44) for this.
        *  As one way to specify a channel to install a package `channel::package=version` is used without explicit explanation in `kaggle` example under "My package isn’t available on the defaults channel! What should I do?" section. This syntax is explained within an exercise "Alternative syntax for installing packages from specific channels" at the bottom. If the command above is used in a demo, probably learners will wonder about the difference. 
        *  [name=davidrpugh] Probably the best solution here would be to change the exercise into a callout box.
- [name=Radovan]
  - "Working with Environments", "Activating an existing environment": I think it would be better if we activated and deactivated the environment that we just created (machine-learning-env) instead of the one that maybe only the instructor created.
    - Aha OK, later we practice this. I see. 
  - "use the deactivate command" -> "use the conda deactivate command"
  - "To simply return" -> "To return" (avoid all "simple/simply")
  - Creating an environment file: "if we intended this environment file to be used to create an environment inside a sub-directory call ./env of the project directory, then we should set then name key to null as follows"
    - I don't understand the "null". For me this works fine with a name and an environment file with a "null" name I find strange.
  - Formatting problems:
    - "A [conda package][conda-pkg-docs] is a compressed tarball"
    - "Again from the [Conda documentation][conda-channels-docs]" 
  - Suggestion to rename/explain "tarball" to "archive" (when I started learning Linux I did not understand the meaning of the word "tarball")
  - Flowchart in "What actually happens when I install packages?" too tiny
  - Missing (or I missed it?): how to specify channel in `environment.yml` since this is advocated earlier as the best practice compared to installing "interactively".
    - And now I see it was suggested by others further down in this document. 
---
## Issues (to be filed?)
### binder not working
done: https://github.com/carpentries-incubator/introduction-to-conda-for-data-scientists/issues/43
### xgboost:
* xgboost or xgboost=1.0 cannot be found in (at that point of the workshop) current/default channels
* it can be found in conda-forge channel (up to version 1.3)
* from anaconda channel it can be found via py-xgboost (version < 1.0)
*  In the xgboost documentation, installation via pip3 is recommended: https://xgboost.readthedocs.io/en/latest/build.html
suggestions to fix 
* using py-xgboost (no need to intrduce channels yet, neither to go into too much details about pips requirements.txt)
* introducing channels first, then use conda-forge
* using pip from within environment.yml (to not have to introduce channels yet): new environment.yml for the exercise:
```
name: xgboost-env
dependencies:
  - ipython=7.13
  - matplotlib=3.1
  - pandas=1.0
  - pip=20.0
  - pip:
    - xgboost==1.0 
  - python=3.6
  - scikit-learn=0.22
```
>  :arrow_up: Naoe got `pip failed`. A part of the error message shows: "XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?". `=` instead of `==` (which the lesson ep. 4 shows) failed as well. There the error message asked did you mean `==` but `=`. However, without specifying version of xgboost went well. Using `condaforge::xgboost=1.3` worked well, too. The installed xgboost was version 1.3.1, but when I specify `xgboost==1.3`, then I got an error message "ERROR: Could not find a version that satisfies the requirement xgboost==1.3"  
However, this may be confusing as well, since pip needs '==' instead of '='. (also, takes longer) and this way is presented in following episodes exercise (Installing via pip in environment.yml files)
In dask exercise similar: dask-xgboost not in default channel, but in conda-forge or via pip https://github.com/dask/dask-xgboost
* [name=Samantha] suggests: using py-xgboost (easiest to grasp) but hinting also towards the pip from within environments.yml (sometimes necessary), and when channels come up also hinting towards that adding conda forge to default channels may be a solution. About the concern that way 2 is presented in following episode I suggest to switch epsiode ‘using packages and channels’ with ‘sharing environments’.
*  [name=Naoe] agrees with Samantha's suggestion above using py-xgboost. Also switching ep. 3 and 4, but "Installing via pip in environment.yml files", which is in ep 4 now, should wait until ep. 3. 
### conda update/conda create
* Ep3: ```conda update``` does not have ```--force``` argument, but ```--force-reinstall``` which is different from ```conda create ... --force``` (first only reinstalles what is given in yml, leaves other packages (that may have been installed via conda install) as is, the latter removes everything and creates new environment with only the packages mentioned in yml (?))
* Tested updating env by removing scikit-learn from environment.yml and running the command ```conda env update --prefix ./env --file environment.yml --prune``` . It does not remove scikit-learn package. But it works for changing the version of the package. Are some of the packages downloaded by default?
### adding channels to environment.yml
adding an example for adding channels in environment.yml and also hint to that its possible to add channels 'forever' to the default channnel via eg condarc (tbd:find easiest, conda config?) 
```
name: xx
channels:
  - conda-forge
  - defaults
dependencies:
  - ...
```
### clarify the use of channels in examples
conda install conda-forge::kaggle=1.5 vs conda install -c/--channel conda-forge kaggle=1.5
* [name=Naoe] Does `-c channel` syntax is more often used when they want to specify >1 channels so that priority is given to the first coming (left) one, while `channel::package` syntax allows only one channel to install the package? I am not sure if this is true, but if so, probably this should also be clarified. 
---
Chat 04.01
* switching ‘using packages and channels’ with ‘sharing environments’ ok? :heavy_check_mark: 
* 15 min in beginning for welcome and hackmd intro :heavy_check_mark: 
* no breakoutroom for first session :heavy_check_mark: 
* in beginning one line about self, adding programming language (OS in pre-workshop survey)
* first breakout room longer with introduction-round :heavy_check_mark: 
* PRs
    * ~~clarify exercises discussed above~~
    * ~~different ways of adding channels, also yml~~
    * ~~clarify conda update/create~~
    * ~~xgboost (leave out xgboost)~~
    * ~~small things as mentioned by Radovan above~~
    * clarify specify channel exercise
        * channel priority? first resolves dependecies
        * dont mix channels
* who teaches what :heavy_check_mark: 
* other options and how they co-exist :heavy_check_mark:
* leave out pip ex
~~* pip/conda what to choose, how can they work together~~
* discussion points
    * why have yml
* Installation instructions 
    * link to CR instructions on workshop homepage
    * PR sent
___
Meeting with David 05.01.20 (notes)
* switch episodes 3 and 4 possible
-> we make PR (Anne knows what needs to be changed to make it possible?)
* multiple smaller PRs preferred
* pip issue
-> it is a big source for failure to debug
-> make more clear why pip in conda is a very good idea
-> target audience may not use conda for python
-> leave out in episode 1, explain more in episode 3
-> PR: clarify pip usage
* tensorflow issue
-> leave out tf for all episodes before 5
* additions welcome for generalizing episode 5 (after friday workshop)
* 2 future episodes on conda build for specific software and one on conda and docker 
* conda env export --from-history
-> good option
-> possible reproducibility issues (different low level packages for different OS, as it is picked by Conda)
* conda environment in jupyter
-> moving to seperate episode in future
* binder
-> failsafe backstop, if all else failed for in-person workshops
-> online: encouraged to use binder
-> for friday: installation help on thursday -> encourage to install on own machine, but we can use binder as backup plan
* for the future: if >100 users, notify binder deploy to change kubernetes settings so that all users can use it at same time
* installation instructions: update according to SC gapminder or older/ code refinery have also anaconda installation instructions, anaconda/miniconda encouragement
-> miniconda only: pushes people to create many small environments, rather than one gigantormus environment
* feedback wanted:
    * after action report
    * interest: broadening appeal of these lessons
    * painpoints that seem to limit applicability 
    * opinionated stuff
    * debriefing meeting after workshop
 
---
## Feedback to share with David
* suggestion to make it a two halfday workshop
    * intro on day one as we did episodes 1-4 without the jupyter part
    * more advanced stuff such as episode 5 and jupyter and conda build on day 2
    * each day 
* lists of points to observe during the workshop (suggested by Toby)
    * amount of time used to teach each section
        * Ep1+2 until "Deactivate the current environment"= approx. 65 min
        * Rest of Ep2 from "Installing a package into an existing environment" = approx. 80 min
        * Ep3 and Ep4, we somehow rushed in the rest of the time.
    * amount of time used for each exercise
        * 15 min for the first exercises
        * 5 min for the second set, which was not enough
        * 10 min (? correct me if Naoe is wrong)
    * technical issues that arose during installation
        * One learner did "SUDO" and destroyed (?) her PC environment. 
    * bugs or parts of the lesson code that didn’t work as expected
    * incorrect or missing exercise solutions
        * (already fixed, I think)
    * questions learners asked (and their answers)
        * Written in [HackMD note](https://hackmd.io/QL3ngJArRSm9QsaXeCH7ow#Introduction-to-conda-for-data-Scientists)
    * parts of the lesson that were confusing for learners
        * package management and environment management
        * "why pip"? in the first parts
* Radovan (served as a helper)'s feedback:
    * this was fun
    * material is really useful and good already
    * would be nice to have a cheatsheet/ summary of commands somewhere (sorry if it is already there and i did not see)
    * examples can use smaller libraries to not take forever
    * exercises can be made less "mechanically repetitive" and optional "more difficult" exercises can be added
* Samantha
    * using tabs for exercises (basic, advanced, more advanced), with the latter two being hidden and activated when needed by learner
    * also for examples, first give only basic environment with small packages, in tabs provide more useful ones (eg a machine learning environment)
    * agree with Radovan on cheatsheet (happy to create one :) )
    * moving pip to much later in the workshop (eg with channel introduction as own section)
    *