Skip to content

Commit 60e0341

Browse files
committed
Updating conda section to bring it more in line with our guidance
1 parent c4fabe5 commit 60e0341

File tree

1 file changed

+31
-88
lines changed

1 file changed

+31
-88
lines changed

book/course/conda.md

Lines changed: 31 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -563,7 +563,7 @@ Now, to update the environment from this file, we us the `update` subcommand:
563563
$ conda env update --file environment.yml --prune
564564
```
565565

566-
Note that the environment does **not** need to be active to do this.
566+
Note that the environment does **not** need to be active to do this. You should pin any versions of libraries (such as `matplotlib=3.5.1`) that you don't want to update.
567567

568568
````{admonition} View full output
569569
:class: dropdown
@@ -608,6 +608,8 @@ Executing transaction: done
608608
609609
This ensures that we have an up-to-date record of what we have installed in our project folder.
610610
611+
The `--prune` argument here clears out old unused libraries and is key to keeping your `.conda` folder a reasonable size. **Please ensure you use the prune command to prevent environment bloat**.
612+
611613
(removing-a-conda-environment)=
612614
### Removing a Conda environment
613615
@@ -668,118 +670,56 @@ You cannot undo deletion of an environment to the exact state it was in before d
668670
However, if you have exported details of your environment it is possible to recreate it.
669671
```
670672

671-
(sharing-conda-environments)=
672-
### Sharing Conda environments
673-
674-
If you need to share a Conda environment with others or between machines its possible to use Conda to export a file containing a specification of packages installed in that environment.
675-
With this environment file and Conda installed on another device its possible to recreate the environment with the same specifications.
673+
(recording-conda-environments)=
674+
### Recording your Conda environments
676675

677-
Let's assume we want to share our `data-sci-env` Conda environment with others. To do this we first need to create the `environment.yml` file containing our environment specification.
678-
You can create a very detailed specification that includes operating system specific hashes with the command:
676+
Recording dependencies is crucial for reproducibility.
677+
In order to record the exact versions of all dependencies used in your project (as opposed to the limited list you manually installed with your `envrionment.yml` file), from inside your active conda environment, you can run the following export command:
679678

680679
```bash
681680
$ conda activate data-sci-env
682681

683-
(data-sci-env)$ conda env export > environment.yml
682+
(data-sci-env)$ conda env export > env-record.yml
684683
```
685684

686-
Above, we activate the environment we want to create an `environment.yml` file from and then use the command `conda env export`.
687-
This outputs the environment specification to the standard output in the terminal so to capture and write this to a file we redirect the output to `environment.yml`.
688-
689-
This command also exports a line called `prefix:` specifying the directory location of the environment on your filesystem.
690-
This isn't required when sharing your environment and should be removed, you can do this manually or use `grep` when exporting your environment.
685+
This can be run as part of a batch job and included in your submission script; so that it's saved out alongside your other output data files:
691686

692687
```bash
693-
(data-sci-env)$ conda env export | grep -v ^prefix: > environment.yml
688+
conda env export > /mnt/scratch/users/your-user-name/env-record.yaml
694689
```
695690

696-
We can share the `environment.yml` file with collaborators and/or commit the file to version control to ensure people can recreate the required Conda environment.
691+
**This exported environment file is mainly useful as a record for the sake of reproducibility, not for *reusability*. Your `environment.yml` file is a far better basis for rebuilding or sharing environments.**
697692

698-
You can recreate a Conda environment from a file with the following command:
693+
This record will include background library dependencies (libraries you did not explicitly install, that were loaded automatically) and details of builds. This file, while technically an `environment.yml` file, will likely not be able to rebuild your environment on a machine other than the machine it was created on.
699694

700-
```bash
701-
$ conda env create -f environment.yml
702-
```
703-
````{admonition} View full output
704-
:class: dropdown
705-
```
706-
Collecting package metadata (repodata.json): done
707-
Solving environment: done
708-
709-
710-
==> WARNING: A newer version of conda exists. <==
711-
current version: 4.12.0
712-
latest version: 4.14.0
713-
714-
Please update conda by running
715-
716-
$ conda update -n base -c defaults conda
717-
718-
719-
Preparing transaction: done
720-
Verifying transaction: done
721-
Executing transaction: done
722-
#
723-
# To activate this environment, use
724-
#
725-
# $ conda activate py39-env
726-
#
727-
# To deactivate an active environment, use
728-
#
729-
# $ conda deactivate
730-
```
731-
````
695+
It's important to consider the balance of reproducibility and portability: `conda env export` captures the exact specification of an environment including all installed packages, their dependencies and package hashes.
696+
Sometimes this level of detail should be included to ensure maximum reproduciblity of a project and when looking to validate results, but it's important to also balance being able to allow people to reproduce your work on other systems. The next section talks about portability or re*use*ability more.
732697

733-
Here we're specifying Conda create a new environment and using the `-f` option to specify that it creates the environment using a file with an environment specification.
734-
We pass the file path to the environment file as the argument following `-f`.
698+
(sharing-conda-environments)=
699+
### Sharing Conda environments
735700

736-
#### Creating a cross platform environment file
701+
The Conda `environment.yml` file is the key to sharing conda environments across systems.
737702

738-
As noted above using `conda env export` creates a highly specific environment file, this often causes difficulties when sharing environments across operating systems as the `environment.yml` contains operating system specific hashes for each package.
703+
If you created your Conda environment from a `.yml` file (and have kept it up-to-date by using it and the `update` command to install new packages), you can share this file with collaborators, and they can use the instructions above to create an environment from file.
739704

740-
There are two possible methods of creating a more flexible `environment.yml`.
705+
If you instead used the on-the-fly creation method and *don't* have an `environment.yml`, it will take a little bit more work. As we stated in the last section, using `conda env export` will export all installed packages, their dependencies, and package hashes, and will be unlikely to install without error on a different system. So how can we produce a reuseable `environment.yml` file?
741706

742-
##### 1. Using `conda env export --from-history`
707+
**If you follow the above steps for building your conda environment from a `.yml` file, this step is not necessary. However, if you want to salvage, share, or back-up an environment that you built using repeated `conda install package-name` commands, this allows you to create an `environment.yml` file.**
743708

744-
By default `conda env export` exports an environments entire specification, including dependencies of packages you `conda install` and their associated hashes.
745-
If you use `conda env export --from-history` Conda only exports packages explicitly installed with `conda install`.
746-
It does not include dependencies of those packages and therefore allows different operating systems to more flexibly install package dependencies.
709+
Activate your environment and run a modified export:
747710

748-
For the above example with `data-sci-env` we would export a more flexible `environment.yml` with:
749711
```bash
750-
(data-sci-env)$ conda env export --from-history | grep -v ^prefix: > environment.yml
751-
```
752-
753-
##### 2. Manually create an `environment.yml`
712+
$ conda activate data-sci-env
754713

755-
The other option is to manually specify the `environment.yml` file.
756-
This is often more fiddly than just exporting an environment but can be preferable to ensure all the desired dependencies of your project are captured.
757-
Environment files are written in YAML, a markup language, and have the standard pattern of:
758-
```yaml
759-
name: data-sci-env
760-
channels:
761-
- defaults
762-
dependencies:
763-
- scikit-learn
764-
- matplotlib=3.5.1
765-
- pandas=1.4.3
714+
(data-sci-env)$ conda env export --from-history > environment_export.yml
766715
```
767-
Where you specify the environment name, a list of Conda channels used to install packages, and under dependencies a list of packages to be installed. You can also include version specification within the `environment.yml` allowing you to
768-
769-
Understanding the differences between weays to create environment files is important when you come to deciding on how best to share your project.
770-
It's important to consider the balance of reproducibility and portability, `conda env export` captures the exact specification of an environment including all installed packages, their dependencies and package hashes.
771-
Sometimes this level of detail should be included to ensure maximum reproduciblity of a project, when looking to validate results, but it's important to also balance being able to allow people to reproduce your work on other systems.
772716

773-
774-
## Using Conda to install packages
775-
776-
With the Conda command line tool searching for and installing packages is can be performed with the following subcommands:
777-
- `conda search`
778-
- `conda install`
717+
This will export a list of only the libraries that you explicitly installed (and not all the background dependencies), and only the pinned versions you requested. This is not useful as a record of your exact environment, but is a good backup for rebuilding or sharing your environment. **Note that this will not add any pip dependencies: to find out more about pip dependencies.** We won't get into mixing in pip dependencies today, but please read our documentation for [how to export a reuseable environment file including pip dependencies](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html#pip-dependencies).
779718

780719
(searching-for-packages)=
781-
### Searching for packages
720+
## Using Conda to search for packages
782721

722+
We can use the `search` command in Conda to find available package versions:
783723
```bash
784724
$ conda search python
785725
```
@@ -934,8 +874,8 @@ python 3.10.4 h12debd9_0 pkgs/main
934874

935875
This command searches for packages based on the argument provided.
936876
It searches in package repositories called [Conda Channels](https://docs.conda.io/projects/conda/en/stable/user-guide/concepts/channels.html) which are remote websites where built Conda packages have been uploaded to.
937-
By default Conda uses the `defaults` channel which points to the Anaconda maintained package repository https://repo.anaconda.com/pkgs/main and https://repo.anaconda.com/pkgs/r.
938-
Other channels are also available such as [`conda-forge`](https://conda-forge.org/) and we can specify when installing packages or when searching which channels we wish to search.
877+
By default Conda installed with Miniforge uses the [`conda-forge` channel](https://conda-forge.org/).
878+
If you are using a different install of Conda, you may need to specify this channel. Alternatively, you may need to point to the Bioconda channel.
939879

940880
```bash
941881
$ conda search 'python[channel=conda-forge]'
@@ -1496,6 +1436,8 @@ As you can see in the above example, removing one package may also lead to the r
14961436

14971437
With these changes made we can now install a newer version of pandas using `conda install`.
14981438

1439+
Of course, this can also be easily done by updating our `environment.yml` file to remove the package, and running the `update` command shown above with the flag `--prune`.
1440+
14991441
(updating-a-package)=
15001442
### Updating a package
15011443

@@ -1546,6 +1488,7 @@ Proceed ([y]/n)?
15461488

15471489
When requesting to update a package Conda will also update other dependencies of the package that you wish to update, and can potentially install new packages that are required.
15481490

1491+
Again, this can also be easily done by updating our `environment.yml` file to change the version of a specific package, and running the `update` command shown above with the flag `--prune`.
15491492

15501493
## Summary
15511494

0 commit comments

Comments
 (0)