From c4fabe51841ba5196a5405aefefe44b10812d82a Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 9 Jun 2025 12:10:00 +0100 Subject: [PATCH 1/8] Updating conda section - work in progress - to include better direction on env yml files --- book/course/conda.md | 361 +++++++++++++++++++++++++++++++------------ 1 file changed, 263 insertions(+), 98 deletions(-) diff --git a/book/course/conda.md b/book/course/conda.md index 842ddb2..fd0dc58 100644 --- a/book/course/conda.md +++ b/book/course/conda.md @@ -61,17 +61,42 @@ This allows you to separate the dependencies of different projects cleanly so fo Conda makes it easy to switch between different environments and allows you to create and delete them as required. Conda environments also make it easier to share our environment setup between machines and with collaborators as we can export our environments into a text file. +If you want to find out more about [good dependency management practises in general, please read our documentation](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html#dependency-management); we use this material to inform this session but take a more trial-and-error approach here. + ```{admonition} The base environment By default Conda includes the `base` environment. This contains a starting installation of Python and the dependencies of the Conda tool itself. + Therefore, it's **best practice** to not install packages into the `base` environment and create your own environments into which you install the tools you need. + +Installing into the `base` environment can lead to dependency conflicts and prevents you from being able to swap between different versions of packages for different libraries. ``` + +### General guidelines for handling environments + +While following the steps below to build, experiment with, and then create a reproducible environment, you will hopefully notice the following key principles: + +- **In general, environments should be treated as disposable and rebuildable**: you should be able to tear down and rebuild your environment quickly and easily (of course, some larger environments with complex installations will be an exception to this rule). Ideally, *you won't have to rebuild*, but being able to will save you an awful lot of heartbreak if and when something goes wrong. We'll see how we can use an `environment.yml` file to do this. +- **Export your exact environment as metadata for analysis results**: it is useful to save a snapshot of your environment to store along any results or outputs produced in that specific environment. +- **Environments must be stored in your `home` directory and all research output must be stored in `/mnt/scratch/users`**: misuse of the system can affect performance for **all users** and will lead to your jobs being stopped. + + (creating-environments)= ### Creating environments +There are two main ways to create a fresh Conda environment: + 1. Creating directly from the command line with a list of required packages; + 2. Creating from an `environment.yml` file that lists required packages. + +We will step through examples of both, and compare both techniques. + +#### 1. On the fly creation + +If you have come across Conda before, this is likely the method of creating environments that you've encountered. + You can create an environment with Conda with the subcommand `conda create`. -When creating an environment we need to give it a name; we recommend giving it a name related to the project you're going to use the environment for. +When creating an environment we need to give it a name; we recommend giving it a name related to the project you're going building it to support. In this example, we use the (unimaginative name) `py39-env` as we're going to be using Python 3.9; you can imagine that if you're working with multiple different versions of Python is could be useful to record this in the environment name, and prefix it with the project title. ```bash $ conda create --name py39-env python=3.9 @@ -292,13 +317,78 @@ Executing transaction: / $ conda install scikit-learn-intelex $ python -m sklearnex my_application.py +done +# +# To activate this environment, use +# +# $ conda activate data-sci-env +# +# To deactivate an active environment, use +# +# $ conda deactivate +``` +```` +With the above command we create a new environment but don't specify to install Python. +However, because we've specified Python packages which depend on Python being installed to run Conda will install the highest version of Python suitable for these packages. -done +#### 2. Creation from an `environment.yml` file + +Instead of providing a list of packages as arguments to the Conda command, you can instead point Conda to a file that lists your dependencies. + +First, you need to create an environment file with the dependencies required, saved with the file extension `.yaml` or `.yml` (usually called `environment.yml`, but it doesn't *have* to be): + +```yml +name: data-sci-env +dependencies: +- scikit-learn +- matplotlib=3.5.1 +- pandas=1.4.2 +``` + +You'll note that this list has the same dependencies as our on-the-fly example previously (`conda create --name data-sci-env pandas=1.4.2 matplotlib=3.5.1 scikit-learn`). This file should be saved in the project directory. + +Then, we can create a new environment by simply pointing Conda at the environment file: + +```bash +$ conda env create -f environment.yml +``` + +*Note that this second example was run much more recently (2025) than the previous example; can you spot some key differences in the output below?* + + +````{admonition} View full output +:class: dropdown +``` +Retrieving notices: done +Channels: + - conda-forge +Platform: linux-64 +Collecting package metadata (repodata.json): done +Solving environment: done + + +==> WARNING: A newer version of conda exists. <== + current version: 25.3.1 + latest version: 25.5.1 + +Please update conda by running + + $ conda update -n base -c conda-forge conda + + + +Downloading and Extracting Packages: + +>>>>>>> lots of information on installing packages... + +Preparing transaction: done +Verifying transaction: done +Executing transaction: done # # To activate this environment, use # -# $ conda activate data-sci-env2 +# $ conda activate data-sci-env # # To deactivate an active environment, use # @@ -307,12 +397,12 @@ done ```` With the above command we create a new environment but don't specify to install Python. -However, because we've specified Python packages which depend on Python being installed to run Conda will install the high version of Python suitable for these packages. +However, because we've specified Python packages which depend on Python being installed to run Conda will install the highest version of Python suitable for these packages. (activating-environments)= ### Activating environments -To use a Conda environment we need to activate it. +Regardless of the method you used to create the environment, in order to use a Conda environment we need to activate it. Activating our environment does a number of steps that sets the terminal we're using up so that it can see all of the installed packages in the environment, making it ready for use. ```bash @@ -350,6 +440,174 @@ data-sci-env /home/home01/arcusers/.conda/envs/data-sci-env ``` ```` +(installing-packages)= +### Updating a Conda environment and installing new packages + +It's very likely that after creating an environment with a certain list of packages, you'll want to add other packages, or potentially change what version of a package you have installed. + +Earlier we created the `data-sci-env` and installed some useful data science packages. +We've discovered we also need the `statsmodels` package for some extra work we want to do so we'll look at how to install this package within our existing environment. + +````{admonition} Searching for packages + +Conda has a command-line search functionality that we describe below in the section [Use Conda to search for a package](#); you can also use the [`conda-forge` repository](https://anaconda.org/conda-forge) or [`bioconda` repository](https://anaconda.org/bioconda) to search for packages. +```` + +Once you have the name (and possibly version) of the package you want to install, again there are two different ways to add these packages, much like there were two ways to create the environment to begin with. + +#### 1. On the fly installation of new packages + +You can add new packages directly from the command line using the `install` subcommand with the format `conda install PACKAGE`, where `PACKAGE` is the name of the package you wish to install. + +To install packages into an existing environment we need to activate it with the [subcommand shown above](activating-environments). + +```bash +$ conda activate data-sci-env + +(data-sci-env)$ conda install statsmodels +``` +````{admonition} View full output +:class: dropdown +``` +Collecting package metadata (current_repodata.json): done +Solving environment: done + +## Package Plan ## + + environment location: /home/home01/arcuser/.conda/envs/data-sci-env + + added / updated specs: + - statsmodels + + +The following packages will be downloaded: + + package | build + ---------------------------|----------------- + libopenblas-0.3.21 |pthreads_h78a6416_3 10.1 MB conda-forge + numpy-1.23.2 | py310h53a5b5f_0 7.1 MB conda-forge + pandas-1.4.4 | py310h769672d_0 12.5 MB conda-forge + patsy-0.5.2 | pyhd8ed1ab_0 188 KB conda-forge + pytz-2022.2.1 | pyhd8ed1ab_0 224 KB conda-forge + scipy-1.9.1 | py310hdfbd76f_0 26.2 MB conda-forge + statsmodels-0.13.2 | py310hde88566_0 11.2 MB conda-forge + ------------------------------------------------------------ + Total: 67.4 MB + +The following NEW packages will be INSTALLED: + + libblas conda-forge/linux-64::libblas-3.9.0-16_linux64_openblas + libcblas conda-forge/linux-64::libcblas-3.9.0-16_linux64_openblas + libgfortran-ng conda-forge/linux-64::libgfortran-ng-12.1.0-h69a702a_16 + libgfortran5 conda-forge/linux-64::libgfortran5-12.1.0-hdcd56e2_16 + liblapack conda-forge/linux-64::liblapack-3.9.0-16_linux64_openblas + libopenblas conda-forge/linux-64::libopenblas-0.3.21-pthreads_h78a6416_3 + libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-12.1.0-ha89aaad_16 + numpy conda-forge/linux-64::numpy-1.23.2-py310h53a5b5f_0 + packaging conda-forge/noarch::packaging-21.3-pyhd8ed1ab_0 + pandas conda-forge/linux-64::pandas-1.4.4-py310h769672d_0 + patsy conda-forge/noarch::patsy-0.5.2-pyhd8ed1ab_0 + pyparsing conda-forge/noarch::pyparsing-3.0.9-pyhd8ed1ab_0 + python-dateutil conda-forge/noarch::python-dateutil-2.8.2-pyhd8ed1ab_0 + python_abi conda-forge/linux-64::python_abi-3.10-2_cp310 + pytz conda-forge/noarch::pytz-2022.2.1-pyhd8ed1ab_0 + scipy conda-forge/linux-64::scipy-1.9.1-py310hdfbd76f_0 + six conda-forge/noarch::six-1.16.0-pyh6c4a22f_0 + statsmodels conda-forge/linux-64::statsmodels-0.13.2-py310hde88566_0 + + +Proceed ([y]/n)? +``` +```` + +Conda will always prompt the user if we're happy to proceed with the installation and specifies all the other packages that will be installed or updated that are required for our specified package. +We confirm we wish to proceed by entering `y` and pressing Return. + +````{admonition} View full output +:class: dropdown +``` +Proceed ([y]/n)? y + +Downloading and Extracting Packages +pytz-2022.2.1 | 224 KB | ##################################### | 100% +libopenblas-0.3.21 | 10.1 MB | ##################################### | 100% +scipy-1.9.1 | 26.2 MB | ##################################### | 100% +patsy-0.5.2 | 188 KB | ##################################### | 100% +statsmodels-0.13.2 | 11.2 MB | ##################################### | 100% +pandas-1.4.4 | 12.5 MB | ##################################### | 100% +numpy-1.23.2 | 7.1 MB | ##################################### | 100% +Preparing transaction: done +Verifying transaction: done +Executing transaction: done +``` +```` + +This installs any packages that are currently not installed (Conda caches packages locally incase they are required by other packages, this speeds up installs but uses more disk space to maintain this cache). + +#### 2. Updating from an `environment.yml` file + +To update our environment using our environment file, we need to edit the `environment.yml` to include the new packages: + +```yml +name: data-sci-env +dependencies: +- scikit-learn +- matplotlib=3.5.1 +- pandas=1.4.2 +- statsmodels +``` + +Now, to update the environment from this file, we us the `update` subcommand: + +```bash +$ conda env update --file environment.yml --prune +``` + +Note that the environment does **not** need to be active to do this. + +````{admonition} View full output +:class: dropdown +``` +FutureWarning: `remote_definition` is deprecated and will be removed in 25.9. Use `conda env create --file=URL` instead. + action(self, namespace, argument_values, option_string)^ + +Channels: + - conda-forge +Platform: linux-64 +Collecting package metadata (repodata.json): done +Solving environment: done + + +==> WARNING: A newer version of conda exists. <== + current version: 25.3.1 + latest version: 25.5.1 + +Please update conda by running + + $ conda update -n base -c conda-forge conda + + + +Downloading and Extracting Packages: + +Preparing transaction: done +Verifying transaction: done +Executing transaction: done +# +# To activate this environment, use +# +# $ conda activate data-sci-env +# +# To deactivate an active environment, use +# +# $ conda deactivate +``` +```` + +*Note that there is a `FutureWarning` that can safely be ignored as it is not intended to flag use of `environment.yml` files.* + +This ensures that we have an up-to-date record of what we have installed in our project folder. + (removing-a-conda-environment)= ### Removing a Conda environment @@ -1179,99 +1437,6 @@ python 3.10.6 ha86cf86_0_cpython conda-forge ``` ```` -(installing-packages)= -### Installing packages - -Installing packages via Conda is performed using the `install` subcommand with the format `conda install PACKAGE`, where `PACKAGE` is the name of the package you wish to install. - -Earlier we created the `data-sci-env` and installed some useful data science packages. -We've discovered we also need the `statsmodels` package for some extra work we want to do so we'll look at using `conda install` to install this package within our existing environment. - -To install packages into an existing environment we need to activate it with the [subcommand shown above](activating-environments). - -```bash -$ conda activate data-sci-env - -(data-sci-env)$ conda install statsmodels -``` -````{admonition} View full output -:class: dropdown -``` -Collecting package metadata (current_repodata.json): done -Solving environment: done - -## Package Plan ## - - environment location: /home/home01/arcuser/.conda/envs/data-sci-env - - added / updated specs: - - statsmodels - - -The following packages will be downloaded: - - package | build - ---------------------------|----------------- - libopenblas-0.3.21 |pthreads_h78a6416_3 10.1 MB conda-forge - numpy-1.23.2 | py310h53a5b5f_0 7.1 MB conda-forge - pandas-1.4.4 | py310h769672d_0 12.5 MB conda-forge - patsy-0.5.2 | pyhd8ed1ab_0 188 KB conda-forge - pytz-2022.2.1 | pyhd8ed1ab_0 224 KB conda-forge - scipy-1.9.1 | py310hdfbd76f_0 26.2 MB conda-forge - statsmodels-0.13.2 | py310hde88566_0 11.2 MB conda-forge - ------------------------------------------------------------ - Total: 67.4 MB - -The following NEW packages will be INSTALLED: - - libblas conda-forge/linux-64::libblas-3.9.0-16_linux64_openblas - libcblas conda-forge/linux-64::libcblas-3.9.0-16_linux64_openblas - libgfortran-ng conda-forge/linux-64::libgfortran-ng-12.1.0-h69a702a_16 - libgfortran5 conda-forge/linux-64::libgfortran5-12.1.0-hdcd56e2_16 - liblapack conda-forge/linux-64::liblapack-3.9.0-16_linux64_openblas - libopenblas conda-forge/linux-64::libopenblas-0.3.21-pthreads_h78a6416_3 - libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-12.1.0-ha89aaad_16 - numpy conda-forge/linux-64::numpy-1.23.2-py310h53a5b5f_0 - packaging conda-forge/noarch::packaging-21.3-pyhd8ed1ab_0 - pandas conda-forge/linux-64::pandas-1.4.4-py310h769672d_0 - patsy conda-forge/noarch::patsy-0.5.2-pyhd8ed1ab_0 - pyparsing conda-forge/noarch::pyparsing-3.0.9-pyhd8ed1ab_0 - python-dateutil conda-forge/noarch::python-dateutil-2.8.2-pyhd8ed1ab_0 - python_abi conda-forge/linux-64::python_abi-3.10-2_cp310 - pytz conda-forge/noarch::pytz-2022.2.1-pyhd8ed1ab_0 - scipy conda-forge/linux-64::scipy-1.9.1-py310hdfbd76f_0 - six conda-forge/noarch::six-1.16.0-pyh6c4a22f_0 - statsmodels conda-forge/linux-64::statsmodels-0.13.2-py310hde88566_0 - - -Proceed ([y]/n)? -``` -```` - -Conda will always prompt the user if we're happy to proceed with the installation and specifies all the other packages that will be installed or updated that are required for our specified package. -We confirm we wish to proceed by entering `y` and pressing Return. - -````{admonition} View full output -:class: dropdown -``` -Proceed ([y]/n)? y - -Downloading and Extracting Packages -pytz-2022.2.1 | 224 KB | ##################################### | 100% -libopenblas-0.3.21 | 10.1 MB | ##################################### | 100% -scipy-1.9.1 | 26.2 MB | ##################################### | 100% -patsy-0.5.2 | 188 KB | ##################################### | 100% -statsmodels-0.13.2 | 11.2 MB | ##################################### | 100% -pandas-1.4.4 | 12.5 MB | ##################################### | 100% -numpy-1.23.2 | 7.1 MB | ##################################### | 100% -Preparing transaction: done -Verifying transaction: done -Executing transaction: done -``` -```` - -This installs any packages that are currently not installed (Conda caches packages locally incase they are required by other packages, this speeds up installs but uses more disk space to maintain this cache). - (removing-packages)= ### Removing packages From 60e03417e54574d6bdc65963d1441de27e508637 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 9 Jun 2025 15:28:57 +0100 Subject: [PATCH 2/8] Updating conda section to bring it more in line with our guidance --- book/course/conda.md | 119 +++++++++++-------------------------------- 1 file changed, 31 insertions(+), 88 deletions(-) diff --git a/book/course/conda.md b/book/course/conda.md index fd0dc58..08b26ee 100644 --- a/book/course/conda.md +++ b/book/course/conda.md @@ -563,7 +563,7 @@ Now, to update the environment from this file, we us the `update` subcommand: $ conda env update --file environment.yml --prune ``` -Note that the environment does **not** need to be active to do this. +Note that the environment does **not** need to be active to do this. You should pin any versions of libraries (such as `matplotlib=3.5.1`) that you don't want to update. ````{admonition} View full output :class: dropdown @@ -608,6 +608,8 @@ Executing transaction: done This ensures that we have an up-to-date record of what we have installed in our project folder. +The `--prune` argument here clears out old unused libraries and is key to keeping your `.conda` folder a reasonable size. **Please ensure you use the prune command to prevent environment bloat**. + (removing-a-conda-environment)= ### Removing a Conda environment @@ -668,118 +670,56 @@ You cannot undo deletion of an environment to the exact state it was in before d However, if you have exported details of your environment it is possible to recreate it. ``` -(sharing-conda-environments)= -### Sharing Conda environments - -If you need to share a Conda environment with others or between machines its possible to use Conda to export a file containing a specification of packages installed in that environment. -With this environment file and Conda installed on another device its possible to recreate the environment with the same specifications. +(recording-conda-environments)= +### Recording your Conda environments -Let's assume we want to share our `data-sci-env` Conda environment with others. To do this we first need to create the `environment.yml` file containing our environment specification. -You can create a very detailed specification that includes operating system specific hashes with the command: +Recording dependencies is crucial for reproducibility. +In order to record the exact versions of all dependencies used in your project (as opposed to the limited list you manually installed with your `envrionment.yml` file), from inside your active conda environment, you can run the following export command: ```bash $ conda activate data-sci-env -(data-sci-env)$ conda env export > environment.yml +(data-sci-env)$ conda env export > env-record.yml ``` -Above, we activate the environment we want to create an `environment.yml` file from and then use the command `conda env export`. -This outputs the environment specification to the standard output in the terminal so to capture and write this to a file we redirect the output to `environment.yml`. - -This command also exports a line called `prefix:` specifying the directory location of the environment on your filesystem. -This isn't required when sharing your environment and should be removed, you can do this manually or use `grep` when exporting your environment. +This can be run as part of a batch job and included in your submission script; so that it's saved out alongside your other output data files: ```bash -(data-sci-env)$ conda env export | grep -v ^prefix: > environment.yml +conda env export > /mnt/scratch/users/your-user-name/env-record.yaml ``` -We can share the `environment.yml` file with collaborators and/or commit the file to version control to ensure people can recreate the required Conda environment. +**This exported environment file is mainly useful as a record for the sake of reproducibility, not for *reusability*. Your `environment.yml` file is a far better basis for rebuilding or sharing environments.** -You can recreate a Conda environment from a file with the following command: +This record will include background library dependencies (libraries you did not explicitly install, that were loaded automatically) and details of builds. This file, while technically an `environment.yml` file, will likely not be able to rebuild your environment on a machine other than the machine it was created on. -```bash -$ conda env create -f environment.yml -``` -````{admonition} View full output -:class: dropdown -``` -Collecting package metadata (repodata.json): done -Solving environment: done - - -==> WARNING: A newer version of conda exists. <== - current version: 4.12.0 - latest version: 4.14.0 - -Please update conda by running - - $ conda update -n base -c defaults conda - - -Preparing transaction: done -Verifying transaction: done -Executing transaction: done -# -# To activate this environment, use -# -# $ conda activate py39-env -# -# To deactivate an active environment, use -# -# $ conda deactivate -``` -```` +It's important to consider the balance of reproducibility and portability: `conda env export` captures the exact specification of an environment including all installed packages, their dependencies and package hashes. +Sometimes this level of detail should be included to ensure maximum reproduciblity of a project and when looking to validate results, but it's important to also balance being able to allow people to reproduce your work on other systems. The next section talks about portability or re*use*ability more. -Here we're specifying Conda create a new environment and using the `-f` option to specify that it creates the environment using a file with an environment specification. -We pass the file path to the environment file as the argument following `-f`. +(sharing-conda-environments)= +### Sharing Conda environments -#### Creating a cross platform environment file +The Conda `environment.yml` file is the key to sharing conda environments across systems. -As noted above using `conda env export` creates a highly specific environment file, this often causes difficulties when sharing environments across operating systems as the `environment.yml` contains operating system specific hashes for each package. +If you created your Conda environment from a `.yml` file (and have kept it up-to-date by using it and the `update` command to install new packages), you can share this file with collaborators, and they can use the instructions above to create an environment from file. -There are two possible methods of creating a more flexible `environment.yml`. +If you instead used the on-the-fly creation method and *don't* have an `environment.yml`, it will take a little bit more work. As we stated in the last section, using `conda env export` will export all installed packages, their dependencies, and package hashes, and will be unlikely to install without error on a different system. So how can we produce a reuseable `environment.yml` file? -##### 1. Using `conda env export --from-history` +**If you follow the above steps for building your conda environment from a `.yml` file, this step is not necessary. However, if you want to salvage, share, or back-up an environment that you built using repeated `conda install package-name` commands, this allows you to create an `environment.yml` file.** -By default `conda env export` exports an environments entire specification, including dependencies of packages you `conda install` and their associated hashes. -If you use `conda env export --from-history` Conda only exports packages explicitly installed with `conda install`. -It does not include dependencies of those packages and therefore allows different operating systems to more flexibly install package dependencies. +Activate your environment and run a modified export: -For the above example with `data-sci-env` we would export a more flexible `environment.yml` with: ```bash -(data-sci-env)$ conda env export --from-history | grep -v ^prefix: > environment.yml -``` - -##### 2. Manually create an `environment.yml` +$ conda activate data-sci-env -The other option is to manually specify the `environment.yml` file. -This is often more fiddly than just exporting an environment but can be preferable to ensure all the desired dependencies of your project are captured. -Environment files are written in YAML, a markup language, and have the standard pattern of: -```yaml -name: data-sci-env -channels: -- defaults -dependencies: -- scikit-learn -- matplotlib=3.5.1 -- pandas=1.4.3 +(data-sci-env)$ conda env export --from-history > environment_export.yml ``` -Where you specify the environment name, a list of Conda channels used to install packages, and under dependencies a list of packages to be installed. You can also include version specification within the `environment.yml` allowing you to - -Understanding the differences between weays to create environment files is important when you come to deciding on how best to share your project. -It's important to consider the balance of reproducibility and portability, `conda env export` captures the exact specification of an environment including all installed packages, their dependencies and package hashes. -Sometimes this level of detail should be included to ensure maximum reproduciblity of a project, when looking to validate results, but it's important to also balance being able to allow people to reproduce your work on other systems. - -## Using Conda to install packages - -With the Conda command line tool searching for and installing packages is can be performed with the following subcommands: -- `conda search` -- `conda install` +This will export a list of only the libraries that you explicitly installed (and not all the background dependencies), and only the pinned versions you requested. This is not useful as a record of your exact environment, but is a good backup for rebuilding or sharing your environment. **Note that this will not add any pip dependencies: to find out more about pip dependencies.** We won't get into mixing in pip dependencies today, but please read our documentation for [how to export a reuseable environment file including pip dependencies](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html#pip-dependencies). (searching-for-packages)= -### Searching for packages +## Using Conda to search for packages +We can use the `search` command in Conda to find available package versions: ```bash $ conda search python ``` @@ -934,8 +874,8 @@ python 3.10.4 h12debd9_0 pkgs/main This command searches for packages based on the argument provided. It searches in package repositories called [Conda Channels](https://docs.conda.io/projects/conda/en/stable/user-guide/concepts/channels.html) which are remote websites where built Conda packages have been uploaded to. -By default Conda uses the `defaults` channel which points to the Anaconda maintained package repository https://repo.anaconda.com/pkgs/main and https://repo.anaconda.com/pkgs/r. -Other channels are also available such as [`conda-forge`](https://conda-forge.org/) and we can specify when installing packages or when searching which channels we wish to search. +By default Conda installed with Miniforge uses the [`conda-forge` channel](https://conda-forge.org/). +If you are using a different install of Conda, you may need to specify this channel. Alternatively, you may need to point to the Bioconda channel. ```bash $ conda search 'python[channel=conda-forge]' @@ -1496,6 +1436,8 @@ As you can see in the above example, removing one package may also lead to the r With these changes made we can now install a newer version of pandas using `conda install`. +Of course, this can also be easily done by updating our `environment.yml` file to remove the package, and running the `update` command shown above with the flag `--prune`. + (updating-a-package)= ### Updating a package @@ -1546,6 +1488,7 @@ Proceed ([y]/n)? When requesting to update a package Conda will also update other dependencies of the package that you wish to update, and can potentially install new packages that are required. +Again, this can also be easily done by updating our `environment.yml` file to change the version of a specific package, and running the `update` command shown above with the flag `--prune`. ## Summary From 92203d662450c9de736b64f79f476f3d62513828 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 9 Jun 2025 15:49:50 +0100 Subject: [PATCH 3/8] Update book/course/conda.md Co-authored-by: John Hodrien --- book/course/conda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/course/conda.md b/book/course/conda.md index 08b26ee..1c17d39 100644 --- a/book/course/conda.md +++ b/book/course/conda.md @@ -61,7 +61,7 @@ This allows you to separate the dependencies of different projects cleanly so fo Conda makes it easy to switch between different environments and allows you to create and delete them as required. Conda environments also make it easier to share our environment setup between machines and with collaborators as we can export our environments into a text file. -If you want to find out more about [good dependency management practises in general, please read our documentation](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html#dependency-management); we use this material to inform this session but take a more trial-and-error approach here. +If you want to find out more about [good dependency management practices in general, please read our documentation](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html#dependency-management); we use this material to inform this session but take a more trial-and-error approach here. ```{admonition} The base environment By default Conda includes the `base` environment. From a184f47fd47c0cd3a5b81a8f0ec838c83a7c6b96 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 9 Jun 2025 15:50:16 +0100 Subject: [PATCH 4/8] Update book/course/conda.md Co-authored-by: John Hodrien --- book/course/conda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/course/conda.md b/book/course/conda.md index 1c17d39..d7c988e 100644 --- a/book/course/conda.md +++ b/book/course/conda.md @@ -96,7 +96,7 @@ We will step through examples of both, and compare both techniques. If you have come across Conda before, this is likely the method of creating environments that you've encountered. You can create an environment with Conda with the subcommand `conda create`. -When creating an environment we need to give it a name; we recommend giving it a name related to the project you're going building it to support. In this example, we use the (unimaginative name) `py39-env` as we're going to be using Python 3.9; you can imagine that if you're working with multiple different versions of Python is could be useful to record this in the environment name, and prefix it with the project title. +When creating an environment we need to give it a name; we recommend giving it a name related to the project you're building it to support. In this example, we use the (unimaginative name) `py39-env` as we're going to be using Python 3.9; you can imagine that if you're working with multiple different versions of Python is could be useful to record this in the environment name, and prefix it with the project title. ```bash $ conda create --name py39-env python=3.9 From 9f71b87180d6ea1b0d8686f85ac3bd2b47daf9c0 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 9 Jun 2025 15:53:39 +0100 Subject: [PATCH 5/8] Update book/course/conda.md --- book/course/conda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/course/conda.md b/book/course/conda.md index d7c988e..d0491f4 100644 --- a/book/course/conda.md +++ b/book/course/conda.md @@ -450,7 +450,7 @@ We've discovered we also need the `statsmodels` package for some extra work we w ````{admonition} Searching for packages -Conda has a command-line search functionality that we describe below in the section [Use Conda to search for a package](#); you can also use the [`conda-forge` repository](https://anaconda.org/conda-forge) or [`bioconda` repository](https://anaconda.org/bioconda) to search for packages. +Conda has a command-line search functionality that we describe below in the section [Use Conda to search for a package](#searching-for-packages); you can also use the [`conda-forge` repository](https://anaconda.org/conda-forge) or [`bioconda` repository](https://anaconda.org/bioconda) to search for packages. ```` Once you have the name (and possibly version) of the package you want to install, again there are two different ways to add these packages, much like there were two ways to create the environment to begin with. From 32e60dd6e2bd63062a3480ab3e6615db31fbb9ec Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 9 Jun 2025 15:54:12 +0100 Subject: [PATCH 6/8] Update book/course/conda.md Co-authored-by: John Hodrien --- book/course/conda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/course/conda.md b/book/course/conda.md index d0491f4..e107c07 100644 --- a/book/course/conda.md +++ b/book/course/conda.md @@ -542,7 +542,7 @@ Executing transaction: done ``` ```` -This installs any packages that are currently not installed (Conda caches packages locally incase they are required by other packages, this speeds up installs but uses more disk space to maintain this cache). +This installs any packages that are currently not installed (Conda caches packages locally in case they are required by other packages, this speeds up installs but uses more disk space to maintain this cache). #### 2. Updating from an `environment.yml` file From bb218bd72ca78bd89a7d28c9f5b771fdb4a27489 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 9 Jun 2025 15:54:21 +0100 Subject: [PATCH 7/8] Update book/course/conda.md Co-authored-by: John Hodrien --- book/course/conda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/course/conda.md b/book/course/conda.md index e107c07..b6009b9 100644 --- a/book/course/conda.md +++ b/book/course/conda.md @@ -685,7 +685,7 @@ $ conda activate data-sci-env This can be run as part of a batch job and included in your submission script; so that it's saved out alongside your other output data files: ```bash -conda env export > /mnt/scratch/users/your-user-name/env-record.yaml +conda env export > $SCRATCH/env-record.yaml ``` **This exported environment file is mainly useful as a record for the sake of reproducibility, not for *reusability*. Your `environment.yml` file is a far better basis for rebuilding or sharing environments.** From 9fad84633d2359e788b61e60380692eeca938db2 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 9 Jun 2025 15:56:31 +0100 Subject: [PATCH 8/8] Change yml to yaml --- book/course/conda.md | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/book/course/conda.md b/book/course/conda.md index b6009b9..1781336 100644 --- a/book/course/conda.md +++ b/book/course/conda.md @@ -77,7 +77,7 @@ Installing into the `base` environment can lead to dependency conflicts and prev While following the steps below to build, experiment with, and then create a reproducible environment, you will hopefully notice the following key principles: -- **In general, environments should be treated as disposable and rebuildable**: you should be able to tear down and rebuild your environment quickly and easily (of course, some larger environments with complex installations will be an exception to this rule). Ideally, *you won't have to rebuild*, but being able to will save you an awful lot of heartbreak if and when something goes wrong. We'll see how we can use an `environment.yml` file to do this. +- **In general, environments should be treated as disposable and rebuildable**: you should be able to tear down and rebuild your environment quickly and easily (of course, some larger environments with complex installations will be an exception to this rule). Ideally, *you won't have to rebuild*, but being able to will save you an awful lot of heartbreak if and when something goes wrong. We'll see how we can use an `environment.yaml` file to do this. - **Export your exact environment as metadata for analysis results**: it is useful to save a snapshot of your environment to store along any results or outputs produced in that specific environment. - **Environments must be stored in your `home` directory and all research output must be stored in `/mnt/scratch/users`**: misuse of the system can affect performance for **all users** and will lead to your jobs being stopped. @@ -87,7 +87,7 @@ While following the steps below to build, experiment with, and then create a rep There are two main ways to create a fresh Conda environment: 1. Creating directly from the command line with a list of required packages; - 2. Creating from an `environment.yml` file that lists required packages. + 2. Creating from an `environment.yaml` file that lists required packages. We will step through examples of both, and compare both techniques. @@ -332,13 +332,13 @@ done With the above command we create a new environment but don't specify to install Python. However, because we've specified Python packages which depend on Python being installed to run Conda will install the highest version of Python suitable for these packages. -#### 2. Creation from an `environment.yml` file +#### 2. Creation from an `environment.yaml` file Instead of providing a list of packages as arguments to the Conda command, you can instead point Conda to a file that lists your dependencies. -First, you need to create an environment file with the dependencies required, saved with the file extension `.yaml` or `.yml` (usually called `environment.yml`, but it doesn't *have* to be): +First, you need to create an environment file with the dependencies required, saved with the file extension `.yaml` or `.yml` (usually called `environment.yaml`, but it doesn't *have* to be): -```yml +```yaml name: data-sci-env dependencies: - scikit-learn @@ -351,7 +351,7 @@ You'll note that this list has the same dependencies as our on-the-fly example p Then, we can create a new environment by simply pointing Conda at the environment file: ```bash -$ conda env create -f environment.yml +$ conda env create -f environment.yaml ``` *Note that this second example was run much more recently (2025) than the previous example; can you spot some key differences in the output below?* @@ -544,11 +544,11 @@ Executing transaction: done This installs any packages that are currently not installed (Conda caches packages locally in case they are required by other packages, this speeds up installs but uses more disk space to maintain this cache). -#### 2. Updating from an `environment.yml` file +#### 2. Updating from an `environment.yaml` file -To update our environment using our environment file, we need to edit the `environment.yml` to include the new packages: +To update our environment using our environment file, we need to edit the `environment.yaml` to include the new packages: -```yml +```yaml name: data-sci-env dependencies: - scikit-learn @@ -560,7 +560,7 @@ dependencies: Now, to update the environment from this file, we us the `update` subcommand: ```bash -$ conda env update --file environment.yml --prune +$ conda env update --file environment.yaml --prune ``` Note that the environment does **not** need to be active to do this. You should pin any versions of libraries (such as `matplotlib=3.5.1`) that you don't want to update. @@ -604,7 +604,7 @@ Executing transaction: done ``` ```` -*Note that there is a `FutureWarning` that can safely be ignored as it is not intended to flag use of `environment.yml` files.* +*Note that there is a `FutureWarning` that can safely be ignored as it is not intended to flag use of `environment.yaml` files.* This ensures that we have an up-to-date record of what we have installed in our project folder. @@ -674,12 +674,12 @@ However, if you have exported details of your environment it is possible to recr ### Recording your Conda environments Recording dependencies is crucial for reproducibility. -In order to record the exact versions of all dependencies used in your project (as opposed to the limited list you manually installed with your `envrionment.yml` file), from inside your active conda environment, you can run the following export command: +In order to record the exact versions of all dependencies used in your project (as opposed to the limited list you manually installed with your `envrionment.yaml` file), from inside your active conda environment, you can run the following export command: ```bash $ conda activate data-sci-env -(data-sci-env)$ conda env export > env-record.yml +(data-sci-env)$ conda env export > env-record.yaml ``` This can be run as part of a batch job and included in your submission script; so that it's saved out alongside your other output data files: @@ -688,9 +688,9 @@ This can be run as part of a batch job and included in your submission script; conda env export > $SCRATCH/env-record.yaml ``` -**This exported environment file is mainly useful as a record for the sake of reproducibility, not for *reusability*. Your `environment.yml` file is a far better basis for rebuilding or sharing environments.** +**This exported environment file is mainly useful as a record for the sake of reproducibility, not for *reusability*. Your `environment.yaml` file is a far better basis for rebuilding or sharing environments.** -This record will include background library dependencies (libraries you did not explicitly install, that were loaded automatically) and details of builds. This file, while technically an `environment.yml` file, will likely not be able to rebuild your environment on a machine other than the machine it was created on. +This record will include background library dependencies (libraries you did not explicitly install, that were loaded automatically) and details of builds. This file, while technically an `environment.yaml` file, will likely not be able to rebuild your environment on a machine other than the machine it was created on. It's important to consider the balance of reproducibility and portability: `conda env export` captures the exact specification of an environment including all installed packages, their dependencies and package hashes. Sometimes this level of detail should be included to ensure maximum reproduciblity of a project and when looking to validate results, but it's important to also balance being able to allow people to reproduce your work on other systems. The next section talks about portability or re*use*ability more. @@ -698,20 +698,20 @@ Sometimes this level of detail should be included to ensure maximum reproducibli (sharing-conda-environments)= ### Sharing Conda environments -The Conda `environment.yml` file is the key to sharing conda environments across systems. +The Conda `environment.yaml` file is the key to sharing conda environments across systems. -If you created your Conda environment from a `.yml` file (and have kept it up-to-date by using it and the `update` command to install new packages), you can share this file with collaborators, and they can use the instructions above to create an environment from file. +If you created your Conda environment from a `.yaml` file (and have kept it up-to-date by using it and the `update` command to install new packages), you can share this file with collaborators, and they can use the instructions above to create an environment from file. -If you instead used the on-the-fly creation method and *don't* have an `environment.yml`, it will take a little bit more work. As we stated in the last section, using `conda env export` will export all installed packages, their dependencies, and package hashes, and will be unlikely to install without error on a different system. So how can we produce a reuseable `environment.yml` file? +If you instead used the on-the-fly creation method and *don't* have an `environment.yaml`, it will take a little bit more work. As we stated in the last section, using `conda env export` will export all installed packages, their dependencies, and package hashes, and will be unlikely to install without error on a different system. So how can we produce a reuseable `environment.yaml` file? -**If you follow the above steps for building your conda environment from a `.yml` file, this step is not necessary. However, if you want to salvage, share, or back-up an environment that you built using repeated `conda install package-name` commands, this allows you to create an `environment.yml` file.** +**If you follow the above steps for building your conda environment from a `.yaml` file, this step is not necessary. However, if you want to salvage, share, or back-up an environment that you built using repeated `conda install package-name` commands, this allows you to create an `environment.yaml` file.** Activate your environment and run a modified export: ```bash $ conda activate data-sci-env -(data-sci-env)$ conda env export --from-history > environment_export.yml +(data-sci-env)$ conda env export --from-history > environment_export.yaml ``` This will export a list of only the libraries that you explicitly installed (and not all the background dependencies), and only the pinned versions you requested. This is not useful as a record of your exact environment, but is a good backup for rebuilding or sharing your environment. **Note that this will not add any pip dependencies: to find out more about pip dependencies.** We won't get into mixing in pip dependencies today, but please read our documentation for [how to export a reuseable environment file including pip dependencies](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html#pip-dependencies). @@ -1436,7 +1436,7 @@ As you can see in the above example, removing one package may also lead to the r With these changes made we can now install a newer version of pandas using `conda install`. -Of course, this can also be easily done by updating our `environment.yml` file to remove the package, and running the `update` command shown above with the flag `--prune`. +Of course, this can also be easily done by updating our `environment.yaml` file to remove the package, and running the `update` command shown above with the flag `--prune`. (updating-a-package)= ### Updating a package @@ -1488,7 +1488,7 @@ Proceed ([y]/n)? When requesting to update a package Conda will also update other dependencies of the package that you wish to update, and can potentially install new packages that are required. -Again, this can also be easily done by updating our `environment.yml` file to change the version of a specific package, and running the `update` command shown above with the flag `--prune`. +Again, this can also be easily done by updating our `environment.yaml` file to change the version of a specific package, and running the `update` command shown above with the flag `--prune`. ## Summary