diff --git a/sites/docs/src/content/docs/contributing/documentation.md b/sites/docs/src/content/docs/contributing/documentation.md index c8c09be455..0cf169e710 100644 --- a/sites/docs/src/content/docs/contributing/documentation.md +++ b/sites/docs/src/content/docs/contributing/documentation.md @@ -38,7 +38,7 @@ Before you start writing, familiarize yourself with these essential resources: ### Style guide -The [style guide](../developers/documentation/style_guide) covers all the essential styling rules for nf-core documentation, including: +The [style guide](../developers/documentation/style_guide.md) covers all the essential styling rules for nf-core documentation, including: - Voice and tone guidelines for conversational, concise writing - Grammar and punctuation rules (British English, active voice, Oxford comma) diff --git a/sites/docs/src/content/docs/running/configuration/nextflow-for-your-system.md b/sites/docs/src/content/docs/running/configuration/nextflow-for-your-system.md index 1313d60a28..27acabc617 100644 --- a/sites/docs/src/content/docs/running/configuration/nextflow-for-your-system.md +++ b/sites/docs/src/content/docs/running/configuration/nextflow-for-your-system.md @@ -9,17 +9,19 @@ This page shows you how to configure pipelines to match your system's capabiliti ## Workflow resources -The base configuration of nf-core pipelines defines default resource allocations for each workflow step (e.g., in the [`base.config`](https://github.com/nf-core/rnaseq/blob/master/conf/base.config) file). +The base configuration of nf-core pipelines defines default resource allocations for each workflow step (for example, in the [`base.config`](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config) file). These default values are generous to accommodate diverse workloads across different users. -Your jobs might receive more resources than needed, which can reduce system efficiency. -You might also want to increase resources for specific tasks to maximise speed. +However, your jobs might receive more resources than needed, which can reduce system efficiency. +In contrast, You might also want to increase resources for specific tasks to maximise speed. Consider increasing resources if a pipeline step fails with a `Command exit status` of `137`. -Pipelines configure tools to use available resources when possible (e.g., with `-p ${task.cpus}`), where `${task.cpus}` is dynamically set from the pipeline configuration. +:::note +Pipelines are coded to configure tools to use available resources when possible (e.g., with `-p ${task.cpus}`), where `${task.cpus}` is dynamically set from the pipeline configuration. Not all tools support dynamic resource configuration. +::: -Most process resources use process labels, as shown in this base configuration example: +Most nf-core pipelines use process labels to define resource requirements for each module, as shown in this base configuration example: ```groovy process { @@ -48,7 +50,7 @@ process { The `resourceLimits` list sets the absolute maximum resources any pipeline job can request (typically matching your machine's maximum available resources). The label blocks define the initial default resources each pipeline job requests. -When a job runs out of memory, most nf-core pipelines retry the job and increase the resource request up to the `resourceLimits` maximum. +When a job runs out of memory, most nf-core pipelines will attempt to retry the job and increase the resource request up to the `resourceLimits` maximum. ### Customize process resources @@ -56,7 +58,7 @@ When a job runs out of memory, most nf-core pipelines retry the job and increase Copy only the labels you want to change into your custom configuration file, not all labels. ::: -To set a fixed memory allocation for all large tasks across most nf-core pipelines (without increases during retries), add this to your custom configuration file: +To set a fixed memory allocation for all large tasks across most nf-core pipelines (without increases during retries), add this to a custom Nextflow configuration file: ```groovy process { @@ -66,6 +68,10 @@ process { } ``` +:::tip +To find the default labels and resources of the pipeline you want to optimise, go to its GitHub repository and look at the file `conf/base.config`. +::: + You can target a specific process (job) name instead of a label using `withName`. Find process names in your console log when the pipeline runs. For example: @@ -88,13 +94,9 @@ process { } ``` -:::info -If you receive a warning about an unrecognised process selector, check that you specified the process name correctly. -::: - For more information, see the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#process-selectors). -After writing your [configuration file](#custom-configuration-files), supply it to your pipeline command with `-c`. +After writing your [configuration file](#custom-configuration-files), supply it to your pipeline command with `-c //.conf`. :::warning Check your syntax carefully. @@ -105,11 +107,16 @@ Use quotes with a space or no quotes with a dot: `"200 GB"` or `200.GB`. See the Nextflow documentation for [memory](https://www.nextflow.io/docs/latest/process.html#memory), [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), and [time](https://www.nextflow.io/docs/latest/process.html#time). ::: +:::info +If you receive a warning when about an unrecognised process selector when running a pipeline, check that you specified the process name correctly. +::: + If the pipeline defaults need adjustment, contact the pipeline developers on Slack in the pipeline channel or submit a GitHub issue on the pipeline repository. ## Change your executor -Nextflow pipelines run in local mode by default, executing jobs on the same system where Nextflow runs. +Nextflow pipelines run in 'local' mode by default, executing jobs on the same system where Nextflow runs and assuming all tools the pipeline need are already on the machine's environment `$PATH`. + Most users need to specify an executor to tell Nextflow how to submit jobs to a job scheduler (e.g., SGE, LSF, Slurm, PBS, or AWS Batch). You can configure the executor in shared configuration profiles or in custom configuration files. @@ -135,12 +142,12 @@ process { ``` When a job exceeds the default memory request, Nextflow retries the job with increased memory. -The memory increases with each retry until the job completes or reaches the `256.GB` limit. - -These parameters cap resource requests to prevent Nextflow from submitting jobs that exceed your system's capabilities. +The memory increases with each retry until the job completes or reaches one of the limits, such as `256.GB` for memory. +:::warning Specifying resource limits does not increase the resources available to pipeline tasks. See [Tuning workflow resources](#tuning-workflow-resources) for more information. +::: :::note{collapse title="Note on older nf-core pipelines"} @@ -166,8 +173,10 @@ The `--max_` parameters represent the maximum for a single pipeline jo ## Customize Docker registries -Most pipelines use `quay.io` as the default Docker registry for Docker and Podman images. -When you specify a Docker container without a full URI, Nextflow pulls the image from `quay.io`. +Most nf-core pipelines use `quay.io` as the default Docker registry for Docker and Podman images. +In some cases, you may want to customise where a pipeline sources their images. + +By default, when you specify a Docker container without a full URI, Nextflow pulls the image from `quay.io`. For example, this container specification: @@ -177,15 +186,11 @@ Pulls from `quay.io`, resulting in the full URI: - `quay.io/biocontainers/fastqc:0.11.7--4` -If you specify a different `docker.registry` value, Nextflow uses that registry instead. +If you specify a different `docker.registry` value in a configuration file, Nextflow uses that registry instead. For example, if you set `docker.registry = 'myregistry.com'`, the image pulls from: - `myregistry.com/biocontainers/fastqc:0.11.7--4` -When you specify a full URI in the container specification, Nextflow ignores the `docker.registry` setting and pulls exactly as specified: - -- `docker.io/biocontainers/fastqc:v0.11.9_cv8` - ## Update tool versions The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of nf-core pipelines uses one container or Conda environment per process, which simplifies software dependency maintenance and updates. @@ -232,13 +237,12 @@ You can override the default container by creating a custom configuration file a ``` :::warning -Pipeline developers provide no warranty when you update containers. -Major changes in the container tool may break the pipeline. +Note that when you specify a full URI in the container specification, Nextflow ignores the `docker.registry` setting and pulls exactly as specified. ::: :::warning -Tool developers sometimes change version reporting between updates. -Container updates may break version reporting within the pipeline and create missing values in MultiQC version tables. +Pipeline developers provide no warranty when you update containers. +Major changes in the container tool may break the pipeline or see a degradation of output (e.g. missing versions in MultiQC reports when the tool changes how versions are reported). ::: ## Modifying tool arguments diff --git a/sites/docs/src/content/docs/running/configuration/overview.md b/sites/docs/src/content/docs/running/configuration/overview.md index a902f7b334..4546718895 100644 --- a/sites/docs/src/content/docs/running/configuration/overview.md +++ b/sites/docs/src/content/docs/running/configuration/overview.md @@ -21,11 +21,14 @@ For pipeline-specific parameters, see the pipeline documentation. ## Configuration options -You can configure pipelines using three approaches: +You can configure pipelines to your infrastructure using three approaches: -1. [Default pipeline configuration profiles](#default-configuration-profiles) -2. [Shared nf-core/configs configuration profiles](#shared-nf-coreconfigs) -3. [Custom configuration files](#custom-configuration-files) +- [Configuration options](#configuration-options) +- [Choosing your configuration approach](#choosing-your-configuration-approach) + - [Default configuration profiles](#default-configuration-profiles) + - [Shared nf-core/configs](#shared-nf-coreconfigs) + - [Custom configuration files](#custom-configuration-files) +- [Additional resources](#additional-resources) :::warning{title="Do not edit the pipeline code to configure nf-core pipelines"} Editing pipeline defaults prevents you from updating to newer versions without overwriting your changes. @@ -48,7 +51,7 @@ Use shared nf-core/configs when: Use custom configuration files when: -- You need specific resource limits +- You need pipeline-specific resource limits - Running on unique infrastructure - You are the only user of the pipeline @@ -65,6 +68,7 @@ Order matters. Profiles load in sequence. Later profiles overwrite earlier ones. ::: + nf-core provides these basic profiles for container engines: - `docker`: Uses [Docker](http://docker.com/) and pulls software from quay.io @@ -78,7 +82,7 @@ nf-core provides these basic profiles for container engines: Use Conda only as a last resort (that is, when you cannot run the pipeline with Docker or Singularity). ::: -Without a specified profile, the pipeline runs locally and expects all software to be installed and available on the `PATH`. +Without a specified profile, the pipeline runs locally and expects all software to be installed and available on the `$PATH`. This approach is not recommended. Each pipeline includes `test` and `test_full` profiles. @@ -97,9 +101,11 @@ If not, follow the repository instructions or the tutorial to add your cluster. ### Custom configuration files -If you run the pipeline alone, create a local configuration file. +If you run the pipeline alone on a local machine, create a local configuration file. Nextflow searches for configuration files in three locations: + + 1. User's home directory: `~/.nextflow/config` 2. Analysis working directory: `nextflow.config` 3. Custom path on the command line: `-c path/to/config` (you can specify multiple files) @@ -113,11 +119,15 @@ The loading order is: 4. Each `-c` file in the order you specify 5. Command line parameters (`--`) -:::warning + + +:::warnings Parameters in `custom.config` files will not override defaults in `nextflow.config`. Use `-params-file` with YAML or JSON format instead. ::: + + :::tip Generate a parameters file using the **Launch** button on the [nf-co.re website](https://nf-co.re/launch). ::: diff --git a/sites/docs/src/content/docs/running/reference-genomes.md b/sites/docs/src/content/docs/running/reference-genomes.md index dcbbeb23c1..058167181d 100644 --- a/sites/docs/src/content/docs/running/reference-genomes.md +++ b/sites/docs/src/content/docs/running/reference-genomes.md @@ -7,9 +7,60 @@ shortTitle: Reference genomes Many nf-core pipelines use reference genomes for alignment, annotation, and similar tasks. This page describes available approaches for managing reference genomes. +There are three main ways to use reference genomes with nf-core pipelines: + +- [Local copies of genomes](#local-copies-of-genomes): user downloaded and self-managed +- [AWS iGenomes](#aws-igenomes): Illumina-hosted pre-build reference genomes and indices +- [Refgenie](#refgenie): programmatic genome asset management tool + +## Local copies of genomes + +Most genomics nf-core pipelines can start from just a FASTA and GTF file and create downstream reference assets (genome indices, interval files, etc.) as part of pipeline execution. + +Using GRCh38 as an example: + +1. Download the latest files: + + ```bash + #!/bin/bash + + VERSION=108 + wget -L ftp://ftp.ensembl.org/pub/release-$VERSION/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz + wget -L ftp://ftp.ensembl.org/pub/release-$VERSION/gtf/homo_sapiens/Homo_sapiens.GRCh38.$VERSION.gtf.gz + ``` + +2. Run pipeline with `--save_reference` to generate indices: + + ```bash + nextflow run \ + nf-core/rnaseq \ + --input samplesheet.csv \ + --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz \ + --gtf Homo_sapiens.GRCh38.108.gtf.gz \ + --save_reference + ``` + + :::note + The pipeline will generate and save reference assets. For example, the STAR index will be stored in `/genome/index/star`. + ::: + +3. Move generated assets to a central, persistent storage location for re-use in future runs. +4. Use pre-generated indices in future runs. + + ```bash + nextflow run \ + nf-core/rnaseq \ + --input samplesheet.csv \ + --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz \ + --gtf Homo_sapiens.GRCh38.108.gtf.gz \ + --star_index \ + --gene_bed + ``` + ## AWS iGenomes -AWS iGenomes is Illumina's centralized resource that organizes commonly used reference genome files in a consistent structure for multiple genomes: +AWS iGenomes is Illumina's centralized resource that organizes commonly used reference genome and pre-built index files in a consistent structure for multiple genomes. +It provides the following benefits: - Hosted on AWS S3 through the [Registry of Open Data](https://registry.opendata.aws/aws-igenomes/) - Free to access and download @@ -25,39 +76,37 @@ Consider using custom genomes for current annotations. :::warning{title="GRCh38 assembly issues"} GRCh38 in iGenomes comes from NCBI instead of Ensembl, not the masked Ensembl assembly. -This can cause pipeline issues in some cases. -See [nf-core/rnaseq issue #460](https://github.com/nf-core/rnaseq/issues/460) for details. +This can cause pipeline issues in some cases. See [nf-core/rnaseq issue #460](https://github.com/nf-core/rnaseq/issues/460) for details. For GRCh38 with masked Ensembl assembly, use [Custom genomes](#custom-genomes). ::: ### Use remote AWS iGenomes -To use remote AWS iGenomes: +To use remote AWS iGenomes in supported nf-core pipelines, supply the `--genome` flag to your pipeline (e.g., `--genome GRCh37`). +On execution the pipeline will then: -1. Supply the `--genome` flag to your pipeline (e.g., `--genome GRCh37`). -1. Pipeline automatically downloads required reference files. -1. Reference genome parameters are auto-populated from `conf/igenomes.config`. +1. Automatically download required reference files. +2. Auto-populated reference genome parameters from `conf/igenomes.config`. - Parameters like FASTA, GTF, and index paths are set automatically. -1. Pipeline downloads only what it requires for that specific workflow. +3. Download only what it requires for that specific workflow. :::tip Downloading reference genome files takes time and bandwidth. -We recommend using a local copy when possible. -See [Use local AWS iGenomes](#use-local-aws-igenomes) for more information. +We recommend using a local copy when possible. See [Use local AWS iGenomes](#use-local-aws-igenomes) for more information. ::: ### Use local AWS iGenomes To use local AWS iGenomes: -1. Download the iGenomes reference files you need to a local directory. -1. Set `params.igenomes_base` to your local iGenomes directory path. +1. [Download](https://github.com/ewels/AWS-iGenomes?tab=readme-ov-file#download-script) the iGenomes reference files you need to a local directory. +2. Set `--igenomes_base` to your local iGenomes directory path. :::warning - This path must reflect the structure defined in `conf/igenomes.config`. + This directory structure must reflect the structure defined in [`conf/igenomes.config`](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/igenomes.config). ::: -1. Pipeline will use local files instead of downloading from AWS. +3. Pipeline will use local files instead of downloading from AWS. ### Check annotation versions @@ -69,7 +118,7 @@ To check the version of annotations used by AWS iGenomes: aws s3 cp --no-sign-request s3://ngi-igenomes/igenomes/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt ``` -1. View the README to see annotation details: +2. View the README to see annotation details: ```bash cat README.txt @@ -85,62 +134,6 @@ To check the version of annotations used by AWS iGenomes: This confirms the annotations are from Ensembl release 75 (July 2015), which is significantly outdated. -## Custom genomes - -Use custom genomes when AWS iGenomes doesn't meet your requirements. - -Custom genomes allow you to: - -- Use current genome annotations -- Avoid repetitive index generation -- Maintain full control over reference files -- Achieve faster pipeline execution when indices are pre-generated - -### Use custom genomes - -Most genomics nf-core pipelines can start from just a FASTA and GTF file and create downstream reference assets (genome indices, interval files, etc.) as part of pipeline execution. - -Using GRCh38 as an example: - -1. Download the latest files: - - ```bash - #!/bin/bash - - VERSION=108 - wget -L ftp://ftp.ensembl.org/pub/release-$VERSION/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz - wget -L ftp://ftp.ensembl.org/pub/release-$VERSION/gtf/homo_sapiens/Homo_sapiens.GRCh38.$VERSION.gtf.gz - ``` - -1. Run pipeline with `--save_reference` to generate indices: - - ```bash - nextflow run \ - nf-core/rnaseq \ - --input samplesheet.csv \ - --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz \ - --gtf Homo_sapiens.GRCh38.108.gtf.gz \ - --save_reference - ``` - - :::note - The pipeline will generate and save reference assets. -For example, the STAR index will be stored in `/genome/index/star`. - ::: - -1. Move generated assets to a central, persistent storage location for re-use in future runs. -1. Use pre-generated indices in future runs. - - ```bash - nextflow run \ - nf-core/rnaseq \ - --input samplesheet.csv \ - --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz \ - --gtf Homo_sapiens.GRCh38.108.gtf.gz \ - --star_index \ - --gene_bed - ``` - ## Refgenie Refgenie provides programmatic genome asset management as an alternative to manual file handling. @@ -157,13 +150,15 @@ Refgenie allows you to: To use Refgenie: 1. Install Refgenie following the [official documentation](http://refgenie.databio.org/). -1. Initialize Refgenie. +2. Initialize Refgenie. :::note Refgenie creates `~/.nextflow/nf-core/refgenie_genomes.config` and appends an `includeConfig` statement to `~/.nextflow/config` that references this file. ::: -1. Pull required genome assets. For example: + + +3. Pull required genome assets. For example: ```bash refgenie pull t7/fasta @@ -171,7 +166,7 @@ To use Refgenie: ``` Asset paths are automatically added to `~/.nextflow/nf-core/refgenie_genomes.config`. -For example: + For example: ```groovy title="refgenie_genomes.config" // This is a read-only config file managed by refgenie. Manual changes to this file will be overwritten. @@ -186,7 +181,7 @@ For example: } ``` -1. Run your pipeline with the required genome. For example: +4. Run your pipeline with the required genome. For example: :::bash nextflow run nf-core/ --genome t7 diff --git a/sites/docs/src/content/docs/running/run-pipelines-offline.md b/sites/docs/src/content/docs/running/run-pipelines-offline.md index 45b19ff9f0..6de990ceb4 100644 --- a/sites/docs/src/content/docs/running/run-pipelines-offline.md +++ b/sites/docs/src/content/docs/running/run-pipelines-offline.md @@ -21,21 +21,21 @@ Running pipelines offline requires three main components: To transfer Nextflow to an offline system: 1. [Install Nextflow](https://nextflow.io/docs/latest/getstarted.html#installation) in an online environment. -1. Run your pipeline locally. +2. Run your pipeline locally. :::note Nextflow fetches the required plugins. -It does not need to run to completion. + It does not need to run to completion. ::: -1. Copy the Nextflow binary and `$HOME/.nextflow` folder to your offline environment. -1. In your Nextflow configuration file, specify each plugin (both name and version), including default plugins. +3. Copy the Nextflow binary and `$HOME/.nextflow` folder to your offline environment. +4. In your Nextflow configuration file, specify each plugin (both name and version), including default plugins. :::note This prevents Nextflow from trying to download newer versions of plugins. ::: -1. Add the following environment variable in your `~/.bashrc` file: +5. Add the following environment variable in your `~/.bashrc` file: ```bash title=".bashrc" export NXF_OFFLINE='true' @@ -55,24 +55,27 @@ To transfer pipeline code to an offline system: Add the argument `--container singularity` to fetch the singularity container(s). ::: -1. Transfer the `.tar.gz` file to your offline system and unpack it. +2. Transfer the `.tar.gz` file to your offline system and unpack it. :::note The archive contains directories called: - - `workflow`: The pipeline files - - `config`: [nf-core/configs](https://github.com/nf-core/configs) files - - `singularity`: Singularity images (if you used `--container singularity`) + - `workflow/`: The pipeline files + - `config/`: [nf-core/configs](https://github.com/nf-core/configs) files + - `singularity/`: Singularity images (if you used `--container singularity`) ::: :::tip If you are downloading _directly_ to the offline storage (e.g., a head node with internet access whilst compute nodes are offline), use the `--singularity-cache-only` option for `nf-core pipelines download` and set the `$NXF_SINGULARITY_CACHEDIR` environment variable. -This reduces total disk space by downloading singularity images to the `$NXF_SINGULARITY_CACHEDIR` folder without copying them into the target downloaded pipeline folder. + + This reduces total disk space by downloading singularity images to the `$NXF_SINGULARITY_CACHEDIR` folder without copying them into the target downloaded pipeline folder. ::: ### Transfer reference genomes offline To use nf-core reference genomes offline, download and transfer them to your offline cluster. -See [Reference genomes](./reference_genomes.md) for more information. +See [Reference genomes](./reference-genomes.md) for more information. + + ## Additional resources