CMPipeline

Workflow Introduction

How to run EVC pipeline

Install Nextflow as a conda environment
Make sure you have the following folders and files in your working directory:

main.nf
nextflow.config
conf/base.config
sample.csv

You can find these shared files and folder in /tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/CMPipeline

Next, download the human reference genomes to be used for filtration. We recommend GRCh38, T2T-CHM13v2.0, and all currently available pangenomes from the Human Pangenome Reference Consortium (HPRC). A download script is provided for convenience. Please update the reference paths in the main.nf

bash scripts/download_references.sh

Next, create Minimap2 indexes for the previously downloaded reference genomes. A script is provided for convenience to build minimap2 indexes. Please update the index paths in the main.nf

bash scripts/create_minimap2_indexes.sh

Download microbial databases for krakenUniq and Metaphlan4. Please update the database paths in the main.nf once downloaded

# KrakenUniq database
wget https://genome-idx.s3.amazonaws.com/kraken/uniq/krakendb-2023-08-08-MICROBIAL/kuniq_microbialdb_minus_kdb.20230808.tgz
wget https://genome-idx.s3.amazonaws.com/kraken/uniq/krakendb-2023-08-08-MICROBIAL/database.kdb

# Metaphlan database
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJun23_CHOCOPhlAnSGB_202307.tar

Prepare your sample.csv file (Example format below):

patient,bam
PD56137a,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137a.unmapped.viral.bam
PD56137b,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137b.unmapped.viral.bam

Request an interactive node and run Nextflow in your working directory under an interactive node:

# Node requesting
srun -N 1 -n 1 -c 8 --mem 125G -t 24:00:00 -p platinum -q hcp-ddp302 -A ddp302 --pty bash

# Activate your nextflow conda environment
conda activate env_nf

# Run nextflow
nextflow run main.nf

# If your pipeline terminates with external error, or the interactive node is killed, you can resume your task after setting up the previous steps again with the following command:
nextflow run main.nf -resume

# Optionally, you can recieve an notifiction email on completion with -N flag:
nextflow run main.nf -N [email protected]

Every process result and report will be stored in the RESULT folder

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Modules		Modules
conda_envs		conda_envs
conf		conf
ref		ref
scripts		scripts
workflow_logo		workflow_logo
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CMPipeline

Workflow Introduction

How to run EVC pipeline

About

Uh oh!

Releases

Packages

Languages

ammalabbasi/CMPipeline

Folders and files

Latest commit

History

Repository files navigation

CMPipeline

Workflow Introduction

How to run EVC pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages