- Install Nextflow as a conda environment
- Make sure you have the following folders and files in your working directory:
- main.nf
- nextflow.config
- conf/base.config
- sample.csv
You can find these shared files and folder in
/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/CMPipeline
- Next, download the human reference genomes to be used for filtration. We recommend GRCh38, T2T-CHM13v2.0, and all currently available pangenomes from the Human Pangenome Reference Consortium (HPRC). A download script is provided for convenience. Please update the reference paths in the main.nf
bash scripts/download_references.sh- Next, create Minimap2 indexes for the previously downloaded reference genomes. A script is provided for convenience to build minimap2 indexes. Please update the index paths in the main.nf
bash scripts/create_minimap2_indexes.sh- Download microbial databases for krakenUniq and Metaphlan4. Please update the database paths in the main.nf once downloaded
# KrakenUniq database
wget https://genome-idx.s3.amazonaws.com/kraken/uniq/krakendb-2023-08-08-MICROBIAL/kuniq_microbialdb_minus_kdb.20230808.tgz
wget https://genome-idx.s3.amazonaws.com/kraken/uniq/krakendb-2023-08-08-MICROBIAL/database.kdb
# Metaphlan database
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJun23_CHOCOPhlAnSGB_202307.tar
- Prepare your sample.csv file (Example format below):
patient,bam
PD56137a,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137a.unmapped.viral.bam
PD56137b,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137b.unmapped.viral.bam
- Request an interactive node and run Nextflow in your working directory under an interactive node:
# Node requesting
srun -N 1 -n 1 -c 8 --mem 125G -t 24:00:00 -p platinum -q hcp-ddp302 -A ddp302 --pty bash
# Activate your nextflow conda environment
conda activate env_nf
# Run nextflow
nextflow run main.nf
# If your pipeline terminates with external error, or the interactive node is killed, you can resume your task after setting up the previous steps again with the following command:
nextflow run main.nf -resume
# Optionally, you can recieve an notifiction email on completion with -N flag:
nextflow run main.nf -N [email protected]
- Every process result and report will be stored in the RESULT folder
