Skip to content

ammalabbasi/CMPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CMPipeline

Workflow Introduction

How to run EVC pipeline

  1. Install Nextflow as a conda environment
  2. Make sure you have the following folders and files in your working directory:
  • main.nf
  • nextflow.config
  • conf/base.config
  • sample.csv

You can find these shared files and folder in /tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/CMPipeline

  1. Next, download the human reference genomes to be used for filtration. We recommend GRCh38, T2T-CHM13v2.0, and all currently available pangenomes from the Human Pangenome Reference Consortium (HPRC). A download script is provided for convenience. Please update the reference paths in the main.nf
bash scripts/download_references.sh
  1. Next, create Minimap2 indexes for the previously downloaded reference genomes. A script is provided for convenience to build minimap2 indexes. Please update the index paths in the main.nf
bash scripts/create_minimap2_indexes.sh
  1. Download microbial databases for krakenUniq and Metaphlan4. Please update the database paths in the main.nf once downloaded
# KrakenUniq database
wget https://genome-idx.s3.amazonaws.com/kraken/uniq/krakendb-2023-08-08-MICROBIAL/kuniq_microbialdb_minus_kdb.20230808.tgz
wget https://genome-idx.s3.amazonaws.com/kraken/uniq/krakendb-2023-08-08-MICROBIAL/database.kdb

# Metaphlan database
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJun23_CHOCOPhlAnSGB_202307.tar
  1. Prepare your sample.csv file (Example format below):
patient,bam
PD56137a,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137a.unmapped.viral.bam
PD56137b,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137b.unmapped.viral.bam
  1. Request an interactive node and run Nextflow in your working directory under an interactive node:
# Node requesting
srun -N 1 -n 1 -c 8 --mem 125G -t 24:00:00 -p platinum -q hcp-ddp302 -A ddp302 --pty bash

# Activate your nextflow conda environment
conda activate env_nf

# Run nextflow
nextflow run main.nf

# If your pipeline terminates with external error, or the interactive node is killed, you can resume your task after setting up the previous steps again with the following command:
nextflow run main.nf -resume

# Optionally, you can recieve an notifiction email on completion with -N flag:
nextflow run main.nf -N [email protected]
  1. Every process result and report will be stored in the RESULT folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published