pksProfiler is a computational framework for the detection and quantitative profiling of polyketide synthase (PKS) gene signatures from unmapped high-throughput sequencing reads derived from cancer genomes and metagenomic datasets.
- Detection of PKS gene signatures in unmapped sequencing reads
- Quantification of PKS island gene abundances using two independent and complementary analytical approaches
- Assignment of putative microbial species of origin for detected PKS loci
- Install Nextflow as a conda environment
- Make sure you have the following folders and files in your working directory:
- main.nf
- nextflow.config
- conf/base.config
- sample.csv
You can find these shared files and folder in
/tscc/nfs/home/amabbasi/restricted/pksProfiler/
- Next, download the human reference genomes to be used for filtration. We recommend GRCh38, T2T-CHM13v2.0, and all currently available pangenomes from the Human Pangenome Reference Consortium (HPRC). A download script is provided for convenience. Please update the reference paths in the main.nf
bash scripts/download_references.sh- Next, create Minimap2 indexes for the previously downloaded reference genomes. A script is provided for convenience to build minimap2 indexes. Please update the index paths in the main.nf
bash scripts/create_minimap2_indexes.sh- Prepare your sample.csv file (Example format below):
patient,bam
PD56137a,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137a.unmapped.viral.bam
PD56137b,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137b.unmapped.viral.bam
- Request an interactive node and run Nextflow in your working directory under an interactive node:
# Node requesting
srun -N 1 -n 1 -c 8 --mem 125G -t 24:00:00 -p platinum -q hcp-ddp302 -A ddp302 --pty bash
# Activate your nextflow conda environment
conda activate env_nf
# Run nextflow
nextflow run main.nf
# If your pipeline terminates with external error, or the interactive node is killed, you can resume your task after setting up the previous steps again with the following command:
nextflow run main.nf -resume
# Optionally, you can recieve an notifiction email on completion with -N flag:
nextflow run main.nf -N [email protected]
- Every process result and report will be stored in the RESULT folder
