Skip to content

ammalabbasi/pksProfiler

Repository files navigation

pksProfiler

pksProfiler is a computational framework for the detection and quantitative profiling of polyketide synthase (PKS) gene signatures from unmapped high-throughput sequencing reads derived from cancer genomes and metagenomic datasets.


Key Features

  1. Detection of PKS gene signatures in unmapped sequencing reads
  2. Quantification of PKS island gene abundances using two independent and complementary analytical approaches
  3. Assignment of putative microbial species of origin for detected PKS loci

Workflow Schematic


🚀 Quick Start

How to run pksProfiler pipeline

  1. Install Nextflow as a conda environment
  2. Make sure you have the following folders and files in your working directory:
  • main.nf
  • nextflow.config
  • conf/base.config
  • sample.csv

You can find these shared files and folder in /tscc/nfs/home/amabbasi/restricted/pksProfiler/

  1. Next, download the human reference genomes to be used for filtration. We recommend GRCh38, T2T-CHM13v2.0, and all currently available pangenomes from the Human Pangenome Reference Consortium (HPRC). A download script is provided for convenience. Please update the reference paths in the main.nf
bash scripts/download_references.sh
  1. Next, create Minimap2 indexes for the previously downloaded reference genomes. A script is provided for convenience to build minimap2 indexes. Please update the index paths in the main.nf
bash scripts/create_minimap2_indexes.sh
  1. Prepare your sample.csv file (Example format below):
patient,bam
PD56137a,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137a.unmapped.viral.bam
PD56137b,/tscc/nfs/home/amabbasi/restricted/microbiome_pipeline/test_data/PD56137b.unmapped.viral.bam
  1. Request an interactive node and run Nextflow in your working directory under an interactive node:
# Node requesting
srun -N 1 -n 1 -c 8 --mem 125G -t 24:00:00 -p platinum -q hcp-ddp302 -A ddp302 --pty bash

# Activate your nextflow conda environment
conda activate env_nf

# Run nextflow
nextflow run main.nf

# If your pipeline terminates with external error, or the interactive node is killed, you can resume your task after setting up the previous steps again with the following command:
nextflow run main.nf -resume

# Optionally, you can recieve an notifiction email on completion with -N flag:
nextflow run main.nf -N [email protected]
  1. Every process result and report will be stored in the RESULT folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published