Snakemake Workflow for CUT&RUN Upstream Analysis

This repository contains a modular and scalable Snakemake workflow for analyzing CUT&RUN (or ChIP-seq) data.

✨ Features

🧠 Automatic Sample Detection
Supports various naming conventions including _R1.fastq.gz, _1.fastq.gz, .fastq.gz, _R1_001.fastq.gz, etc.
🔁 SE/PE Mode Auto-Detection
Automatically routes samples through the correct pipeline depending on whether data is single-end or paired-end.
⚙️ Flexible and Configurable
Centralized config.yaml to set input paths, number of threads, STAR index, genome size, bin size, and more.
🧬 Multimapping Handling
Retains multi-mapping reads during STAR alignment, and includes a post-mapping multimap_weight function to adjust for NH tag weights (for accurate peak calling).
🚫 Blacklist Filtering (Optional)
When filter_blacklist: true is set in config.yaml, ENCODE blacklist regions will be automatically downloaded (based on genome) and applied to bamCoverage using --blackListFileName. This step replaces the older repeat masking logic.
📊 BigWig Generation with Normalization
Converts BAM to bigWig using deeptools with and without normalization (e.g., RPKM), while excluding PCR duplicates (--samFlagExclude 1024).

🧬 Workflow Overview

Sample Detection
Automatically detects sample names based on filenames.
Quality Trimming
Uses fastp to trim adapters and remove low-quality reads.
Alignment
Aligns reads to the reference genome using STAR, retaining up to 100 multi-mapped hits.
Multimap Weighting
Applies fractional weighting to multi-mapped reads based on their NH tag values.
Blacklist Filtering (Optional)
Filters signal from known artefact regions via ENCODE blacklist when filter_blacklist is enabled.
BigWig Conversion
Generates normalized (RPKM) and unnormalized bigWig files for visualization.
Peak Calling
Uses MACS3 to call peaks from the aligned BAM files.

🚀 Quick Start

Clone the repository:

git clone https://github.com/Shall-We-Dance/CUTRUN_smk.git
cd CUTRUN_smk

Edit ./config/config.yaml to specify your paths and parameters.
Activate Snakemake and run the pipeline:

snakemake --use-conda --cores 16

📁 Project Structure

CUTRUN_smk/
├── config/
│   └── config.yaml              # Main configuration file
│
├── workflow/
│   ├── Snakefile                # Entry point Snakefile
│   ├── rules/                   # Modular rule files
│   │   ├── fastp.smk
│   │   ├── star.smk
│   │   ├── macs3.smk
│   │   ├── bam_to_bigwig.smk
│   │   └── detect_samples.smk
│   └── envs/                    # Conda environments
│       ├── fastp.yaml
│       ├── star.yaml
│       ├── macs3.yaml
│       ├── bedtools.yaml
│       └── deeptools.yaml
│
├── results/                     # Final and intermediate output files
│   ├── fastp/
│   ├── star/
│   └── bigwig/
│
├── logs/                        # Log files for each step
│   ├── fastp/
│   ├── star/
│   └── bigwig/
│
├── resources/                   # Resource files for each step
│   └── blacklist/
│
├── LICENSE
└── README.md

📝 Notes

STAR genome index must be prebuilt using STAR --runMode genomeGenerate.
For blacklist functionality, genome name must match those recognized by ENCODE (e.g., hg38, mm10).
samtools, deeptools, and other tools will auto-scale to the number of available threads (default max/4).

License

MIT License

Contact

For questions, issues, or contributions, please open an issue or pull request on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake Workflow for CUT&RUN Upstream Analysis

✨ Features

🧬 Workflow Overview

🚀 Quick Start

📁 Project Structure

📝 Notes

License

Contact

About

Uh oh!

Releases 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
config		config
workflow		workflow
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

License

Shall-We-Dance/CUTRUN_smk

Folders and files

Latest commit

History

Repository files navigation

Snakemake Workflow for CUT&RUN Upstream Analysis

✨ Features

🧬 Workflow Overview

🚀 Quick Start

📁 Project Structure

📝 Notes

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Languages