Included in this repository are all of the code used for bioinformatics processing, statistical analysis and figure preparation for "Bacterial Community Structure Shapes the Vaginal Resistome During Pregnancy", authored by : Nassim Boutouchent, Agnes Baud,Asmaa Tazi, Luce Landraud, InSPIRe Consortium,Laurent Mandelbrot, Claire Poyart and Sean P. Kennedy.
This repository is divided into two main components:
-
Scripts and workflow for functional gene detection and quantification (MoonCrater) A Snakemake workflow used for the detection and quantification of functional genes, including antibiotic resistance genes (ARGs), from shotgun metagenomic data using the ResFinder database. this workflow also includes an independent script (
KMA_kraken2_link.py) used to link functional genes annotations to taxonomic assignments produced by Kraken2.files:
| File | information |
|---|---|
snakefile |
MoonCrater workflow (functional gene detection and counting) |
config.yml |
MoonCrater config file |
KMA_count.py |
Script used to generate ARG count tables from KMA output |
KMA_kraken2_link.py |
Script linking functional gene annotations to taxonomic annotations produced by Kraken2 |
- Downstream analyses and data used for the Vaginal Resistome paper
InSPIRe_VaginalResistome_paper/: This folder includes all analysis notebooks and datasets required to reproduce the results presented in the main manuscript.
Note: Taxonomical assignment and species level count table has been previously reported by Baud et al., 2023,https://doi.org/10.1038/s41598-023-36126-z.
Supplementary data of the study are provided as rendered HTML pages via the Quarto_site branch, allowing interactive exploration of figures, tables, and statistical reports: https://motleystate.github.io/InSPIRe_VagResist/
MoonCrater is a snakemake-based workflow used in our study to identify and quantify antibiotic resistance genes from metagenomic data using the ResFinder database. This workflow relies on mapping reads against the KMA-indexed ResFinder database and produces gene count tables for resistome analysis.
- ResFinder database indexed with KMA (https://github.com/cadms/resfinder)
- Python
- Snakemake
- KMA (https://bitbucket.org/genomicepidemiology/kma/src/master/)
Clone repository:
git clone https://github.com/motleystate/inspire_VagResist.gitBefore running MoonCrater, change the config.yml file to fit your data
paths:
output_dir: "path/to/output_directory" # output directory for mooncrater's results
db_resfinder: "path/to/db_resfinder" # ResFinder database (KMA-indexed)
# NB: to be detected the samples must follow the pattern:
# * if paired: {sample_name}{pair_prefix}{read}{suffix}
# * if single: {sample_name}{suffix}
input_data:
source_dir: "path/to/reads_files" # input directory with reads files
paired: True # set to False, if the samples aren't paired
pair_prefix: "_R"
suffix: "_trimmed_nohost.fastq.gz"
format: "fastq" # format of the input file. Formats allowed: "fasta" and "fastq" (even in their compress version)
kma_count:
coverage: 60
identity: 80
Run MoonCrater:
snakemake -c N -j Jwhere N is the number of threads and J is the number of jobs that can be executed in parallel.
MoonCrater requires a minimum of 12 threads to run.
NB: run_resfinder.py is provided as part of the ResFinder tool
Output: gene_abundance_table_Cov_ID.csv: read-count table for antibiotic resistance genes. read_mapping_report.csv: Per-sample read processing summary report.
| Column name | Description |
|---|---|
Total_Reads_Before |
Total number of read pairs detected before filtering. |
Total_Reads_After |
Total number of read pairs retained and assigned to a gene after coverage and identity filtering. |
Reads_Multi_Hits |
Number of read pairs mapping to > 1 gene |
Multi_Hit_Same_Score |
Number of read pairs for which multiple genes had identical KMA scores. |
KMA_kraken2_link.py script allows linking the functional annotations produced by MoonCrater to the taxonomic assignments generated by Kraken2.
python3 KMA_kraken2_link.py \
--resfinder_dir path/to/ResFinder_mapping_out \
--kraken_dir path/to/kraken_mapping_files \
--output_dir path/to/output \
--coverage 60 \
--identity 80`--resfinder_dir` Directory containing KMA mapping outputs.
`--kraken_dir` Directory containing Kraken2 mapping output files used for taxonomic assignment.
Input mapping file names are expected to start with the name of the sample and end with the extension '.kraken': {sample_name}*.kraken
`--output_dir` Output directory for species–ARG association table.
`--coverage` Minimum coverage threshold (%) applied to resistance gene detection.
`--identity` Minimum identity threshold (%) applied to resistance gene detection.
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics 2009; 10:421.
- Clausen PTLC, Aarestrup FM, Lund O. Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 2018; 19:307.
- Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0