Skip to content

Code used for metagenomic analysis of the vaginal resistome and the associated website (InSPIRe Cohort).

License

Notifications You must be signed in to change notification settings

motleystate/InSPIRe_VagResist

Repository files navigation

InSPIRe Vaginal Resistome Paper

Included in this repository are all of the code used for bioinformatics processing, statistical analysis and figure preparation for "Bacterial Community Structure Shapes the Vaginal Resistome During Pregnancy", authored by : Nassim Boutouchent, Agnes Baud,Asmaa Tazi, Luce Landraud, InSPIRe Consortium,Laurent Mandelbrot, Claire Poyart and Sean P. Kennedy.

Repository content

This repository is divided into two main components:

  1. Scripts and workflow for functional gene detection and quantification (MoonCrater) A Snakemake workflow used for the detection and quantification of functional genes, including antibiotic resistance genes (ARGs), from shotgun metagenomic data using the ResFinder database. this workflow also includes an independent script (KMA_kraken2_link.py) used to link functional genes annotations to taxonomic assignments produced by Kraken2.

    files:

File information
snakefile MoonCrater workflow (functional gene detection and counting)
config.yml MoonCrater config file
KMA_count.py Script used to generate ARG count tables from KMA output
KMA_kraken2_link.py Script linking functional gene annotations to taxonomic annotations produced by Kraken2
  1. Downstream analyses and data used for the Vaginal Resistome paper
    InSPIRe_VaginalResistome_paper/ : This folder includes all analysis notebooks and datasets required to reproduce the results presented in the main manuscript.

Note: Taxonomical assignment and species level count table has been previously reported by Baud et al., 2023,https://doi.org/10.1038/s41598-023-36126-z.

Supplementary data of the study are provided as rendered HTML pages via the Quarto_site branch, allowing interactive exploration of figures, tables, and statistical reports: https://motleystate.github.io/InSPIRe_VagResist/

MoonCrater: a metagenomic shotgun read-count workflow for antibiotic resistance gene

MoonCrater is a snakemake-based workflow used in our study to identify and quantify antibiotic resistance genes from metagenomic data using the ResFinder database. This workflow relies on mapping reads against the KMA-indexed ResFinder database and produces gene count tables for resistome analysis.

Environnement

Usage

Clone repository:

git clone https://github.com/motleystate/inspire_VagResist.git

Before running MoonCrater, change the config.yml file to fit your data

paths:
  output_dir: "path/to/output_directory"  # output directory for mooncrater's results
  db_resfinder: "path/to/db_resfinder"    # ResFinder database (KMA-indexed)

# NB: to be detected the samples must follow the pattern: 
#		* if paired: {sample_name}{pair_prefix}{read}{suffix}
#		* if single: {sample_name}{suffix}
input_data:
  source_dir: "path/to/reads_files"       # input directory with reads files
  paired: True                            # set to False, if the samples aren't paired
  pair_prefix: "_R"
  suffix: "_trimmed_nohost.fastq.gz"
  format: "fastq"                         # format of the input file. Formats allowed: "fasta" and "fastq" (even in their compress version)

kma_count:
  coverage: 60
  identity: 80

Run MoonCrater:

snakemake -c N -j J

where N is the number of threads and J is the number of jobs that can be executed in parallel. MoonCrater requires a minimum of 12 threads to run. NB: run_resfinder.py is provided as part of the ResFinder tool

Output: gene_abundance_table_Cov_ID.csv: read-count table for antibiotic resistance genes. read_mapping_report.csv: Per-sample read processing summary report.

Column name Description
Total_Reads_Before Total number of read pairs detected before filtering.
Total_Reads_After Total number of read pairs retained and assigned to a gene after coverage and identity filtering.
Reads_Multi_Hits Number of read pairs mapping to > 1 gene
Multi_Hit_Same_Score Number of read pairs for which multiple genes had identical KMA scores.

Species-resolved associations with antibiotic resistance

KMA_kraken2_link.py script allows linking the functional annotations produced by MoonCrater to the taxonomic assignments generated by Kraken2.

python3 KMA_kraken2_link.py \
  --resfinder_dir path/to/ResFinder_mapping_out \
  --kraken_dir path/to/kraken_mapping_files \
  --output_dir path/to/output \
  --coverage 60 \
  --identity 80
`--resfinder_dir`   Directory containing KMA mapping outputs.

`--kraken_dir`      Directory containing Kraken2 mapping output files used for taxonomic assignment. 
                    Input mapping file names are expected to start with the name of the sample and end with the extension '.kraken': {sample_name}*.kraken

`--output_dir`      Output directory for species–ARG association table.

`--coverage`        Minimum coverage threshold (%) applied to resistance gene detection.

`--identity`        Minimum identity threshold (%) applied to resistance gene detection.

References

  1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics 2009; 10:421.
  2. Clausen PTLC, Aarestrup FM, Lund O. Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 2018; 19:307.
  3. Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0

About

Code used for metagenomic analysis of the vaginal resistome and the associated website (InSPIRe Cohort).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •