Skip to content

Commit b394559

Browse files
authored
Merge pull request #150 from ctmrbio/develop
Version 0.4.0
2 parents da5113d + dc83506 commit b394559

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+5707
-1640
lines changed

.circleci/config.yml

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
version: 2.0
2+
3+
jobs:
4+
setup_snakemake:
5+
docker:
6+
- image: continuumio/miniconda3:4.7.10
7+
working_directory: ~/stag-mwc
8+
steps:
9+
- run:
10+
name: Install Snakemake
11+
command: |
12+
conda info -a
13+
conda config --add channels defaults
14+
conda config --add channels conda-forge
15+
conda config --add channels bioconda
16+
conda create -q -y -n snakemake python=3.7 snakemake=5.6.0
17+
- save_cache:
18+
key: v1-snakemake-5.6.0-{{ .Environment.CIRCLE_SHA1 }}
19+
paths:
20+
- /opt/conda
21+
bundle_dependencies:
22+
docker:
23+
- image: continuumio/miniconda3:4.7.10
24+
working_directory: ~/stag-mwc
25+
steps:
26+
- run:
27+
name: Bundle test data
28+
command: |
29+
#git clone https://github.com/ctmrbio/stag-mwc_test_data input
30+
mkdir -pv input
31+
touch input/test1_1.fq.gz input/test1_2.fq.gz
32+
touch input/test2_1.fq.gz input/test2_2.fq.gz
33+
- save_cache:
34+
key: v1-stag-mwc_test_data-{{ .Environment.CIRCLE_SHA1 }}
35+
paths:
36+
- ~/stag-mwc/input
37+
checkout_code:
38+
docker:
39+
- image: continuumio/miniconda3:4.7.10
40+
working_directory: ~/stag-mwc
41+
steps:
42+
- checkout
43+
- run:
44+
name: Placeholder database files
45+
command: |
46+
mkdir -pv ~/stag-mwc/db/hg19
47+
mkdir -pv ~/stag-mwc/db/metaphlan2
48+
touch ~/stag-mwc/db/hg19/taxo.k2d
49+
touch ~/stag-mwc/db/metaphlan2/test.1.bt2
50+
- save_cache:
51+
key: v1-stag-mwc_with_dbs-{{ .Revision }}
52+
paths:
53+
- ~/stag-mwc
54+
validate_syntax_and_dag:
55+
docker:
56+
- image: continuumio/miniconda3:4.7.10
57+
working_directory: ~/stag-mwc
58+
steps:
59+
- restore_cache:
60+
key: v1-snakemake-5.6.0-{{ .Environment.CIRCLE_SHA1 }}
61+
- restore_cache:
62+
key: v1-stag-mwc_test_data-{{ .Environment.CIRCLE_SHA1 }}
63+
- restore_cache:
64+
key: v1-stag-mwc_with_dbs-{{ .Revision }}
65+
- run:
66+
name: Validate syntax and DAG
67+
command: |
68+
sed -i 's/assess_depth: False/assess_depth: True/' config.yaml
69+
sed -i 's/sketch_compare: False/sketch_compare: True/' config.yaml
70+
sed -i 's/kaiju: False/kaiju: True/' config.yaml
71+
sed -i 's/kraken2: False/kraken2: True/' config.yaml
72+
sed -i 's/metaphlan2: False/metaphlan2: True/' config.yaml
73+
sed -i 's/humann2: False/humann2: True/' config.yaml
74+
sed -i 's/antibiotic_resistance: False/antibiotic_resistance: True/' config.yaml
75+
sed -i 's/assembly: False/assembly: True/' config.yaml
76+
sed -i 's/binning: False/binning: True/' config.yaml
77+
sed -i 's|db_path: \"\"|db_path: \"db/hg19\"|' config.yaml
78+
sed -i 's|db: \"\"|db: \"db\"|' config.yaml
79+
sed -i 's|bt2_db_dir: \"\"|bt2_db_dir: \"db/metaphlan2\"|' config.yaml
80+
sed -i 's|bt2_index: \"\"|bt2_index: \"test\"|' config.yaml
81+
sed -i 's|_db: \"\"|_db: \"db\"|' config.yaml
82+
sed -i 's| index: \"\"| index: \"db\"|' config.yaml
83+
sed -i 's|kmer_distrib: \"\"|kmer_distrib: \"db\"|' config.yaml
84+
cat config.yaml
85+
source activate snakemake
86+
snakemake --use-conda --dryrun
87+
88+
workflows:
89+
version: 2
90+
validate_syntax_and_dag:
91+
jobs:
92+
- setup_snakemake
93+
- bundle_dependencies
94+
- checkout_code
95+
- validate_syntax_and_dag:
96+
requires:
97+
- setup_snakemake
98+
- bundle_dependencies
99+
- checkout_code
100+

.editorconfig

Lines changed: 0 additions & 15 deletions
This file was deleted.

.travis.yml

Lines changed: 0 additions & 27 deletions
This file was deleted.

CHANGELOG.md

Lines changed: 88 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,91 @@ files), and the patch version is typically incremented for any set of changes
1313
committed to the master branch that does not trigger any of the aforementioned
1414
situations.
1515

16+
## [0.4.0] Unreleased
17+
### Added
18+
- Added resource limiter for HUMAnN2 due to its intense use of huge temporary
19+
files in the output folder. Activated with --resources humann2=X, where X is
20+
the max number of parallel instances of humann2 to run.
21+
- Added groot report parameters `covcutoff` and `lowcov` to config file.
22+
- Added automatic plot of proportion human reads. Included in report.
23+
- Added assembly and binning using MetaWRAP.
24+
- Added the possibility to run in Singularity with conda using
25+
`--use-singularity --use-conda`.
26+
- Added more MetaPhlAn2 data in output report.
27+
- Added HUMAnN2 summary tables to output report.
28+
- Added combined table and Krona plot for Kraken2 to output report.
29+
- Added metagenomic assembly, binning and "blobology", using MEGAHIT or SPAdes,
30+
with binning using CONCOCT and MetaBat (MaxBin2 is not working), implemented
31+
via MetaWrap.
32+
- Added KrakenTools from Jennifer Lu under the MIT license.
33+
- Added basic syntax and DAG validation test in CircleCI
34+
- Added possibility to skip read QC and/or host removal: will symlink relevant files
35+
files into the relevant output directories so Snakemake can continue
36+
without performing read QC and/or host removal.
37+
- Added MultiQC, mainly to summarize fastp logs.
38+
- Added Bracken abundance estimation on Kraken2 report files; added Bracken to
39+
the StaG conda environment. Also added Bracken abundance filtering rules so
40+
users can include/exclude certain taxa.
41+
- Added "all_samples" summary output files in a more common format for all
42+
taxonomic profilers, called `mpa_style`. They are not identical, but very
43+
similar, with full lineage listings for all detected taxa.
44+
- Added pigz v2.4 to the main StaG conda environment.
45+
- Added summary with read counts passing preprocessing steps as a table and
46+
basic line plot. Only runs if both read QC and host removal are performed.
47+
48+
### Fixed
49+
- Fixed bug in Slurm profile handling of cancelled/failed jobs.
50+
- MetaPhlAn2 rule now correctly detect if no database path has been entered in
51+
the config file.
52+
- HUMAnN2 rule now correctly detects if no database path has been entered in
53+
the config file.
54+
- Kraken2 rule now correctly detects if no database is available at the path
55+
given in the config file.
56+
57+
### Changed
58+
- Updated Python to 3.7 in main stag-mwc conda environment.
59+
- Updated Kaiju to 1.7.2.
60+
- Updated BBMap to 38.68.
61+
- Updated sambamba to 0.7.0.
62+
- Updated Kraken2 to 2.0.8_beta.
63+
- Updated seaborn to 0.8.1, added fastcluster to main stag-mwc conda env, installed via pip.
64+
- Updated MetaPhlAn2 to 2.96.1.
65+
- Updated HUMAnN2 to 2.8.1.
66+
- Updated GROOT to v0.8.5.
67+
- Updated plot_metaphlan2_heatmap.py to 0.3.
68+
- Changed output filenames ending with `.tsv` to `.txt` to avoid pretty HTML
69+
representations in report.
70+
- Replaced BBMap-based host removal with Kraken2, substantially reducing time
71+
and resources requirements.
72+
- Added read length window filter before groot alignment step.
73+
- Change logdir of remove_human rule to LOGDIR/remove_human instead of
74+
OUTDIR/logs/remove_human.
75+
- Improved make_count_table.py so it can use TSV annotation files with multiple
76+
columns. Added config setting for which columns to include.
77+
- Cleaned up sketch comparison cluster heatmap plotting script, making it more
78+
robust to variations in output from different BBTools versions.
79+
- Changed the call to merge_metaphlan_tables.py due to undocumented
80+
CLI change in latest conda version.
81+
- Changed Kaiju summary report output filenames.
82+
- Split biobakery environment into metaphlan2 and humann2 so users only
83+
interested in MetaPhlAn2 do not have to download the huge HUMAnN2.
84+
- Replaced the outdated metaphlan_hclust_heatmap.py with a custom
85+
plot_metaphlan2_heatmap.py script.
86+
- Defined some low-impact summary and plotting rules as localrules.
87+
- Reworded all rules relating to human removal to "host removal", and changed output folder
88+
structure accordingly.
89+
- Renamed output folder and file names for quality and adapter trimming.
90+
- Set Kraken2 --confidence to 0.1 by default.
91+
- Adjusted HUMAnN2 cores to 20 (up from 8).
92+
- Adjusted MetaPhlAn2 cores to 5 (up from 4).
93+
94+
95+
### Removed
96+
- Removed outdated database download rules for Centrifuge, MetaPhlAn2, Kaiju,
97+
Kraken2, HUMAnN2.
98+
- Replaced FastQC + BBDuk with fastp adapter trimming and quality filtering.
99+
- Removed support for Centrifuge
100+
16101

17102
## [0.3.2-dev]
18103
### Added
@@ -48,9 +133,9 @@ situations.
48133
- Substantial improvements to Rackham Slurm profile, focusing on better Slurm
49134
log handling.
50135
- A few low-impact rules that can be run locally are now declared as localrules.
51-
- Replaced MEGARes antibiotic resistance gene mapping with GROOT resistance
52-
gene profiling using gene variation graphs, using a default database based on
53-
arg-annot.
136+
- Replaced MEGARes antibiotic resistance gene mapping with Groot resistance gene
137+
profiling using gene variation graphs.
138+
- Increased resource requirements for remove_human step in Rackham cluster profile.
54139
- Added clustered sketch comparison output heatmap.
55140
- Updated MetaPhlAn2 to version 2.7.8, with corresponding changes to config file.
56141

README.md

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,16 @@
22

33
[![DOI](https://zenodo.org/badge/125840716.svg)](https://zenodo.org/badge/latestdoi/125840716)
44
[![Snakemake](https://img.shields.io/badge/snakemake-≥4.8.1-brightgreen.svg)](https://snakemake.bitbucket.io)
5-
<!--[![Build Status](https://travis-ci.org/snakemake-workflows/mwc.svg?branch=master)](https://travis-ci.org/snakemake-workflows/mwc) -->
5+
[![CircleCI](https://circleci.com/gh/ctmrbio/stag-mwc/tree/master.svg?style=svg)](https://circleci.com/gh/ctmrbio/stag-mwc/tree/master)
6+
67
![StaG mwc logo](docs/source/img/stag_head_text.png "StaG mwc")
78

89
This repo contains the code for a Snakemake workflow of the StaG Metagenomic
910
Workflow Collaboration (mwc). Currently, the project focus is a barebones
1011
metagenomics analysis workflow to produce primary output files from several
1112
different metagenomic analysis tools.
1213

13-
## Authors
14-
15-
* Fredrik Boulund (@boulund)
16-
* Lisa Olsson (@lis4matilda)
17-
* (your name here)
14+
Go to https://stag-mwc.readthedocs.org for the full documentation.
1815

1916
## Usage
2017

@@ -26,7 +23,7 @@ StaG-mwc. Most people would probably want to install
2623
base environment. Conda will automatically install the required versions of
2724
all tools required to run StaG-mwc.
2825

29-
### Step 1: Install workflow
26+
### Step 1: Clone workflow
3027
To use StaG-mwc, you need a local copy of the workflow repository. Start by
3128
making a clone of the repository:
3229

@@ -62,17 +59,10 @@ documentation](https://snakemake.readthedocs.io) for further details on how to
6259
run Snakemake workflows on other types of cluster resources.
6360

6461

65-
### Automatic database download
66-
The workflow offers steps that can automatically download the required
67-
reference databases. Note that this step is normally only required once, as
68-
previously downloaded databases are reused. See the
69-
[official documentation](https://stag-mwc.readthedocs.org) for more information.
70-
71-
7262
## Testing
73-
Tests are currently not implemented. The ambition is that StaG-mwc will contain
74-
extensive tests to verify functionality. We plan to implement automated linting
75-
and testing on a small test data set via continuous integration.
63+
A very basic continuous integration test is currently in place. It merely
64+
validates the syntax by trying to let Snakemake build the dependency graph if
65+
all outputs are activated.
7666

7767

7868
## Contributing
@@ -83,5 +73,10 @@ If you intend to modify or further develop this workflow, you are welcome to
8373
fork this reposity. Please consider sharing potential improvements via a pull
8474
request.
8575

76+
## Citing
77+
If you find StaG-mwc useful in your research, please cite the Zenodo DOI:
78+
https://zenodo.org/record/1483891
79+
80+
8681
# Logo attribution
8782
<a href="https://www.freepik.com/free-photos-vectors/animal">Animal vector created by Patrickss - Freepik.com</a>

0 commit comments

Comments
 (0)