Skip to content

Commit 207f42a

Browse files
committed
Merge branch 'master' of github.com:ctmrbio/stag-mwc
2 parents c0d0d22 + eda858a commit 207f42a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+1147
-638
lines changed

.circleci/config.yml

Lines changed: 0 additions & 105 deletions
This file was deleted.

.github/workflows/build_containers.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030
steps:
3131

3232
- name: Check out code for the container builds
33-
uses: actions/checkout@v2
33+
uses: actions/checkout@v3
3434

3535
- name: Continue if Singularity Recipe exists
3636
run: |
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
name: Validate syntax and DAG
2+
on:
3+
push:
4+
branches:
5+
- master
6+
- develop
7+
pull_request: [] # Do it on all PRs
8+
9+
jobs:
10+
validate-syntax-and-dag:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
fail-fast: false
14+
matrix:
15+
docker_tag:
16+
- 'stable'
17+
container:
18+
image: snakemake/snakemake:${{ matrix.docker_tag }}
19+
20+
name: Validate syntax and DAG
21+
steps:
22+
- name: Check out code
23+
uses: actions/checkout@v3
24+
25+
- name: Create empty input files
26+
run: |
27+
ls
28+
mkdir -pv input
29+
touch input/test1_1.fq.gz input/test1_2.fq.gz
30+
touch input/test2_1.fq.gz input/test2_2.fq.gz
31+
32+
- name: Create placeholder db files
33+
run: |
34+
ls
35+
mkdir -pv tmpdir
36+
mkdir -pv db/hg19
37+
mkdir -pv db/metaphlan
38+
touch db/hg19/taxo.k2d
39+
touch db/metaphlan/test.1.bt2
40+
41+
- name: Modify config.yaml
42+
run: |
43+
ls
44+
sed -i 's/assess_depth: False/assess_depth: True/' config.yaml
45+
sed -i 's/sketch_compare: False/sketch_compare: True/' config.yaml
46+
sed -i 's/kaiju: False/kaiju: True/' config.yaml
47+
sed -i 's/kraken2: False/kraken2: True/' config.yaml
48+
sed -i 's/metaphlan: False/metaphlan: True/' config.yaml
49+
sed -i 's/humann: False/humann: True/' config.yaml
50+
sed -i 's/strainphlan: False/strainphlan: True/' config.yaml
51+
sed -i 's/groot: False/groot: True/' config.yaml
52+
sed -i 's/amrplusplus: False/amrplusplus: True/' config.yaml
53+
sed -i 's/assembly: False/assembly: True/' config.yaml
54+
sed -i 's/binning: False/binning: True/' config.yaml
55+
sed -i 's|db_path: \"\"|db_path: \"db/hg19\"|' config.yaml
56+
sed -i 's|db: \"\"|db: \"db\"|' config.yaml
57+
sed -i 's|bt2_db_dir: \"\"|bt2_db_dir: \"db/metaphlan\"|' config.yaml
58+
sed -i 's|bt2_index: \"\"|bt2_index: \"test\"|' config.yaml
59+
sed -i 's|_db: \"\"|_db: \"db\"|' config.yaml
60+
sed -i 's| index: \"\"| index: \"db\"|' config.yaml
61+
sed -i 's|kmer_distrib: \"\"|kmer_distrib: \"db\"|' config.yaml
62+
sed -i 's|tmpdir: \"/scratch\"|tmpdir: \"tmpdir\"|' config.yaml
63+
cat config.yaml
64+
65+
- name: Run Snakemake
66+
run: |
67+
ls
68+
snakemake --dryrun
69+
70+
71+
72+
73+

CHANGELOG.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,43 @@ committed to the master branch that does not trigger any of the aforementioned
1414
situations.
1515

1616

17-
## [0.5.1] Unreleased
17+
## [0.5.1] 2022-12-06
1818
### Added
19+
- Produce Snakemake report in zip format instead of HTML due to the HTML report being
20+
broken in the later versions of Snakemake.
21+
- Add KrakenUniq as taxonomic profiler as an alternative with lower false
22+
positive rate than Kraken2.
23+
- Added samplesheet as alternative input file selection method, this also
24+
enables providing custom sample names that are not based on pattern in input
25+
filenames.
26+
- Samplesheet can be used to specify remote input files from S3 or HTTP/HTTPS sources.
27+
- Added `run_krona` setting for taxonomic profilers to make it possible to disable Krona
28+
table and plot creation.
1929

2030
### Fixed
31+
- Corrected typo in `host_removal` rule concerning `keep_kreport` config flag.
32+
- Corrected typo in bowtie2 annotation counts output files leading to workflow
33+
complaining about missing output files.
34+
- Removed unintended stdout printouts from various helper scripts and some
35+
MetaPhlAn related rules.
36+
- Removed outdated mentions of MetaPhlAn2 in report.
2137

2238
### Changed
39+
- Replaced CircleCI automatic testing workflow with one implemented with Github actions.
40+
- Updated MetaPhlAn to version 4.0.3.
41+
- Updated HUMAnN to version 3.6.
42+
- Modified area and MetaPhlAn heatmap plotting scripts to better deal
43+
with MetaPhlAn 4 output formats.
44+
- Updated the documentation to reflect recent changes in StaG.
45+
- Updated KrakenTools to v1.2
46+
- Updated `scripts/join_tables.py` to v1.1, which includes support for skipping
47+
lines before the header.
48+
- Improved automatic report generation code in main Snakefile to be more
49+
robust. Now works well also when --use-singularity or --jobs are used
50+
simultaneously with --report.
2351

2452
### Removed
53+
- Removed old unmaintained DB download rules for groot, kaiju, kraken2.
2554

2655

2756
## [0.5.0] 2021-11-18

README.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,16 @@
22

33
[![DOI](https://zenodo.org/badge/125840716.svg)](https://zenodo.org/badge/latestdoi/125840716)
44
[![Snakemake](https://img.shields.io/badge/snakemake-≥4.8.1-brightgreen.svg)](https://snakemake.bitbucket.io)
5-
[![CircleCI](https://circleci.com/gh/ctmrbio/stag-mwc/tree/master.svg?style=svg)](https://circleci.com/gh/ctmrbio/stag-mwc/tree/master)
65

76
![StaG mwc logo](docs/source/img/stag_head_text.png "StaG mwc")
87

9-
This repo contains the code for a Snakemake workflow of the StaG Metagenomic
10-
Workflow Collaboration (mwc). Currently, the project focus is a barebones
11-
metagenomics analysis workflow to produce primary output files from several
12-
different metagenomic analysis tools.
8+
This repo contains the StaG Metagenomic Workflow Collaboration (mwc) Snakemake
9+
workflow. The project focuses on providing a metagenomics analysis workflow to
10+
produce primary output files from several different metagenomic analysis tools.
1311

1412
Go to https://stag-mwc.readthedocs.org for the full documentation.
1513

14+
1615
## Usage
1716

1817
### Step 0: Install conda and Snakemake
@@ -23,8 +22,9 @@ StaG-mwc. Most people would probably want to install
2322
base environment. When running StaG with the `--use-conda` or
2423
`--use-singularity` flags, all dependencies are managed automatically. If
2524
using conda it will automatically install the required versions of all tools
26-
required to run StaG-mwc. There is no need to combine the flags: the
27-
Singularity images already contain all required dependencies.
25+
required to run StaG-mwc. There is no need to combine the conda and singularity
26+
flags: the Singularity images used by the workflow already contain all required
27+
dependencies.
2828

2929
### Step 1: Clone workflow
3030
To use StaG-mwc, you need a local copy of the workflow repository. Start by
@@ -40,7 +40,7 @@ cite the publications of the other tools used in your workflow.
4040
Configure the workflow according to your needs by editing the file
4141
`config.yaml`. The most common changes include setting the paths to input and
4242
output folders, and configuring what steps of the workflow should be included
43-
when running the workflow.
43+
when running the workflow.
4444

4545
### Step 3: Execute workflow
4646
Test your configuration by performing a dry-run via
@@ -65,6 +65,7 @@ Note that in all examples above, `--use-conda` can be replaced with
6565
`--use-singularity` to run in Singularity containers instead of using a locally
6666
installed conda. Read more about it under the Running section in the docs.
6767

68+
6869
## Testing
6970
A very basic continuous integration test is currently in place. It merely
7071
validates the syntax by trying to let Snakemake build the dependency graph if
@@ -80,6 +81,7 @@ If you intend to modify or further develop this workflow, you are welcome to
8081
fork this reposity. Please consider sharing potential improvements via a pull
8182
request.
8283

84+
8385
## Citing
8486
If you find StaG-mwc useful in your research, please cite the Zenodo DOI:
8587
https://zenodo.org/badge/latestdoi/125840716

Snakefile

Lines changed: 51 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,20 @@
88
# https://stag-mwc.readthedocs.org
99

1010
from pathlib import Path
11+
import copy
12+
import subprocess
1113
import textwrap
1214

1315
from snakemake.exceptions import WorkflowError
1416
from snakemake.utils import min_version
1517
min_version("5.5.4")
1618

1719
from rules.publications import publications
18-
from scripts.common import UserMessages
20+
from scripts.common import UserMessages, SampleSheet
1921

2022
user_messages = UserMessages()
2123

22-
stag_version = "0.5.0"
24+
stag_version = "0.5.1"
2325
singularity_branch_tag = "-master" # Replace with "-master" before publishing new version
2426

2527
configfile: "config.yaml"
@@ -33,7 +35,15 @@ TMPDIR = Path(config["tmpdir"])
3335
DBDIR = Path(config["dbdir"])
3436
all_outputs = []
3537

36-
SAMPLES = set(glob_wildcards(INPUTDIR/config["input_fn_pattern"]).sample)
38+
if config["samplesheet"]:
39+
samplesheet = SampleSheet(config["samplesheet"], keep_local=config["keep_local"], endpoint_url=config["s3_endpoint_url"])
40+
SAMPLES = samplesheet.samples
41+
INPUT_read1 = lambda w: samplesheet.sample_info[w.sample]["read1"]
42+
INPUT_read2 = lambda w: samplesheet.sample_info[w.sample]["read2"]
43+
else:
44+
SAMPLES = set(glob_wildcards(INPUTDIR/config["input_fn_pattern"]).sample)
45+
INPUT_read1 = INPUTDIR/config["input_fn_pattern"].format(sample="{sample}", readpair="1"),
46+
INPUT_read2 = INPUTDIR/config["input_fn_pattern"].format(sample="{sample}", readpair="2")
3747

3848
onstart:
3949
print("\n".join([
@@ -46,9 +56,12 @@ onstart:
4656
)
4757

4858
if len(SAMPLES) < 1:
49-
raise WorkflowError("Found no samples! Check input file pattern and path in config.yaml")
59+
raise WorkflowError("Found no samples! Check input file options in config.yaml")
5060
else:
51-
print(f"Found the following samples in inputdir using input filename pattern '{config['input_fn_pattern']}':\n{SAMPLES}")
61+
if config["samplesheet"]:
62+
print(f"Found these samples in '{config['samplesheet']}':\n{SAMPLES}")
63+
else:
64+
print(f"Found these samples in '{config['inputdir']}' using input filename pattern '{config['input_fn_pattern']}':\n{SAMPLES}")
5265

5366

5467
#############################
@@ -69,6 +82,7 @@ include: "rules/naive/bbcountunique.smk"
6982
#############################
7083
include: "rules/taxonomic_profiling/kaiju.smk"
7184
include: "rules/taxonomic_profiling/kraken2.smk"
85+
include: "rules/taxonomic_profiling/krakenuniq.smk"
7286
include: "rules/taxonomic_profiling/metaphlan.smk"
7387

7488
#############################
@@ -173,10 +187,35 @@ onsuccess:
173187
Path("citations.rst").unlink()
174188
Path("citations.rst").symlink_to(citation_filename)
175189

176-
shell("{snakemake_call} --unlock".format(snakemake_call=argv[0]))
177-
shell("{snakemake_call} --report {report}-{datetime}.html".format(
178-
snakemake_call=argv[0],
179-
report=config["report"],
180-
datetime=report_datetime,
181-
)
182-
)
190+
unlock_call = copy.deepcopy(argv)
191+
unlock_call.append("--unlock")
192+
193+
report_args = copy.deepcopy(argv)
194+
report_args.extend(["--report", f"{config['report']}-{report_datetime}.zip"])
195+
196+
# Report generation doesn't work if --jobs
197+
# or --use-singularity are specified,
198+
# so we strip all args related to these from argv
199+
# for report generation call
200+
skip = False
201+
report_call = []
202+
for arg in report_args:
203+
if arg == "--use-singularity":
204+
continue
205+
if arg == "--singularity-args":
206+
skip = True
207+
continue
208+
if arg == "--singularity-prefix":
209+
skip = True
210+
continue
211+
if arg == "--jobs":
212+
skip = True
213+
continue
214+
if skip:
215+
skip = False
216+
continue
217+
report_call.append(arg)
218+
219+
subprocess.run(unlock_call)
220+
subprocess.run(report_call)
221+

0 commit comments

Comments
 (0)