Skip to content

Commit 56e700d

Browse files
authored
Use pathlib (#31)
* Use pathlib in Snakefile * Add logdir config param. Get tired because Snakemake doesn't support Path objects as input or log files. * Use pathlib for all paths. Add version printout to Snakefile * Add info about pathlib use * Add details on branching structure to CONTRIBUTING.md * Bump docs version
1 parent 1e2828f commit 56e700d

File tree

14 files changed

+299
-239
lines changed

14 files changed

+299
-239
lines changed

CONTRIBUTING.md

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,41 @@ We use the issue tracker in Github. Submit issues for things such as
55
bug reports, feature requests, or general improvement discussion topics.
66

77
## Submitting changes
8-
The typical procedure to develop new features or fix bugs in StaG-mwc looks
9-
something like this:
8+
The main branch of StaG-mwc should always be stable and reliable. All
9+
development should be based on the develop branch. Please create new feature
10+
branches from the develop branch. The develop branch is then merged into the
11+
master branch when enough improvements have accrued. The typical procedure to
12+
develop new features or fix bugs in StaG-mwc looks something like this:
1013

1114
1. Fork or clone the repository.
12-
2. Create a branch with a descriptive name based on your intended changes using
13-
dashes to separate words, e.g. `branch-to-add-megahit-assembly-step`.
14-
3. Insert your code into the respective folders, i.e. scripts, rules and envs.
15-
Define the entry point of the workflow in the Snakefile and the main
16-
configuration in the config.yaml file.
15+
2. Checkout the develop branch and create a new feature branch from there.
16+
Use a descriptive name and use dashes to separate words:
17+
```
18+
git checkout develop
19+
git checkout -b add-megahit-assembly-step
20+
```
21+
3. Write or modify code in the scripts, rules and envs folders. Define the
22+
entry point of the workflow in the Snakefile and the main configuration in the
23+
config.yaml file.
24+
4. If a new feature has been added, document it in the Sphinx documentation.
1725
4. Commit changes to your fork/clone.
18-
5. Create a pull request (PR) with some motivation behind the work you have
19-
done and possibly some explanations for tricky bits.
26+
5. Create a pull request (PR) with some descriptions of the work you have
27+
done and possibly some explanations for potentially tricky bits.
28+
6. When the feature is considered complete, we bump the version number and
29+
merge the PR back to the develop branch.
30+
31+
### Releases
32+
New releases are made whenever enough new features have accrued on the develop
33+
branch. Before creating a new release, ensure the following things have been
34+
taken care of:
35+
36+
* All pending features that should be included in the upcoming release are
37+
merged into the develop branch.
38+
* Double check that documentation is up-to-date for implemented features.
39+
* Check that the version number in the documentation matches the Snakefile.
40+
41+
Then, merge the develop branch into master, squashing all commits, and tag
42+
the new release.
2043

2144

2245
## Code organization
@@ -76,7 +99,12 @@ designed to allow some inclusion logic in the main Snakefile, so components can
7699
be turned on or off without too much trouble. Output should typically be in a
77100
subfolder inside the overall `outdir` folder. `outdir` is available as a string
78101
in all rule files, as it is defined in the main Snakefile based on the value
79-
set in `config.yaml`.
102+
set in `config.yaml`.
103+
104+
Declare paths to input, output and log files using the pathlib Path objects
105+
INPUTDIR, OUTDIR, and LOGDIR. Note that Snakemake is not yet fully pathlib
106+
compatible so convert Path objects to strings inside `expand` statements and
107+
log file declarations.
80108

81109
Tools that require databases or other reference material to work can be
82110
confusing or annyoing to users of the workflow. To minimize the amount of

Snakefile

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,32 @@
11
# vim: syntax=python expandtab
22
#
3-
# StaG
4-
# mwc - Metagenomic Workflow Collaboration
3+
# StaG Metagenomic Workflow Collaboration
4+
# StaG-mwc
55
# Copyright (c) 2018 Authors
66
#
7-
# Running snakemake -n in a clone of this repository should successfully
8-
# execute a test dry-run of the workflow.
7+
# Running snakemake --use-conda -n in a clone of this repository should
8+
# successfully execute a test dry run of the workflow.
9+
from pathlib import Path
10+
911
from snakemake.exceptions import WorkflowError
12+
from snakemake.utils import min_version
13+
min_version("4.8.1") # TODO: Bump version when Snakemake is pathlib compatible
1014

11-
from sys import exit
12-
import os.path
15+
stag_version = "0.1.1-dev"
16+
print("="*60)
17+
print("StaG Metagenomic Workflow Collaboration".center(60))
18+
print("StaG-mwc".center(60))
19+
print(stag_version.center(60))
20+
print("="*60)
1321

1422
configfile: "config.yaml"
15-
outdir = config["outdir"]
23+
INPUTDIR = Path(config["inputdir"])
24+
OUTDIR = Path(config["outdir"])
25+
LOGDIR = Path(config["logdir"])
26+
DBDIR = Path(config["dbdir"])
1627
all_outputs = []
1728

18-
SAMPLES = set(glob_wildcards(config["inputdir"]+"/"+config["input_fn_pattern"]).sample)
29+
SAMPLES = set(glob_wildcards(INPUTDIR/config["input_fn_pattern"]).sample)
1930

2031
#############################
2132
# Pre-processing

config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
inputdir: "input"
1616
input_fn_pattern: "{sample}_R{readpair}.fastq.gz"
1717
outdir: "output_dir"
18+
logdir: "output_dir/logs"
1819
dbdir: "databases" # Databases will be downloaded to this dir, if requested
1920

2021

docs/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,9 +56,9 @@
5656
# built documents.
5757
#
5858
# The short X.Y version.
59-
version = '0.1.0'
59+
version = '0.1.1'
6060
# The full version, including alpha/beta/rc tags.
61-
release = '0.1.0-dev'
61+
release = '0.1.1-dev'
6262

6363
# reStructuredText prolog contains a string of reStructuredText that will be
6464
# included at the beginning of every source file that is read.

rules/antibiotic_resistance/megares.smk

Lines changed: 37 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,84 +1,92 @@
11
# Generic rules for detection of antibiotic resistance genes using MEGARes
2+
# TODO: Remove superfluous str conversions when Snakemake is pathlib compatible.
3+
from pathlib import Path
24
from snakemake.exceptions import WorkflowError
3-
import os.path
45

56
localrules:
67
download_megares
78

8-
if not os.path.isdir(os.path.join(config["megares"]["db_path"], "ref")):
9-
err_message = "No MEGARes database found at: '{}'!\n".format(config["megares"]["db_path"])
9+
megares_db_path = Path(config["megares"]["db_path"])
10+
if not Path(megares_db_path/"ref").exists():
11+
err_message = "No MEGARes database found at: '{}'!\n".format(megares_db_path)
1012
err_message += "Specify the DB path in the megares section of config.yaml.\n"
11-
err_message += "Run 'snakemake create_megares_index' to download and build a BBMap index in '{dbdir}/megares'\n".format(dbdir=config["dbdir"])
13+
err_message += "Run 'snakemake create_megares_index' to download and build a BBMap index in '{dbdir}'\n".format(dbdir=DBDIR/"megares")
1214
err_message += "If you do not want to map reads against MEGARes for antibiotic resistance gene detection, set antibiotic_resistance: False in config.yaml"
1315
raise WorkflowError(err_message)
1416

15-
megares_outputs = expand("{outdir}/megares/{sample}.{output_type}",
16-
outdir=outdir,
17+
megares_outputs = expand(str(OUTDIR/"megares/{sample}.{output_type}"),
1718
sample=SAMPLES,
1819
output_type=("sam.gz", "mapped_reads.fq.gz", "mhist.txt", "covstats.txt", "rpkm.txt"))
1920
all_outputs.extend(megares_outputs)
2021

2122
rule download_megares:
2223
"""Download MEGARes database."""
2324
output:
24-
config["dbdir"]+"/megares/megares_annotations_v1.01.csv",
25-
config["dbdir"]+"/megares/megares_database_v1.01.fasta",
26-
config["dbdir"]+"/megares/megares_to_external_header_mappings_v1.01.tsv",
25+
DBDIR/"megares/megares_annotations_v1.01.csv",
26+
DBDIR/"megares/megares_database_v1.01.fasta",
27+
DBDIR/"megares/megares_to_external_header_mappings_v1.01.tsv",
28+
log:
29+
str(LOGDIR/"megares/megares.download.log")
2730
shadow:
2831
"shallow"
2932
params:
30-
dbdir=config["dbdir"]+"/megares"
33+
dbdir=DBDIR/"megares"
3134
shell:
3235
"""
3336
cd {params.dbdir}
3437
wget http://megares.meglab.org/download/megares_v1.01.zip \
38+
> {log} \
3539
&& \
3640
unzip megares_v1.01.zip \
41+
>> {log} \
3742
&& \
3843
mv megares_v1.01/* . \
3944
&& \
40-
rm -rfv megares_v1.01 megares_v1.01.zip
45+
rm -rfv megares_v1.01 megares_v1.01.zip \
46+
>> {log}
4147
"""
4248

4349

4450
rule create_megares_index:
4551
"""Create BBMap index for MEGARes."""
4652
input:
47-
fasta=config["dbdir"]+"/megares/megares_database_v1.01.fasta"
53+
fasta=DBDIR/"megares/megares_database_v1.01.fasta"
4854
output:
49-
config["dbdir"]+"/megares/ref/genome/1/chr1.chrom.gz",
50-
config["dbdir"]+"/megares/ref/genome/1/info.txt",
51-
config["dbdir"]+"/megares/ref/genome/1/scaffolds.txt.gz",
52-
config["dbdir"]+"/megares/ref/genome/1/summary.txt",
53-
config["dbdir"]+"/megares/ref/index/1/chr1_index_k13_c8_b1.block",
54-
config["dbdir"]+"/megares/ref/index/1/chr1_index_k13_c8_b1.block2.gz",
55+
DBDIR/"megares/ref/genome/1/chr1.chrom.gz",
56+
DBDIR/"megares/ref/genome/1/info.txt",
57+
DBDIR/"megares/ref/genome/1/scaffolds.txt.gz",
58+
DBDIR/"megares/ref/genome/1/summary.txt",
59+
DBDIR/"megares/ref/index/1/chr1_index_k13_c8_b1.block",
60+
DBDIR/"megares/ref/index/1/chr1_index_k13_c8_b1.block2.gz",
61+
log:
62+
str(LOGDIR/"megares/megares.bbmap_index.log")
5563
shadow:
5664
"shallow"
5765
conda:
5866
"../../envs/stag-mwc.yaml"
5967
params:
60-
dbdir=config["dbdir"]+"/megares"
68+
dbdir=DBDIR/"megares"
6169
shell:
6270
"""
63-
bbmap.sh ref={input} path={params.dbdir}
71+
bbmap.sh ref={input} path={params.dbdir} > {log}
6472
"""
6573

6674

6775
megares_config = config["megares"]
6876
rule bbmap_to_megares:
6977
"""BBMap to MEGARes."""
7078
input:
71-
read1=config["outdir"]+"/filtered_human/{sample}_R1.filtered_human.fq.gz",
72-
read2=config["outdir"]+"/filtered_human/{sample}_R2.filtered_human.fq.gz",
79+
read1=OUTDIR/"filtered_human/{sample}_R1.filtered_human.fq.gz",
80+
read2=OUTDIR/"filtered_human/{sample}_R2.filtered_human.fq.gz",
7381
output:
74-
sam=config["outdir"]+"/megares/{sample}.sam.gz",
75-
mapped_reads=config["outdir"]+"/megares/{sample}.mapped_reads.fq.gz",
76-
covstats=config["outdir"]+"/megares/{sample}.covstats.txt",
77-
rpkm=config["outdir"]+"/megares/{sample}.rpkm.txt",
78-
mhist=config["outdir"]+"/megares/{sample}.mhist.txt",
82+
sam=OUTDIR/"megares/{sample}.sam.gz",
83+
mapped_reads=OUTDIR/"megares/{sample}.mapped_reads.fq.gz",
84+
covstats=OUTDIR/"megares/{sample}.covstats.txt",
85+
rpkm=OUTDIR/"megares/{sample}.rpkm.txt",
86+
mhist=OUTDIR/"megares/{sample}.mhist.txt",
7987
log:
80-
stdout=config["outdir"]+"/logs/megares/{sample}.bbmap.stdout.log",
81-
stderr=config["outdir"]+"/logs/megares/{sample}.bbmap.statsfile.txt"
88+
stdout=str(LOGDIR/"megares/{sample}.bbmap.stdout.log"),
89+
stderr=str(LOGDIR/"megares/{sample}.bbmap.statsfile.txt"),
8290
shadow:
8391
"shallow"
8492
conda:

rules/mappers/bbmap.smk

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,63 @@
11
# Rules for generic read mapping using BBMap
2+
# TODO: Remove superfluous str conversions when Snakemake is pathlib compatible.
3+
from pathlib import Path
4+
25
from snakemake.exceptions import WorkflowError
3-
import os.path
46

57
localrules:
68
bbmap_counts_table
79
bbmap_featureCounts
810

9-
if not os.path.isdir(os.path.join(config["bbmap"]["db_path"], "ref")):
10-
err_message = "BBMap index not found at: '{}'\n".format(config["bbmap"]["db_path"])
11+
db_path = Path(config["bbmap"]["db_path"])
12+
if not Path(db_path/"ref").exists():
13+
err_message = "BBMap index not found at: '{}'\n".format(db_path)
1114
err_message += "Check path in config setting 'bbmap:db_path'.\n"
1215
err_message += "If you want to skip mapping with BBMap, set mappers:bbmap:False in config.yaml."
1316
raise WorkflowError(err_message)
1417

1518
# Add final output files from this module to 'all_outputs' from the main
1619
# Snakefile scope. SAMPLES is also from the main Snakefile scope.
17-
bbmap_alignments = expand("{outdir}/bbmap/{db_name}/{sample}.{output_type}",
18-
outdir=config["outdir"],
20+
bbmap_alignments = expand(str(OUTDIR/"bbmap/{db_name}/{sample}.{output_type}"),
1921
db_name=config["bbmap"]["db_name"],
2022
sample=SAMPLES,
2123
output_type=("sam.gz", "covstats.txt", "rpkm.txt"))
22-
counts_table = expand("{outdir}/bbmap/{db_name}/all_samples.counts_table.tab",
23-
outdir=config["outdir"],
24+
counts_table = expand(str(OUTDIR/"bbmap/{db_name}/all_samples.counts_table.tab"),
2425
db_name=config["bbmap"]["db_name"],
2526
sample=SAMPLES)
26-
featureCounts = expand("{outdir}/bbmap/{db_name}/all_samples.featureCounts{output_type}",
27-
outdir=config["outdir"],
27+
featureCounts = expand(str(OUTDIR/"bbmap/{db_name}/all_samples.featureCounts{output_type}"),
2828
db_name=config["bbmap"]["db_name"],
2929
sample=SAMPLES,
3030
output_type=["", ".summary", ".table.tsv"])
3131
all_outputs.extend(bbmap_alignments)
3232
if config["bbmap"]["counts_table"]["annotations"]:
33-
if not os.path.isfile(config["bbmap"]["counts_table"]["annotations"]):
33+
if not Path(config["bbmap"]["counts_table"]["annotations"]).exists():
3434
err_message = "BBMap counts table annotations not found at: '{}'\n".format(config["bbmap"]["counts_table"]["annotations"])
3535
err_message += "Check path in config setting 'bbmap:counts_table:annotations'.\n"
3636
err_message += "If you want to skip read counts summary for BBMap, set bbmap:counts_table:annotations to '' in config.yaml."
3737
raise WorkflowError(err_message)
3838
all_outputs.extend(counts_table)
3939
if config["bbmap"]["featureCounts"]["annotations"]:
40-
if not os.path.isfile(config["bbmap"]["featureCounts"]["annotations"]):
40+
if not Path(config["bbmap"]["featureCounts"]["annotations"]).exists():
4141
err_message = "BBMap featureCounts annotations not found at: '{}'\n".format(config["bbmap"]["featureCounts"]["annotations"])
4242
err_message += "Check path in config setting 'bbmap:featureCounts:annotations'.\n"
4343
err_message += "If you want to skip mapping with BBMap, set mappers:bbmap:False in config.yaml."
4444
raise WorkflowError(err_message)
4545
all_outputs.extend(featureCounts)
4646

4747
bbmap_config = config["bbmap"]
48-
bbmap_output_folder = config["outdir"]+"/bbmap/{db_name}/".format(db_name=bbmap_config["db_name"])
48+
bbmap_output_folder = OUTDIR/"bbmap/{db_name}".format(db_name=bbmap_config["db_name"])
4949
rule bbmap:
5050
"""BBMap"""
5151
input:
52-
read1=config["outdir"]+"/filtered_human/{sample}_R1.filtered_human.fq.gz",
53-
read2=config["outdir"]+"/filtered_human/{sample}_R2.filtered_human.fq.gz",
52+
read1=OUTDIR/"filtered_human/{sample}_R1.filtered_human.fq.gz",
53+
read2=OUTDIR/"filtered_human/{sample}_R2.filtered_human.fq.gz",
5454
output:
55-
sam=bbmap_output_folder+"{sample}.sam.gz",
56-
covstats=bbmap_output_folder+"{sample}.covstats.txt",
57-
rpkm=bbmap_output_folder+"{sample}.rpkm.txt",
55+
sam=bbmap_output_folder/"{sample}.sam.gz",
56+
covstats=bbmap_output_folder/"{sample}.covstats.txt",
57+
rpkm=bbmap_output_folder/"{sample}.rpkm.txt",
5858
log:
59-
stdout=config["outdir"]+"/logs/bbmap/{sample}.bbmap.stdout.log",
60-
stderr=config["outdir"]+"/logs/bbmap/{sample}.bbmap.statsfile.txt"
59+
stdout=str(LOGDIR/"bbmap/{sample}.bbmap.stdout.log"),
60+
stderr=str(LOGDIR/"bbmap/{sample}.bbmap.statsfile.txt"),
6161
shadow:
6262
"shallow"
6363
conda:
@@ -87,13 +87,13 @@ rule bbmap:
8787

8888
rule bbmap_counts_table:
8989
input:
90-
rpkms=expand(config["outdir"]+"/bbmap/{dbname}/{sample}.rpkm.txt",
90+
rpkms=expand(str(OUTDIR/"bbmap/{dbname}/{sample}.rpkm.txt"),
9191
dbname=bbmap_config["db_name"],
9292
sample=SAMPLES)
9393
output:
94-
counts=config["outdir"]+"/bbmap/{dbname}/all_samples.counts_table.tab".format(dbname=bbmap_config["db_name"]),
94+
counts=OUTDIR/"bbmap/{dbname}/all_samples.counts_table.tab".format(dbname=bbmap_config["db_name"]),
9595
log:
96-
config["outdir"]+"/logs/bbmap/{dbname}/all_samples.counts_table.log".format(dbname=bbmap_config["db_name"])
96+
str(LOGDIR/"bbmap/{dbname}/all_samples.counts_table.log".format(dbname=bbmap_config["db_name"]))
9797
shadow:
9898
"shallow"
9999
conda:
@@ -115,15 +115,15 @@ rule bbmap_counts_table:
115115
fc_config = bbmap_config["featureCounts"]
116116
rule bbmap_featureCounts:
117117
input:
118-
bams=expand(config["outdir"]+"/bbmap/{dbname}/{sample}.sam.gz",
118+
bams=expand(str(OUTDIR/"bbmap/{dbname}/{sample}.sam.gz"),
119119
dbname=bbmap_config["db_name"],
120120
sample=SAMPLES)
121121
output:
122-
counts=config["outdir"]+"/bbmap/{dbname}/all_samples.featureCounts".format(dbname=bbmap_config["db_name"]),
123-
counts_table=config["outdir"]+"/bbmap/{dbname}/all_samples.featureCounts.table.tsv".format(dbname=bbmap_config["db_name"]),
124-
summary=config["outdir"]+"/bbmap/{dbname}/all_samples.featureCounts.summary".format(dbname=bbmap_config["db_name"]),
122+
counts=OUTDIR/"bbmap/{dbname}/all_samples.featureCounts".format(dbname=bbmap_config["db_name"]),
123+
counts_table=OUTDIR/"bbmap/{dbname}/all_samples.featureCounts.table.tsv".format(dbname=bbmap_config["db_name"]),
124+
summary=OUTDIR/"bbmap/{dbname}/all_samples.featureCounts.summary".format(dbname=bbmap_config["db_name"]),
125125
log:
126-
config["outdir"]+"/logs/bbmap/{dbname}/all_samples.featureCounts.log".format(dbname=bbmap_config["db_name"])
126+
str(LOGDIR/"bbmap/{dbname}/all_samples.featureCounts.log".format(dbname=bbmap_config["db_name"]))
127127
shadow:
128128
"shallow"
129129
conda:

0 commit comments

Comments
 (0)