-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hello,
I'm encountering a MissingInputException when running the experiment pipeline. The error occurs during DAG building, before any jobs execute. This appears to be a workflow structure issue, but I've been unable to identify the root cause despite extensive debugging.
Error
Here is my command and error:
$ snakemake --sdm conda --configfile config_MS_MPRA_combined.yaml -c 30 -j 35 --workflow-profile profiles/MS_mpra/ --executor slurm --resources mem_mb=60000 --rerun-incomplete --keep-going -n --quiet rules
Using workflow specific profile /home/go274/project/MPRAsnakeflow/profiles/MS_mpra/ for setting default command line arguments.
Running MPRAsnakeflow version 0.5.4
host: r814u03n01.mccleary.ycrc.yale.edu
Building DAG of jobs...
MissingInputException in rule experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all in file "/home/go274/project/MPRAsnakeflow/workflow/rules/experiment/statistic/assigned_counts.smk", line 121:
Missing input files for rule experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all:
output: results/experiments/MS_mpra_exp_bbmap/statistic/statistic_assigned_counts_merged_MS_mpra_assign_bbmap_default.tsv
wildcards: project=MS_mpra_exp_bbmap, assignment=MS_mpra_assign_bbmap, config=default
affected files:
results/experiments/MS_mpra_exp_bbmap/statistic/assigned_counts/MS_mpra_assign_bbmap/default/combined/jurkat_0hr_merged_assigned_counts.statistic.tsv.gz
results/experiments/MS_mpra_exp_bbmap/statistic/assigned_counts/MS_mpra_assign_bbmap/default/combined/jurkat_12hr_merged_assigned_counts.statistic.tsv.gz
results/experiments/MS_mpra_exp_bbmap/statistic/assigned_counts/MS_mpra_assign_bbmap/default/combined/jurkat_48hr_merged_assigned_counts.statistic.tsv.gz
results/experiments/MS_mpra_exp_bbmap/statistic/assigned_counts/MS_mpra_assign_bbmap/default/combined/jurkat_24hr_merged_assigned_counts.statistic.tsv.gz
This error occurs because snakemake thinks that the file results/experiments/{{project}}/statistic/assigned_counts/{{assignment}}/{{config}}/combined/{condition}_merged_assigned_counts.statistic.tsv.gz will never be created. I go through a code trace below, that ends in the missing output of this file.
Code Trace
I attempted to code trace the workflow for this particular bug, working backward from experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all. The farthest back I got was getFinalCounts in MPRAsnakeflow/workflow/rules/common.smk. I made the code trace color coded here, to make it easier to follow. Files with the same color are the same files that are produced and then fed into the next function.
-
getFinalCounts()(workflow/rules/common.smk)- Output:
results/experiments/{project}/%s/{condition}_%s_%s_final_counts[.sampling|].{config}.tsv.gz
- Output:
-
experiment_counts_dna_rna_merge_counts(workflow/rules/experiment/counts.smk)- Input: calls
getFinalCounts()for DNA and RNA - Output:
results/experiments/{project}/{raw_or_assigned}/{condition}_{replicate}.merged.config.{config}.tsv.gz
- Input: calls
-
experiment_assigned_counts_dna_rna_merge(workflow/rules/experiment/assigned_counts.smk)- Input:
results/experiments/{project}/counts/{condition}_{replicate}.merged.config.{config}.tsv.gz - Output:
results/experiments/{project}/statistic/assigned_counts/{assignment}/{config}/{condition}_{replicate}_merged_assigned_counts.statistic.tsv.gz
- Input:
-
experiment_statistic_assigned_counts_combine_stats_dna_rna_merge(workflow/rules/experiment/statistic/assigned_counts.smk)- Input:
results/experiments/{project}/statistic/assigned_counts/{assignment}/{config}/{condition}_{replicate}_merged_assigned_counts.statistic.tsv.gz - Output:
results/experiments/{project}/statistic/assigned_counts/{assignment}/{config}/combined/{condition}_merged_assigned_counts.statistic.tsv.gz
- Input:
-
experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all(line 121)- MISSING INPUT:
results/experiments/{project}/statistic/assigned_counts/{assignment}/{config}/combined/{condition}_merged_assigned_counts.statistic.tsv.gz
- MISSING INPUT:
Attempted Fixes
-
Fixed wildcard inconsistency: The output from
experiment_counts_dna_rna_merge_countsuses{raw_or_assigned}while the input toexperiment_assigned_counts_dna_rna_mergehardcodescounts. I replaced all instances of{raw_or_assigned}withcountsinexperiment_counts_dna_rna_merge_counts. Issue persists. -
Commented out statistics targets: I commented out all instances of
statistic_assigned_counts_mergedin the main Snakefile (lines 160, 309-313, 400-404) to prevent the rule from being triggered as a default target. Issue persists. -
Cleared Snakemake cache: Removed
.snakemake/directory and partial output files. Issue persists.
Questions
I am at a loss on how to debug this. I am not familiar with the snakemake workflow. I ran the ENCODE and other example data through this pipeline, and did not get this bug.
Any guidance on debugging this would be greatly appreciated. Thank you!
Grace
Relevant Files
Config:
version: "0.5"
experiments:
MS_mpra_exp_bbmap:
bc_length: 20
data_folder: /home/go274/palmer_scratch/practice/MS_mpra_exp_data
experiment_file: /home/go274/palmer_scratch/practice/MPRAsnakeflow_practice/MS_mpra/experiment.csv
demultiplex: false
assignments:
MS_mpra_assign_bbmap:
type: config
assignment_name: MS_mpra_assign_bbmap
assignment_config: default_config
design_file: /home/go274/palmer_scratch/practice/MPRAsnakeflow_practice/MS_mpra/rev_seq_w_adapter.fasta
label_file: /home/go274/palmer_scratch/practice/MPRAsnakeflow_practice/MS_mpra/labels.tsv.gz
configs:
default:
filter:
bc_threshold: 1
min_dna_counts: 1
min_rna_counts: 1
outlier_detection:
methods: none
mad_bins: 20
times_mad: 5
times_zscore: 3
DNA:
min_counts: 1
RNA:
min_counts: 1
experiment.csv:
Condition,Replicate,DNA_BC_F,RNA_BC_F
jurkat_0hr,1,ms_plasmid_rep1.fastq.gz,ms_Jurkat_MS_rep1_0hr.fastq.gz
jurkat_0hr,2,ms_plasmid_rep2.fastq.gz,ms_Jurkat_MS_rep2_0hr.fastq.gz
jurkat_0hr,3,ms_plasmid_rep3.fastq.gz,ms_Jurkat_MS_rep3_0hr.fastq.gz
jurkat_0hr,4,ms_plasmid_rep4.fastq.gz,ms_Jurkat_MS_rep4_0hr.fastq.gz
jurkat_0hr,5,ms_plasmid_rep5.fastq.gz,ms_Jurkat_MS_rep5_0hr.fastq.gz
jurkat_12hr,1,ms_plasmid_rep1.fastq.gz,ms_Jurkat_MS_rep1_12hr.fastq.gz
jurkat_12hr,2,ms_plasmid_rep2.fastq.gz,ms_Jurkat_MS_rep2_12hr.fastq.gz
jurkat_12hr,3,ms_plasmid_rep3.fastq.gz,ms_Jurkat_MS_rep3_12hr.fastq.gz
jurkat_12hr,4,ms_plasmid_rep4.fastq.gz,ms_Jurkat_MS_rep4_12hr.fastq.gz
jurkat_12hr,5,ms_plasmid_rep5.fastq.gz,ms_Jurkat_MS_rep5_12hr.fastq.gz
jurkat_24hr,1,ms_plasmid_rep1.fastq.gz,ms_Jurkat_MS_rep1_24hr.fastq.gz
jurkat_24hr,2,ms_plasmid_rep2.fastq.gz,ms_Jurkat_MS_rep2_24hr.fastq.gz
jurkat_24hr,3,ms_plasmid_rep3.fastq.gz,ms_Jurkat_MS_rep3_24hr.fastq.gz
jurkat_24hr,4,ms_plasmid_rep4.fastq.gz,ms_Jurkat_MS_rep4_24hr.fastq.gz
jurkat_24hr,5,ms_plasmid_rep5.fastq.gz,ms_Jurkat_MS_rep5_24hr.fastq.gz
jurkat_48hr,1,ms_plasmid_rep1.fastq.gz,ms_Jurkat_MS_rep1_48hr.fastq.gz
jurkat_48hr,2,ms_plasmid_rep2.fastq.gz,ms_Jurkat_MS_rep2_48hr.fastq.gz
jurkat_48hr,3,ms_plasmid_rep3.fastq.gz,ms_Jurkat_MS_rep3_48hr.fastq.gz
jurkat_48hr,4,ms_plasmid_rep4.fastq.gz,ms_Jurkat_MS_rep4_48hr.fastq.gz
jurkat_48hr,5,ms_plasmid_rep5.fastq.gz,ms_Jurkat_MS_rep5_48hr.fastq.gz
K562,1,ms_plasmid_rep1.fastq.gz,ms_k562_rep1.fastq.gz
K562,2,ms_plasmid_rep2.fastq.gz,ms_k562_rep2.fastq.gz
K562,3,ms_plasmid_rep3.fastq.gz,ms_k562_rep3.fastq.gz
K562,4,ms_plasmid_rep4.fastq.gz,ms_k562_rep4.fastq.gz
K562,5,ms_plasmid_rep5.fastq.gz,ms_k562_rep5.fastq.gz