MissingInputException in rule experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all

Hello,
I'm encountering a `MissingInputException` when running the experiment pipeline. The error occurs during DAG building, before any jobs execute. This appears to be a workflow structure issue, but I've been unable to identify the root cause despite extensive debugging.

### Error
Here is my command and error:
```
$ snakemake --sdm conda --configfile config_MS_MPRA_combined.yaml -c 30 -j 35  --workflow-profile profiles/MS_mpra/ --executor slurm --resources mem_mb=60000 --rerun-incomplete --keep-going -n --quiet rules
Using workflow specific profile /home/go274/project/MPRAsnakeflow/profiles/MS_mpra/ for setting default command line arguments.
Running MPRAsnakeflow version 0.5.4
host: r814u03n01.mccleary.ycrc.yale.edu
Building DAG of jobs...
MissingInputException in rule experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all in file "/home/go274/project/MPRAsnakeflow/workflow/rules/experiment/statistic/assigned_counts.smk", line 121:
Missing input files for rule experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all:
    output: results/experiments/MS_mpra_exp_bbmap/statistic/statistic_assigned_counts_merged_MS_mpra_assign_bbmap_default.tsv
    wildcards: project=MS_mpra_exp_bbmap, assignment=MS_mpra_assign_bbmap, config=default
    affected files:
        results/experiments/MS_mpra_exp_bbmap/statistic/assigned_counts/MS_mpra_assign_bbmap/default/combined/jurkat_0hr_merged_assigned_counts.statistic.tsv.gz

results/experiments/MS_mpra_exp_bbmap/statistic/assigned_counts/MS_mpra_assign_bbmap/default/combined/jurkat_12hr_merged_assigned_counts.statistic.tsv.gz
        results/experiments/MS_mpra_exp_bbmap/statistic/assigned_counts/MS_mpra_assign_bbmap/default/combined/jurkat_48hr_merged_assigned_counts.statistic.tsv.gz
        results/experiments/MS_mpra_exp_bbmap/statistic/assigned_counts/MS_mpra_assign_bbmap/default/combined/jurkat_24hr_merged_assigned_counts.statistic.tsv.gz
```
This error occurs because snakemake thinks that the file `results/experiments/{{project}}/statistic/assigned_counts/{{assignment}}/{{config}}/combined/{condition}_merged_assigned_counts.statistic.tsv.gz` will never be created. I go through a code trace below, that ends in the missing output of this file.

### Code Trace
I attempted to code trace the workflow for this particular bug, working backward from `experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all`. The farthest back I got was `getFinalCounts` in `MPRAsnakeflow/workflow/rules/common.smk`. I made the code trace color coded [here](https://github.com/user-attachments/assets/dc2aecb7-6024-41a6-a277-48aab177f66e), to make it easier to follow. Files with the same color are the same files that are produced and then fed into the next function.
1. **`getFinalCounts()`** (`workflow/rules/common.smk`)
   - Output: `results/experiments/{project}/%s/{condition}_%s_%s_final_counts[.sampling|].{config}.tsv.gz`

2. **`experiment_counts_dna_rna_merge_counts`** (`workflow/rules/experiment/counts.smk`)
   - Input: calls `getFinalCounts()` for DNA and RNA
   - Output: `results/experiments/{project}/{raw_or_assigned}/{condition}_{replicate}.merged.config.{config}.tsv.gz`

3. **`experiment_assigned_counts_dna_rna_merge`** (`workflow/rules/experiment/assigned_counts.smk`)
   - Input: `results/experiments/{project}/counts/{condition}_{replicate}.merged.config.{config}.tsv.gz`
   - Output: `results/experiments/{project}/statistic/assigned_counts/{assignment}/{config}/{condition}_{replicate}_merged_assigned_counts.statistic.tsv.gz`

4. **`experiment_statistic_assigned_counts_combine_stats_dna_rna_merge`** (`workflow/rules/experiment/statistic/assigned_counts.smk`)
   - Input: `results/experiments/{project}/statistic/assigned_counts/{assignment}/{config}/{condition}_{replicate}_merged_assigned_counts.statistic.tsv.gz`
   - Output: `results/experiments/{project}/statistic/assigned_counts/{assignment}/{config}/combined/{condition}_merged_assigned_counts.statistic.tsv.gz`

5. **`experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all`** (line 121)
   - **MISSING INPUT**: `results/experiments/{project}/statistic/assigned_counts/{assignment}/{config}/combined/{condition}_merged_assigned_counts.statistic.tsv.gz`

### Attempted Fixes
1. **Fixed wildcard inconsistency**: The output from `experiment_counts_dna_rna_merge_counts` uses `{raw_or_assigned}` while the input to `experiment_assigned_counts_dna_rna_merge` hardcodes `counts`. I replaced all instances of `{raw_or_assigned}` with `counts` in `experiment_counts_dna_rna_merge_counts`. Issue persists.

2. **Commented out statistics targets**: I commented out all instances of `statistic_assigned_counts_merged` in the main Snakefile (lines 160, 309-313, 400-404) to prevent the rule from being triggered as a default target. Issue persists.

3. **Cleared Snakemake cache**: Removed `.snakemake/` directory and partial output files. Issue persists.

### Questions

I am at a loss on how to debug this. I am not familiar with the snakemake workflow. I ran the ENCODE and other example data through this pipeline, and did not get this bug. 
Any guidance on debugging this would be greatly appreciated. Thank you!
Grace

### Relevant Files
Config:
```
version: "0.5"
experiments:
  MS_mpra_exp_bbmap:
    bc_length: 20
    data_folder: /home/go274/palmer_scratch/practice/MS_mpra_exp_data
    experiment_file: /home/go274/palmer_scratch/practice/MPRAsnakeflow_practice/MS_mpra/experiment.csv
    demultiplex: false
    assignments:
      MS_mpra_assign_bbmap:
        type: config
        assignment_name: MS_mpra_assign_bbmap
        assignment_config: default_config
    design_file: /home/go274/palmer_scratch/practice/MPRAsnakeflow_practice/MS_mpra/rev_seq_w_adapter.fasta
    label_file: /home/go274/palmer_scratch/practice/MPRAsnakeflow_practice/MS_mpra/labels.tsv.gz
    configs:
      default:
        filter:
            bc_threshold: 1
            min_dna_counts: 1
            min_rna_counts: 1
            outlier_detection:
              methods: none
              mad_bins: 20
              times_mad: 5
              times_zscore: 3
            DNA:
              min_counts: 1
            RNA:
              min_counts: 1
```
experiment.csv:
```
Condition,Replicate,DNA_BC_F,RNA_BC_F
jurkat_0hr,1,ms_plasmid_rep1.fastq.gz,ms_Jurkat_MS_rep1_0hr.fastq.gz
jurkat_0hr,2,ms_plasmid_rep2.fastq.gz,ms_Jurkat_MS_rep2_0hr.fastq.gz
jurkat_0hr,3,ms_plasmid_rep3.fastq.gz,ms_Jurkat_MS_rep3_0hr.fastq.gz
jurkat_0hr,4,ms_plasmid_rep4.fastq.gz,ms_Jurkat_MS_rep4_0hr.fastq.gz
jurkat_0hr,5,ms_plasmid_rep5.fastq.gz,ms_Jurkat_MS_rep5_0hr.fastq.gz
jurkat_12hr,1,ms_plasmid_rep1.fastq.gz,ms_Jurkat_MS_rep1_12hr.fastq.gz
jurkat_12hr,2,ms_plasmid_rep2.fastq.gz,ms_Jurkat_MS_rep2_12hr.fastq.gz
jurkat_12hr,3,ms_plasmid_rep3.fastq.gz,ms_Jurkat_MS_rep3_12hr.fastq.gz
jurkat_12hr,4,ms_plasmid_rep4.fastq.gz,ms_Jurkat_MS_rep4_12hr.fastq.gz
jurkat_12hr,5,ms_plasmid_rep5.fastq.gz,ms_Jurkat_MS_rep5_12hr.fastq.gz
jurkat_24hr,1,ms_plasmid_rep1.fastq.gz,ms_Jurkat_MS_rep1_24hr.fastq.gz
jurkat_24hr,2,ms_plasmid_rep2.fastq.gz,ms_Jurkat_MS_rep2_24hr.fastq.gz
jurkat_24hr,3,ms_plasmid_rep3.fastq.gz,ms_Jurkat_MS_rep3_24hr.fastq.gz
jurkat_24hr,4,ms_plasmid_rep4.fastq.gz,ms_Jurkat_MS_rep4_24hr.fastq.gz
jurkat_24hr,5,ms_plasmid_rep5.fastq.gz,ms_Jurkat_MS_rep5_24hr.fastq.gz
jurkat_48hr,1,ms_plasmid_rep1.fastq.gz,ms_Jurkat_MS_rep1_48hr.fastq.gz
jurkat_48hr,2,ms_plasmid_rep2.fastq.gz,ms_Jurkat_MS_rep2_48hr.fastq.gz
jurkat_48hr,3,ms_plasmid_rep3.fastq.gz,ms_Jurkat_MS_rep3_48hr.fastq.gz
jurkat_48hr,4,ms_plasmid_rep4.fastq.gz,ms_Jurkat_MS_rep4_48hr.fastq.gz
jurkat_48hr,5,ms_plasmid_rep5.fastq.gz,ms_Jurkat_MS_rep5_48hr.fastq.gz
K562,1,ms_plasmid_rep1.fastq.gz,ms_k562_rep1.fastq.gz
K562,2,ms_plasmid_rep2.fastq.gz,ms_k562_rep2.fastq.gz
K562,3,ms_plasmid_rep3.fastq.gz,ms_k562_rep3.fastq.gz
K562,4,ms_plasmid_rep4.fastq.gz,ms_k562_rep4.fastq.gz
K562,5,ms_plasmid_rep5.fastq.gz,ms_k562_rep5.fastq.gz
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MissingInputException in rule experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all #223

Error

Code Trace

Attempted Fixes

Questions

Relevant Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MissingInputException in rule experiment_statistic_assigned_counts_combine_stats_dna_rna_merge_all #223

Description

Error

Code Trace

Attempted Fixes

Questions

Relevant Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions