Skip to content

❗ Issue: Invalid contig 'chr' fetch error despite valid loci file #7

@agnes-csw

Description

@agnes-csw

Hi MSMuTect4 team,

I'm encountering a contig parsing issue while running MSMuTect4. I generated the loci_file using PHOBOS with the hg19.fa reference in one-per-line format. The file includes only standard chromosomes (1–22, X, Y, M), and I verified there are no malformed entries.

To trace the issue, I slightly modified the source code (src/GenomicUtils/ReadsFetcher.pyx) and recompiled it. Specifically, I updated the reset_iterator method to add debug logging around the fetch() call:

  def reset_iterator(self, chromosome: str, start: int):
     # changes chromosome and gets new iterator using .bai index file
     self.chromosome = chromosome
     # self.reads_iterator: IteratorRowRegion = self.BAM_handle.fetch(f"{self.chromosome_prefix}{chromosome}",
     #                                                                 start=start, multiple_iterators=False)
     print(f"[DEBUG] Fetching: contig='{self.chromosome_prefix}{chromosome}', start={start}")
     try:
         self.reads_iterator: IteratorRowRegion = self.BAM_handle.fetch(f"{self.chromosome_prefix}{chromosome}",
                                                                        start=start, multiple_iterators=False)
     except ValueError as e:
         print(f"[ERROR] Invalid contig fetch attempted: '{self.chromosome_prefix}{chromosome}' at start={start}")
         raise e

     self.last_unmapped_read = self.get_next_mapped_read()
     self.last_extracted_reads = []

Here is the command I used to run MSMuTect4:

/pmglocal/sc5006/HM/MSMuTect_4/msmutect.sh \
  -T '/manitou/pmg/users/sc5006/data/HM/bam_new_new/IRCR_BT16_930_T02_SS.recal.bam' \
  -N '/manitou/pmg/users/sc5006/data/HM/bam_new_new/IRCR_BT16_930_B_SS.recal.bam' \
  -l /pmglocal/sc5006/HM/MSMuTect_4/loci_fin.tsv \
  -O test_A -c 8 -m -A -H -f > log.txt 2>&1

In the generated log.txt, I observed this error:

[DEBUG] Fetching: contig='chr', start=198
[ERROR] Invalid contig fetch attempted: 'chr' at start=198
[DEBUG] Fetching: contig='chrX', start=136688007
....
"""
Traceback (most recent call last):
  File "/manitou/pmg/users/sc5006/conda/envs/msmutect4/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "src/Entry/PairFileBatches.pyx", line 57, in src.Entry.PairFileBatches.partial_full_pair
    normal_alleles = get_alleles(locus, normal_fetcher, flanking, noise_table, required_reads, integer_indels_only)
  File "src/Entry/PairFileBatches.pyx", line 43, in src.Entry.PairFileBatches.get_alleles
    reads = reads_fetcher.get_reads(locus.chromosome, locus.start - flanking, locus.end + flanking)
  File "src/GenomicUtils/ReadsFetcher.pyx", line 107, in src.GenomicUtils.ReadsFetcher.ReadsFetcher.get_reads
    self.reset_iterator(chromosome, start)
  File "src/GenomicUtils/ReadsFetcher.pyx", line 46, in src.GenomicUtils.ReadsFetcher.ReadsFetcher.reset_iterator
    raise e
  File "src/GenomicUtils/ReadsFetcher.pyx", line 42, in src.GenomicUtils.ReadsFetcher.ReadsFetcher.reset_iterator
    self.reads_iterator: IteratorRowRegion = self.BAM_handle.fetch(f"{self.chromosome_prefix}{chromosome}",
  File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig `chr`
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/main.py", line 53, in <module>
    run_msmutect(arguments)
  File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/main.py", line 40, in run_msmutect
    mut_file = run_full_pair(args.normal_file, args.tumor_file, args.loci_file, args.batch_start-1, batch_end,
  File "src/Entry/PairFileBatches.pyx", line 33, in src.Entry.PairFileBatches.run_full_pair
    results: List[FileBackedQueue] = BatchUtil.run_batch(partial_full_pair, [normal, tumor, flanking, noise_table, required_reads, integer_indels_only], loci_iterator,
  File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/BatchUtil.py", line 89, in run_batch
    return extract_results(results)
  File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/BatchUtil.py", line 28, in extract_results
    combined = [result.get() for result in results]
  File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/BatchUtil.py", line 28, in <listcomp>
    combined = [result.get() for result in results]
  File "/manitou/pmg/users/sc5006/conda/envs/msmutect4/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: invalid contig `chr`

However, the corresponding line for [ERROR] Invalid contig fetch attempted: 'chr' at start=198 in the loci_file appears correct:

1       2       100     3297757     3297764     8   4   6   0   0   0   0   AC   CACACACA

I also confirmed via shell and R that all chromosome entries in the loci file are valid (1, 2, ..., 22, X, Y, M) and no lines have "chr" or missing chromosome values

Could this be an issue with how the locus object is being parsed or passed within MSMuTect4? I'd appreciate any suggestions on how to further troubleshoot or resolve this.

Thank you for your help!

— Seung-won

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions