-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi MSMuTect4 team,
I'm encountering a contig parsing issue while running MSMuTect4. I generated the loci_file using PHOBOS with the hg19.fa reference in one-per-line format. The file includes only standard chromosomes (1–22, X, Y, M), and I verified there are no malformed entries.
To trace the issue, I slightly modified the source code (src/GenomicUtils/ReadsFetcher.pyx) and recompiled it. Specifically, I updated the reset_iterator method to add debug logging around the fetch() call:
def reset_iterator(self, chromosome: str, start: int):
# changes chromosome and gets new iterator using .bai index file
self.chromosome = chromosome
# self.reads_iterator: IteratorRowRegion = self.BAM_handle.fetch(f"{self.chromosome_prefix}{chromosome}",
# start=start, multiple_iterators=False)
print(f"[DEBUG] Fetching: contig='{self.chromosome_prefix}{chromosome}', start={start}")
try:
self.reads_iterator: IteratorRowRegion = self.BAM_handle.fetch(f"{self.chromosome_prefix}{chromosome}",
start=start, multiple_iterators=False)
except ValueError as e:
print(f"[ERROR] Invalid contig fetch attempted: '{self.chromosome_prefix}{chromosome}' at start={start}")
raise e
self.last_unmapped_read = self.get_next_mapped_read()
self.last_extracted_reads = []Here is the command I used to run MSMuTect4:
/pmglocal/sc5006/HM/MSMuTect_4/msmutect.sh \
-T '/manitou/pmg/users/sc5006/data/HM/bam_new_new/IRCR_BT16_930_T02_SS.recal.bam' \
-N '/manitou/pmg/users/sc5006/data/HM/bam_new_new/IRCR_BT16_930_B_SS.recal.bam' \
-l /pmglocal/sc5006/HM/MSMuTect_4/loci_fin.tsv \
-O test_A -c 8 -m -A -H -f > log.txt 2>&1In the generated log.txt, I observed this error:
[DEBUG] Fetching: contig='chr', start=198
[ERROR] Invalid contig fetch attempted: 'chr' at start=198
[DEBUG] Fetching: contig='chrX', start=136688007
....
"""
Traceback (most recent call last):
File "/manitou/pmg/users/sc5006/conda/envs/msmutect4/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "src/Entry/PairFileBatches.pyx", line 57, in src.Entry.PairFileBatches.partial_full_pair
normal_alleles = get_alleles(locus, normal_fetcher, flanking, noise_table, required_reads, integer_indels_only)
File "src/Entry/PairFileBatches.pyx", line 43, in src.Entry.PairFileBatches.get_alleles
reads = reads_fetcher.get_reads(locus.chromosome, locus.start - flanking, locus.end + flanking)
File "src/GenomicUtils/ReadsFetcher.pyx", line 107, in src.GenomicUtils.ReadsFetcher.ReadsFetcher.get_reads
self.reset_iterator(chromosome, start)
File "src/GenomicUtils/ReadsFetcher.pyx", line 46, in src.GenomicUtils.ReadsFetcher.ReadsFetcher.reset_iterator
raise e
File "src/GenomicUtils/ReadsFetcher.pyx", line 42, in src.GenomicUtils.ReadsFetcher.ReadsFetcher.reset_iterator
self.reads_iterator: IteratorRowRegion = self.BAM_handle.fetch(f"{self.chromosome_prefix}{chromosome}",
File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig `chr`
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/main.py", line 53, in <module>
run_msmutect(arguments)
File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/main.py", line 40, in run_msmutect
mut_file = run_full_pair(args.normal_file, args.tumor_file, args.loci_file, args.batch_start-1, batch_end,
File "src/Entry/PairFileBatches.pyx", line 33, in src.Entry.PairFileBatches.run_full_pair
results: List[FileBackedQueue] = BatchUtil.run_batch(partial_full_pair, [normal, tumor, flanking, noise_table, required_reads, integer_indels_only], loci_iterator,
File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/BatchUtil.py", line 89, in run_batch
return extract_results(results)
File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/BatchUtil.py", line 28, in extract_results
combined = [result.get() for result in results]
File "/pmglocal/sc5006/HM/MSMuTect_4/src/Entry/BatchUtil.py", line 28, in <listcomp>
combined = [result.get() for result in results]
File "/manitou/pmg/users/sc5006/conda/envs/msmutect4/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: invalid contig `chr`However, the corresponding line for [ERROR] Invalid contig fetch attempted: 'chr' at start=198 in the loci_file appears correct:
1 2 100 3297757 3297764 8 4 6 0 0 0 0 AC CACACACAI also confirmed via shell and R that all chromosome entries in the loci file are valid (1, 2, ..., 22, X, Y, M) and no lines have "chr" or missing chromosome values
Could this be an issue with how the locus object is being parsed or passed within MSMuTect4? I'd appreciate any suggestions on how to further troubleshoot or resolve this.
Thank you for your help!
— Seung-won