-
Notifications
You must be signed in to change notification settings - Fork 25
Closed
Description
Hi Bambu team,
Thank you for developing such a powerful and elegant tool for context-aware transcript quantification.
I have two questions regarding the output behavior when running Bambu in multi-sample mode:
1. CPM values do not sum to 1,000,000 per sample
I observed that in the CPM_transcript.txt file, the sum of CPM values for each sample is not 1,000,000, but rather around 800,000 to 900,000.
- Is this expected behavior?
- If so, what types of reads or transcripts are excluded from the CPM computation, causing the total to fall below 1 million?
2. Gene-level total counts > Transcript-level total counts
When I compare the sum of raw counts:
- From counts_gene.txt, the total counts per sample are about 32 million.
- From counts_transcript.txt, the totals are around 28 million.
This seems counterintuitive, since one would expect transcript-level counts to be equal to or exceed gene-level counts (due to gene = sum of its transcripts).
- Could you explain why the transcript-level sum is lower?
- Does this have to do with multi-mapping reads, transcript filtering, or EM assignment behavior?
I would appreciate any clarification on this!
Thanks again for your work on Bambu.
Metadata
Metadata
Assignees
Labels
No labels