Though Hi-TOM do offer analysis on genotype based on sequence information, it provide little information we need. This project is a tiny tool to extract genotype, specific mutation, and organise these data.
It determines the genotype with a min_threshold and a compare_index.
Signal <= min_threshold is abandent, and the sample without signal left won't be shown;
samples with more than 2 signals after filtering are marked as "error";
samples with only one signal are marked as correspondent type;
samples with 2 signals would compute the ratio of 2 figures, that within compare_index are marked as heterozygote, the rest marked as heterozygote with a "?".
graph TD
A[Genotype Determination Start] --> B[Filter Signals by min_threshold]
B --> C{Any signals > min_threshold?}
C -->|No| D[No Output<br/>Sample Skipped]
C -->|Yes| E[Normalize Remaining Signals]
E --> F[Count Normalized Signals]
F -->|Signals > 2| G[Mark as: Error<br/>Too Many Signals]
F -->|Signals = 2| H[Calculate Ratio of Two Signals]
F -->|Signals = 1| I[Mark as: Homozygous]
H --> J{Ratio within<br/>compare_index range?}
J -->|Yes| K[Mark as: Heterozygous]
J -->|No| L[Mark as: Homozygous with ?<br/>Needs Review]
G --> M[Output Result]
I --> M
K --> M
L --> M
style A fill:#e1f5fe
style B fill:#fff3e0
style C fill:#ffccbc
style D fill:#f5f5f5
style E fill:#fff3e0
style G fill:#ffebee
style I fill:#e8f5e8
style K fill:#e8f5e8
style L fill:#fff3e0
style M fill:#f3e5f5
After you upload your data to Hi-TOM website, you are likely to get results like this:

Each of them contain sequence and genotype information.

Put the *Sequence.xls file in a single folder. That's all preparation you need for these data.

File structure:
C:.
│ .gitignore
│ filetree.txt
│ LICENSE
│ README.md # instruction
│
└─genotyping_result_analysis
.Rhistory
genotyping_result_analysis.R # core function, single version
genotyping_result_analysis.Rproj
genptype_marking_func.R # core function, function version
main.R # you only need to run this
summarize_data_in_folder_func.R # read data
Clone the code from github.
git clone https://github.com/CharlesV555/Hi-TOM-table-orgnization.git[!Note] R requirement R version 4.5.1 (2025-06-13 ucrt) -- "Great Square Root" tidyverse-2.0.0 openxlsx-4.2.8
文件名中的CAF1X非常重要,示例中不同CAF基因数据能够分列展示就是依靠对文件名的正则识别进行的。如果你需要测试自己的基因命名,最后表格中名字分类只会有一类“CAF1_unknown”。可以修改genotype_marking相关的代码实现你的分类。
It deserves your attention that the CAF1X part in names of original file is SUPER IMPORTANT. The division of columes in example is based on recognition of such part in file name using orthognal expressions. If you test your own data with different names, all these data would be displayed in only one colume named "CAF1_unknown". You may modify the part in genotype_marking to achieve proper sorting.

yeah you still need to do a little manupilation.
That's all for this tiny tool. Goog luck !
2 logics: reorgnise the data frame & judge the genotype based on signal information.
Apated from Liu et al., Science China Life Sciences, 2019. The information extraction is based on this layout. If bug rise in the future, it may due to the change of layout.
Given the situation in Zhai-lab, I label samples with signals more than 2 as error.
Liu, Q., Wang, C., Jiao, X., Zhang, H., Song, L., Li, Y., Gao, C., & Wang, K. (2019). Hi-TOM: A platform for high-throughput tracking of mutations induced by CRISPR/Cas systems. Science China Life Sciences, 62(1), 1~7. https://doi.org/10.1007/s11427-018-9402-9



