Tiny tool for Hi-TOM genotyping data orgnization

Introduction

Though Hi-TOM do offer analysis on genotype based on sequence information, it provide little information we need. This project is a tiny tool to extract genotype, specific mutation, and organise these data.

It determines the genotype with a min_threshold and a compare_index. Signal <= min_threshold is abandent, and the sample without signal left won't be shown; samples with more than 2 signals after filtering are marked as "error"; samples with only one signal are marked as correspondent type; samples with 2 signals would compute the ratio of 2 figures, that within compare_index are marked as heterozygote, the rest marked as heterozygote with a "?".

graph TD
    A[Genotype Determination Start] --> B[Filter Signals by min_threshold]
    
    B --> C{Any signals > min_threshold?}
    C -->|No| D[No Output<br/>Sample Skipped]
    C -->|Yes| E[Normalize Remaining Signals]
    
    E --> F[Count Normalized Signals]
    
    F -->|Signals > 2| G[Mark as: Error<br/>Too Many Signals]
    F -->|Signals = 2| H[Calculate Ratio of Two Signals]
    F -->|Signals = 1| I[Mark as: Homozygous]
    
    H --> J{Ratio within<br/>compare_index range?}
    J -->|Yes| K[Mark as: Heterozygous]
    J -->|No| L[Mark as: Homozygous with ?<br/>Needs Review]
    
    G --> M[Output Result]
    I --> M
    K --> M
    L --> M
    
    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#ffccbc
    style D fill:#f5f5f5
    style E fill:#fff3e0
    style G fill:#ffebee
    style I fill:#e8f5e8
    style K fill:#e8f5e8
    style L fill:#fff3e0
    style M fill:#f3e5f5

To start with

After you upload your data to Hi-TOM website, you are likely to get results like this:

Each of them contain sequence and genotype information.

Put the *Sequence.xls file in a single folder. That's all preparation you need for these data.

Pipeline

File structure:

C:.
│  .gitignore
│  filetree.txt
│  LICENSE
│  README.md # instruction
│  
└─genotyping_result_analysis
        .Rhistory
        genotyping_result_analysis.R # core function, single version
        genotyping_result_analysis.Rproj
        genptype_marking_func.R # core function, function version
        main.R # you only need to run this
        summarize_data_in_folder_func.R # read data

Clone the code from github.

git clone https://github.com/CharlesV555/Hi-TOM-table-orgnization.git

Enter the main.R by R.

[!Note] R requirement R version 4.5.1 (2025-06-13 ucrt) -- "Great Square Root" tidyverse-2.0.0 openxlsx-4.2.8

follow the instruction in it.

注意ATTENTION

Tiny tool for Hi-TOM genotyping data orgnization-1770018191459

文件名中的CAF1X非常重要，示例中不同CAF基因数据能够分列展示就是依靠对文件名的正则识别进行的。如果你需要测试自己的基因命名，最后表格中名字分类只会有一类“CAF1_unknown”。可以修改genotype_marking相关的代码实现你的分类。 It deserves your attention that the CAF1X part in names of original file is SUPER IMPORTANT. The division of columes in example is based on recognition of such part in file name using orthognal expressions. If you test your own data with different names, all these data would be displayed in only one colume named "CAF1_unknown". You may modify the part in genotype_marking to achieve proper sorting.

Example

original data:

result:

yeah you still need to do a little manupilation.

That's all for this tiny tool. Goog luck !

Principle(in case of debugging)

2 logics: reorgnise the data frame & judge the genotype based on signal information.

data frame

Apated from Liu et al., Science China Life Sciences, 2019. The information extraction is based on this layout. If bug rise in the future, it may due to the change of layout.

genotype judgement

Given the situation in Zhai-lab, I label samples with signals more than 2 as error.

Reference

Liu, Q., Wang, C., Jiao, X., Zhang, H., Song, L., Li, Y., Gao, C., & Wang, K. (2019). Hi-TOM: A platform for high-throughput tracking of mutations induced by CRISPR/Cas systems. Science China Life Sciences, 62(1), 1～7. https://doi.org/10.1007/s11427-018-9402-9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny tool for Hi-TOM genotyping data orgnization

Introduction

To start with

Pipeline

注意ATTENTION

Example

Principle(in case of debugging)

data frame

genotype judgement

Reference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
genotyping_result_analysis		genotyping_result_analysis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
filetree.txt		filetree.txt

License

CharlesV555/Hi-TOM-table-orgnization

Folders and files

Latest commit

History

Repository files navigation

Tiny tool for Hi-TOM genotyping data orgnization

Introduction

To start with

Pipeline

注意ATTENTION

Example

Principle(in case of debugging)

data frame

genotype judgement

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages