Skip to content

A script to help with genotype determination from the output of Hi-TOM website. It's just a tiny tool.

License

Notifications You must be signed in to change notification settings

CharlesV555/Hi-TOM-table-orgnization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tiny tool for Hi-TOM genotyping data orgnization

Introduction

Though Hi-TOM do offer analysis on genotype based on sequence information, it provide little information we need. This project is a tiny tool to extract genotype, specific mutation, and organise these data.

It determines the genotype with a min_threshold and a compare_index. Signal <= min_threshold is abandent, and the sample without signal left won't be shown; samples with more than 2 signals after filtering are marked as "error"; samples with only one signal are marked as correspondent type; samples with 2 signals would compute the ratio of 2 figures, that within compare_index are marked as heterozygote, the rest marked as heterozygote with a "?".

graph TD
    A[Genotype Determination Start] --> B[Filter Signals by min_threshold]
    
    B --> C{Any signals > min_threshold?}
    C -->|No| D[No Output<br/>Sample Skipped]
    C -->|Yes| E[Normalize Remaining Signals]
    
    E --> F[Count Normalized Signals]
    
    F -->|Signals > 2| G[Mark as: Error<br/>Too Many Signals]
    F -->|Signals = 2| H[Calculate Ratio of Two Signals]
    F -->|Signals = 1| I[Mark as: Homozygous]
    
    H --> J{Ratio within<br/>compare_index range?}
    J -->|Yes| K[Mark as: Heterozygous]
    J -->|No| L[Mark as: Homozygous with ?<br/>Needs Review]
    
    G --> M[Output Result]
    I --> M
    K --> M
    L --> M
    
    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#ffccbc
    style D fill:#f5f5f5
    style E fill:#fff3e0
    style G fill:#ffebee
    style I fill:#e8f5e8
    style K fill:#e8f5e8
    style L fill:#fff3e0
    style M fill:#f3e5f5
Loading

To start with

Git上传-1770002204600

After you upload your data to Hi-TOM website, you are likely to get results like this: Git上传-1770002747009

Each of them contain sequence and genotype information. Git上传-1770002788107

Put the *Sequence.xls file in a single folder. That's all preparation you need for these data. Git上传-1770003746786

Pipeline

File structure:

C:.
│  .gitignore
│  filetree.txt
│  LICENSE
│  README.md # instruction
│  
└─genotyping_result_analysis
        .Rhistory
        genotyping_result_analysis.R # core function, single version
        genotyping_result_analysis.Rproj
        genptype_marking_func.R # core function, function version
        main.R # you only need to run this
        summarize_data_in_folder_func.R # read data

Clone the code from github.

git clone https://github.com/CharlesV555/Hi-TOM-table-orgnization.git

Enter the main.R by R. Git上传-1770002701691

[!Note] R requirement R version 4.5.1 (2025-06-13 ucrt) -- "Great Square Root" tidyverse-2.0.0 openxlsx-4.2.8

follow the instruction in it. Git上传-1770003792356

注意ATTENTION

Tiny tool for Hi-TOM genotyping data orgnization-1770018191459

文件名中的CAF1X非常重要,示例中不同CAF基因数据能够分列展示就是依靠对文件名的正则识别进行的。如果你需要测试自己的基因命名,最后表格中名字分类只会有一类“CAF1_unknown”。可以修改genotype_marking相关的代码实现你的分类。 It deserves your attention that the CAF1X part in names of original file is SUPER IMPORTANT. The division of columes in example is based on recognition of such part in file name using orthognal expressions. If you test your own data with different names, all these data would be displayed in only one colume named "CAF1_unknown". You may modify the part in genotype_marking to achieve proper sorting. Tiny tool for Hi-TOM genotyping data orgnization-1770017828172

Example

original data: Git上传-1770003529722

result: Git上传-1770003558668

yeah you still need to do a little manupilation.

That's all for this tiny tool. Goog luck !


Principle(in case of debugging)

2 logics: reorgnise the data frame & judge the genotype based on signal information.

data frame

Git上传-1770003991584

Apated from Liu et al., Science China Life Sciences, 2019. The information extraction is based on this layout. If bug rise in the future, it may due to the change of layout.

genotype judgement

Git上传-1770004240386

Given the situation in Zhai-lab, I label samples with signals more than 2 as error.

Reference

Liu, Q., Wang, C., Jiao, X., Zhang, H., Song, L., Li, Y., Gao, C., & Wang, K. (2019). Hi-TOM: A platform for high-throughput tracking of mutations induced by CRISPR/Cas systems. Science China Life Sciences, 62(1), 1~7. https://doi.org/10.1007/s11427-018-9402-9

About

A script to help with genotype determination from the output of Hi-TOM website. It's just a tiny tool.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages