Skip to content

gptogcg/germline_SNV_WES_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Germline SNV - WES Analysis

Germline SNV - WES analysis pipelines by Genetic Predisposition to Gastrointestinal Cancer Group.

Folder structure changed February 26th 2019 but most workflows not modified already

Two different models of inheritance are considered: autosomal dominant (heterozygosity) and recessive (homozygosity or compound heterozygosity). You can see an overview of the filtering criteria used in the workflow diagrams.

Different workflows have also been developed according to the structure of the input files. One file per sample is allowed, as well as one big file containing the information of the germline variants of all the samples of the cohort. In all cases, tab separated text files are required to run these pipelines (not vcf files).

In all cases, an annotation folder will be required. This folder is not directly provided in GitHub because of size issue, but it should contain the following folders and files:

  • DBs_annotation
    • filtering_terms_CRC.xls
    • HGNC_ProteinAtlas_cancer.csv
    • interactions
    • gene2go
    • HGNC_ProteinAtlas_normal_tissue.csv
    • OMIM_morbidmap.txt
    • generifs_basic
    • HGNC_ProteinAtlas_subcellular_location.csv
    • refSeqSummaryfilt_noalmohad.txt
    • HGNC_entrez.txt
    • HGNC_RefSeq.txt
    • uniprot-all.tab
    • HGNC_OMIM.txt
    • HGNC_synonyms.txt
  • gene_families_&_pathways
    • Functional_Families_con_genes_hereditarios.txt
    • Gene_Families_con_genes_hereditarios.txt
    • PATHWAYS_db.txt
  • genes_candidatos
    • genes_candidatos_23.09.2016.txt
    • Cancer_predisposing_genes_Nature_2014.txt
  • LD_regions
    • LIST_Regions.txt
  • pLI
    • fordist_cleaned_exac_r03_march16_z_pli_rec_null_data.txt

One file per sample

Regarding this methodology, HERNANDEZ project is the most recent workflow used. In this case two files will be required to run the pipeline:

  1. Directories file (directories.txt): Tabulated text file containing the absolute paths of the directories used for running the pipeline (input directory, output/results directory, workflow directory -the home directory by default- and annotation directory). It should contain 4 columns with the following mandatory headers: Input_dir, Output_dir, Workflow_dir and Annotation_dir
  2. Files to analyze file (file_to_analyze.txt): Tabulated tect file containing in two columns the names of the files and samples to analysze. It should contain 2 columns with the following mandatory headers: FILE and NAME.

About

Germline SNV - WES analysis pipeline by Genetic Predisposition to Gastrointestinal Cancer Group

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages