-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
In the 1000 Genomes dataset hosted here, all files follow the same pattern except (annoyingly) for the chrX VCF and its index file (which end with ".v2.vcf.gz" instead of just ".vcf.gz" like the other files).
Atm, my solution is to download those files into the data/vcf/phased folder (and rename them to be consistent with the other VCFs) before running the ProHap snakemake pipeline.
wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz \
-O $HOME/projects/ProHap/data/vcf/phased/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.vcf.gz \
&& gunzip $HOME/projects/ProHap/data/vcf/phased/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.vcf.gz
But would it be possible to add a bit more flexibility to the pipeline by allowing for wildcards?
For example:
phased_FTP_URL: "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/"
phased_local_path: ""
phased_vcf_file_name: "1kGP_high_coverage_Illumina.chr{chr}.filtered.SNV_INDEL_SV_phased_panel.vcf"With the wildcard added:
phased_vcf_file_name: "1kGP_high_coverage_Illumina.chr{chr}.filtered.SNV_INDEL_SV_phased_panel*.vcf"Not sure how hard this would be, but for cases like this it could be quite helpful
Many thanks again!,
Brian
Metadata
Metadata
Assignees
Labels
No labels