-
Notifications
You must be signed in to change notification settings - Fork 6
Enforce SampleInVCF only when genotypes evidence types #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
a7aa124
New semantic check for the presence of sampleInVCF
tcezard 37ae5fe
Fix bug and add tests
tcezard 6682b61
Ad integration tests
tcezard 239356c
Document the case of allele frequency with no SampleInVCF
tcezard c408c7a
Fix error in integration test
tcezard 3c618b2
First attempt at integration tests
tcezard fd7657f
Fix biovalidator and nextflow install
tcezard 2198770
Use newer vcf-validator
tcezard f845974
Apply suggestions from code review
tcezard 9090518
Update tests/test_native_validator_sample_in_vcf.py
tcezard d7be8e8
Use preexisting evidence types samples_checker.py
tcezard c7edfd4
Add missing evidence type file
tcezard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| name: Integration tests | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ main ] | ||
| pull_request: | ||
| branches: [ main] | ||
|
|
||
| env: | ||
| VCF_VALIDATOR_VERSION: "0.10.2" | ||
| NXF_VER: "23.10.0" | ||
|
|
||
| jobs: | ||
| integration-tests: | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python 3.11 | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.11" | ||
|
|
||
| - name: Install Java and Node | ||
| run: sudo apt update && sudo apt install -y default-jdk nodejs npm git curl | ||
|
|
||
| - name: Install vcf-validator and vcf-assembly-checker | ||
| run: | | ||
| curl -LJo /usr/local/bin/vcf_validator \ | ||
| https://github.com/EBIvariation/vcf-validator/releases/download/v${VCF_VALIDATOR_VERSION}/vcf_validator_linux | ||
| curl -LJo /usr/local/bin/vcf_assembly_checker \ | ||
| https://github.com/EBIvariation/vcf-validator/releases/download/v${VCF_VALIDATOR_VERSION}/vcf_assembly_checker_linux | ||
| chmod 755 /usr/local/bin/vcf_validator /usr/local/bin/vcf_assembly_checker | ||
|
|
||
| - name: Install biovalidator | ||
| run: | | ||
| git clone https://github.com/elixir-europe/biovalidator.git | ||
| cd biovalidator | ||
| npm install | ||
| sudo npm link | ||
|
|
||
| - name: Install Nextflow | ||
| run: | | ||
| curl -L "https://github.com/nextflow-io/nextflow/releases/download/v${NXF_VER}/nextflow-${NXF_VER}-all" | bash | ||
| sudo mv nextflow /usr/local/bin/ | ||
|
|
||
| - name: Install Python dependencies | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| pip install pytest | ||
| python -m pip install . | ||
|
|
||
| - name: Run integration tests | ||
| run: | | ||
| PYTHONPATH=. pytest tests -m integration |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| FROM --platform=linux/amd64 python:3.10 | ||
| FROM python:3.10 | ||
|
|
||
| ENV vcf_validator_version=0.10.2 | ||
| ENV NXF_VER=23.10.0 | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,18 +1,23 @@ | ||
| import argparse | ||
| import json | ||
|
|
||
| import yaml | ||
|
|
||
| from eva_sub_cli.semantic_metadata import SemanticMetadataChecker | ||
|
|
||
|
|
||
| def main(): | ||
| arg_parser = argparse.ArgumentParser(description='Perform semantic checks on the metadata') | ||
| arg_parser.add_argument('--metadata_json', required=True, dest='metadata_json', help='EVA metadata json file') | ||
| arg_parser.add_argument('--evidence_type_results', required=True, dest='evidence_type_results', help='Results of the evidence check') | ||
| arg_parser.add_argument('--output_yaml', required=True, dest='output_yaml', | ||
| help='Path to the location of the results') | ||
| args = arg_parser.parse_args() | ||
|
|
||
| with open(args.metadata_json) as open_json: | ||
| metadata = json.load(open_json) | ||
| checker = SemanticMetadataChecker(metadata) | ||
| checker.check_all() | ||
| checker.write_result_yaml(args.output_yaml) | ||
| with open(args.evidence_type_results) as open_yaml: | ||
| evidence_type_results = yaml.safe_load(open_yaml) | ||
| checker = SemanticMetadataChecker(metadata, evidence_type_results) | ||
| checker.check_all() | ||
| checker.write_result_yaml(args.output_yaml) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| VD1: | ||
| evidence_type: genotype | ||
| VD2: | ||
| evidence_type: genotype | ||
| VD3: | ||
| evidence_type: genotype |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| ##fileformat=VCFv4.1 | ||
| ##FILTER=<ID=PASS,Description="All filters passed"> | ||
| ##contig=<ID=1,length=249250621> | ||
| ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"> | ||
| ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total Allele Count"> | ||
| ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele Count"> | ||
| #CHROM POS ID REF ALT QUAL FILTER INFO | ||
| 1 10177 rs367896724 A AC 100 PASS AF=0.11;AN=2000;AC=220 | ||
| 1 10505 rs548419688 A T 100 PASS AF=0.09;AN=2000;AC=180 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| >fasta | ||
| AAA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| ##fileformat=VCFv4.1 | ||
| ##FILTER=<ID=PASS,Description="All filters passed"> | ||
| ##contig=<ID=1,assembly=b37,length=249250621> | ||
| ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
| #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3 | ||
| 1 10177 rs367896724 A AC 100 PASS . GT 1|0 0|1 0|0 | ||
| 1 10505 rs548419688 A T 100 PASS . GT 0|0 0|0 0|1 |
1 change: 1 addition & 0 deletions
1
tests/resources/sample_in_vcf_check/metadata_af_no_sample_in_vcf.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| {"project": {"title": "Test AF Project", "description": "Project with allele frequency VCF", "taxId": 9606, "centre": "Test Centre"}, "sample": [{"analysisAlias": ["AF1"], "bioSampleAccession": "SAME00001"}], "analysis": [{"analysisTitle": "AF Analysis", "analysisAlias": "AF1", "description": "Allele frequency analysis", "experimentType": "Whole genome sequencing", "referenceGenome": "GCA_000001405.27"}], "files": [{"analysisAlias": "AF1", "fileName": "allele_freq.vcf", "fileType": "vcf", "md5": "b8ab2c9d58e5f430ce70783d8d0a0b88", "fileSize": 458}]} |
1 change: 1 addition & 0 deletions
1
tests/resources/sample_in_vcf_check/metadata_af_with_sample_in_vcf.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| {"project": {"title": "Test AF Project", "description": "Project with allele frequency VCF", "taxId": 9606, "centre": "Test Centre"}, "sample": [{"analysisAlias": ["AF1"], "sampleInVCF": "sample1", "bioSampleAccession": "SAME00001"}], "analysis": [{"analysisTitle": "AF Analysis", "analysisAlias": "AF1", "description": "Allele frequency analysis", "experimentType": "Whole genome sequencing", "referenceGenome": "GCA_000001405.27"}], "files": [{"analysisAlias": "AF1", "fileName": "allele_freq.vcf", "fileType": "vcf", "md5": "b8ab2c9d58e5f430ce70783d8d0a0b88", "fileSize": 458}]} |
1 change: 1 addition & 0 deletions
1
tests/resources/sample_in_vcf_check/metadata_genotype_no_sample_in_vcf.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| {"project": {"title": "Test Genotype Project", "description": "Project with genotype VCF", "taxId": 9606, "centre": "Test Centre"}, "sample": [{"analysisAlias": ["GT1"], "bioSampleAccession": "SAME00001"}, {"analysisAlias": ["GT1"], "bioSampleAccession": "SAME00002"}, {"analysisAlias": ["GT1"], "bioSampleAccession": "SAME00003"}], "analysis": [{"analysisTitle": "Genotype Analysis", "analysisAlias": "GT1", "description": "Genotype analysis", "experimentType": "Whole genome sequencing", "referenceGenome": "GCA_000001405.27"}], "files": [{"analysisAlias": "GT1", "fileName": "genotype.vcf", "fileType": "vcf", "md5": "81ca0b3a6e5b657bc2be50085c76546a", "fileSize": 350}]} |
1 change: 1 addition & 0 deletions
1
tests/resources/sample_in_vcf_check/metadata_genotype_with_sample_in_vcf.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| {"project": {"title": "Test Genotype Project", "description": "Project with genotype VCF", "taxId": 9606, "centre": "Test Centre"}, "sample": [{"analysisAlias": ["GT1"], "sampleInVCF": "sample1", "bioSampleAccession": "SAME00001"}, {"analysisAlias": ["GT1"], "sampleInVCF": "sample2", "bioSampleAccession": "SAME00002"}, {"analysisAlias": ["GT1"], "sampleInVCF": "sample3", "bioSampleAccession": "SAME00003"}], "analysis": [{"analysisTitle": "Genotype Analysis", "analysisAlias": "GT1", "description": "Genotype analysis", "experimentType": "Whole genome sequencing", "referenceGenome": "GCA_000001405.27"}], "files": [{"analysisAlias": "GT1", "fileName": "genotype.vcf", "fileType": "vcf", "md5": "81ca0b3a6e5b657bc2be50085c76546a", "fileSize": 350}]} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.