-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi, I've been using the pipeline, concretely the option 1 (canu + nanopolish). However I got stuck in the nanopolish step because the expected file output was empty. The rule is the following one:
## Polish a genome assembly
rule nanopolish_consensus:
input:
reads = config['project_dir'] + '/03_subsampled_reads/{barcode}/reads.fastq.gz',
bam = config['project_dir'] + '/05_nanopolish/{barcode}/reads.sorted.bam',
bai = config['project_dir'] + '/05_nanopolish/{barcode}/reads.sorted.bam.bai',
contigs = config['project_dir'] + '/05_nanopolish/{barcode}/canu.contigs.fasta',
makerange_fp = config['nanopolish_fp'] + '/scripts/nanopolish_makerange.py',
merge_fp = config['nanopolish_fp'] + '/scripts/nanopolish_merge.py'
output:
config['project_dir'] + '/05_nanopolish/{barcode}/nanopolish.contigs.fasta'
params:
results_fp = config['project_dir'] + '/05_nanopolish/{barcode}/nanopolish.results',
polished_fp = config['project_dir'] + '/05_nanopolish/{barcode}/polished.results'
threads: 16
shell:
"""
mkdir -p {params.polished_fp}
python {input.makerange_fp} {input.contigs} | \
parallel --results {params.results_fp} -P 16 \
LD_LIBRARY_PATH="$CONDA_PREFIX/lib64" {config[nanopolish_fp]}/nanopolish variants \
--consensus {params.polished_fp}/polished.{{1}}.fa \
-w {{1}} -r {input.reads} -b {input.bam} -g {input.contigs} \
-t {threads} --min-candidate-frequency 0.1 --methylation-aware=dcm,dam
python {input.merge_fp} {params.polished_fp}/polished.*.fa > {output}
"""
In the shell statement, the command nanopolish variants --consensus is generating VCF files. The following command is python nanopolish_merge.py, which is expecting FASTA files, thus it do not work correctly and it generates an empty output file. Maybe this could be due to different nanopolish versions. However, in the instructions provided in the readme file no version is recommended.
I noticed that nanopolish provides a tool for converting a set of VCF into a fasta file, so I finally decided to use that instead of nanopolish_merge.py script. The resulting rule is the following (not test as I executed the command manually):
## Polish a genome assembly
rule nanopolish_consensus:
input:
reads = config['project_dir'] + '/03_subsampled_reads/{barcode}/reads.fastq.gz',
bam = config['project_dir'] + '/05_nanopolish/{barcode}/reads.sorted.bam',
bai = config['project_dir'] + '/05_nanopolish/{barcode}/reads.sorted.bam.bai',
contigs = config['project_dir'] + '/05_nanopolish/{barcode}/canu.contigs.fasta',
makerange_fp = config['nanopolish_fp'] + '/scripts/nanopolish_makerange.py',
merge_fp = config['nanopolish_fp'] + '/scripts/nanopolish_merge.py'
output:
config['project_dir'] + '/05_nanopolish/{barcode}/nanopolish.contigs.fasta'
params:
results_fp = config['project_dir'] + '/05_nanopolish/{barcode}/nanopolish.results',
polished_fp = config['project_dir'] + '/05_nanopolish/{barcode}/polished.results'
threads: 16
shell:
"""
mkdir -p {params.polished_fp}
python {input.makerange_fp} {input.contigs} | \
parallel --results {params.results_fp} -P 16 \
LD_LIBRARY_PATH="$CONDA_PREFIX/lib64" {config[nanopolish_fp]}/nanopolish variants \
--consensus {params.polished_fp}/polished.{{1}}.fa \
-w {{1}} -r {input.reads} -b {input.bam} -g {input.contigs} \
-t {threads} --min-candidate-frequency 0.1 --methylation-aware=dcm,dam
LD_LIBRARY_PATH="$CONDA_PREFIX/lib64" {config[nanopolish_fp]}/nanopolish vcf2fasta -g {input.contigs} {params.polished_fp}/polished.*.fa > {output}
"""
I hope this helps.