Skip to content

Nanopolish returns VCF instead of FASTA files #5

@fanavarro

Description

@fanavarro

Hi, I've been using the pipeline, concretely the option 1 (canu + nanopolish). However I got stuck in the nanopolish step because the expected file output was empty. The rule is the following one:

## Polish a genome assembly
rule nanopolish_consensus:
 input:
  reads = config['project_dir'] + '/03_subsampled_reads/{barcode}/reads.fastq.gz',
  bam = config['project_dir'] + '/05_nanopolish/{barcode}/reads.sorted.bam',
  bai = config['project_dir'] + '/05_nanopolish/{barcode}/reads.sorted.bam.bai',
  contigs = config['project_dir'] + '/05_nanopolish/{barcode}/canu.contigs.fasta',
  makerange_fp = config['nanopolish_fp'] + '/scripts/nanopolish_makerange.py',
  merge_fp = config['nanopolish_fp'] + '/scripts/nanopolish_merge.py'
 output:
  config['project_dir'] + '/05_nanopolish/{barcode}/nanopolish.contigs.fasta'
 params:
  results_fp = config['project_dir'] + '/05_nanopolish/{barcode}/nanopolish.results',
  polished_fp = config['project_dir'] + '/05_nanopolish/{barcode}/polished.results'
 threads: 16
 shell:
  """
  mkdir -p {params.polished_fp}
  python {input.makerange_fp} {input.contigs} | \
   parallel --results {params.results_fp} -P 16 \
    LD_LIBRARY_PATH="$CONDA_PREFIX/lib64" {config[nanopolish_fp]}/nanopolish variants \
    --consensus {params.polished_fp}/polished.{{1}}.fa \
    -w {{1}} -r {input.reads} -b {input.bam} -g {input.contigs} \
    -t {threads} --min-candidate-frequency 0.1 --methylation-aware=dcm,dam
  
  python {input.merge_fp} {params.polished_fp}/polished.*.fa > {output}
  """

In the shell statement, the command nanopolish variants --consensus is generating VCF files. The following command is python nanopolish_merge.py, which is expecting FASTA files, thus it do not work correctly and it generates an empty output file. Maybe this could be due to different nanopolish versions. However, in the instructions provided in the readme file no version is recommended.

I noticed that nanopolish provides a tool for converting a set of VCF into a fasta file, so I finally decided to use that instead of nanopolish_merge.py script. The resulting rule is the following (not test as I executed the command manually):

## Polish a genome assembly
rule nanopolish_consensus:
 input:
  reads = config['project_dir'] + '/03_subsampled_reads/{barcode}/reads.fastq.gz',
  bam = config['project_dir'] + '/05_nanopolish/{barcode}/reads.sorted.bam',
  bai = config['project_dir'] + '/05_nanopolish/{barcode}/reads.sorted.bam.bai',
  contigs = config['project_dir'] + '/05_nanopolish/{barcode}/canu.contigs.fasta',
  makerange_fp = config['nanopolish_fp'] + '/scripts/nanopolish_makerange.py',
  merge_fp = config['nanopolish_fp'] + '/scripts/nanopolish_merge.py'
 output:
  config['project_dir'] + '/05_nanopolish/{barcode}/nanopolish.contigs.fasta'
 params:
  results_fp = config['project_dir'] + '/05_nanopolish/{barcode}/nanopolish.results',
  polished_fp = config['project_dir'] + '/05_nanopolish/{barcode}/polished.results'
 threads: 16
 shell:
  """
  mkdir -p {params.polished_fp}
  python {input.makerange_fp} {input.contigs} | \
   parallel --results {params.results_fp} -P 16 \
    LD_LIBRARY_PATH="$CONDA_PREFIX/lib64" {config[nanopolish_fp]}/nanopolish variants \
    --consensus {params.polished_fp}/polished.{{1}}.fa \
    -w {{1}} -r {input.reads} -b {input.bam} -g {input.contigs} \
    -t {threads} --min-candidate-frequency 0.1 --methylation-aware=dcm,dam
  
  LD_LIBRARY_PATH="$CONDA_PREFIX/lib64" {config[nanopolish_fp]}/nanopolish vcf2fasta -g {input.contigs} {params.polished_fp}/polished.*.fa > {output}
  """

I hope this helps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions