This repository aims to analyze those metagenomic Illumina reads that
were unable to be classified by
VirMet
(undetermined_reads.fastq.gz) by aligning them on protein level using
DIAMOND/BLASTx.
diamet.py analyzes all reads from undetermined_reads.fastq.gz
-
as single reads, and
-
as contigs created by de novo assembly using megahit.
- Enter timavo.
ssh timavo
- Move into the directory of the sample whose undetermined reads you want to analyze.
cd /analysis/VirMetResults/<run>/<sample>/
- Run the python script.
python <path to script>/diamet.py
To run diamet.py, you need:
-
the
diamondunix executable file which can be found here; -
megahit installed on the server;
-
a protein database (defined in the code; we are using swissprot);
-
undetermined_reads.fastq.gz, which should be in the current working directory.
diamet.py will output the following files:
-
undetermined_reads_diamet.pdfwhich plots taxonomic classification distribution of all hits; -
undetermined_reads_diamet.tsvwhich lists all hits and their Query Seq - id (qseqid), Query sequence length (qlen), Alignment length (length), Unique Subject Scientific Name (sscinames), and Unique Subject Super Kingdom (sskingdoms); -
undetermined_reads_diamet_viral.csvwhich lists only the viral hits and their counts; -
undetermined_contigs_diamet.tsvwhich lists all hits of the contigs and their Query Seq - id (qseqid), Query sequence length (qlen),Alignment length (length), Unique Subject Scientific Name (sscinames), and Unique Subject Super Kingdom (sskingdoms).