Volcano plot for showing significant changes in reads per gene

To indicate genes that have a fitness change between different libraries, I am creating a volcano plot.
A dot represents a single gene. On the x-axis it shows the fold change in number of reads or insertions between two libraries and on the y-axis it shows the p-value (determined by an independent t-test). Note the the x-axis is in log2 scale and the y-axis is in -log10 scale.
So the interesting genes are the ones that are high on the y-axis and far away from 0 on the x-axis.
![volcanoplot_reads](https://user-images.githubusercontent.com/29129193/108827378-d5bef200-75c5-11eb-868c-ba9968691393.png)
When I compare my results with those found by Agnes (Kornmann lab), the numbers don't seem to match. See the below figure which is using the same dataset.
<img width="920" alt="volcanoplot_reads_Agnes" src="https://user-images.githubusercontent.com/29129193/108828438-1f5c0c80-75c7-11eb-9271-33bcd7c62932.png">
The method I use for determining the fold change is I sum over all reads of a specific gene in all datasets of a library (e.g. we have 2 wt datasets and 4 dNrp1 datasets) and then I normalize for the total number of insertions in the library. So I end up with two values, the normalized summed number of reads for wt and for dNrp1.
The p-value of the student t-test is determined by python using `scipy.stats.ttest_ind(wt_datasets, dNrp1_datasets)`.

One thing in my graph is the cluster with large negative fold change. I checked the genes in this cluster and they are all genes that have 0 reads in all datasets except for one. This probably messes up the fold change calculation, but I am not sure yet about a way of dealing with this.
Another thing is that I don't know for each dataset the total number of insertions. Now I only use the insertion count in all genes (which is the only thing I have data from), but this ignores all insertions outside the genes which might explain some differences between my results and those from Agnes.

I use [volcano.py](https://github.com/Gregory94/LaanLab-SATAY-DataAnalysis/blob/dev_Gregory/Python_scripts/volcanoplot.py) for creating the above plot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Volcano plot for showing significant changes in reads per gene #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Volcano plot for showing significant changes in reads per gene #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions