You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 8, 2021. It is now read-only.
To indicate genes that have a fitness change between different libraries, I am creating a volcano plot.
A dot represents a single gene. On the x-axis it shows the fold change in number of reads or insertions between two libraries and on the y-axis it shows the p-value (determined by an independent t-test). Note the the x-axis is in log2 scale and the y-axis is in -log10 scale.
So the interesting genes are the ones that are high on the y-axis and far away from 0 on the x-axis.
When I compare my results with those found by Agnes (Kornmann lab), the numbers don't seem to match. See the below figure which is using the same dataset.
The method I use for determining the fold change is I sum over all reads of a specific gene in all datasets of a library (e.g. we have 2 wt datasets and 4 dNrp1 datasets) and then I normalize for the total number of insertions in the library. So I end up with two values, the normalized summed number of reads for wt and for dNrp1.
The p-value of the student t-test is determined by python using scipy.stats.ttest_ind(wt_datasets, dNrp1_datasets).
One thing in my graph is the cluster with large negative fold change. I checked the genes in this cluster and they are all genes that have 0 reads in all datasets except for one. This probably messes up the fold change calculation, but I am not sure yet about a way of dealing with this.
Another thing is that I don't know for each dataset the total number of insertions. Now I only use the insertion count in all genes (which is the only thing I have data from), but this ignores all insertions outside the genes which might explain some differences between my results and those from Agnes.