You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -990,3 +990,119 @@ ggplot(data = asd_vs_c, aes(x = log2FoldChange, y = padj_trans, col = diffexpres
990
990
ggtitle("Differential Expression in Mouse Prefrontal Cortex")
991
991
```
992
992
993
+
994
+
## Running a gene set enrichment analysis
995
+
996
+
We can use a gene set enrichment analysis to explore biological processes might be affected by the genes that are differentially expressed in our comparison. We can also use this to figure out the types of processes genes on our gene list are involved in.
997
+
998
+
If you are using the SciServer environment, you can skip the **Creating the analysis functions** section below and go straight to the **Loading the data** section. The functions you need have already been created and are saved for your use on SciServer.
999
+
1000
+
### Creating the analysis functions
1001
+
1002
+
Before we get started with the actual analysis, we'll need to run a couple of commands to create a special function to do the gene set analysis. This function will allow us to do a complex set of analysis steps in a single line of code.
1003
+
1004
+
Before we can create our function, we need to install a couple of packages. Run the code below.
1005
+
1006
+
```{r}
1007
+
if (!require("BiocManager", quietly = TRUE))
1008
+
install.packages("BiocManager")
1009
+
1010
+
BiocManager::install("clusterProfiler")
1011
+
BiocManager::install("org.Mm.eg.db")
1012
+
```
1013
+
1014
+
You may have noticed that the installation commands for these two packages are different than the ones you used previously with the `tidyverse` and `ggrepel` packages. That's because the `clusterProfiler` and `org.Mm.eg.db` packages are saved in a different location (called a repository) than `tidyverse` and `ggrepel`. Both `clusterprofiler` and `org.Mm.eg.db` are part of the Bioconductor Project, which focuses on developing, maintaining, and distributing software specifically for biological and bioinformatics research.
1015
+
1016
+
Once these two packages have been installed, make sure you load them.
1017
+
1018
+
```{r}
1019
+
library("clusterProfiler")
1020
+
library("org.Mm.eg.db")
1021
+
```
1022
+
1023
+
Now we can make our function! The code below makes two functions. A _function_ in R is just a block of organized, reusable code that performs a specific task. The commands you have been using in your analysis (like `library`, `filter`, and `heatmap`, to name a few examples) are actually functions that someone else wrote and saved in packages for everyone to use.
1024
+
1025
+
The functions you're creating also perform special tasks. `runClusterProfiler` is going to pull information about the GO terms associated with each gene in your list. (Remember, GO terms are standardized names for molecular processes, biological processes, and cellular locations.) The function `getClusterProfilerGenes` will let you see which genes in your gene list belong to each GO term.
1026
+
1027
+
If this sounds confusing, be patient - it should become more clear as you use the functions.
Okay, that's all the prep work we need to do! Now let's move onto the actual analysis.
1051
+
1052
+
### Loading the data
1053
+
1054
+
For this analysis, we're using the differential expression dataset that you used to make your volcano plot. If it's loaded into your session, great! You don't need to do anything. If you need to load it again, here's the code and links from before. We're loading the data that compares gene expression between the prefrontal cortex and striatum in the control mice.
**Comparing gene expression between prefrontal cortex and striatum**
1071
+
1072
+
All mice: <https://genomicseducation.org/data/mouse_gutbrain_de_tissuetype.csv>
1073
+
1074
+
Only ASD mice: <https://genomicseducation.org/data/mouse_gutbrain_de_tissuetype_in_ASDmice.csv>
1075
+
1076
+
Only control mice: <https://genomicseducation.org/data/mouse_gutbrain_de_tissuetype_in_controlmice.csv>
1077
+
1078
+
### Filtering the data based on padj
1079
+
1080
+
In a gene set analysis, we're interested in identifying the biological pathways that are likely to be different based on the gene expression differences between our two compared groups. We run these kinds of analyses on the genes with the lowest padj values in the dataset (because these are the genes that we feel confident are differentially expressed).
1081
+
1082
+
We use `filter` to create this gene list. You can set "padj" to whatever threshold you like.
### Visualizing the biological pathway GO terms in the gene set
1089
+
1090
+
Now we're going to use the special functions you created earlier. First we want to use `runClusterProfiler`. This is the function that gathers all the GO terms associated with the genes in your gene list.
In the dot plot, the GO terms (mostly biological processes) are on the y-axis. The x-axis gives us an idea of how many genes in our gene list are found in each GO term category (GeneRatio is proportion of the differentially expressed genes that belong to a specific GO term or pathway). The dot plot also colors the dots based on the "padj" value. In this case, the red dots indicate lower (or more statistically certain) adjusted p-values.
1102
+
1103
+
The top GO term on the plot (which is the category with the greatest number of differentially expressed genes in it) is "Pathways of neurodegeneration - multiple diseases." How interesting! Clearly the expression of neurodegeneration-associated genes differ between the two parts of the brain, even in control mice. Let's take a closer look at what genes belong to this GO term category using the `getClusterProfilerGenes` function.
1104
+
1105
+
```{r}
1106
+
getClusterProfilerGenes(gene_clusters, "Pathways of neurodegeneration - multiple diseases")
1107
+
```
1108
+
There are a lot of genes in this category! At this point, researchers might pick a some genes from this list and go back to the MGI database to research it for future work.
Then we'll filter out the gene in which we're interested from each object. Let's take a look at gene ENSMUSG00000079516, which is the reg3a gene we previously looked up on MGI.
Then we'll filter out the gene in which we're interested from each object. Let's take a look at gene ENSMUSG00000079516, which is the reg3a gene we previously looked up on MGI.
0 commit comments