Skip to content

Commit 79f5a5d

Browse files
committed
Add harmonization
Added harmonization as an option to the single site stratigraphic plot.
1 parent 3016857 commit 79f5a5d

File tree

3 files changed

+575
-520
lines changed

3 files changed

+575
-520
lines changed

simple_workflow.Rmd

Lines changed: 62 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ csl: 'https://bit.ly/3khj0ZL'
4141

4242
```{r setup, echo=FALSE}
4343
options(warn = -1)
44-
pacman::p_load(neotoma2, dplyr, ggplot2, sf, geojsonsf, leaflet, DT, readr, stringr, rioja)
44+
pacman::p_load(neotoma2, dplyr, ggplot2, sf, geojsonsf, leaflet, DT, readr, stringr, rioja, tidyr)
4545
```
4646

4747
## Introduction
@@ -325,7 +325,7 @@ The following call can take some time, but we've frozen the object as an RDS dat
325325
denmark_dl <- readRDS("data/dkDownload.RDS")
326326
```
327327

328-
Once we've downloaded, we now have information for each site about all the associated collection units, the datasets, and, for each dataset, all the samples associated with the datasets. To extract all the samples we can call:
328+
Once we've downloaded, we now have information for each site about all the associated collection units, the datasets, and, for each dataset, all the samples associated with the datasets. To extract samples all downloads we can call:
329329

330330
```{r allSamples}
331331
allSamp <- samples(denmark_dl)
@@ -505,11 +505,14 @@ ggplot(data = taxaplots, aes(x = sites, y = samples)) +
505505

506506
## Simple Analytics
507507

508-
### Stratigraphic Plotting
508+
### Stratigraphic Plotting {.tabset}
509509

510-
We can use packages like `rioja` to do stratigraphic plotting for a single record, but first we need to do some different data management. Although we could do harmonization again we're going to simply take the taxa at a single site and plot them in a stratigraphic diagram.
510+
To plot at strategraphic diargram we are only interested in one site and in one dataset. By looking at the summary of downloads we can see that Lake Solsø has two collection units that both have a pollen record. Lets look at the SOLSOE81 collection unit, which is the second download. To get the samples from just that one collection unit by specifying that you want only the samples from the second download.
511511

512-
```{r stratiplot, message = FALSE}
512+
We can use packages like `rioja` to do stratigraphic plotting for a single record, but first we need to do some different data management. Although we could do harmonization again we're going to simply take the taxa at a single site and plot them in a stratigraphic diagram. However, if you would like to plot multiple sites and you want them to have harmonized taxa we have provided examples on how to do both.
513+
514+
#### Raw Taxon
515+
```{r stratiplotraw, message = FALSE}
513516
# Get a particular site, in this case we are simply subsetting the
514517
# `denmark_dl` object to get Lake Solsø:
515518
plottingSite <- denmark_dl[[2]]
@@ -529,11 +532,42 @@ counts <- plottingSite %>%
529532
counts <- counts[, colSums(counts > 0.01, na.rm = TRUE) > 5]
530533
```
531534

535+
#### With Harmonization
536+
```{r stratiplotharm, message = FALSE}
537+
# Get a particular site, in this case we are simply subsetting the
538+
# `denmark_dl` object to get Lake Solsø:
539+
plottingSite <- denmark_dl[[2]]
540+
541+
# Select only pollen measured using NISP and convert to a "wide"
542+
# table, using proportions. The first column will be "age".
543+
# This turns our "long" table into a "wide" table:
544+
counts_harmonized <- plottingSite %>%
545+
samples() %>%
546+
toWide(ecologicalgroup = c("TRSH"),
547+
unit = c("NISP"),
548+
elementtypes = c("pollen"),
549+
groupby = "age",
550+
operation = "prop") %>%
551+
arrange(age) %>%
552+
pivot_longer(-age) %>%
553+
inner_join(translation, by = c("name" = "variablename")) %>%
554+
dplyr::select(!c("name", taxonid)) %>%
555+
group_by(harmonizedname, age) %>%
556+
summarise(value = sum(value), .groups='keep')%>%
557+
pivot_wider(names_from = harmonizedname, values_from = value)
558+
559+
counts_harmonized <- counts_harmonized[, colSums(counts_harmonized > 0.01, na.rm = TRUE) > 5]
560+
```
561+
562+
### {.tabset}
563+
532564
Hopefully the code is pretty straightforward. The `toWide()` function provides you with significant control over the taxa, units and other elements of your data before you get them into the wide matrix (`depth` by `taxon`) that most statistical tools such as the `vegan` package or `rioja` use.
533565

534566
To plot the data we can use `rioja`'s `strat.plot()`, sorting the taxa using weighted averaging scores (`wa.order`). I've also added a CONISS plot to the edge of the the plot, to show how the new *wide* data frame works with distance metric funcitons.
535567

536-
```{r plotStrigraph, message=FALSE, warning=FALSE}
568+
#### Raw Taxon
569+
570+
```{r plotStrigraphraw, message=FALSE, warning=FALSE, out.width='90%'}
537571
# Perform constrained clustering:
538572
clust <- rioja::chclust(dist(sqrt(counts)),
539573
method = "coniss")
@@ -552,6 +586,28 @@ plot <- rioja::strat.plot(counts[,-1] * 100, yvar = counts$age,
552586
rioja::addClustZone(plot, clust, 4, col = "red")
553587
```
554588

589+
#### With Harmonization
590+
```{r plotStrigraphharm, message=FALSE, warning=FALSE, out.width='90%'}
591+
# Perform constrained clustering:
592+
clust <- rioja::chclust(dist(sqrt(counts_harmonized)),
593+
method = "coniss")
594+
595+
# Plot the stratigraphic plot, converting proportions to percentages:
596+
plot <- rioja::strat.plot(counts_harmonized[,-1] * 100, yvar = counts_harmonized$age,
597+
title = denmark_dl[[1]]$sitename,
598+
ylabel = "Calibrated Years BP",
599+
xlabel = "Pollen (% of Trees and Shrubs)",
600+
srt.xlabel = 70,
601+
y.rev = TRUE,
602+
clust = clust,
603+
wa.order = "topleft",
604+
scale.percent = TRUE)
605+
606+
rioja::addClustZone(plot, clust, 4, col = "red")
607+
```
608+
609+
###
610+
555611
## Conclusion
556612

557613
So, we've done a lot in this example. We've (1) searched for sites using site names and geographic parameters, (2) filtered results using temporal and spatial parameters, (3) obtained sample information for the selected datasets and (4) performed basic analysis including the use of climate data from rasters. Hopefully you can use these examples as templates for your own future work, or as a building block for something new and cool!

simple_workflow.html

Lines changed: 429 additions & 490 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)