|
| 1 | +--- |
| 2 | +title: "Logic-Checking BacDive Datasets" |
| 3 | +author: "Katrin Leinweber" |
| 4 | +date: "`r Sys.Date()`" |
| 5 | +output: rmarkdown::html_vignette |
| 6 | +vignette: > |
| 7 | + %\VignetteIndexEntry{Vignette Title} |
| 8 | + %\VignetteEncoding{UTF-8} |
| 9 | + %\VignetteEngine{knitr::rmarkdown} |
| 10 | +editor_options: |
| 11 | + chunk_output_type: inline |
| 12 | +bibliography: BacDive.bib |
| 13 | +--- |
| 14 | + |
| 15 | +```{r setup, include = FALSE} |
| 16 | +knitr::opts_chunk$set( |
| 17 | + collapse = TRUE, |
| 18 | + comment = "#>" |
| 19 | +) |
| 20 | +``` |
| 21 | + |
| 22 | +### Example of a data inconsistency |
| 23 | + |
| 24 | +Just as the correctness of data analysis code should be tested automatically, the |
| 25 | +consistency of data should be evaluated and monitored as well. Using [BacDive's advanced search](https://bacdive.dsmz.de/AdvSearch) |
| 26 | +and [BacDiveR's `retrieve_search_results()`](https://tibhannover.github.io/BacDiveR/reference/retrieve_search_results.html) |
| 27 | +several examples of geographic inconsistencies have been found. Presumably due to |
| 28 | +an overly strict location-to-country-to-continent mapping, several samples collected |
| 29 | +from seas neighbouring Russia (like the [Sea of Japan)](https://bacdive.dsmz.de/advsearch?site=advsearch&searchparams%5B20%5D%5Bcontenttype%5D=text&searchparams%5B20%5D%5Btypecontent%5D=contains&searchparams%5B20%5D%5Bsearchterm%5D=Sea+of+Japan&searchparams%5B100%5D%5Bcontenttype%5D=text&searchparams%5B100%5D%5Btypecontent%5D=contains&searchparams%5B100%5D%5Bsearchterm%5D=&searchparams%5B17%5D%5Bsearchterm%5D=Europe&advsearch=search), |
| 30 | +were assigned to Europe. |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +While one may debate where exactly border between Asia and Europe runs through Russia, |
| 35 | +it is clear that its Eastern shoreline is located well within Asia. These and |
| 36 | +other datasets with East Russian locations have been reported to the BacDive team |
| 37 | +and a portion of those was corrected in [BacDive's 04.07.2018 release](https://bacdive.dsmz.de/news). |
| 38 | + |
| 39 | +```{r data} |
| 40 | +library(BacDiveR) |
| 41 | + |
| 42 | +inconsistent_data <- retrieve_search_results( |
| 43 | + "https://bacdive.dsmz.de/advsearch?advsearch=search&site=advsearch&searchparams[20][contenttype]=text&searchparams[20][typecontent]=contains&searchparams[20][searchterm]=Sea+of+Japan&searchparams[17][searchterm]=Europe" |
| 44 | + ) |
| 45 | +``` |
| 46 | + |
| 47 | +As long as this specific inconsistency is not fixed, the above should display: |
| 48 | +`Data download in progress for BacDive-IDs: 131115 139987`. |
| 49 | + |
| 50 | + |
| 51 | +### How to test datasets |
| 52 | + |
| 53 | +If a BacDive user finds an inconsistency within the datasets they use, BacDiveR's |
| 54 | +`retrieve_search_results()` can be used to construct a test-case for such a problem. |
| 55 | +In the following example, the test fails as long as BacDive contains datasets with |
| 56 | +the above-described discrepancy between the `geo_loc_name` and `continent` fields. |
| 57 | + |
| 58 | +```{r test, error=TRUE} |
| 59 | +library(testthat) |
| 60 | +
|
| 61 | +test_that("No inconsistent datasets exist", { |
| 62 | + expect_null(inconsistent_data) |
| 63 | +}) |
| 64 | +``` |
| 65 | + |
| 66 | +Once the inconsistency is corrected in BacDive, the advanced search returns no |
| 67 | +results any more, and the above test passes. It can thus be used to monitor the |
| 68 | +resolution of such a problem after [reporting](https://bacdive.dsmz.de/?site=contact) |
| 69 | +it. Furthermore, the users is alerted (by the test failing again) in case new |
| 70 | +datasets appear in BacDive with the same inconsistency. |
| 71 | + |
| 72 | +### References |
| 73 | + |
| 74 | +See [testthat.R-lib.org](https://testthat.r-lib.org/) and the |
| 75 | +[related "R Packages" chapter](http://r-pkgs.had.co.nz/tests.html) to learn |
| 76 | +more about testing in R [@TT; @T]. |
0 commit comments