Skip to content

Commit d3db77d

Browse files
Explain use-case of dataset testing
1 parent 9d9b2e5 commit d3db77d

File tree

7 files changed

+124
-4
lines changed

7 files changed

+124
-4
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: BacDiveR
22
Title: A Programmatic Interface For BacDive, The DSMZ's Bacterial Diversity Metadatabase
3-
Version: 0.5.1
3+
Version: 0.6.0
44
Authors@R: person("Katrin", "Leinweber", email = "[email protected]",
55
role = c("aut", "cre"),
66
comment = c(ORCID = "0000-0001-5135-5758"))

NEWS.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111
### Fixed
1212
### Security
1313

14+
# BacDiveR 0.6.0
15+
16+
### Added
17+
18+
- The vignette [Logic-Checking BacDive Datasets](https://tibhannover.github.io/BacDiveR/articles/logic-checking-bacdive-datasets.html)
19+
20+
### Changed
21+
22+
- `retrieve_search_results()` now returns `NULL` when no results are found, in
23+
order to ease integration of datasets into `testthat` tests.
24+
1425
## BacDiveR 0.5.1
1526

1627
### Fixed

R/retrieve_search_results.R

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,10 @@ retrieve_search_results <- function(queryURL)
1919
if (!grepl(pattern = paste0("$", download_param), x = queryURL))
2020
queryURL <- paste0(queryURL, download_param)
2121

22-
result_IDs <-
23-
strsplit(x = RCurl::getURL(queryURL), split = "\\n")[[1]]
22+
payload <- RCurl::getURL(queryURL)
2423

25-
aggregate_datasets(result_IDs, from_IDs = TRUE)
24+
if (grepl("^[[:digit:]]", payload))
25+
aggregate_datasets(strsplit(x = payload, split = "\\n")[[1]], from_IDs = TRUE)
26+
else if (grepl("^<!DOCTYPE", payload))
27+
NULL # needed for logic-checking datasets, see vignette
2628
}

tests/testthat/test-retrieve_search_results.R

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,11 @@ test_that("downloading a dataset from an 'advanced search' URL works", {
1212
expect_equal(Millers_strains[[1]], "Borrelia mayonii")
1313
expect_equal(Millers_strains[[2]], "Bacillus wiedmannii")
1414
})
15+
16+
17+
test_that("Inconsistent datasets get corrected", {
18+
inconsistent_data <- retrieve_search_results(
19+
"https://bacdive.dsmz.de/advsearch?advsearch=search&site=advsearch&searchparams[20][contenttype]=text&searchparams[20][typecontent]=contains&searchparams[20][searchterm]=Sea+of+Japan&searchparams[17][searchterm]=Europe")
20+
21+
expect_false(is.null(inconsistent_data))
22+
})
59.8 KB
Loading

vignettes/BacDive.bib

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,26 @@ @article{BD16
2222
doi = {10.1093/nar/gkv983},
2323
URL = {https://academic.oup.com/nar/article/44/D1/D581/2503137}
2424
}
25+
26+
@Article{TT,
27+
author = {Hadley Wickham},
28+
title = {testthat: Get Started with Testing},
29+
journal = {The R Journal},
30+
year = {2011},
31+
volume = {3},
32+
pages = {5--10},
33+
url = {https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf},
34+
}
35+
36+
@book{T,
37+
author = {Hadley Wickham},
38+
langid = {english},
39+
location = {{Sebastopol, CA}},
40+
title = {R {{Packages}}: {{Organize}}, {{Test}}, {{Document}}, and {{Share Your Code}}},
41+
edition = {1st edition},
42+
isbn = {978-1-4919-1059-7},
43+
url = {http://r-pkgs.had.co.nz/},
44+
publisher = {{O'Reilly Media}},
45+
date = {2015-04-13}
46+
}
47+
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: "Logic-Checking BacDive Datasets"
3+
author: "Katrin Leinweber"
4+
date: "`r Sys.Date()`"
5+
output: rmarkdown::html_vignette
6+
vignette: >
7+
%\VignetteIndexEntry{Vignette Title}
8+
%\VignetteEncoding{UTF-8}
9+
%\VignetteEngine{knitr::rmarkdown}
10+
editor_options:
11+
chunk_output_type: inline
12+
bibliography: BacDive.bib
13+
---
14+
15+
```{r setup, include = FALSE}
16+
knitr::opts_chunk$set(
17+
collapse = TRUE,
18+
comment = "#>"
19+
)
20+
```
21+
22+
### Example of a data inconsistency
23+
24+
Just as the correctness of data analysis code should be tested automatically, the
25+
consistency of data should be evaluated and monitored as well. Using [BacDive's advanced search](https://bacdive.dsmz.de/AdvSearch)
26+
and [BacDiveR's `retrieve_search_results()`](https://tibhannover.github.io/BacDiveR/reference/retrieve_search_results.html)
27+
several examples of geographic inconsistencies have been found. Presumably due to
28+
an overly strict location-to-country-to-continent mapping, several samples collected
29+
from seas neighbouring Russia (like the [Sea of Japan)](https://bacdive.dsmz.de/advsearch?site=advsearch&searchparams%5B20%5D%5Bcontenttype%5D=text&searchparams%5B20%5D%5Btypecontent%5D=contains&searchparams%5B20%5D%5Bsearchterm%5D=Sea+of+Japan&searchparams%5B100%5D%5Bcontenttype%5D=text&searchparams%5B100%5D%5Btypecontent%5D=contains&searchparams%5B100%5D%5Bsearchterm%5D=&searchparams%5B17%5D%5Bsearchterm%5D=Europe&advsearch=search),
30+
were assigned to Europe.
31+
32+
![Two datasets with a geo-logic fault (pun intended)](BacDive-geo-logic-fault.png)
33+
34+
While one may debate where exactly border between Asia and Europe runs through Russia,
35+
it is clear that its Eastern shoreline is located well within Asia. These and
36+
other datasets with East Russian locations have been reported to the BacDive team
37+
and a portion of those was corrected in [BacDive's 04.07.2018 release](https://bacdive.dsmz.de/news).
38+
39+
```{r data}
40+
library(BacDiveR)
41+
42+
inconsistent_data <- retrieve_search_results(
43+
"https://bacdive.dsmz.de/advsearch?advsearch=search&site=advsearch&searchparams[20][contenttype]=text&searchparams[20][typecontent]=contains&searchparams[20][searchterm]=Sea+of+Japan&searchparams[17][searchterm]=Europe"
44+
)
45+
```
46+
47+
As long as this specific inconsistency is not fixed, the above should display:
48+
`Data download in progress for BacDive-IDs: 131115 139987`.
49+
50+
51+
### How to test datasets
52+
53+
If a BacDive user finds an inconsistency within the datasets they use, BacDiveR's
54+
`retrieve_search_results()` can be used to construct a test-case for such a problem.
55+
In the following example, the test fails as long as BacDive contains datasets with
56+
the above-described discrepancy between the `geo_loc_name` and `continent` fields.
57+
58+
```{r test, error=TRUE}
59+
library(testthat)
60+
61+
test_that("No inconsistent datasets exist", {
62+
expect_null(inconsistent_data)
63+
})
64+
```
65+
66+
Once the inconsistency is corrected in BacDive, the advanced search returns no
67+
results any more, and the above test passes. It can thus be used to monitor the
68+
resolution of such a problem after [reporting](https://bacdive.dsmz.de/?site=contact)
69+
it. Furthermore, the users is alerted (by the test failing again) in case new
70+
datasets appear in BacDive with the same inconsistency.
71+
72+
### References
73+
74+
See [testthat.R-lib.org](https://testthat.r-lib.org/) and the
75+
[related "R Packages" chapter](http://r-pkgs.had.co.nz/tests.html) to learn
76+
more about testing in R [@TT; @T].

0 commit comments

Comments
 (0)