-
Notifications
You must be signed in to change notification settings - Fork 14
Examples 2 Tests #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Examples 2 Tests #60
Conversation
one-shotted using Claude Opus 4.5 with the following prompt:
Create `extractDatasetFromJASPfile` function that allows extracting datataset as a data.frame from saved JASP files.
There is an example JASP file in `tests/JASPFiles/debug-descriptives.JASP`
You will probably need to do something like this:
#' \itemize{
#' \item Unpacking the .jasp archive (which is a zip file)
#' \item Reading the internal.sqlite database
#' \item Converting Column_N_DBL and Column_N_INT columns to properly named columns
#' \item Mapping factor levels from the Labels table to create proper R factors
#' \item Handling both explicitly nominal columns and columns with label mappings
#' }
you might need packages like:
'DBI','RSQLite', and 'archive'
The extracted file should exactly match the `debug-descriptives.csv` file
Once you are done, add a unit test that verifies the extraction.
Be aware that the file continsNAs, NaNs, Infs, different types of strings and encodings etc....
Introduces an encodedDataset argument to preloadDataset, runAnalysis, and initAnalysisRuntime functions. This allows skipping column name repair and type detection when the dataset is already encoded and typed, improving flexibility for pre-processed datasets.
Introduces makeTestsFromExamples and related internal functions to generate unit tests from JASP example files. Refactors extractDatasetFromJASPfile to extractDatasetFromJASPFile for consistency, updates documentation and test files accordingly, and exports new functions in NAMESPACE. Also improves analysisOptions to better handle file paths and .jasp file detection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces comprehensive improvements to jaspTools focused on enhancing reproducibility, testing workflows, and compatibility with JASP files. The changes enable developers to extract datasets and options directly from .jasp files, encode them for reproducible testing, and automatically generate test files from JASP examples.
Key Changes:
- New dataset and option encoding utilities for standardized, reproducible testing with generic variable names
- JASP file dataset extraction capabilities through SQLite database parsing
- Automated test generation from JASP example files
- Improved option structure handling for nested types metadata
- Comprehensive developer documentation
Reviewed changes
Copilot reviewed 25 out of 27 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
tests/testthat/test-extractDatasetFromJASPfile.R |
Comprehensive tests for dataset extraction from JASP files covering various data types (numeric, categorical, binary, unicode, infinity, NaN) |
tests/testthat/test-encodeOptionsAndDataset.R |
Tests for the new encoding functionality ensuring proper variable-type mapping and dataset transformation |
tests/testthat/test-analysisOptions.R |
Tests verifying the flattening of complex nested option structures with types metadata |
tests/testthat.R |
Standard testthat entry point for running package tests |
tests/JASPFiles/debug-descriptives.csv |
Test data file with comprehensive coverage of edge cases (100 rows with 31 columns including special values) |
R/test-generator.R |
New functionality for automatically generating test files from JASP examples with makeTestsFromExamples() |
R/run.R |
Enhanced runAnalysis() and initAnalysisRuntime() to support encoded datasets via new encodedDataset parameter |
R/options.R |
Improved analysisOptions() with cross-platform path handling and refactored fixOptionsForVariableTypes() for recursive flattening |
R/dataset.R |
New functions: extractDatasetFromJASPFile(), encodeOptionsAndDataset(), and supporting utilities for encoding/extraction workflows |
man/*.Rd |
Documentation files for new exported functions and updated signatures |
.github/copilot-instructions.md |
Comprehensive developer guide covering architecture, workflows, conventions, and common pitfalls |
NAMESPACE |
Exports new public functions: extractDatasetFromJASPFile, encodeOptionsAndDataset, makeTestsFromExamples |
DESCRIPTION |
Version bump to 0.20.1, added contributor, and suggested dependencies (DBI, RSQLite) for JASP file processing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| utils::unzip(jaspFile, exdir = tempDir) | ||
|
|
||
| # Check for internal.sqlite | ||
|
|
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 60 has an empty comment line which serves no purpose and should be removed to keep the code clean.
| @@ -0,0 +1,197 @@ | |||
| context("extractDatasetFromJASPFile") | |||
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test file name has an inconsistent capitalization: "test-extractDatasetFromJASPfile.R" uses lowercase "file", but the function name is "extractDatasetFromJASPFile" with uppercase "F". This inconsistency could cause confusion for developers looking for the test file.
| quiet = FALSE, | ||
| makeTests = FALSE | ||
| makeTests = FALSE, | ||
| encodedDataset = FALSE |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new encodedDataset parameter is added without documentation. The @param section should include documentation explaining what this parameter does and when it should be set to TRUE.
| runAnalysis( | ||
| name, | ||
| dataset, | ||
| dataset = NULL, |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for the dataset parameter is changed to NULL, but there's no visible documentation update in the @param section or function body explaining how NULL datasets are handled. The documentation should be updated to clarify this behavior.
| # Extract the JASP file (it's a zip archive) | ||
| utils::unzip(jaspFile, exdir = tempDir) | ||
|
|
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extractDatasetFromJASPFile unzips a user-specified .jasp archive directly into a temporary directory via utils::unzip(jaspFile, exdir = tempDir) without validating or constraining the archive’s internal paths. A malicious .jasp file can include entries with .. or absolute paths, causing files outside tempDir (e.g., .Rprofile, project sources) to be overwritten when extracted, which can lead to arbitrary code execution the next time R or the project is loaded. To fix this, avoid raw unzip on untrusted archives: either use an API that enforces safe extraction or explicitly inspect archive member paths and refuse any that escape the intended directory before extracting and reading internal.sqlite.
| # Extract the JASP file (it's a zip archive) | |
| utils::unzip(jaspFile, exdir = tempDir) | |
| # Safely inspect and extract the JASP file (it's a zip archive) | |
| zipContents <- utils::unzip(jaspFile, list = TRUE) | |
| entryNames <- zipContents$Name | |
| # Basic validation of archive member paths to prevent path traversal | |
| isAbsolute <- grepl("^(/|\\\\|[A-Za-z]:|~)", entryNames) | |
| hasDotDot <- grepl("(^|/|\\\\)\\.\\.($|/|\\\\)", entryNames) | |
| if (any(isAbsolute | hasDotDot)) { | |
| stop("The JASP file contains invalid paths (absolute paths or '..' components). ", | |
| "Refusing to extract for security reasons.") | |
| } | |
| # Ensure that the normalized target paths stay within tempDir | |
| baseDir <- normalizePath(tempDir, winslash = "/", mustWork = TRUE) | |
| targetPaths <- file.path(tempDir, entryNames) | |
| normalizedTargets <- normalizePath(targetPaths, winslash = "/", mustWork = FALSE) | |
| escapesTempDir <- !startsWith(normalizedTargets, paste0(baseDir, "/")) | |
| if (any(escapesTempDir, na.rm = TRUE)) { | |
| stop("The JASP file contains entries that would extract outside the temporary directory. ", | |
| "Refusing to extract for security reasons.") | |
| } | |
| # Extract only the validated entries | |
| utils::unzip(jaspFile, exdir = tempDir, files = entryNames) |
| lines <- character(0) | ||
|
|
||
| # Header | ||
| lines <- c(lines, paste0('context("Example: ', baseName, '")')) |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generateExampleTestFileContent interpolates baseName (derived from the .jasp filename) directly into an R string literal in context("Example: ...") without escaping. An attacker who provides a .jasp file whose basename contains quotes or R code (e.g., evil"); system("rm -rf /"); #) can cause the generated test file to contain and execute arbitrary R code when tests are run. To mitigate this, treat baseName as data by escaping it for use inside string literals (e.g., via encodeString/appropriate quoting) or pass it as a symbol/variable instead of concatenating it into raw code.
| lines <- c(lines, paste0('context("Example: ', baseName, '")')) | |
| contextTitle <- paste("Example:", baseName) | |
| safeContextTitle <- encodeString(contextTitle, quote = '"') | |
| lines <- c(lines, paste0("context(", safeContextTitle, ")")) |
| lines <- c(lines, paste0(' jaspFile <- testthat::test_path("..", "..", "examples", "', jaspFileName, '")')) | ||
|
|
||
| # Generate appropriate options extraction based on number of analyses | ||
| if (totalAnalyses == 1) { | ||
| # Single analysis: analysisOptions returns options directly (not in a list) | ||
| lines <- c(lines, " opts <- jaspTools::analysisOptions(jaspFile)") | ||
| } else { | ||
| # Multiple analyses: analysisOptions returns a list | ||
| lines <- c(lines, paste0(" opts <- jaspTools::analysisOptions(jaspFile)[[", analysisIndex, "]]")) | ||
| } | ||
|
|
||
| lines <- c(lines, " dataset <- jaspTools::extractDatasetFromJASPFile(jaspFile)") | ||
| lines <- c(lines, "") | ||
|
|
||
| # Encode and run | ||
| lines <- c(lines, " # Encode and run analysis") | ||
| lines <- c(lines, " encoded <- jaspTools:::encodeOptionsAndDataset(opts, dataset)") | ||
| lines <- c(lines, " set.seed(1)") | ||
| lines <- c(lines, paste0(' results <- jaspTools::runAnalysis("', analysisName, '", encoded$dataset, encoded$options, encodedDataset = TRUE)')) |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generateExampleTestBlock embeds both jaspFileName and analysisName—values read from the .jasp file—directly into R string literals when constructing the test code (test_path(..., "<name>") and runAnalysis("<name>", ...)) without any escaping. A crafted .jasp whose filename or analysis name contains embedded quotes and R code can cause the generated test file to include attacker-controlled R statements that execute when tests run. To prevent this, ensure these values are safely escaped or quoted for inclusion in R strings (or passed as data rather than concatenated into source code) before writing out the test blocks.
| lines <- c(lines, paste0(' jaspFile <- testthat::test_path("..", "..", "examples", "', jaspFileName, '")')) | ||
|
|
||
| # Generate appropriate options extraction based on number of analyses | ||
| if (totalAnalyses == 1) { | ||
| # Single analysis: analysisOptions returns options directly (not in a list) | ||
| lines <- c(lines, " opts <- jaspTools::analysisOptions(jaspFile)") | ||
| } else { | ||
| # Multiple analyses: analysisOptions returns a list | ||
| lines <- c(lines, paste0(" opts <- jaspTools::analysisOptions(jaspFile)[[", analysisIndex, "]]")) | ||
| } | ||
|
|
||
| lines <- c(lines, " dataset <- jaspTools::extractDatasetFromJASPFile(jaspFile)") | ||
| lines <- c(lines, "") | ||
|
|
||
| # Encode and run | ||
| lines <- c(lines, " # Encode and run analysis") | ||
| lines <- c(lines, " encoded <- jaspTools:::encodeOptionsAndDataset(opts, dataset)") | ||
| lines <- c(lines, " set.seed(1)") | ||
| lines <- c(lines, paste0(' results <- jaspTools::runAnalysis("', analysisName, '", encoded$dataset, encoded$options, encodedDataset = TRUE)')) |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generateExampleTestBlockBasic has the same pattern as generateExampleTestBlock, interpolating unescaped jaspFileName and analysisName from the .jasp file into R string literals for test_path(...) and runAnalysis("..."). A malicious .jasp with specially crafted filenames or analysis names can therefore inject arbitrary R code into the generated fallback test file, which will be executed when the test suite runs. As above, these values should be escaped for safe inclusion in string literals (or handled as data rather than assembled into raw code) before writing the test code.
|
btw, I get tests errors even when running the check locally with the master branch, not really sure what's wrong there |
This pull request introduces several improvements and new features to jaspTools, focusing on enhanced documentation, more robust option and dataset encoding for analysis reproducibility, and improved compatibility with JASP files. The changes include a comprehensive development guide, new encoding utilities, improved handling of option structures, and support for extracting datasets directly from JASP files.
Documentation and Developer Guidance:
.github/copilot-instructions.mdcovering jaspTools architecture, workflows, conventions, and common pitfalls for new and existing contributors.Encoding and Dataset Utilities:
encodeOptionsAndDataset,encodeOptionsWithMap, andcreateEncodedDatasetto encode options and datasets, ensuring variable names and types are handled generically and reproducibly for testing and analysis. [1] [2] [3] [4]extractDatasetFromJASPFile, making it easier to use real analysis data in tests and debugging. [1] [2]Option Structure and Compatibility Improvements:
fixOptionsForVariableTypes()to recursively handle complex nested options with.typesmetadata, improving compatibility with JASP’s analysis option formats.analysisOptions()to correctly prioritize .jasp file detection and path normalization, enhancing cross-platform compatibility and robustness. [1] [2]Testing and Analysis Execution:
runAnalysis()and related runtime initialization to support encoded datasets, enabling more controlled and reproducible analysis runs for testing. [1] [2] [3]Package Metadata:
DESCRIPTION. Also updatedImportsandSuggeststo include dependencies for new features. [1] [2]These changes collectively improve the usability, reliability, and developer experience of jaspTools, especially for module development, testing, and reproducibility.
See jasp-stats/jaspMetaAnalysis#394 for an example application