Skip to content

Conversation

@FBartos
Copy link
Contributor

@FBartos FBartos commented Dec 22, 2025

This pull request introduces several improvements and new features to jaspTools, focusing on enhanced documentation, more robust option and dataset encoding for analysis reproducibility, and improved compatibility with JASP files. The changes include a comprehensive development guide, new encoding utilities, improved handling of option structures, and support for extracting datasets directly from JASP files.

Documentation and Developer Guidance:

  • Added a detailed development guide in .github/copilot-instructions.md covering jaspTools architecture, workflows, conventions, and common pitfalls for new and existing contributors.

Encoding and Dataset Utilities:

  • Introduced new functions: encodeOptionsAndDataset, encodeOptionsWithMap, and createEncodedDataset to encode options and datasets, ensuring variable names and types are handled generically and reproducibly for testing and analysis. [1] [2] [3] [4]
  • Added support for extracting datasets from JASP files via extractDatasetFromJASPFile, making it easier to use real analysis data in tests and debugging. [1] [2]

Option Structure and Compatibility Improvements:

  • Refactored option flattening logic in fixOptionsForVariableTypes() to recursively handle complex nested options with .types metadata, improving compatibility with JASP’s analysis option formats.
  • Improved analysisOptions() to correctly prioritize .jasp file detection and path normalization, enhancing cross-platform compatibility and robustness. [1] [2]

Testing and Analysis Execution:

  • Updated runAnalysis() and related runtime initialization to support encoded datasets, enabling more controlled and reproducible analysis runs for testing. [1] [2] [3]

Package Metadata:

  • Updated package version to 0.20.1 and added new contributor to DESCRIPTION. Also updated Imports and Suggests to include dependencies for new features. [1] [2]

These changes collectively improve the usability, reliability, and developer experience of jaspTools, especially for module development, testing, and reproducibility.

See jasp-stats/jaspMetaAnalysis#394 for an example application

one-shotted using Claude Opus 4.5 with the following prompt:

Create `extractDatasetFromJASPfile` function that allows extracting datataset as a data.frame from saved JASP files.

There is an example JASP file in `tests/JASPFiles/debug-descriptives.JASP`

You will probably need to do something like this:

#' \itemize{
#'   \item Unpacking the .jasp archive (which is a zip file)
#'   \item Reading the internal.sqlite database
#'   \item Converting Column_N_DBL and Column_N_INT columns to properly named columns
#'   \item Mapping factor levels from the Labels table to create proper R factors
#'   \item Handling both explicitly nominal columns and columns with label mappings
#' }

you might need packages like:

'DBI','RSQLite', and 'archive'

The extracted file should exactly match the `debug-descriptives.csv` file

Once you are done, add a unit test that verifies the extraction.

Be aware that the file continsNAs, NaNs, Infs, different types of strings and encodings etc....
Introduces an encodedDataset argument to preloadDataset, runAnalysis, and initAnalysisRuntime functions. This allows skipping column name repair and type detection when the dataset is already encoded and typed, improving flexibility for pre-processed datasets.
Introduces makeTestsFromExamples and related internal functions to generate unit tests from JASP example files. Refactors extractDatasetFromJASPfile to extractDatasetFromJASPFile for consistency, updates documentation and test files accordingly, and exports new functions in NAMESPACE. Also improves analysisOptions to better handle file paths and .jasp file detection.
Copilot AI review requested due to automatic review settings December 22, 2025 10:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces comprehensive improvements to jaspTools focused on enhancing reproducibility, testing workflows, and compatibility with JASP files. The changes enable developers to extract datasets and options directly from .jasp files, encode them for reproducible testing, and automatically generate test files from JASP examples.

Key Changes:

  • New dataset and option encoding utilities for standardized, reproducible testing with generic variable names
  • JASP file dataset extraction capabilities through SQLite database parsing
  • Automated test generation from JASP example files
  • Improved option structure handling for nested types metadata
  • Comprehensive developer documentation

Reviewed changes

Copilot reviewed 25 out of 27 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
tests/testthat/test-extractDatasetFromJASPfile.R Comprehensive tests for dataset extraction from JASP files covering various data types (numeric, categorical, binary, unicode, infinity, NaN)
tests/testthat/test-encodeOptionsAndDataset.R Tests for the new encoding functionality ensuring proper variable-type mapping and dataset transformation
tests/testthat/test-analysisOptions.R Tests verifying the flattening of complex nested option structures with types metadata
tests/testthat.R Standard testthat entry point for running package tests
tests/JASPFiles/debug-descriptives.csv Test data file with comprehensive coverage of edge cases (100 rows with 31 columns including special values)
R/test-generator.R New functionality for automatically generating test files from JASP examples with makeTestsFromExamples()
R/run.R Enhanced runAnalysis() and initAnalysisRuntime() to support encoded datasets via new encodedDataset parameter
R/options.R Improved analysisOptions() with cross-platform path handling and refactored fixOptionsForVariableTypes() for recursive flattening
R/dataset.R New functions: extractDatasetFromJASPFile(), encodeOptionsAndDataset(), and supporting utilities for encoding/extraction workflows
man/*.Rd Documentation files for new exported functions and updated signatures
.github/copilot-instructions.md Comprehensive developer guide covering architecture, workflows, conventions, and common pitfalls
NAMESPACE Exports new public functions: extractDatasetFromJASPFile, encodeOptionsAndDataset, makeTestsFromExamples
DESCRIPTION Version bump to 0.20.1, added contributor, and suggested dependencies (DBI, RSQLite) for JASP file processing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

utils::unzip(jaspFile, exdir = tempDir)

# Check for internal.sqlite

Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 60 has an empty comment line which serves no purpose and should be removed to keep the code clean.

Suggested change

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,197 @@
context("extractDatasetFromJASPFile")
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file name has an inconsistent capitalization: "test-extractDatasetFromJASPfile.R" uses lowercase "file", but the function name is "extractDatasetFromJASPFile" with uppercase "F". This inconsistency could cause confusion for developers looking for the test file.

Copilot uses AI. Check for mistakes.
quiet = FALSE,
makeTests = FALSE
makeTests = FALSE,
encodedDataset = FALSE
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new encodedDataset parameter is added without documentation. The @param section should include documentation explaining what this parameter does and when it should be set to TRUE.

Copilot uses AI. Check for mistakes.
runAnalysis(
name,
dataset,
dataset = NULL,
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for the dataset parameter is changed to NULL, but there's no visible documentation update in the @param section or function body explaining how NULL datasets are handled. The documentation should be updated to clarify this behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +58
# Extract the JASP file (it's a zip archive)
utils::unzip(jaspFile, exdir = tempDir)

Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractDatasetFromJASPFile unzips a user-specified .jasp archive directly into a temporary directory via utils::unzip(jaspFile, exdir = tempDir) without validating or constraining the archive’s internal paths. A malicious .jasp file can include entries with .. or absolute paths, causing files outside tempDir (e.g., .Rprofile, project sources) to be overwritten when extracted, which can lead to arbitrary code execution the next time R or the project is loaded. To fix this, avoid raw unzip on untrusted archives: either use an API that enforces safe extraction or explicitly inspect archive member paths and refuse any that escape the intended directory before extracting and reading internal.sqlite.

Suggested change
# Extract the JASP file (it's a zip archive)
utils::unzip(jaspFile, exdir = tempDir)
# Safely inspect and extract the JASP file (it's a zip archive)
zipContents <- utils::unzip(jaspFile, list = TRUE)
entryNames <- zipContents$Name
# Basic validation of archive member paths to prevent path traversal
isAbsolute <- grepl("^(/|\\\\|[A-Za-z]:|~)", entryNames)
hasDotDot <- grepl("(^|/|\\\\)\\.\\.($|/|\\\\)", entryNames)
if (any(isAbsolute | hasDotDot)) {
stop("The JASP file contains invalid paths (absolute paths or '..' components). ",
"Refusing to extract for security reasons.")
}
# Ensure that the normalized target paths stay within tempDir
baseDir <- normalizePath(tempDir, winslash = "/", mustWork = TRUE)
targetPaths <- file.path(tempDir, entryNames)
normalizedTargets <- normalizePath(targetPaths, winslash = "/", mustWork = FALSE)
escapesTempDir <- !startsWith(normalizedTargets, paste0(baseDir, "/"))
if (any(escapesTempDir, na.rm = TRUE)) {
stop("The JASP file contains entries that would extract outside the temporary directory. ",
"Refusing to extract for security reasons.")
}
# Extract only the validated entries
utils::unzip(jaspFile, exdir = tempDir, files = entryNames)

Copilot uses AI. Check for mistakes.
lines <- character(0)

# Header
lines <- c(lines, paste0('context("Example: ', baseName, '")'))
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateExampleTestFileContent interpolates baseName (derived from the .jasp filename) directly into an R string literal in context("Example: ...") without escaping. An attacker who provides a .jasp file whose basename contains quotes or R code (e.g., evil"); system("rm -rf /"); #) can cause the generated test file to contain and execute arbitrary R code when tests are run. To mitigate this, treat baseName as data by escaping it for use inside string literals (e.g., via encodeString/appropriate quoting) or pass it as a symbol/variable instead of concatenating it into raw code.

Suggested change
lines <- c(lines, paste0('context("Example: ', baseName, '")'))
contextTitle <- paste("Example:", baseName)
safeContextTitle <- encodeString(contextTitle, quote = '"')
lines <- c(lines, paste0("context(", safeContextTitle, ")"))

Copilot uses AI. Check for mistakes.
Comment on lines +338 to +356
lines <- c(lines, paste0(' jaspFile <- testthat::test_path("..", "..", "examples", "', jaspFileName, '")'))

# Generate appropriate options extraction based on number of analyses
if (totalAnalyses == 1) {
# Single analysis: analysisOptions returns options directly (not in a list)
lines <- c(lines, " opts <- jaspTools::analysisOptions(jaspFile)")
} else {
# Multiple analyses: analysisOptions returns a list
lines <- c(lines, paste0(" opts <- jaspTools::analysisOptions(jaspFile)[[", analysisIndex, "]]"))
}

lines <- c(lines, " dataset <- jaspTools::extractDatasetFromJASPFile(jaspFile)")
lines <- c(lines, "")

# Encode and run
lines <- c(lines, " # Encode and run analysis")
lines <- c(lines, " encoded <- jaspTools:::encodeOptionsAndDataset(opts, dataset)")
lines <- c(lines, " set.seed(1)")
lines <- c(lines, paste0(' results <- jaspTools::runAnalysis("', analysisName, '", encoded$dataset, encoded$options, encodedDataset = TRUE)'))
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateExampleTestBlock embeds both jaspFileName and analysisName—values read from the .jasp file—directly into R string literals when constructing the test code (test_path(..., "<name>") and runAnalysis("<name>", ...)) without any escaping. A crafted .jasp whose filename or analysis name contains embedded quotes and R code can cause the generated test file to include attacker-controlled R statements that execute when tests run. To prevent this, ensure these values are safely escaped or quoted for inclusion in R strings (or passed as data rather than concatenated into source code) before writing out the test blocks.

Copilot uses AI. Check for mistakes.
Comment on lines +420 to +438
lines <- c(lines, paste0(' jaspFile <- testthat::test_path("..", "..", "examples", "', jaspFileName, '")'))

# Generate appropriate options extraction based on number of analyses
if (totalAnalyses == 1) {
# Single analysis: analysisOptions returns options directly (not in a list)
lines <- c(lines, " opts <- jaspTools::analysisOptions(jaspFile)")
} else {
# Multiple analyses: analysisOptions returns a list
lines <- c(lines, paste0(" opts <- jaspTools::analysisOptions(jaspFile)[[", analysisIndex, "]]"))
}

lines <- c(lines, " dataset <- jaspTools::extractDatasetFromJASPFile(jaspFile)")
lines <- c(lines, "")

# Encode and run
lines <- c(lines, " # Encode and run analysis")
lines <- c(lines, " encoded <- jaspTools:::encodeOptionsAndDataset(opts, dataset)")
lines <- c(lines, " set.seed(1)")
lines <- c(lines, paste0(' results <- jaspTools::runAnalysis("', analysisName, '", encoded$dataset, encoded$options, encodedDataset = TRUE)'))
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateExampleTestBlockBasic has the same pattern as generateExampleTestBlock, interpolating unescaped jaspFileName and analysisName from the .jasp file into R string literals for test_path(...) and runAnalysis("..."). A malicious .jasp with specially crafted filenames or analysis names can therefore inject arbitrary R code into the generated fallback test file, which will be executed when the test suite runs. As above, these values should be escaped for safe inclusion in string literals (or handled as data rather than assembled into raw code) before writing the test code.

Copilot uses AI. Check for mistakes.
@FBartos
Copy link
Contributor Author

FBartos commented Dec 23, 2025

btw, I get tests errors even when running the check locally with the master branch, not really sure what's wrong there
(despite having set-up jaspTools first)

❯ checking examples ... ERROR
  Running examples in 'jaspTools-Ex.R' failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: analysisOptions
  > ### Title: Obtain options to run JASP analyses with.
  > ### Aliases: analysisOptions
  > 
  > ### ** Examples
  > 
  > jaspOptions <- analysisOptions("~/Documents/someFile.jasp")
  Error in getPkgOption("module.dirs") : 
    jaspTools is not configured yet. Did you run `setupJaspTools()`?
  Calls: analysisOptions ... getModulePathFromRFunction -> getModulePaths -> getPkgOption

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant