NLSS Demo Output

Find demo output files of NLSS.

Data Exploration

Date: 20251227

NLSS needed 4 minutes to run a data exploration report on responses dataset in WSL2 environment.

Guided Full Report

Date: 20251229

NLSS needed 24 minutes to run all analyses, reported in 20251229/report_canonical.md and some additional time to apply formatting. Two follow-up prompts were needed.

Smoke Tests

Date: 20260101

We ran 628 smoke tests on all statistical modules and functionalities (subskills and utilities) of NLSS using bash scripts (Linux/WSL) and PowerShell (Windows) against the respective R scripts via Rscript calls. Those smoke tests included positive, edge and negative tests. All tests passed with the expected behavior.

Smoke tests were run using golden_dataset.csv (see 20260102/prompt-robustness-tests/golden_dataset.csv).

Smoke test suite is part of the NLSS repository and lives in tests/smoke/.

Automatic Full Reports by Model

Date: 20260102

Full reports by model:

GPT-5.2-Codex-Low: 20260102/full-reports/GPT-5.2-Codex-Low/
GPT-5.2-Codex-Medium: 20260102/full-reports/GPT-5.2-Codex-Medium/
GPT-5.2-Codex-High: 20260102/full-reports/GPT-5.2-Codex-High/
GPT-5.2-Codex-Extra_High: 20260102/full-reports/GPT-5.2-Codex-XHigh/
Claude-4.5-Sonnet: 20260102/full-reports/Claude-4.5-Sonnet/
Claude-4.5-Opus: 20260102/full-reports/Claude-4.5-Opus/

For Codex models, the higher the thinking effort, the more polished and complete the report.

Claude Sonnet missed governance following completely, while Opus did well here. Opus' report still was more bullet-point style and less narrative than Codex's.

Codex fullows NLSS governance much more strictly than Claude, even in the Low thinking effort.

Prompt Robustness Tests

Date: 20260102

We tested 10 different prompts with identical statistical intent against each of the 15 statistical modules (subskills) of NLSS, respectively. You can find the prompts as well as the intended Rscript call in 20260102/prompt-robustness-tests/prompts.csv.

Of the 150 total tests, 145 (96.7%) matched the expected statistical module and produced correct results. The 5 partially correct results used equivalent but different statistical methods than expected (e.g., correlation instead of test-retest-reliability analysis, crosstabs instead of grouped frequency tables).

Find the machine readable results in 20260102/prompt-robustness-tests/protocol_log.jsonl and the human readable results in 20260102/prompt-robustness-tests/report_canonical_reconstructed.md.

We used GPT-5.2-Codex-Low (lowest thinking effort) for all prompt robustness tests.

Prompt robustness tests were run using golden_dataset.csv (see 20260102/prompt-robustness-tests/golden_dataset.csv).

Prompt robustness test suite is part of the NLSS repository and lives in tests/prompt-robustness/.

Golden Values Tests

Date: 20260105

We ran 291 tests using golden_dataset.csv (see 20260102/prompt-robustness-tests/golden_dataset.csv) to compare NLSS results against idenpendently calculated golden values (R 4.5.2) for the implemented 15 statistical modules. All tests passed, confirming the accuracy of NLSS's statistical computations.

Golden values test suite is part of the NLSS repository, lives in tests/values/ and is wired into the smoke test suite.

NLSS is Live

Check out NLSS.

Cite

If you use NLSS-Demo in your work, please cite:

Hammes, M. (2026). docmh/nlss-demo: NLSS-Demo [Software]. Zenodo. https://doi.org/10.5281/zenodo.18173608

License

This repository's contents are licensed under Apache 2.0. See LICENSE.

NLSS is a trademark of Mike Hammes. See TRADEMARKS.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLSS Demo Output

Data Exploration

Guided Full Report

Smoke Tests

Automatic Full Reports by Model

Prompt Robustness Tests

Golden Values Tests

NLSS is Live

Cite

License

About

Uh oh!

Releases 2

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
20251227		20251227
20251229		20251229
20260102		20260102
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
TRADEMARKS.md		TRADEMARKS.md

License

docmh/nlss-demo

Folders and files

Latest commit

History

Repository files navigation

NLSS Demo Output

Data Exploration

Guided Full Report

Smoke Tests

Automatic Full Reports by Model

Prompt Robustness Tests

Golden Values Tests

NLSS is Live

Cite

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages