Skip to content

Conversation

@bghira
Copy link
Owner

@bghira bghira commented Jan 28, 2026

Closes #2509

This pull request improves the handling of fields with the TEXT_JSON type (such as modelspec_comment) throughout the configuration pipeline, ensuring that JSON arrays are correctly serialized and deserialized when passing through CLI arguments. It also adds comprehensive tests to verify that these fields behave as expected, both for arrays and strings.

Enhancements to TEXT_JSON field handling:

  • Updated mapping_to_cli_args in cli_utils.py to detect fields registered as TEXT_JSON in the field registry and JSON-serialize their values when necessary, ensuring consistent CLI argument formatting for fields like modelspec_comment. [1] [2]
  • Improved _process_modelspec_comment in cmd_args.py to recognize and parse JSON string representations of arrays, converting them into newline-joined strings for downstream use.

Testing improvements:

  • Added the TestTextJsonConfigLoading class in test_text_json_field.py to verify that TEXT_JSON fields (e.g., modelspec_comment) are correctly serialized to JSON in CLI args, passed through as strings when appropriate, and properly processed through the full configuration pipeline.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes end-to-end handling of TEXT_JSON config fields (notably modelspec_comment) so that JSON arrays in config files are preserved correctly through CLI argument generation and post-processing, preventing loss of all but the last line in safetensors metadata.

Changes:

  • Update mapping_to_cli_args to detect TEXT_JSON fields via the field registry and JSON-serialize lists/mappings for those fields when constructing CLI args.
  • Enhance _process_modelspec_comment to recognize JSON array strings, parse them into lists, and then normalize them to newline-joined strings with env placeholder expansion.
  • Add TestTextJsonConfigLoading tests to validate the full config → CLI → argparse → post-processing pipeline for modelspec_comment with both array and string inputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/test_text_json_field.py Adds tests that verify TEXT_JSON fields (especially modelspec_comment) are serialized to CLI JSON correctly and round-trip through _process_modelspec_comment.
simpletuner/helpers/configuration/cmd_args.py Extends _process_modelspec_comment to parse JSON array strings before performing list joining and env placeholder substitution.
simpletuner/helpers/configuration/cli_utils.py Adjusts mapping_to_cli_args to consult the field registry and JSON-serialize values for TEXT_JSON fields as well as existing structured config keys.
Comments suppressed due to low confidence (1)

simpletuner/helpers/configuration/cli_utils.py:204

  • canonical_key is computed twice (lines 196 and 203) with the same logic, which is unnecessary duplication and slightly increases the chance of the two usages drifting apart in future edits. Consider computing canonical_key once before the field_registry lookup and reusing it for the registry lookup, legacy handler resolution, and JSON-serialization checks.
        if field_registry is not None:
            canonical_key = key.lstrip("-") if isinstance(key, str) else key
            field = field_registry.get_field(canonical_key)
            if field is not None and getattr(field, "arg_name", "NOT_NONE") is None:
                continue

        value = transform(key, raw_value) if transform else raw_value

        canonical_key = key.lstrip("-") if isinstance(key, str) else key
        handler = _LEGACY_ARG_HANDLERS.get(canonical_key)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bghira bghira merged commit c1ae17d into main Jan 28, 2026
2 checks passed
@bghira bghira deleted the bugfix/json-cmdargs-parser branch January 28, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

modelspec.comment gets only filled with last line

2 participants