Release Version 0.4: Enhanced STJ Format with New Validation and Features #5
yaniv-golan
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Release Version 0.4: Enhanced STJ Format with New Validation and Features
This update introduces significant enhancements to the specification, schema, tools, and documentation to improve the flexibility, interoperability, and robustness of the STJ format.
What's New
1. Addition of
word_timing_modeword_timing_modefield in thesegmentsobject to indicate the completeness of word-level timing data."complete": All words in thetextare included in thewordsarray."partial": Only some words are included in thewordsarray."none": No word-level timing data is provided.wordsarray within a segment, especially when dealing with incomplete or absent word-level timing information.2. Updated JSON Schema (
stj-schema.json)word_timing_modefield to the schema.startandendtimes.[0.0, 1.0].startequalsend) must include the appropriate duration flag set to"zero"inadditional_info.3. Enhanced Validators
Python Validator (
stj_validator.py)word_timing_modeconsistency with the presence and completeness of thewordsarray.textfield and the concatenatedwordsarray whenword_timing_modeis"complete".iso639-langlibrary (version 2.4.2).JavaScript Validator (
stj-validator.js)iso-639-1package to validate language codes.4. Updated Conversion Tools
stj_to_srt,stj_to_vtt,stj_to_ass):word_timing_modefield appropriately.word_timing_modeset to"complete", the text is reconstructed from thewordsarray.5. Comprehensive Test Coverage
overlapping_segments.stj.jsoninvalid_language.stj.jsoninvalid_word_timing_mode.stj.jsonzero_duration_word_without_flag.stj.jsoninvalid_confidence_scores.stj.jsoninvalid_speaker_id.stj.jsonword_outside_segment_timings.stj.jsonwords_overlap_or_out_of_order.stj.json6. Documentation Updates
stj-specification.md):word_timing_modefield and new validation requirements.Breaking Changes
word_timing_modeis correctly set in segments.How to Upgrade
Update Your STJ Files:
word_timing_modefield to segments as appropriate.Update Tools and Dependencies:
stj_validator.pyand conversion scripts from the repository.iso639-langversion 2.4.2:stj-validator.jsand conversion scripts.iso-639-1package:Run Validation:
Update Integrations:
Acknowledgments
This release was triggered by mluggy comments on word-by-word caption tools expectations. Thanks for spotting this!
Feedback and Contributions
If you encounter any issues, have suggestions, or would like to contribute to the project, please:
Links
This discussion was created from the release Release Version 0.4: Enhanced STJ Format with New Validation and Features.
Beta Was this translation helpful? Give feedback.
All reactions