Skip to content

Bump chardet from 5.2.0 to 7.3.0#1655

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.3.0
Open

Bump chardet from 5.2.0 to 7.3.0#1655
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.3.0

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Mar 24, 2026

Bumps chardet from 5.2.0 to 7.3.0.

Release notes

Sourced from chardet's releases.

7.3.0

License

  • 0BSD license — the project license has been changed from MIT to 0BSD, a maximally permissive license with no attribution requirement. All prior 7.x releases should also be considered 0BSD licensed as of this release.

Features

  • Added mime_type field to detection results — identifies file types for both binary (via magic number matching) and text content. Returned in all detect(), detect_all(), and UniversalDetector results. (#350)
  • New pipeline/magic.py module detects 40+ binary file formats including images, audio/video, archives, documents, executables, and fonts. ZIP-based formats (XLSX, DOCX, JAR, APK, EPUB, wheel, OpenDocument) are distinguished by entry filenames. (#350)

Bug Fixes

  • Fixed incorrect equivalence between UTF-16-LE and UTF-16-BE in accuracy testing — these are distinct encodings with different byte order, not interchangeable

Performance

  • Added 4 new modules to mypyc compilation (orchestrator, confusion, magic, ascii), bringing the total to 11 compiled modules
  • Capped statistical scoring at 16 KB — bigram models converge quickly, so large files no longer score the full 200 KB. Worst-case detection time dropped from 62ms to 26ms with no accuracy loss.
  • Replaced dataclasses.replace() with direct DetectionResult construction on hot paths, eliminating ~354k function calls per full test suite run

Build

  • Added riscv64 to the mypyc wheel build matrix — prebuilt wheels are now published for RISC-V Linux alongside existing architectures (#348, thanks @​gounthar)

chardet 7.2.0

Features

  • Added include_encodings and exclude_encodings parameters to detect(), detect_all(), and UniversalDetector — restrict or exclude specific encodings from the candidate set, with corresponding -i/--include-encodings and -x/--exclude-encodings CLI flags (#343)
  • Added no_match_encoding (default "cp1252") and empty_input_encoding (default "utf-8") parameters — control which encoding is returned when no candidate survives the pipeline or the input is empty, with corresponding CLI flags (#343)
  • Added -l/--language flag to chardetect CLI — shows the detected language (ISO 639-1 code and English name) alongside the encoding (#342)

Fixes

  • Fixed null-separated ASCII data being misdetected as UTF-16-BE (#346, #347)

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

chardet 7.1.0

Features

  • Added PEP 263 encoding declaration detection — # -*- coding: ... -*- and # coding=... declarations on lines 1–2 of Python source files are now recognized with confidence 0.95 (#249)
  • Added chardet.universaldetector backward-compatibility stub so that from chardet.universaldetector import UniversalDetector works with a deprecation warning (#341)

Fixes

  • Fixed false UTF-7 detection of ASCII text containing ++ or +word patterns (#332)
  • Fixed 0.5s startup cost on first detect() call — model norms are now computed during loading instead of lazily iterating 21M entries (#333)
  • Fixed undocumented encoding name changes between chardet 5.x and 7.0 — detect() now returns chardet 5.x-compatible names by default (#338)
  • Improved ISO-2022-JP family detection — recognizes ESC sequences for ISO-2022-JP-2004 (JIS X 0213) and ISO-2022-JP-EXT (JIS X 0201 Kana)
  • Fixed silent truncation of corrupt model data (iter_unpack yielded fewer tuples instead of raising)

... (truncated)

Changelog

Sourced from chardet's changelog.

7.3.0 (2026-03-24)

License:

  • 0BSD license — the project license has been changed from MIT to 0BSD <https://opensource.org/license/0bsd>, a maximally permissive license with no attribution requirement. All prior 7.x releases should also be considered 0BSD licensed as of this release. (Dan Blanchard <https://github.com/dan-blanchard>)

Features:

  • Added mime_type field to detection results — identifies file types for both binary (via magic number matching) and text content. Returned in all detect(), detect_all(), and UniversalDetector results. (Dan Blanchard <https://github.com/dan-blanchard>, [#350](https://github.com/chardet/chardet/issues/350) <https://github.com/chardet/chardet/pull/350>)
  • New pipeline/magic.py module detects 40+ binary file formats including images, audio/video, archives, documents, executables, and fonts. ZIP-based formats (XLSX, DOCX, JAR, APK, EPUB, wheel, OpenDocument) are distinguished by entry filenames. (Dan Blanchard <https://github.com/dan-blanchard>, [#350](https://github.com/chardet/chardet/issues/350) <https://github.com/chardet/chardet/pull/350>)

Bug Fixes:

  • Fixed incorrect equivalence between UTF-16-LE and UTF-16-BE in accuracy testing — these are distinct encodings with different byte order, not interchangeable (Dan Blanchard <https://github.com/dan-blanchard>_)

Performance:

  • Added 4 new modules to mypyc compilation (orchestrator, confusion, magic, ascii), bringing the total to 11 compiled modules (Dan Blanchard <https://github.com/dan-blanchard>_)
  • Capped statistical scoring at 16 KB — bigram models converge quickly, so large files no longer score the full 200 KB. Worst-case detection time dropped from 62ms to 26ms with no accuracy loss. (Dan Blanchard <https://github.com/dan-blanchard>_)
  • Replaced dataclasses.replace() with direct DetectionResult construction on hot paths, eliminating ~354k function calls per full test suite run (Dan Blanchard <https://github.com/dan-blanchard>_)

Build:

  • Added riscv64 to the mypyc wheel build matrix — prebuilt wheels are now published for RISC-V Linux alongside existing architectures

... (truncated)

Commits
  • 9402975 docs: clarify that git tags have no v prefix in CLAUDE.md
  • 80f5d95 docs: explicitly list all performance.rst tables in update skill
  • 4ca09ed docs: single alphabetical table for supported MIME types
  • 78c3551 docs: add supported MIME types page with auto-generation script
  • d59bbba docs: fix stale numbers, add mime_type, update pipeline and mypyc lists
  • 601cde1 docs: add riscv64 wheels and UTF-16 endianness fix to 7.3.0 changelog
  • f00bd2d docs: finalize 7.3.0 changelog with 2026-03-24 release date
  • 3ed9b3e docs: remove EncodingEra from What's New (was added in 6.x)
  • dab6952 docs: update What's New section with mime_type and encoding filters
  • 3466c96 docs: say chardet 7, not 7.0, in README headers
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.3.0.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.3.0)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.3.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 24, 2026
@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.72%. Comparing base (b98e44b) to head (97539a9).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1655      +/-   ##
==========================================
- Coverage   54.74%   54.72%   -0.02%     
==========================================
  Files         335      335              
  Lines       27400    27400              
==========================================
- Hits        15000    14995       -5     
- Misses      12400    12405       +5     
Flag Coverage Δ
functionaltests 0.00% <ø> (ø)
unittests 54.72% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Development

Successfully merging this pull request may close these issues.

0 participants