Skip to content

Releases: addok/addok-fr

v1.1.0

01 Nov 14:33
a162e8c

Choose a tag to compare

This release brings a major infrastructure upgrade with the integration of long-standing contributions from @cquest, improved caching strategy, and modernized packaging.

✨ What's New

🎯 Integration of @cquest's Work (#9)

After several years, we've finally merged the valuable contributions from @cquest's cquest-11rc1 branch (originally from 2016-2018):

Enhanced Phonemicization Rules:

  • Better handling of complex vowel combinations (ae, ei, oeu)
  • Improved nasal sound processing (m → n before labial/dental consonants)
  • Special cases like "oeufs" → "eu"
  • Enhanced support for "y" at word beginning
  • More accurate silent consonant handling (including "gn": seigneur → senieur)
  • Better duplicate letter removal
  • Compiled regex patterns for better performance

Extended Synonym Coverage (#6):

  • Added ~48 new synonym mappings (175 → 223 entries)
  • Additional address abbreviations and variations
  • Includes mappings like "clef/clefs → cle/cles", "gir → giratoire"
  • Fixed duplicate entries

🔧 LRU Cache Implementation (#7)

Replaced the simple dictionary cache with a memory-efficient LRU (Least Recently Used) cache:

  • Configurable cache size via PHONEMICIZE_CACHE_SIZE setting
  • Default: 500,000 entries (~86 MB), suitable for most French address datasets
  • Recommendations:
    • 500K entries (~86 MB): Default, suitable for most datasets
    • 1M entries (~172 MB): For larger datasets with more unique words
    • 250K entries (~43 MB): For memory-constrained environments
  • Fixed race condition in cache initialization
  • Prevents unbounded memory growth while maintaining performance

🔧 Infrastructure & Development

  • Modernized Packaging (#5)

    • Migrated from setup.py to pyproject.toml (PEP 517/518 compliant)
    • Modern build system with setuptools>=65.0
    • Support for Python 3.9 through 3.14
  • CI/CD Setup (#8)

    • Added GitHub Actions workflow for automated testing
    • Redis integration in CI pipeline
    • Automated pytest and coverage reporting
  • Development Environment

    • Added dev dependencies: pytest, pytest-cov, build, twine

📝 Documentation

  • Added cache configuration guidance in README
  • Memory usage recommendations for different cache sizes
  • Improved wording and examples

🔄 Upgrading from 1.0.1

This release is fully backward compatible:

pip install --upgrade addok-fr

Optional: Configure cache size in your Addok config if needed:

PHONEMICIZE_CACHE_SIZE = 500_000  # Adjust based on your needs

No other configuration changes required.


Full Changelog: 1.0.1...v1.1.0