MDEV-38904: Fix latin7 collation corruption and my_convert infinite loop#4737
Open
itzanway wants to merge 9 commits intoMariaDB:mainfrom
Open
MDEV-38904: Fix latin7 collation corruption and my_convert infinite loop#4737itzanway wants to merge 9 commits intoMariaDB:mainfrom
itzanway wants to merge 9 commits intoMariaDB:mainfrom
Conversation
Adjusted the latin7_general_ci sort_order array to prevent hyphen and space weight collisions, fixing false B-tree corruption. Added a safety advancement pointer in my_convert_using_func and my_convert_fix to prevent 100% CPU hangs on malformed byte sequences.
gkodinov
requested changes
Mar 6, 2026
Member
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
Couple of things before I start reviewing the substance:
- Please squash all of your commits into a single one
- Please have a commit message that complies with CODING_STANDARDS.md
- Please set your text editor to not convert spaces to table or vice versa.
- Please do not do space only changes.
- Please add test cases.
- Please make sure all the buildbot hosts compile and run tests successfuly
After a brief consultation with the future final reviewer, I should add that we can't "adjust weights" on an existing collation. Especially if it's used to store data on disk. This is not backwards compatible. But this is just a heads up. To be resolved during the final review.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses MDEV-38904, which caused false index corruption in latin7 tables and a subsequent server hang during string conversion. The fix is divided into two parts: correcting the collation logic and hardening the string conversion utility.
The root cause was a transitivity violation in the latin7_general_ci collation. In the original sort_order_latin7_general_ci array, the hyphen (-, 0x2D) and the space ( , 0x20) were assigned weights that caused collisions during MyISAM/Aria index compression.
Changes: Adjusted the sort_order_latin7_general_ci weights to ensure that the space character (Index 32) is uniquely weighted as the minimum printable value and that the hyphen (Index 45) has a distinct weight.
Impact: This prevents CHECK TABLE from falsely reporting "Key in wrong position" and prevents the creation of circular B-tree pointers that caused the server to loop.
Even in cases of table corruption, the server should not hang at 100% CPU. The previous implementation of my_convert_using_func and my_convert_fix did not explicitly force a pointer advancement when the character set's mb_wc function returned a length of 0 (encountered during malformed/corrupt byte sequence reads).
Changes: Added an explicit from++ advancement when cnvres == 0.
Impact: This ensures the conversion loop always terminates, replacing malformed bytes with a '?' placeholder instead of looping infinitely.
Bug- https://jira.mariadb.org/browse/MDEV-38904?filter=-4