MDEV-17677: Fix keyword parsing that treated as identifier when immediately followed by dot#4713
MDEV-17677: Fix keyword parsing that treated as identifier when immediately followed by dot#4713Mahmoud-kh1 wants to merge 1 commit intoMariaDB:10.11from
Conversation
362831f to
b968f70
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
Please have a commit message to your commit that complies with CODING_STANDARDS.md.
gkodinov
left a comment
There was a problem hiding this comment.
LGTM. Thank you for working on this with me. Please stand by for the final review.
And please remove the leading colon from the test's comments. It doesn't look "right" in the .result file :)
mysql-test/main/parser.test
Outdated
| --echo # MDEV-17677 : Keywords are parsed as identifiers when followed by a dot | ||
| --echo # | ||
|
|
||
| --echo : test for Nd (should work) |
Problem: for example SELECT.1 output parser error instead of output 0.1
Fix: Check if char after dot is an identifier-extend (Mn,Mc,Nd,Pc,Cf)
which cannot be start of identifiers for SQL standard
abarkov
left a comment
There was a problem hiding this comment.
Please make a simple change: fix the condition to use my_isdigit() as proposed in the comment.
| (ctype & _MY_NMR) || | ||
| ((ctype & _MY_PNT) && !(ctype & _MY_L)) || | ||
| (ctype & _MY_CTR); | ||
|
|
There was a problem hiding this comment.
The above assignment statement is equivalent to:
bool is_identifier_extend = ctype & (_MY_PNT | _MY_NMR | _MY_CTR);
There was a problem hiding this comment.
I'd also suggest to change the comment. So collectively it could look like this:
/*
The SQL Standard says:
An <identifier extend> is U+00B7 MIDDLE DOT, or any character in
the Unicode General Category classes Mn, Mc, Nd, Pc, or Cf.
We can use this approximation for the above:
*/
bool is_identifier_extend = ctype & (_MY_PNT | _MY_NMR | _MY_CTR);
But I'm afraid we cannot use ctype() here. ctype() does not distinguish between all Unicode character categories. See all use cases of _MY_PNT and _MY_CTR in strings/uctypedump.c.
This is the program which was used to dump the ctype data.
I suggest for now simply to change the condition to:
if (start == get_ptr() && c == '.' && ident_map[(uchar) yyPeek()] &&
!my_isdigit(cs, yyPeek())
problem :

Keywords immediately followed by a dot ('.') were incorrectly parsed as identifiers instead of keywords. This caused syntax errors for "SELECT.1" which should be parsed as keyword SELECT followed by decimal .1
How we fix it :

we check if a token is a keyword before skipping keyword lookup for qualified identifiers. This
allows keywords to still treated as keywords when followed by dot.
Now it works
Test:
Add some cases in Parser test
bug :
MDEV-17677