Skip to content
/ server Public

MDEV-17677: Fix keyword parsing that treated as identifier when immediately followed by dot#4713

Open
Mahmoud-kh1 wants to merge 1 commit intoMariaDB:10.11from
Mahmoud-kh1:kw-dot
Open

MDEV-17677: Fix keyword parsing that treated as identifier when immediately followed by dot#4713
Mahmoud-kh1 wants to merge 1 commit intoMariaDB:10.11from
Mahmoud-kh1:kw-dot

Conversation

@Mahmoud-kh1
Copy link

@Mahmoud-kh1 Mahmoud-kh1 commented Feb 28, 2026

problem :
Keywords immediately followed by a dot ('.') were incorrectly parsed as identifiers instead of keywords. This caused syntax errors for "SELECT.1" which should be parsed as keyword SELECT followed by decimal .1
image

How we fix it :
we check if a token is a keyword before skipping keyword lookup for qualified identifiers. This
allows keywords to still treated as keywords when followed by dot.
Now it works
image

Test:
Add some cases in Parser test

bug :
MDEV-17677

@Mahmoud-kh1 Mahmoud-kh1 marked this pull request as draft February 28, 2026 20:56
@Mahmoud-kh1 Mahmoud-kh1 force-pushed the kw-dot branch 4 times, most recently from 362831f to b968f70 Compare February 28, 2026 23:28
@Mahmoud-kh1 Mahmoud-kh1 marked this pull request as ready for review March 1, 2026 00:17
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 2, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please have a commit message to your commit that complies with CODING_STANDARDS.md.

@Mahmoud-kh1 Mahmoud-kh1 changed the base branch from 12.3 to 10.11 March 5, 2026 02:18
@Mahmoud-kh1 Mahmoud-kh1 changed the base branch from 10.11 to 12.3 March 5, 2026 02:34
@CLAassistant
Copy link

CLAassistant commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

@Mahmoud-kh1 Mahmoud-kh1 changed the base branch from 12.3 to 10.11 March 5, 2026 05:24
@Mahmoud-kh1 Mahmoud-kh1 requested a review from gkodinov March 5, 2026 06:11
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for working on this with me. Please stand by for the final review.

And please remove the leading colon from the test's comments. It doesn't look "right" in the .result file :)

--echo # MDEV-17677 : Keywords are parsed as identifiers when followed by a dot
--echo #

--echo : test for Nd (should work)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the extra leading colon?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Problem: for example SELECT.1 output parser error instead of output 0.1
Fix: Check if char after dot is an identifier-extend (Mn,Mc,Nd,Pc,Cf)
     which cannot be start of identifiers for  SQL standard
Copy link
Contributor

@abarkov abarkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a simple change: fix the condition to use my_isdigit() as proposed in the comment.

(ctype & _MY_NMR) ||
((ctype & _MY_PNT) && !(ctype & _MY_L)) ||
(ctype & _MY_CTR);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above assignment statement is equivalent to:
bool is_identifier_extend = ctype & (_MY_PNT | _MY_NMR | _MY_CTR);

Copy link
Contributor

@abarkov abarkov Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also suggest to change the comment. So collectively it could look like this:

/*
  The SQL Standard says:
    An <identifier extend> is U+00B7 MIDDLE DOT, or any character in
    the Unicode General Category classes Mn, Mc, Nd, Pc, or Cf.
  We can use this approximation for the above:
*/
bool is_identifier_extend = ctype & (_MY_PNT | _MY_NMR | _MY_CTR);

But I'm afraid we cannot use ctype() here. ctype() does not distinguish between all Unicode character categories. See all use cases of _MY_PNT and _MY_CTR in strings/uctypedump.c.
This is the program which was used to dump the ctype data.

I suggest for now simply to change the condition to:

if (start == get_ptr() && c == '.' && ident_map[(uchar) yyPeek()] &&
    !my_isdigit(cs, yyPeek())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

4 participants