Skip to content

Comments

Implement parser support#51

Open
titzer wants to merge 11 commits intomainfrom
parser_impl
Open

Implement parser support#51
titzer wants to merge 11 commits intomainfrom
parser_impl

Conversation

@titzer
Copy link
Contributor

@titzer titzer commented Dec 8, 2025

This implements parser support and (I think) the intended semantics that custom page sizes that are 0 or not powers of two are malformed, and powers of two that are not 1 or 65536 are invalid.

@rossberg PTAL. There is a shift/reduce conflict here due to the factoring of the pagetype/memorytype having to do with the syntactic sugar for data segments. I think it requires left-factoring the grammar and rearranging it a little; I wasn't sure if you a convention you were following for how that's done.

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sort of parser conflict arises whenever you allow a parenthesised phrase (like pagetype here) to be empty that can be followed by another parenthesised clause. Because of the paren token, the parser would need a look-ahead of 2 to know what case it's in.

The only way I know to resolve this in an LALR(1) grammar is by not using an empty production but instead duplicating the use sites of the symbol, once with and once without the extra phrase. That's what I did in other such cases anyway.

A couple of high-level comments:

  • Given that the binary format for pagetype is in logarithmic representation, the AST should probably reflect that. That's what we do for alignment annotations as well. Then some weird cases (like PageT 0) are avoided by construction.

  • Please expand tabs.


pagetype :
| LPAR PAGESIZE NAT RPAR
{ let v' =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this using the nat32 helper and Lib.Int.is_power_of_two, cf. the action for the align production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using power_of_two now, will rework for nat32.

@titzer
Copy link
Contributor Author

titzer commented Dec 9, 2025

Yeah, this sort of parser conflict arises whenever you allow a parenthesised phrase (like pagetype here) to be empty that can be followed by another parenthesised clause. Because of the paren token, the parser would need a look-ahead of 2 to know what case it's in.

The only way I know to resolve this in an LALR(1) grammar is by not using an empty production but instead duplicating the use sites of the symbol, once with and once without the extra phrase. That's what I did in other such cases anyway.

Thanks, there was a quick fix.

A couple of high-level comments:

* Given that the binary format for pagetype is in logarithmic representation, the AST should probably reflect that. That's what we do for alignment annotations as well. Then some weird cases (like PageT 0) are avoided by construction.

I had considered this at first but started off using the size as an int. I'll rework it to use a nat32 of the logarithm.

* Please expand tabs.

Done.

@titzer
Copy link
Contributor Author

titzer commented Dec 25, 2025

I've rebased the parser implementation on the logarithmic representation in #53

@titzer
Copy link
Contributor Author

titzer commented Jan 6, 2026

@rossberg PTAL

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

{ fun c x loc -> let mems, data, ims, exs = $2 c x loc in
mems, data, ims, $1 (MemoryX x) c :: exs }
| addrtype LPAR DATA string_list RPAR /* Sugar */
mems, data, ims, $1 (MemoryX x) c :: exs }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mems, data, ims, $1 (MemoryX x) c :: exs }
mems, data, ims, $1 (MemoryX x) c :: exs }

mems, data, ims, $1 (MemoryX x) c :: exs }
| addrtype LPAR DATA string_list RPAR /* Sugar */
mems, data, ims, $1 (MemoryX x) c :: exs }
| addrtype pagetype LPAR DATA string_list RPAR /* Sugar */
Copy link
Member

@rossberg rossberg Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pagetype needs to be optional (independently of the addrtype) in the presence of inline data, otherwise this would be a breaking change. You'll probably need two productions to handle this without s/r conflict.

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, please never force-push to a PR branch, since that breaks central functionality of GH's review UI, such as New Changes view or comment history.

Comment on lines +469 to +478
pagetype :
| LPAR PAGESIZE NAT RPAR
{ let n = (nat32 $3 $loc($3)) in
if not (Lib.Int32.is_power_of_two_unsigned n) then
error (at $sloc) "invalid custom page size: must be power of two";
PageT (Int32.to_int (Lib.Int32.log2_unsigned n)) }

memorytype :
| addrtype limits { fun c -> MemoryT ($1, $2, PageT 16) }
| addrtype limits pagetype { fun c -> MemoryT ($1, $2, $3) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: if you declare pagetype inline, then you can factor the empty case into it without having to duplicate rules at use sites:

Suggested change
pagetype :
| LPAR PAGESIZE NAT RPAR
{ let n = (nat32 $3 $loc($3)) in
if not (Lib.Int32.is_power_of_two_unsigned n) then
error (at $sloc) "invalid custom page size: must be power of two";
PageT (Int32.to_int (Lib.Int32.log2_unsigned n)) }
memorytype :
| addrtype limits { fun c -> MemoryT ($1, $2, PageT 16) }
| addrtype limits pagetype { fun c -> MemoryT ($1, $2, $3) }
%inline pagetype :
| LPAR PAGESIZE NAT RPAR
{ let n = (nat32 $3 $loc($3)) in
if not (Lib.Int32.is_power_of_two_unsigned n) then
error (at $sloc) "invalid custom page size: must be power of two";
PageT (Int32.to_int (Lib.Int32.log2_unsigned n)) }
| /* empty */ { PageT 0x1_0000l } /* Sugar */
memorytype :
| addrtype limits pagetype { fun c -> MemoryT ($1, $2, $3) }

titzer and others added 4 commits February 17, 2026 12:36
Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org>
Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants