Skip to content

Commit 11c67b4

Browse files
committed
Add support for semantic predicates
This PR adds support for semantic predicates to Lrama, allowing conditional enabling of grammar rules based on runtime conditions. Syntax: ```yacc rule : {expression}? TOKEN { action } | TOKEN { action } ; ``` The predicate {expression}? is evaluated at parse time. If it returns true (non-zero), the alternative is enabled. Example: ```yacc widget : {new_syntax}? WIDGET ID NEW_ARG { printf("New syntax\n"); } | {!new_syntax}? WIDGET ID OLD_ARG { printf("Old syntax\n"); } ; ``` Motivation / Background Semantic predicates enable context-sensitive parsing, which is useful for: - Version-dependent syntax (e.g., supporting both old and new language features) - Context-sensitive keywords (e.g., async in JavaScript that behaves differently based on context) - Conditional grammar rules based on parser state This feature is similar to ANTLR4's semantic predicates and fills a gap in Lrama's capabilities for handling context-dependent grammars. Leading predicates (at the start of a rule) affect prediction, while trailing predicates act as validation.
1 parent ec4238a commit 11c67b4

File tree

23 files changed

+1210
-391
lines changed

23 files changed

+1210
-391
lines changed

NEWS.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,33 @@
22

33
## Lrama 0.7.1 (2025-xx-xx)
44

5+
### Semantic Predicates
6+
7+
Support semantic predicates to conditionally enable grammar rules based on runtime conditions.
8+
Predicates are evaluated at parse time, similar to ANTLR4's semantic predicates.
9+
10+
```yacc
11+
rule : {expression}? TOKEN { action }
12+
| TOKEN { action }
13+
;
14+
```
15+
16+
The predicate `{expression}?` is evaluated at parse time. If it returns true (non-zero), the alternative is enabled.
17+
18+
Example:
19+
20+
```yacc
21+
widget
22+
: {new_syntax}? WIDGET ID NEW_ARG
23+
{ printf("New syntax\n"); }
24+
| {!new_syntax}? WIDGET ID OLD_ARG
25+
{ printf("Old syntax\n"); }
26+
;
27+
```
28+
29+
Predicates are compiled into static functions in the generated parser.
30+
Leading predicates (at the start of a rule) affect prediction, while trailing predicates act as validation.
31+
532
### Syntax Diagrams
633

734
Lrama provides an API for generating HTML syntax diagrams. These visual diagrams are highly useful as grammar development tools and can also serve as a form of automatic self-documentation.

lib/lrama/grammar.rb

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
require_relative "grammar/reference"
1717
require_relative "grammar/rule"
1818
require_relative "grammar/rule_builder"
19+
require_relative "grammar/semantic_predicate"
1920
require_relative "grammar/symbol"
2021
require_relative "grammar/symbols"
2122
require_relative "grammar/type"
@@ -106,9 +107,10 @@ class Grammar
106107
:find_symbol_by_s_value!, :fill_symbol_number, :fill_nterm_type,
107108
:fill_printer, :fill_destructor, :fill_error_token, :sort_by_number!
108109

109-
# @rbs (Counter rule_counter, bool locations, Hash[String, String] define) -> void
110-
def initialize(rule_counter, locations, define = {})
110+
# @rbs (Counter rule_counter, Counter predicate_counter, bool locations, Hash[String, String] define) -> void
111+
def initialize(rule_counter, predicate_counter, locations, define = {})
111112
@rule_counter = rule_counter
113+
@predicate_counter = predicate_counter
112114

113115
# Code defined by "%code"
114116
@percent_codes = []
@@ -139,7 +141,7 @@ def initialize(rule_counter, locations, define = {})
139141

140142
# @rbs (Counter rule_counter, Counter midrule_action_counter) -> RuleBuilder
141143
def create_rule_builder(rule_counter, midrule_action_counter)
142-
RuleBuilder.new(rule_counter, midrule_action_counter, @parameterized_resolver)
144+
RuleBuilder.new(rule_counter, midrule_action_counter, @parameterized_resolver, @predicate_counter)
143145
end
144146

145147
# @rbs (id: Lexer::Token::Base, code: Lexer::Token::UserCode) -> Array[PercentCode]

lib/lrama/grammar/inline/resolver.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ def build_rule(rhs, token, index, rule)
3333
@rule_builder.rule_counter,
3434
@rule_builder.midrule_action_counter,
3535
@rule_builder.parameterized_resolver,
36+
@rule_builder.predicate_counter,
3637
lhs_tag: @rule_builder.lhs_tag
3738
)
3839
resolve_rhs(builder, rhs, index, token, rule)

lib/lrama/grammar/rule.rb

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,15 @@ class Rule < Struct.new(:id, :_lhs, :lhs, :lhs_tag, :_rhs, :rhs, :token_code, :p
2323
# attr_accessor nullable: bool
2424
# attr_accessor precedence_sym: Grammar::Symbol?
2525
# attr_accessor lineno: Integer?
26-
#
27-
# def initialize: (
28-
# ?id: Integer, ?_lhs: Lexer::Token::Base?, ?lhs: Lexer::Token::Base, ?lhs_tag: Lexer::Token::Tag?, ?_rhs: Array[Lexer::Token::Base], ?rhs: Array[Grammar::Symbol],
29-
# ?token_code: Lexer::Token::UserCode?, ?position_in_original_rule_rhs: Integer?, ?nullable: bool,
30-
# ?precedence_sym: Grammar::Symbol?, ?lineno: Integer?
31-
# ) -> void
3226

3327
attr_accessor :original_rule #: Rule
28+
attr_accessor :predicates #: Array[Grammar::SemanticPredicate]
29+
30+
# @rbs (**untyped kwargs) -> void
31+
def initialize(**kwargs)
32+
super(**kwargs)
33+
@predicates = []
34+
end
3435

3536
# @rbs (Rule other) -> bool
3637
def ==(other)

lib/lrama/grammar/rule_builder.rb

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,22 +17,26 @@ class RuleBuilder
1717
# @parameterized_rules: Array[Rule]
1818
# @midrule_action_rules: Array[Rule]
1919
# @replaced_rhs: Array[Lexer::Token::Base]?
20+
# @predicates: Array[[Lexer::Token::SemanticPredicate, bool]]
2021

2122
attr_accessor :lhs #: Lexer::Token::Base?
2223
attr_accessor :line #: Integer?
2324
attr_reader :rule_counter #: Counter
2425
attr_reader :midrule_action_counter #: Counter
2526
attr_reader :parameterized_resolver #: Grammar::Parameterized::Resolver
27+
attr_reader :predicate_counter #: Counter
2628
attr_reader :lhs_tag #: Lexer::Token::Tag?
2729
attr_reader :rhs #: Array[Lexer::Token::Base]
2830
attr_reader :user_code #: Lexer::Token::UserCode?
2931
attr_reader :precedence_sym #: Grammar::Symbol?
32+
attr_reader :predicates
3033

31-
# @rbs (Counter rule_counter, Counter midrule_action_counter, Grammar::Parameterized::Resolver parameterized_resolver, ?Integer position_in_original_rule_rhs, ?lhs_tag: Lexer::Token::Tag?, ?skip_preprocess_references: bool) -> void
32-
def initialize(rule_counter, midrule_action_counter, parameterized_resolver, position_in_original_rule_rhs = nil, lhs_tag: nil, skip_preprocess_references: false)
34+
# @rbs (Counter rule_counter, Counter midrule_action_counter, Grammar::Parameterized::Resolver parameterized_resolver, Counter? predicate_counter, ?Integer position_in_original_rule_rhs, ?lhs_tag: Lexer::Token::Tag?, ?skip_preprocess_references: bool) -> void
35+
def initialize(rule_counter, midrule_action_counter, parameterized_resolver, predicate_counter = nil, position_in_original_rule_rhs = nil, lhs_tag: nil, skip_preprocess_references: false)
3336
@rule_counter = rule_counter
3437
@midrule_action_counter = midrule_action_counter
3538
@parameterized_resolver = parameterized_resolver
39+
@predicate_counter = predicate_counter || Counter.new(0)
3640
@position_in_original_rule_rhs = position_in_original_rule_rhs
3741
@skip_preprocess_references = skip_preprocess_references
3842

@@ -41,6 +45,7 @@ def initialize(rule_counter, midrule_action_counter, parameterized_resolver, pos
4145
@rhs = []
4246
@user_code = nil
4347
@precedence_sym = nil
48+
@predicates = []
4449
@line = nil
4550
@rules = []
4651
@rule_builders_for_parameterized = []
@@ -74,6 +79,14 @@ def precedence_sym=(precedence_sym)
7479
@precedence_sym = precedence_sym
7580
end
7681

82+
# @rbs (Lexer::Token::SemanticPredicate predicate) -> void
83+
def add_predicate(predicate)
84+
@line ||= predicate.line
85+
flush_user_code
86+
predicate_with_position = [predicate, @rhs.empty?]
87+
@predicates << predicate_with_position
88+
end
89+
7790
# @rbs () -> void
7891
def complete_input
7992
freeze_rhs
@@ -118,6 +131,14 @@ def build_rules
118131
id: @rule_counter.increment, _lhs: lhs, _rhs: tokens, lhs_tag: lhs_tag, token_code: user_code,
119132
position_in_original_rule_rhs: @position_in_original_rule_rhs, precedence_sym: precedence_sym, lineno: line
120133
)
134+
135+
rule.predicates = @predicates.map do |(pred_token, is_leading)|
136+
pred = Grammar::SemanticPredicate.new(pred_token)
137+
pred.index = @predicate_counter.increment
138+
pred.position = is_leading ? :leading : :trailing
139+
pred
140+
end
141+
121142
@rules = [rule]
122143
@parameterized_rules = @rule_builders_for_parameterized.map do |rule_builder|
123144
rule_builder.rules
@@ -158,7 +179,7 @@ def process_rhs
158179
replaced_rhs << lhs_token
159180
@parameterized_resolver.created_lhs_list << lhs_token
160181
parameterized_rule.rhs.each do |r|
161-
rule_builder = RuleBuilder.new(@rule_counter, @midrule_action_counter, @parameterized_resolver, lhs_tag: token.lhs_tag || parameterized_rule.tag)
182+
rule_builder = RuleBuilder.new(@rule_counter, @midrule_action_counter, @parameterized_resolver, @predicate_counter, lhs_tag: token.lhs_tag || parameterized_rule.tag)
162183
rule_builder.lhs = lhs_token
163184
r.symbols.each { |sym| rule_builder.add_rhs(bindings.resolve_symbol(sym)) }
164185
rule_builder.line = line
@@ -175,7 +196,7 @@ def process_rhs
175196
new_token = Lrama::Lexer::Token::Ident.new(s_value: prefix + @midrule_action_counter.increment.to_s)
176197
replaced_rhs << new_token
177198

178-
rule_builder = RuleBuilder.new(@rule_counter, @midrule_action_counter, @parameterized_resolver, i, lhs_tag: tag, skip_preprocess_references: true)
199+
rule_builder = RuleBuilder.new(@rule_counter, @midrule_action_counter, @parameterized_resolver, @predicate_counter, i, lhs_tag: tag, skip_preprocess_references: true)
179200
rule_builder.lhs = new_token
180201
rule_builder.user_code = token
181202
rule_builder.complete_input
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# rbs_inline: enabled
2+
# frozen_string_literal: true
3+
4+
module Lrama
5+
class Grammar
6+
class SemanticPredicate
7+
# @rbs!
8+
# type position = :leading | :trailing | :middle | :unknown
9+
10+
attr_reader :token #: Lexer::Token::SemanticPredicate
11+
attr_reader :code #: String
12+
attr_accessor :position #: position
13+
attr_accessor :index #: Integer?
14+
15+
# @rbs (Lexer::Token::SemanticPredicate token) -> void
16+
def initialize(token)
17+
@token = token
18+
@code = token.code
19+
@position = :unknown
20+
@index = nil
21+
end
22+
23+
# @rbs () -> bool
24+
def visible?
25+
@position == :leading
26+
end
27+
28+
# @rbs () -> String
29+
def function_name
30+
raise "Predicate index not set" if @index.nil?
31+
"yypredicate_#{@index}"
32+
end
33+
34+
# @rbs () -> String
35+
def error_message
36+
"semantic predicate failed: {#{code}}?"
37+
end
38+
39+
# @rbs () -> Lexer::Location
40+
def location
41+
@token.location
42+
end
43+
44+
# @rbs () -> String
45+
def to_s
46+
"{#{code}}?"
47+
end
48+
end
49+
end
50+
end

lib/lrama/lexer.rb

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@ class Lexer
1818
# [::Symbol, Token::Char] |
1919
# [::Symbol, Token::Str] |
2020
# [::Symbol, Token::Int] |
21-
# [::Symbol, Token::Ident]
21+
# [::Symbol, Token::Ident] |
22+
# [::Symbol, Token::SemanticPredicate]
2223
#
2324
# type c_token = [:C_DECLARATION, Token::UserCode]
2425

@@ -119,6 +120,13 @@ def lex_token
119120
case
120121
when @scanner.eos?
121122
return
123+
when @scanner.check(/{/)
124+
if predicate_token = try_scan_semantic_predicate
125+
return [:SEMANTIC_PREDICATE, predicate_token]
126+
else
127+
@scanner.scan(/{/)
128+
return [@scanner.matched, Lrama::Lexer::Token::Token.new(s_value: @scanner.matched, location: location)]
129+
end
122130
when @scanner.scan(/#{SYMBOLS.join('|')}/)
123131
return [@scanner.matched, Lrama::Lexer::Token::Token.new(s_value: @scanner.matched, location: location)]
124132
when @scanner.scan(/#{PERCENT_TOKENS.join('|')}/)
@@ -191,6 +199,70 @@ def lex_c_code
191199

192200
private
193201

202+
# @rbs () -> Lrama::Lexer::Token::SemanticPredicate?
203+
def try_scan_semantic_predicate
204+
start_pos = @scanner.pos
205+
start_line = @line
206+
start_head = @head
207+
return nil unless @scanner.scan(/{/)
208+
209+
code = +''
210+
nested = 1
211+
until @scanner.eos? do
212+
case
213+
when @scanner.scan(/{/)
214+
code << @scanner.matched
215+
nested += 1
216+
when @scanner.scan(/}/)
217+
if nested == 1
218+
if @scanner.scan(/\?/)
219+
return Lrama::Lexer::Token::SemanticPredicate.new(
220+
s_value: "{#{code}}?",
221+
code: code.strip,
222+
location: location
223+
)
224+
else
225+
@scanner.pos = start_pos
226+
@line = start_line
227+
@head = start_head
228+
return nil
229+
end
230+
else
231+
code << @scanner.matched
232+
nested -= 1
233+
end
234+
when @scanner.scan(/\n/)
235+
code << @scanner.matched
236+
newline
237+
when @scanner.scan(/"[^"]*"/)
238+
code << @scanner.matched
239+
@line += @scanner.matched.count("\n")
240+
when @scanner.scan(/'[^']*'/)
241+
code << @scanner.matched
242+
when @scanner.scan(/\/\*/)
243+
code << @scanner.matched
244+
until @scanner.eos?
245+
if @scanner.scan_until(/\*\//)
246+
code << @scanner.matched
247+
@scanner.matched.count("\n").times { newline }
248+
break
249+
end
250+
end
251+
when @scanner.scan(/\/\/[^\n]*/)
252+
code << @scanner.matched
253+
when @scanner.scan(/[^{}"'\n\/]+/)
254+
code << @scanner.matched
255+
else
256+
code << @scanner.getch
257+
end
258+
end
259+
260+
@scanner.pos = start_pos
261+
@line = start_line
262+
@head = start_head
263+
nil
264+
end
265+
194266
# @rbs () -> void
195267
def lex_comment
196268
until @scanner.eos? do

lib/lrama/lexer/token.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
require_relative 'token/ident'
88
require_relative 'token/instantiate_rule'
99
require_relative 'token/int'
10+
require_relative 'token/semantic_predicate'
1011
require_relative 'token/str'
1112
require_relative 'token/tag'
1213
require_relative 'token/token'
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# rbs_inline: enabled
2+
# frozen_string_literal: true
3+
4+
module Lrama
5+
class Lexer
6+
module Token
7+
class SemanticPredicate < Base
8+
attr_reader :code #: String
9+
10+
# @rbs (s_value: String, code: String, ?location: Location) -> void
11+
def initialize(s_value:, code:, location: nil)
12+
super(s_value: s_value, location: location)
13+
@code = code.freeze
14+
end
15+
16+
# @rbs () -> String
17+
def to_s
18+
"semantic_predicate: `{#{code}}?`, location: #{location}"
19+
end
20+
end
21+
end
22+
end
23+
end

lib/lrama/output.rb

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,26 @@ def symbol_actions_for_error_token
235235
end.join
236236
end
237237

238+
# Generate semantic predicate functions
239+
def predicate_functions
240+
all_predicates = @grammar.rules.flat_map(&:predicates).compact.uniq { |p| p.index }
241+
return "" if all_predicates.empty?
242+
243+
functions = all_predicates.map do |predicate|
244+
<<-STR
245+
/* Semantic predicate: {#{predicate.code}}? */
246+
static int
247+
#{predicate.function_name} (void)
248+
{
249+
return (#{predicate.code});
250+
}
251+
252+
STR
253+
end
254+
255+
functions.join
256+
end
257+
238258
# b4_user_actions
239259
def user_actions
240260
action = @context.states.rules.map do |rule|

0 commit comments

Comments
 (0)