This extension adds support for an EBNF-like syntax (Extended Backus-Naur Form) to Visual Studio Code.
- Syntax highlighting + semantic highlighting
- Basic error checking
- Syntax errors
- Undefined symbols
- Duplicate symbols
- Go to definition
- Find all references
- Document symbols (go to symbol, outline)
- Basic code completion
- Rule names
- Hover information
- Rule name and definition
- Code folding
- Railroad diagram generation
This extension implements a simple and strict-ish version of EBNF. The syntax is defined in itself in ebnf.ebnf.
The dialect implemented mostly follows the ISO/IEC 14977 standard, with some extensions for clarity and convenience.
Comments are defined using the (* and *) delimiters.
Rules are defined using the assignment operator =. The left-hand side is the rule name, and the right-hand side is an expression. Rules must end with a semicolon ;.
Rule names can start with any letter, number, or an underscore. They can also contain a hyphen, but not at the beginning. Rule names are case-sensitive.
Expressions are made up of terms and operators. Terms are either literals, references to other rules (by name), special cases, groups, or ranges. Operators are used to combine terms into more complex expressions.
Literals are enclosed in single quotes or double quotes. They can contain any character except for the quote character used to enclose them. No escaping is considered, so you can't use a single quote inside a single-quoted literal, or a double quote inside a double-quoted literal. How to interpret sequences like \n is up to the reader. Both literals and special cases can be multiline.
Special cases are used to describe content that cannot be easily expressed using the other terms. They are enclosed in question marks ?, and can have multiple lines.
? any character ?
? valid UTF-8 ?There are three different types of groups:
- Parentheses (group) are only used to group terms together.
- Brackets (optional) indicate that the content inside is optional, i.e. it can appear zero or one times.
- Braces (repetition) indicate that the content inside can appear zero or more times.
Ranges are used to define a set a contiguous characters. They are composed of two strings joined by two dots ...
Ranges have no specific definition of what a range "is". It should be obvious what the range should represent. For example, a range of "A".."Z" is probably a set of uppercase letters, while a range of "0".."9" is probably a set of digits.
Concatenation can be defined using the comma , operator between terms or by juxtaposition of terms.
It does not define what whitespace is allowed between terms; it is assumed that the reader knows what is and isn't allowed.
"A", "B", "C" (* probably "ABC" *)
"fn" name "()" (* probably "fn foo()" *)The alternation operator is the pipe |. It is used to define a set of possible choices for a term.
"A" | "B" | "C" (* "A", "B", or "C" *)
"A", ( "B" | "C" ) (* "AB" or "AC" *)The exclusion operator is the caret -. It is used to define a set of possible choices for a term, but excludes one or more of them.
letter = "A".."Z" ;
not_z = letter - "Z" ; (* "A".."Y" *)The postfix operators + and - modify the preceding term to indicate that it occurs "one or more" times. The following forms are equivalent:
many-as = { "a" }+ ; (* "a", "aa", "aaa", ... but not "" *)
many-as = { "a" }- ;
many-as = { "a" } - '' ;Note
The - operator is also valid as an infix oerator (see
Exclusion). Thus, when another term follows a unary -, it will
be interpreted as an exclusion instead of a concatenation. Adding a comma
directly after a unary - can be used to disambiguate this case, but can
be confusing and error-prone:
ooof = { "o" }-, "f" ;Usage of - as a postfix operator is therefore discouraged. Using +, although
not part of ISO/IEC 14977, is recommended instead.
