Skip to content

Commit b14a7e1

Browse files
committed
Describe restrictions around whitespace
1 parent 74ab3a1 commit b14a7e1

File tree

1 file changed

+16
-9
lines changed

1 file changed

+16
-9
lines changed

spec/01-lexical-grammar.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,16 @@ whitespace, and comments. The rest of this chapter describes this subdivision.
1616

1717
## 1.2 Whitespace
1818

19-
A maximal contiguous sequence of the following characters is one unit of
20-
_whitespace_:
19+
A maximal contiguous sequence of the following characters, not occurring inside
20+
a token or a comment, is one unit of _whitespace_:
2121

2222
* `\u0009` horizontal tab
2323
* `\u000a` line feed
2424
* `\u000d` carriage return
2525
* `\u0020` space
2626

27-
The rule for recognizing whitespace is valid only between tokens, not within
28-
them; when the above characters occur within a string literal or a comment,
29-
they are not considered to be whitespace.
30-
3127
The tab character is allowed but its use is discouraged. It's recommended to
32-
use four spaces for indentation.
28+
use four spaces for indenting Alma code.
3329

3430
The UTF-8 byte-order mark ("BOM") at the beginning of a compilation unit is
3531
recognized and ignored as whitespace.
@@ -108,6 +104,12 @@ Although the allowed literals in a language normally form a closed set, in Alma
108104
this set can be extended. For more, see [Chapter 18: Extending the
109105
lexer](18-extending-the-lexer.md).
110106

107+
Integer literals and boolean literals are known as _bare literals_, whereas
108+
string literals are known as _enclosed literals_. The fact that string literals
109+
begin and end with a designated character (the `"` character, in this case)
110+
makes them enclosed. When extending the set of literals to new forms, bare
111+
literals are not allowed to contain whitespace, but enclosed literals are.
112+
111113
## 1.5 Keywords
112114

113115
The following words are _keywords_ in Alma:
@@ -135,14 +137,19 @@ The following words are _keywords_ in Alma:
135137
* `while`
136138

137139
Along with the alphabetic literals `none`, `true`, and `false`, the keywords
138-
are _reserved words_: they can not be used as new names in declarations.
139-
However, they can still be used for unscoped literals such as dictionary
140+
are _reserved words_: they may not be used as new names in declarations.
141+
However, they can still be used for unscoped names such as dictionary
140142
keys and object properties.
141143

142144
Although the keywords in a language normally form a closed set, in Alma this
143145
set can be extended. For more, see [Chapter 18: Extending the
144146
lexer](18-extending-the-lexer.md).
145147

148+
When adding new keywords to this set, the keywords are limited to the narrow
149+
set of identifiers described below. Specifically, added keywords may contain
150+
neither whitespace characters nor non-alphanumeric characters which are only
151+
allowed in operator names.
152+
146153
## 1.6 Identifiers
147154

148155
An _identifier_ begins with an alphabetic ASCII character or underscore (`_`),

0 commit comments

Comments
 (0)