docs: explain extras in a bit more detail
This commit is contained in:
parent
ac39aed7c5
commit
63f48afaeb
2 changed files with 83 additions and 1 deletions
|
|
@ -107,7 +107,7 @@ grammar rules themselves. These fields are:
|
|||
|
||||
- **`extras`** — an array of tokens that may appear *anywhere* in the language. This is often used for whitespace and
|
||||
comments. The default value of `extras` is to accept whitespace. To control whitespace explicitly, specify
|
||||
`extras: $ => []` in your grammar.
|
||||
`extras: $ => []` in your grammar. See the section on [using extras][extras] for more details.
|
||||
|
||||
- **`inline`** — an array of rule names that should be automatically *removed* from the grammar by replacing all of their
|
||||
usages with a copy of their definition. This is useful for rules that are used in multiple places but for which you *don't*
|
||||
|
|
@ -144,6 +144,7 @@ empty array, signifying *no* keywords are reserved.
|
|||
[bison-dprec]: https://www.gnu.org/software/bison/manual/html_node/Generalized-LR-Parsing.html
|
||||
[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
|
||||
[external-scanners]: ./4-external-scanners.md
|
||||
[extras]: ./3-writing-the-grammar.html#using-extras
|
||||
[keyword-extraction]: ./3-writing-the-grammar.md#keyword-extraction
|
||||
[lexical vs parse]: ./3-writing-the-grammar.md#lexical-precedence-vs-parse-precedence
|
||||
[lr-conflict]: https://en.wikipedia.org/wiki/LR_parser#Conflicts_in_the_constructed_tables
|
||||
|
|
|
|||
|
|
@ -395,6 +395,87 @@ function_definition: $ =>
|
|||
|
||||
Adding fields like this allows you to retrieve nodes using the [field APIs][field-names-section].
|
||||
|
||||
## Using Extras
|
||||
|
||||
Extras are tokens that can appear anywhere in the grammar, without being explicitly mentioned in a rule. This is useful
|
||||
for things like whitespace and comments, which can appear between any two tokens in most programming languages. To define
|
||||
an extra, you can use the `extras` function:
|
||||
|
||||
```js
|
||||
module.exports = grammar({
|
||||
name: "my_language",
|
||||
|
||||
extras: ($) => [
|
||||
/\s/, // whitespace
|
||||
$.comment,
|
||||
],
|
||||
|
||||
rules: {
|
||||
comment: ($) =>
|
||||
token(
|
||||
choice(seq("//", /.*/), seq("/*", /[^*]*\*+([^/*][^*]*\*+)*/, "/")),
|
||||
),
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
```admonish warning
|
||||
When adding more complicated tokens to `extras`, it's preferable to associate the pattern
|
||||
with a rule. This way, you avoid the lexer inlining this pattern in a bunch of spots,
|
||||
which can dramatically reduce the parser size.
|
||||
```
|
||||
|
||||
For example, instead of defining the `comment` token inline in `extras`:
|
||||
|
||||
```js
|
||||
// ❌ Less preferable
|
||||
|
||||
const comment = token(
|
||||
choice(seq("//", /.*/), seq("/*", /[^*]*\*+([^/*][^*]*\*+)*/, "/")),
|
||||
);
|
||||
|
||||
module.exports = grammar({
|
||||
name: "my_language",
|
||||
extras: ($) => [
|
||||
/\s/, // whitespace
|
||||
comment,
|
||||
],
|
||||
rules: {
|
||||
// ...
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
We can define it as a rule and then reference it in `extras`:
|
||||
|
||||
```js
|
||||
// ✅ More preferable
|
||||
|
||||
module.exports = grammar({
|
||||
name: "my_language",
|
||||
|
||||
extras: ($) => [
|
||||
/\s/, // whitespace
|
||||
$.comment,
|
||||
],
|
||||
|
||||
rules: {
|
||||
// ...
|
||||
|
||||
comment: ($) =>
|
||||
token(
|
||||
choice(seq("//", /.*/), seq("/*", /[^*]*\*+([^/*][^*]*\*+)*/, "/")),
|
||||
),
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
```admonish note
|
||||
Tree-sitter intentionally simplifies some common regex patterns, both as a performance optimization and for simplicity,
|
||||
typically in ways that don't affect the meaning of the pattern. For example, `\w` is simplified to `[a-zA-Z0-9_]`, `\s`
|
||||
to `[ \t\n\r]`, and `\d` to `[0-9]`. If you need more complex behavior, you can always use a more explicit regex.
|
||||
```
|
||||
|
||||
# Lexical Analysis
|
||||
|
||||
Tree-sitter's parsing process is divided into two phases: parsing (which is described above) and [lexing][lexing] — the
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue