From 0109c877d57cdec5ea8ff70f4a76215a01b8b672 Mon Sep 17 00:00:00 2001 From: Amaan Qureshi Date: Sun, 6 Aug 2023 22:30:21 -0400 Subject: [PATCH] docs: document regex limitations --- docs/section-3-creating-parsers.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/section-3-creating-parsers.md b/docs/section-3-creating-parsers.md index dd6ef102..61e31b23 100644 --- a/docs/section-3-creating-parsers.md +++ b/docs/section-3-creating-parsers.md @@ -229,6 +229,20 @@ The following is a complete list of built-in functions you can use in your `gram * **Symbols (the `$` object)** - Every grammar rule is written as a JavaScript function that takes a parameter conventionally called `$`. The syntax `$.identifier` is how you refer to another grammar symbol within a rule. Names starting with `$.MISSING` or `$.UNEXPECTED` should be avoided as they have special meaning for the `tree-sitter test` command. * **String and Regex literals** - The terminal symbols in a grammar are described using JavaScript strings and regular expressions. Of course during parsing, Tree-sitter does not actually use JavaScript's regex engine to evaluate these regexes; it generates its own regex-matching logic as part of each parser. Regex literals are just used as a convenient way of writing regular expressions in your grammar. +* **Regex Limitations** - Currently, only a subset of the Regex engine is actually +supported. This is due to certain features like lookahead and lookaround assertions +not feasible to use in an LR(1) grammar, as well as certain flags being unnecessary +for tree-sitter. However, plenty of features are supported by default: + + * Character classes + * Character ranges + * Character sets + * Quantifiers + * Alternation + * Grouping + * Unicode character escapes + * Unicode property escapes + * **Sequences : `seq(rule1, rule2, ...)`** - This function creates a rule that matches any number of other rules, one after another. It is analogous to simply writing multiple symbols next to each other in [EBNF notation][ebnf]. * **Alternatives : `choice(rule1, rule2, ...)`** - This function creates a rule that matches *one* of a set of possible rules. The order of the arguments does not matter. This is analogous to the `|` (pipe) operator in EBNF notation. * **Repetitions : `repeat(rule)`** - This function creates a rule that matches *zero-or-more* occurrences of a given rule. It is analogous to the `{x}` (curly brace) syntax in EBNF notation.