The `pos` and `size` functions for Nodes now return TSLength structs,
which contain lengths in both characters and bytes. This is important
for knowing the number of unicode characters in a Node.
This reverts commit 5cd07648fd.
The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.
Conflicts:
src/compiler/build_tables/build_lex_table.cc
src/compiler/grammar.cc
src/runtime/parser.c
When breaking down the stack in parser.c, the previous code
would not account for ubiquitous tokens. This was a problem
for a long time, but wasn't noticed until ubiquitous tokens
started being used to represent separator characters
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.
For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
- Node position is public. It represents the node's first character
index in the document.
- Tree offset is private. It represents the distance between the tree's
first character index and it's parent's first character index.
- Tree padding is private. It represents the amount of whitespace
(or other separator characters) immediately preceding the tree.
Previously, if an error happened right at the beginning of an error
production, the error node would be immediately shifted onto the stack
without calling the error handling function.
The lexer doesn't know the expected symbols, so it doesn't have enough
information to construct error nodes. Now, when it encounters an invalid
character, it returns NULL and the parser builds a correct error node.