The `pos` and `size` functions for Nodes now return TSLength structs,
which contain lengths in both characters and bytes. This is important
for knowing the number of unicode characters in a Node.
This reverts commit 5cd07648fd.
The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.
Conflicts:
src/compiler/build_tables/build_lex_table.cc
src/compiler/grammar.cc
src/runtime/parser.c
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.
For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
Now, the root node of a document is always a document node.
It will often have only one child node which corresponds to the grammar's
start symbol, but not always. Currently, it may have more than one child
if there are ubiquitous tokens such as comments at the beginning of the
document. In the future, it will also be possible be possible to have multiple
for the document to have multiple children if the document is partially parsed.
Generated parsers no longer export a parser constructor function.
They now export an opaque Language object which can be set on
Documents directly. This way, the logic for constructing parsers
lives entirely in the runtime. The Languages are just structs which
have no load-time dependency on the runtime
My original thought was to decouple the runtime from
the LR parser generator by making TSParser a generic
interface that LR parsers implement.
I think this was more trouble than it was worth.
As a final step before returning the finished parse tree, check if
there are still multiple nodes on the stack. If so, make the inner
nodes children of the top node.
Stop collapsing hidden symbols upon reducing them.
Sadly, this messes up the ability to re-use parse
trees. Instead, for now, hide these nodes when
stringifying parse trees
This is slightly slower than encoding the parse table in
flow control, but allows the parser to inspect the parse
table more flexibly. This is needed for incremental parsing.