This stores whether a symbol is only ever used as a ubiquitous token. This will
allow ubiquitous nodes to be reused more effectively: if they are always
ubiquitous, then they can be reused immediately, and otherwise, they must be
broken down in case they need to be used structurally.
Instead of child() vs concrete_child(), next_sibling() vs next_concrete_sibling(), etc,
the default is switched: child() refers to the concrete syntax tree, and named_child()
refers to the AST. Because the AST is abstract through exclusion of some nodes, the
names are clearer if the qualifier goes on the AST operations
The `pos` and `size` functions for Nodes now return TSLength structs,
which contain lengths in both characters and bytes. This is important
for knowing the number of unicode characters in a Node.
This reverts commit 5cd07648fd.
The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.
Conflicts:
src/compiler/build_tables/build_lex_table.cc
src/compiler/grammar.cc
src/runtime/parser.c
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.
For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
The lexer doesn't know the expected symbols, so it doesn't have enough
information to construct error nodes. Now, when it encounters an invalid
character, it returns NULL and the parser builds a correct error node.
Generated parsers no longer export a parser constructor function.
They now export an opaque Language object which can be set on
Documents directly. This way, the logic for constructing parsers
lives entirely in the runtime. The Languages are just structs which
have no load-time dependency on the runtime