Commit graph

119 commits

Author SHA1 Message Date
Max Brunsfeld
c495076adb Record in parse table which actions can hide splits
Suppose a parse state S has multiple actions for a terminal lookahead symbol A.
Then during incremental parsing, while in state S, the parser should not
reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B
might prematurely discard one of the possible actions that a batch parser
would have attempted in state S, upon seeing A as a lookahead.
2015-12-17 13:11:56 -08:00
Max Brunsfeld
66144dc28e Treat tokens that are sometimes extra as fragile 2015-12-16 20:04:45 -08:00
Max Brunsfeld
d713054d61 Record which tokens are fragile when lexing 2015-12-10 21:05:54 -08:00
Max Brunsfeld
d2bf88d5fe Include rows and columns in TSLength
This way, we don't have to have separate 1D and 2D versions for so many values
2015-12-04 20:20:29 -08:00
Max Brunsfeld
22c76fc71b Remove TSLength from runtime header
Refactor node functions now that character offset and byte offset are stored separately
2015-12-04 10:45:30 -08:00
Max Brunsfeld
ad619d95f6 Add 'extra' field to symbol metadata
This stores whether a symbol is only ever used as a ubiquitous token. This will
allow ubiquitous nodes to be reused more effectively: if they are always
ubiquitous, then they can be reused immediately, and otherwise, they must be
broken down in case they need to be used structurally.
2015-12-02 15:10:24 -08:00
Max Brunsfeld
f08554e958 Replace NodeType enum with SymbolMetadata bitfield
This will allow storing other metadata about symbols, like if they
only appear as ubiquitous tokens
2015-12-02 15:10:24 -08:00
joshvera
b0f6bac3ab replace start and end with padding and size 2015-11-18 16:34:50 -08:00
joshvera
e720922662 Add source info to TSLexer 2015-11-12 12:24:05 -05:00
Max Brunsfeld
7ee5eaa16a Rename node accessor methods
Instead of child() vs concrete_child(), next_sibling() vs next_concrete_sibling(), etc,
the default is switched: child() refers to the concrete syntax tree, and named_child()
refers to the AST. Because the AST is abstract through exclusion of some nodes, the
names are clearer if the qualifier goes on the AST operations
2015-09-08 23:16:24 -07:00
Max Brunsfeld
c3f3f19ea8 Add concrete_child and concrete_child_count Node methods 2015-09-08 09:53:26 -07:00
Max Brunsfeld
9591c88f39 In runtime, distinguish between anonymous and hidden nodes 2015-09-06 00:12:37 -07:00
Max Brunsfeld
54e40b8146 Rework AST access API: reduce heap allocation 2015-07-31 15:47:48 -07:00
Max Brunsfeld
f9b057f3a9 clang-format everything 2015-07-27 18:29:48 -07:00
Max Brunsfeld
aff8bc3266 Split parse stack when there are multiple parse actions 2015-07-09 23:09:33 -07:00
Max Brunsfeld
755894b44d Allow multiple parse actions in parse table 2015-06-18 17:03:16 -07:00
Max Brunsfeld
d5ce3a9b5a lexer: in error mode, continue until token is found 2015-06-15 15:26:05 -07:00
Max Brunsfeld
2d436cf141 Identify fragile reductions at compile time 2015-02-21 15:11:03 -08:00
Max Brunsfeld
8cf800ef5d Unify debugging API for parsing and lexing 2014-10-17 17:52:54 -07:00
Max Brunsfeld
7498725d7f Move lexer debugging logic out of public header 2014-10-17 16:20:01 -07:00
Max Brunsfeld
5c600942df Inline some helper functions for lexer 2014-10-17 15:22:01 -07:00
Max Brunsfeld
c594208ab8 Allow callbacks to be specified for debug output 2014-10-13 01:02:18 -07:00
Max Brunsfeld
6d37877e49 Tweak debugging output 2014-10-05 16:56:29 -07:00
Max Brunsfeld
e5ea4efb0b Use stdbool.h 2014-10-03 16:06:08 -07:00
Max Brunsfeld
78c5fe8e02 clang-format 2014-10-03 15:44:21 -07:00
Max Brunsfeld
444188cb5f Display characters > 255 as numbers in debug output 2014-09-27 16:00:27 -07:00
Max Brunsfeld
c1565c1aae Track AST nodes' sizes in characters as well as bytes
The `pos` and `size` functions for Nodes now return TSLength structs,
which contain lengths in both characters and bytes. This is important
for knowing the number of unicode characters in a Node.
2014-09-26 16:15:07 -07:00
Max Brunsfeld
f2e2102a25 Add missing import of stdint.h 2014-09-13 00:25:12 -07:00
Max Brunsfeld
141cbcfa02 Read unicode characters using utf8proc 2014-09-13 00:24:10 -07:00
Max Brunsfeld
68d6e242ee Fix parsing of wildcard patterns at the ends of documents
- Remove special EOF handling from lexer
- Explicitly exclude the EOF character from all-inclusive character sets.
2014-09-11 13:10:23 -07:00
Max Brunsfeld
2e7ffb4d14 Tweak auto-format settings
Prefer lines that exceed 80 characters by a small margin to
line breaks in argument lists
2014-09-09 13:15:40 -07:00
Max Brunsfeld
c0a3f8d39c Remove some macros from public parser header 2014-09-05 23:47:38 -07:00
Max Brunsfeld
9c0b5b5571 clang-format 2014-09-03 18:53:38 -07:00
Max Brunsfeld
77529ace3d Fix infinite loop in certain cases w/ unterminated tokens 2014-09-03 00:38:44 -07:00
Max Brunsfeld
545e575508 Revert "Remove the separator characters construct"
This reverts commit 5cd07648fd.

The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.

Conflicts:
	src/compiler/build_tables/build_lex_table.cc
	src/compiler/grammar.cc
	src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
5cd07648fd Remove the separator characters construct
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.

For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
2985a98150 Build error nodes in lexer again, not in parser 2014-08-31 16:59:01 -07:00
Max Brunsfeld
226ffd6b5b Fix initializer list deduction warnings in specs 2014-08-27 22:23:45 -07:00
Max Brunsfeld
e0a53b9f14 Make parse and lex debug output more readable 2014-08-27 18:27:53 -07:00
Max Brunsfeld
bd145d2c6a Preserve the initial error node in handle_error function 2014-08-26 23:22:18 -07:00
Max Brunsfeld
37d5db6fee Move newline in lexer debugging output 2014-08-26 22:21:21 -07:00
Max Brunsfeld
346cf4fe5d Remove LEX_PANIC macro 2014-08-26 13:12:12 -07:00
Max Brunsfeld
77941c85ff Avoid building incomplete error nodes during lexing
The lexer doesn't know the expected symbols, so it doesn't have enough
information to construct error nodes. Now, when it encounters an invalid
character, it returns NULL and the parser builds a correct error node.
2014-08-25 23:35:00 -07:00
Max Brunsfeld
412cc93812 clang format 2014-07-31 13:11:39 -07:00
Max Brunsfeld
0d6d09cbd9 In generated parsers, export language as a function 2014-07-31 13:04:46 -07:00
Max Brunsfeld
909261d742 Remove unnecessary type name in parser.h 2014-07-31 12:31:38 -07:00
Max Brunsfeld
eecbcccee0 Remove generated parsers' dependency on the runtime library
Generated parsers no longer export a parser constructor function.
They now export an opaque Language object which can be set on
Documents directly. This way, the logic for constructing parsers
lives entirely in the runtime. The Languages are just structs which
have no load-time dependency on the runtime
2014-07-30 23:40:02 -07:00
Max Brunsfeld
98cc2f2264 Auto-format all source code with clang-format 2014-07-21 13:20:00 -07:00
Max Brunsfeld
df359bc01f Use 2-space indent in c files 2014-07-20 20:27:33 -07:00
Max Brunsfeld
b3385f20c8 Hide TSTree, expose TSNode 2014-07-17 23:29:11 -07:00