Commit graph

154 commits

Author SHA1 Message Date
Max Brunsfeld
4131e1c16e Return an error when external token name matches non-terminal rule 2017-01-31 11:36:51 -08:00
Max Brunsfeld
e6c82ead2c Start work toward maintaining external scanner's state during incremental parses 2016-12-20 17:06:20 -08:00
Max Brunsfeld
49d25bd0f8 Remove EXTERNAL_TOKEN grammar rule type 2016-12-04 15:02:32 -08:00
Max Brunsfeld
0f8e130687 Call external scanner functions when lexing 2016-12-02 22:03:48 -08:00
Max Brunsfeld
c966af0412 Start work on external tokens 2016-12-02 16:24:19 -08:00
Max Brunsfeld
6cf4ccb840 Represent rule metadata as a struct, not a map 2016-11-19 13:59:34 -08:00
Max Brunsfeld
6935f1d26f Use hash_combine everywhere 2016-11-16 11:46:22 -08:00
Max Brunsfeld
1118a9142a Introduce Symbol::Index type alias 2016-11-14 10:25:26 -08:00
Max Brunsfeld
fad7294ba4 Store shift states for non-terminals directly in the main parse table 2016-11-14 08:36:06 -08:00
Max Brunsfeld
a3679fbb1f Distinguish separators from main tokens via a property on transitions
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
5b74813a5c Refine logic for which tokens to use in error recovery 2016-04-27 14:09:19 -07:00
Max Brunsfeld
e0c24e3be6 Remove old error recovery code 2016-03-02 20:58:39 -08:00
Max Brunsfeld
d4632ab9a9 Make the compile function plain C and take a JSON grammar 2016-01-11 12:33:48 -08:00
Max Brunsfeld
08d50c25ae clang-format 2015-12-04 20:56:33 -08:00
Max Brunsfeld
d5ce268074 Fix handling of changing precedence within lexical rules.
A precedence annotation wrapping a sequence of characters now only affects how
tightly those characters bind to *each other*, not how tightly they bind to the
preceding character.

This bug surfaced because a generated lexer was failing to recognize a '\n' character
as a token, instead treating it as ubiquitous whitespace. It made this error
because, even though anonymous ubiquitous tokens have the lowest precedence, the
character immediately *after* the '\n' was part of a normal token, which had
*normal* precedence (0). Advancing into that following token was incorrectly
prioritized above accepting the line-break token.
2015-11-08 13:36:15 -08:00
Max Brunsfeld
998ae533da Make completion_status() a method on LexItem 2015-10-30 16:48:37 -07:00
Max Brunsfeld
9959fe35b0 Allow associativity to be specified in rules w/o precedence 2015-10-13 11:25:28 -07:00
Max Brunsfeld
4b817dc07c Fix linter errors 2015-10-12 19:22:05 -07:00
Max Brunsfeld
82726ad53b Define repeat rule in terms of repeat1 rule 2015-10-12 19:22:05 -07:00
Max Brunsfeld
5c67f58a4b Add helper for dynamic-casting to rule subclasses 2015-10-12 19:21:56 -07:00
Max Brunsfeld
ebc52f109d Merge branch 'flatten-rules-into-productions'
This branch had diverged considerably, so merging it required changing a lot
of code.

Conflicts:
	project.gyp
	spec/compiler/build_tables/action_takes_precedence_spec.cc
	spec/compiler/build_tables/build_conflict_spec.cc
	spec/compiler/build_tables/build_parse_table_spec.cc
	spec/compiler/build_tables/first_symbols_spec.cc
	spec/compiler/build_tables/item_set_closure_spec.cc
	spec/compiler/build_tables/item_set_transitions_spec.cc
	spec/compiler/build_tables/rule_can_be_blank_spec.cc
	spec/compiler/helpers/containers.h
	spec/compiler/prepare_grammar/expand_repeats_spec.cc
	spec/compiler/prepare_grammar/extract_tokens_spec.cc
	src/compiler/build_tables/action_takes_precedence.h
	src/compiler/build_tables/build_parse_table.cc
	src/compiler/build_tables/first_symbols.cc
	src/compiler/build_tables/first_symbols.h
	src/compiler/build_tables/item_set_closure.cc
	src/compiler/build_tables/item_set_transitions.cc
	src/compiler/build_tables/parse_item.cc
	src/compiler/build_tables/parse_item.h
	src/compiler/build_tables/rule_can_be_blank.cc
	src/compiler/build_tables/rule_can_be_blank.h
	src/compiler/prepare_grammar/expand_repeats.cc
	src/compiler/prepare_grammar/extract_tokens.cc
	src/compiler/prepare_grammar/extract_tokens.h
	src/compiler/prepare_grammar/prepare_grammar.cc
	src/compiler/rules/built_in_symbols.cc
	src/compiler/rules/built_in_symbols.h
	src/compiler/syntax_grammar.cc
	src/compiler/syntax_grammar.h
2015-10-02 23:46:39 -07:00
Max Brunsfeld
673ca411b1 Fix lint errors 2015-09-19 13:19:49 -07:00
Max Brunsfeld
e6f3239bef Move stream operator definitions to spec helpers
This is one less thing for users to worry about when compiling and linking
the library itself
2015-09-10 10:12:11 -07:00
Max Brunsfeld
5982b77c97 In compiler, distinguish between anonymous tokens and hidden rules 2015-09-05 22:28:55 -07:00
Max Brunsfeld
bd77ab1ac9 Move public rule functions out of rule namespace
This way, there's only one public namespace: tree_sitter
2015-09-03 17:49:20 -07:00
Max Brunsfeld
76e2067ee0 Remove unused metadata key 2015-09-02 13:05:54 -07:00
Max Brunsfeld
21258e6a9e Remove 'document' wrapper node 2015-08-22 10:48:34 -07:00
Max Brunsfeld
c18351772a Auto-format: no single-line functions 2015-07-31 16:32:24 -07:00
Max Brunsfeld
f9b057f3a9 clang-format everything 2015-07-27 18:29:48 -07:00
Max Brunsfeld
0b1d70db34 Always resolve ambiguities immediately
No more ambiguity nodes.
Also, when merging parse stacks, merge their successors if needed.
2015-07-15 13:15:11 -07:00
Max Brunsfeld
381f89f8ba Create ambiguity nodes when joining stack heads 2015-06-18 17:03:16 -07:00
Max Brunsfeld
b1f8ba6202 Replace {left,right}_assoc w/ prec, with an associativity argument 2015-03-23 21:06:31 -07:00
Max Brunsfeld
a19b0e75ac 🔥 keyword and keypattern functions
Just make strings have higher precedence than regexps.
2015-03-22 16:00:26 -07:00
Max Brunsfeld
80ec303b10 Replace prec rule w/ left_assoc and right_assoc
Consider shift/reduce conflicts to be compilation errors unless
they are resolved by a specified associativity.
2015-03-16 23:12:34 -07:00
Max Brunsfeld
9a198562e0 Treat parse conflicts as errors in grammar compilation
For now, only reduce/reduce conflicts w/ no tie-breaking precedence
are treated as errors. The rest are dropped, because shift/reduce
conflicts are currently very common because we don't have a way
of specifying associativity along w/ precedence.
2015-03-15 20:31:41 -07:00
Max Brunsfeld
8ac4b9fc17 Store productions' end rule ids in the vector 2015-02-16 22:11:03 -08:00
Max Brunsfeld
68a0e16d1e Add void specialization of RuleFn template 2015-02-16 22:11:03 -08:00
Max Brunsfeld
160fca6579 Refactor avoidance of redundant repeat rules 2015-01-14 21:11:19 -08:00
Max Brunsfeld
a0d9da9d5c Rename static 'Build' methods to 'build' 2015-01-14 21:11:05 -08:00
Max Brunsfeld
aae6f6de14 Remove whitespace between template closing tags 2014-10-12 11:51:12 -07:00
Max Brunsfeld
070dc76050 Generate correct C literals for non-ascii characters 2014-09-28 18:40:15 -07:00
Max Brunsfeld
e0185f84fc Print non-ascii characters as numbers in CharacterRange::to_string 2014-09-28 18:19:42 -07:00
Max Brunsfeld
68d6e242ee Fix parsing of wildcard patterns at the ends of documents
- Remove special EOF handling from lexer
- Explicitly exclude the EOF character from all-inclusive character sets.
2014-09-11 13:10:23 -07:00
Max Brunsfeld
2e7ffb4d14 Tweak auto-format settings
Prefer lines that exceed 80 characters by a small margin to
line breaks in argument lists
2014-09-09 13:15:40 -07:00
Max Brunsfeld
1ff7cedf40 Unify ubiquitous tokens and lexical separators in API 2014-09-07 22:16:45 -07:00
Max Brunsfeld
545e575508 Revert "Remove the separator characters construct"
This reverts commit 5cd07648fd.

The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.

Conflicts:
	src/compiler/build_tables/build_lex_table.cc
	src/compiler/grammar.cc
	src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
5cd07648fd Remove the separator characters construct
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.

For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
8f4939a3d3 unsigned char -> uint32_t in CharacterRange 2014-08-24 01:05:59 -07:00
Max Brunsfeld
9338249075 Remove implicit CharacterRange constructors
Also fix misc smaller lint errors
2014-08-23 14:52:44 -07:00
Max Brunsfeld
0bb5663f0f Refactor - represent char sets in terms of inclusions and exclusions 2014-08-23 14:25:45 -07:00