Commit graph

37 commits

Author SHA1 Message Date
Max Brunsfeld
cd8a683229 Improve error messages for invalid ubiquitous tokens 2014-09-10 13:02:16 -07:00
Max Brunsfeld
e9dad529f5 Make descriptions more consistent in compiler specs 2014-09-09 13:01:18 -07:00
Max Brunsfeld
1ff7cedf40 Unify ubiquitous tokens and lexical separators in API 2014-09-07 22:16:45 -07:00
Max Brunsfeld
a46f9d950c Handle '\s' correctly in regexps 2014-09-07 16:05:43 -07:00
Max Brunsfeld
ed11ef557a Fix expansion of repeat rules into recursive rules
Previously, the way repeat rules were expanded, the auxiliary
rule always needed to be reduced, even if the repeating content
was empty. This caused problems in parse states where some items
contained the repeat rule and some did not. To make those cases
work, the repeat rule had to explicitly be marked as optional.
With this change, that is no longer necessary.
2014-09-07 09:39:14 -07:00
Max Brunsfeld
d3204d3526 Include '_' in '\w' regex character class 2014-09-05 18:41:12 -07:00
Max Brunsfeld
545e575508 Revert "Remove the separator characters construct"
This reverts commit 5cd07648fd.

The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.

Conflicts:
	src/compiler/build_tables/build_lex_table.cc
	src/compiler/grammar.cc
	src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
5cd07648fd Remove the separator characters construct
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.

For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
9338249075 Remove implicit CharacterRange constructors
Also fix misc smaller lint errors
2014-08-23 14:52:44 -07:00
Max Brunsfeld
0bb5663f0f Refactor - represent char sets in terms of inclusions and exclusions 2014-08-23 14:25:45 -07:00
Max Brunsfeld
b155994491 Fix indentation in specs 2014-08-07 08:11:21 -07:00
Max Brunsfeld
01571da30d Handle more escaped characters in regexps 2014-08-03 21:57:21 -07:00
Max Brunsfeld
83a1b9439e Fix handling of ubiquitous tokens used in grammar rules 2014-07-01 20:47:35 -07:00
Max Brunsfeld
a9dff20658 Make grammars' separator characters configurable 2014-06-26 07:31:08 -07:00
Max Brunsfeld
7df35f9b8d Make separate types for syntax and lexical grammars
This way, the separator characters can be added as a field to
lexical grammars only
2014-06-25 13:27:16 -07:00
Max Brunsfeld
81880e000e Tweak header include paths in tests 2014-06-23 18:50:03 -07:00
Max Brunsfeld
e105f5cebc Remove inheritance link btwn PreparedGrammar and Grammar 2014-06-10 10:34:37 -07:00
Max Brunsfeld
54a555168d Add accessor methods on Grammar 2014-06-09 21:05:25 -07:00
Max Brunsfeld
868a09b0b0 Remove infinite loop on certain lex errors 2014-06-01 23:23:24 -07:00
Max Brunsfeld
c7266f791e Don't use std::tuples in parse regex spec
gcc doesn't let me use initializer list syntax for them
2014-06-01 17:34:18 -07:00
Max Brunsfeld
e93e254518 In lexer, prefer tokens to skipped separator characters
This was causing newlines in go and javascript to be parsed as
meaningless separator characters instead of statement terminators
2014-05-30 13:29:54 -07:00
Max Brunsfeld
c30055ba18 Fix symbol names for extracted tokens 2014-05-20 08:30:58 -07:00
Max Brunsfeld
649f200831 Expand regex/string rules as part of grammar preparation
This makes it possible to report errors in regex parsing
2014-05-19 20:54:59 -07:00
Max Brunsfeld
5245bc01fe Backfill tests for token extraction in auxiliary rules 2014-05-19 19:05:54 -07:00
Max Brunsfeld
4700e33746 Introduce 'ubiquitous_tokens' concept, for parsing comments and such 2014-05-06 12:54:04 -07:00
Max Brunsfeld
3a50171249 Expose all grammar compilation errors 2014-05-01 23:28:40 -07:00
Max Brunsfeld
6d40dcf881 Add token helper for building token rules
Now you can specify the structure of tokens using
all of the rule functions, not just `str` and `pattern`
2014-05-01 12:43:29 -07:00
Max Brunsfeld
0d763d229d cpplint 2014-04-28 21:46:43 -07:00
Max Brunsfeld
25eda9d889 ISymbol -> Symbol
Interned symbols are now the main type of symbol in use
2014-04-28 20:43:27 -07:00
Max Brunsfeld
faf80aadac Symbol -> NamedSymbol 2014-04-28 20:15:49 -07:00
Max Brunsfeld
93df5579b4 Trim whitespace 2014-04-25 22:17:23 -07:00
Max Brunsfeld
68d44fd565 Intern symbols during grammar preparation 2014-04-22 23:38:26 -07:00
Max Brunsfeld
c3b65d22bf Improve prepare_grammar specs 2014-01-28 18:44:14 -08:00
Max Brunsfeld
fd0d77ef8b Separate auxiliary rules from user-specified rules 2014-01-28 13:27:30 -08:00
Max Brunsfeld
19e5b2a563 Make token extraction work for repeat rules 2014-01-28 12:52:29 -08:00
Max Brunsfeld
3ca2e126be Remove unnecessary public START and END constants 2014-01-25 21:34:46 -08:00
Max Brunsfeld
67fa81d079 Convert repeat rules into pairs of recursive rules 2014-01-24 18:27:29 -08:00