Commit graph

354 commits

Author SHA1 Message Date
Max Brunsfeld
1fa3bf0f07 In SpyReader::read, always return complete unicode characters 2014-10-03 14:30:19 -07:00
Max Brunsfeld
17f43e5e0c Clean up SpyReader 2014-10-03 14:21:39 -07:00
Max Brunsfeld
a69dfa08f3 Add spec for inserting text w/ unicode characters 2014-10-02 11:54:00 -07:00
Max Brunsfeld
8bee9d8fb9 Fix typo in parser spec descriptions 2014-10-02 11:52:58 -07:00
Max Brunsfeld
aab449635f Allow greek letters as variables in arithmetic fixture grammar 2014-10-02 11:50:16 -07:00
Max Brunsfeld
0f524121f1 Add SpyReader methods for inserting/removing by char index 2014-10-02 11:43:22 -07:00
Max Brunsfeld
5f313896c3 Make ::input a method on SpyReader, not a field 2014-09-30 14:57:57 -07:00
Max Brunsfeld
8d7d9af661 Remove unnecessary helper function in parser spec 2014-09-29 10:51:12 -07:00
Max Brunsfeld
700919951e Reorganize parser spec about handling edits 2014-09-29 10:48:35 -07:00
Max Brunsfeld
070dc76050 Generate correct C literals for non-ascii characters 2014-09-28 18:40:15 -07:00
Max Brunsfeld
cb5ecbd491 Handle string and regex rules w/ non-ascii chars 2014-09-28 18:21:22 -07:00
Max Brunsfeld
13ba35e26f Fix indentation in fixture grammars 2014-09-27 16:10:30 -07:00
Max Brunsfeld
7988829c08 Add spec for recognition of UTF8 characters 2014-09-27 16:00:48 -07:00
Max Brunsfeld
26ac5788b6 Don't use struct literal syntax for TSLength 2014-09-26 16:31:36 -07:00
Max Brunsfeld
04dc721241 Add missing import for string.h 2014-09-26 16:21:09 -07:00
Max Brunsfeld
c1565c1aae Track AST nodes' sizes in characters as well as bytes
The `pos` and `size` functions for Nodes now return TSLength structs,
which contain lengths in both characters and bytes. This is important
for knowing the number of unicode characters in a Node.
2014-09-26 16:15:07 -07:00
Max Brunsfeld
c576d7d4a0 In SpyReader, don't return pointers into main content string
This improves test coverage of the lexer. Before, a SpyReader's read function
would return pointers into a single string that contained the entire text. This
could have masked bugs where out-of-bounds characters were being read.
Now the chunks returned by the reader are copied into a separate buffer.
2014-09-26 16:12:52 -07:00
Max Brunsfeld
68d6e242ee Fix parsing of wildcard patterns at the ends of documents
- Remove special EOF handling from lexer
- Explicitly exclude the EOF character from all-inclusive character sets.
2014-09-11 13:10:23 -07:00
Max Brunsfeld
a2b80098b2 Fix destination directory in compile_examples spec 2014-09-11 12:46:46 -07:00
Max Brunsfeld
f05762b4a0 Move parser tests into their own file 2014-09-10 18:49:53 -07:00
Max Brunsfeld
9682ef6c79 Rename examples directory to spec/fixtures
- I want to move away from having complete grammars for real languages
  (e.g. javascript, golang) in this repo. These languages take a long
  time to compile, and they now exist in their own repos
  (node-tree-sitter-javascript etc).
- I want to start testing more compiler edge cases through integration
  tests, so I want to put more small, weird grammars in here. That makes
  me not want to call the directory `examples`.
2014-09-10 13:31:06 -07:00
Max Brunsfeld
cd8a683229 Improve error messages for invalid ubiquitous tokens 2014-09-10 13:02:16 -07:00
Max Brunsfeld
e9dad529f5 Make descriptions more consistent in compiler specs 2014-09-09 13:01:18 -07:00
Max Brunsfeld
1ff7cedf40 Unify ubiquitous tokens and lexical separators in API 2014-09-07 22:16:45 -07:00
Max Brunsfeld
a46f9d950c Handle '\s' correctly in regexps 2014-09-07 16:05:43 -07:00
Max Brunsfeld
ed11ef557a Fix expansion of repeat rules into recursive rules
Previously, the way repeat rules were expanded, the auxiliary
rule always needed to be reduced, even if the repeating content
was empty. This caused problems in parse states where some items
contained the repeat rule and some did not. To make those cases
work, the repeat rule had to explicitly be marked as optional.
With this change, that is no longer necessary.
2014-09-07 09:39:14 -07:00
Max Brunsfeld
43ecac2a1d Expose debug flag on document 2014-09-06 17:56:00 -07:00
Max Brunsfeld
d3204d3526 Include '_' in '\w' regex character class 2014-09-05 18:41:12 -07:00
Max Brunsfeld
3dea1261a6 Clean up document specs for incremental parsing 2014-09-03 18:48:10 -07:00
Max Brunsfeld
c72445d808 Fix inc parsing for nodes containing ubiq tokens 2014-09-03 13:17:06 -07:00
Max Brunsfeld
ad52bdc448 Fix inc parsing when appending to end of a token 2014-09-03 07:09:15 -07:00
Max Brunsfeld
77529ace3d Fix infinite loop in certain cases w/ unterminated tokens 2014-09-03 00:38:44 -07:00
Max Brunsfeld
7d81126df3 Remove unnecessary import of public header in specs 2014-09-02 22:17:04 -07:00
Max Brunsfeld
545e575508 Revert "Remove the separator characters construct"
This reverts commit 5cd07648fd.

The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.

Conflicts:
	src/compiler/build_tables/build_lex_table.cc
	src/compiler/grammar.cc
	src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
e941f8c175 Fix error in document editing
When breaking down the stack in parser.c, the previous code
would not account for ubiquitous tokens. This was a problem
for a long time, but wasn't noticed until ubiquitous tokens
started being used to represent separator characters
2014-09-01 21:32:29 -07:00
Max Brunsfeld
5cd07648fd Remove the separator characters construct
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.

For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
2985a98150 Build error nodes in lexer again, not in parser 2014-08-31 16:59:01 -07:00
Max Brunsfeld
85d8c9df5c Handle multiple ubiquitous in a row 2014-08-31 12:11:16 -07:00
Max Brunsfeld
a75686b017 Fix double release calls in document spec 2014-08-31 00:46:09 -07:00
Max Brunsfeld
c5ac02c571 Fix size calculation for error nodes 2014-08-29 13:22:03 -07:00
Max Brunsfeld
604b149c4b Assign sizes to error nodes in handle_error 2014-08-28 18:35:30 -07:00
Max Brunsfeld
3430a5edcc Clarify distinction btwn tree padding, tree offset, node position
- Node position is public. It represents the node's first character
  index in the document.
- Tree offset is private. It represents the distance between the tree's
  first character index and it's parent's first character index.
- Tree padding is private. It represents the amount of whitespace
  (or other separator characters) immediately preceding the tree.
2014-08-28 13:22:06 -07:00
Max Brunsfeld
226ffd6b5b Fix initializer list deduction warnings in specs 2014-08-27 22:23:45 -07:00
Max Brunsfeld
b91f48ced2 Call handle_error even when error occurs exactly where expected
Previously, if an error happened right at the beginning of an error
production, the error node would be immediately shifted onto the stack
without calling the error handling function.
2014-08-27 18:44:27 -07:00
Max Brunsfeld
7b0a52ec26 Pretty-print single hidden tree nodes correctly 2014-08-27 12:56:36 -07:00
Max Brunsfeld
77941c85ff Avoid building incomplete error nodes during lexing
The lexer doesn't know the expected symbols, so it doesn't have enough
information to construct error nodes. Now, when it encounters an invalid
character, it returns NULL and the parser builds a correct error node.
2014-08-25 23:35:00 -07:00
Max Brunsfeld
117869e49a Fix position calculation in node_find_for_range 2014-08-25 15:52:17 -07:00
Max Brunsfeld
1535ebd21c Handle null parent in {next,prev}_sibling 2014-08-25 11:28:09 -07:00
Max Brunsfeld
cef6827182 Add find_for_range function for Nodes 2014-08-25 09:31:27 -07:00
Max Brunsfeld
b1a7886225 Rename node_leaf_at_pos -> node_find_pos
It doesn't always return a leaf node, just the smallest node
that spans the given position.
2014-08-25 09:06:51 -07:00