Commit graph

579 commits

Author SHA1 Message Date
Max Brunsfeld
d1603e298f bytes_{inserted,removed} -> chars_{inserted,removed} 2014-09-28 18:47:29 -07:00
Max Brunsfeld
070dc76050 Generate correct C literals for non-ascii characters 2014-09-28 18:40:15 -07:00
Max Brunsfeld
cb5ecbd491 Handle string and regex rules w/ non-ascii chars 2014-09-28 18:21:22 -07:00
Max Brunsfeld
e0185f84fc Print non-ascii characters as numbers in CharacterRange::to_string 2014-09-28 18:19:42 -07:00
Max Brunsfeld
13ba35e26f Fix indentation in fixture grammars 2014-09-27 16:10:30 -07:00
Max Brunsfeld
7988829c08 Add spec for recognition of UTF8 characters 2014-09-27 16:00:48 -07:00
Max Brunsfeld
444188cb5f Display characters > 255 as numbers in debug output 2014-09-27 16:00:27 -07:00
Max Brunsfeld
26ac5788b6 Don't use struct literal syntax for TSLength 2014-09-26 16:31:36 -07:00
Max Brunsfeld
04dc721241 Add missing import for string.h 2014-09-26 16:21:09 -07:00
Max Brunsfeld
c1565c1aae Track AST nodes' sizes in characters as well as bytes
The `pos` and `size` functions for Nodes now return TSLength structs,
which contain lengths in both characters and bytes. This is important
for knowing the number of unicode characters in a Node.
2014-09-26 16:15:07 -07:00
Max Brunsfeld
c576d7d4a0 In SpyReader, don't return pointers into main content string
This improves test coverage of the lexer. Before, a SpyReader's read function
would return pointers into a single string that contained the entire text. This
could have masked bugs where out-of-bounds characters were being read.
Now the chunks returned by the reader are copied into a separate buffer.
2014-09-26 16:12:52 -07:00
Max Brunsfeld
f2e2102a25 Add missing import of stdint.h 2014-09-13 00:25:12 -07:00
Max Brunsfeld
141cbcfa02 Read unicode characters using utf8proc 2014-09-13 00:24:10 -07:00
Max Brunsfeld
e23f11b7c4 Allow lexical debug mode to be enabled on documents
- `ts_document_set_debug(doc, 1)` implies parse debug mode
- `ts_document_set_debug(doc, > 1)` implies parse and lex debug mode
2014-09-11 13:12:06 -07:00
Max Brunsfeld
68d6e242ee Fix parsing of wildcard patterns at the ends of documents
- Remove special EOF handling from lexer
- Explicitly exclude the EOF character from all-inclusive character sets.
2014-09-11 13:10:23 -07:00
Max Brunsfeld
a2b80098b2 Fix destination directory in compile_examples spec 2014-09-11 12:46:46 -07:00
Max Brunsfeld
f05762b4a0 Move parser tests into their own file 2014-09-10 18:49:53 -07:00
Max Brunsfeld
3649ddb77a Move valgrind suppressions file into scripts directory 2014-09-10 18:25:09 -07:00
Max Brunsfeld
9682ef6c79 Rename examples directory to spec/fixtures
- I want to move away from having complete grammars for real languages
  (e.g. javascript, golang) in this repo. These languages take a long
  time to compile, and they now exist in their own repos
  (node-tree-sitter-javascript etc).
- I want to start testing more compiler edge cases through integration
  tests, so I want to put more small, weird grammars in here. That makes
  me not want to call the directory `examples`.
2014-09-10 13:31:06 -07:00
Max Brunsfeld
209992c832 Remove trailing whitespace 2014-09-10 13:19:45 -07:00
Max Brunsfeld
9a93f6bdef Clean up prepare_grammar function 2014-09-10 13:02:31 -07:00
Max Brunsfeld
cd8a683229 Improve error messages for invalid ubiquitous tokens 2014-09-10 13:02:16 -07:00
Max Brunsfeld
2e7ffb4d14 Tweak auto-format settings
Prefer lines that exceed 80 characters by a small margin to
line breaks in argument lists
2014-09-09 13:15:40 -07:00
Max Brunsfeld
e9dad529f5 Make descriptions more consistent in compiler specs 2014-09-09 13:01:18 -07:00
Max Brunsfeld
8f109504a8 Clean up extract_tokens function 2014-09-09 12:57:29 -07:00
Max Brunsfeld
9ee0665fad Remove unused code in extract_tokens.cc 2014-09-09 12:34:15 -07:00
Max Brunsfeld
17c6b05432 Update todo.md 2014-09-09 12:31:47 -07:00
Max Brunsfeld
e181426f6f Use make_tuple rather than init list syntax for gcc 2014-09-07 22:58:45 -07:00
Max Brunsfeld
1ff7cedf40 Unify ubiquitous tokens and lexical separators in API 2014-09-07 22:16:45 -07:00
Max Brunsfeld
a46f9d950c Handle '\s' correctly in regexps 2014-09-07 16:05:43 -07:00
Max Brunsfeld
2a9f51790f Move is_token function to its own file 2014-09-07 13:49:44 -07:00
Max Brunsfeld
ed11ef557a Fix expansion of repeat rules into recursive rules
Previously, the way repeat rules were expanded, the auxiliary
rule always needed to be reduced, even if the repeating content
was empty. This caused problems in parse states where some items
contained the repeat rule and some did not. To make those cases
work, the repeat rule had to explicitly be marked as optional.
With this change, that is no longer necessary.
2014-09-07 09:39:14 -07:00
Max Brunsfeld
43ecac2a1d Expose debug flag on document 2014-09-06 17:56:00 -07:00
Max Brunsfeld
c0a3f8d39c Remove some macros from public parser header 2014-09-05 23:47:38 -07:00
Max Brunsfeld
d3204d3526 Include '_' in '\w' regex character class 2014-09-05 18:41:12 -07:00
Max Brunsfeld
8512af712e Add debug log when re-lexing during error handling 2014-09-05 18:38:17 -07:00
Max Brunsfeld
6cf267efaf Clean up breakdown stack function 2014-09-03 22:35:52 -07:00
Max Brunsfeld
9c0b5b5571 clang-format 2014-09-03 18:53:38 -07:00
Max Brunsfeld
3dea1261a6 Clean up document specs for incremental parsing 2014-09-03 18:48:10 -07:00
Max Brunsfeld
c72445d808 Fix inc parsing for nodes containing ubiq tokens 2014-09-03 13:17:06 -07:00
Max Brunsfeld
ad52bdc448 Fix inc parsing when appending to end of a token 2014-09-03 07:09:15 -07:00
Max Brunsfeld
77529ace3d Fix infinite loop in certain cases w/ unterminated tokens 2014-09-03 00:38:44 -07:00
Max Brunsfeld
7d81126df3 Remove unnecessary import of public header in specs 2014-09-02 22:17:04 -07:00
Max Brunsfeld
cc5f1471a8 Add debug lines for breaking down stack when re-parsing 2014-09-02 22:16:17 -07:00
Max Brunsfeld
66a50d4e4a Add separate debug and release configurations in gyp files
Disable optimizations in debug mode. Use that for specs.
2014-09-02 22:13:53 -07:00
Max Brunsfeld
545e575508 Revert "Remove the separator characters construct"
This reverts commit 5cd07648fd.

The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.

Conflicts:
	src/compiler/build_tables/build_lex_table.cc
	src/compiler/grammar.cc
	src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
e941f8c175 Fix error in document editing
When breaking down the stack in parser.c, the previous code
would not account for ubiquitous tokens. This was a problem
for a long time, but wasn't noticed until ubiquitous tokens
started being used to represent separator characters
2014-09-01 21:32:29 -07:00
Max Brunsfeld
5cd07648fd Remove the separator characters construct
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.

For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
db295cebbc Suppress unused variable warning in stack iteration macro 2014-09-01 14:16:27 -07:00
Max Brunsfeld
d38f095f01 Clean up Tree code 2014-09-01 14:08:07 -07:00