Commit graph

201 commits

Author SHA1 Message Date
Max Brunsfeld
c1639cc456 Add production_id_count field to Language objects
I think this is the last additional field that's needed so
that every array member of TSLanguage has a length that
can be calculated at runtime.
2021-02-25 16:32:05 -08:00
Max Brunsfeld
d8a235faa1 Add further static validation of named precedences 2021-02-25 11:54:21 -08:00
Max Brunsfeld
344797c110 Implement named precedence comparison 2021-02-24 16:02:56 -08:00
Max Brunsfeld
d40f118370 Generalize precedence datatype to include strings
Right now, the strings are not used in comparisons, but they
are passed through the grammar processing pipeline, and are
available to the parse table construction algorithm.

This also cleans up a confusing aspect of the parse table
construction, in which precedences and associativities were
temporarily stored in the parse table data structure itself.
2021-02-23 20:48:39 -08:00
Max Brunsfeld
2f28a35e1b Handle unicode property escapes inside bracketed char classes
Refs #906
2021-02-18 22:27:44 -08:00
Max Brunsfeld
29bc26ecd5 Fix test failure after non-terminal extras change 2021-02-18 15:43:01 -08:00
Max Brunsfeld
86a891fa63 Fix bugs in parser generation for non-terminal extras
Previously, we attempted to completely separate the parse states
for item sets with non-terminal extras from the parse states
for other rules. But there was not a complete separation.

It actually isn't necessary to separate the parse states in this way.
The only special behavior for parse states with non-terminal extra rules
is what happens at the *end* of the rule: these parse states need to
perform an unconditional reduction.

Luckily, it's possible to distinguish these *non-terminal extra ending*
states from other states just based on their normal structure, with
no additional state.
2021-02-18 14:14:22 -08:00
Max Brunsfeld
b46d51f224 Add a unit test for all unicode character escape forms 2021-02-17 17:49:01 -08:00
Max Brunsfeld
5b630054c6 Handle negated unicode property escapes in regexes
Refs #380
2021-02-17 17:22:33 -08:00
Max Brunsfeld
6ae04051e7 Tweak whitespace in generated character set functions 2021-02-17 16:32:49 -08:00
Max Brunsfeld
9d9eb2234f
Merge pull request #906 from tree-sitter/unicode-property-escapes
Handle simple unicode property escapes in regexes
2021-02-17 16:14:42 -08:00
Max Brunsfeld
dad8546776 Generate more compact code for character set binary search 2021-02-17 13:52:23 -08:00
Max Brunsfeld
6132a10b1c Use binary search in generated character set functions 2021-02-17 13:08:56 -08:00
Max Brunsfeld
f5a4c14dbe Add some doc comments to CharacterSet 2021-02-16 21:37:52 -08:00
Max Brunsfeld
2b0de9dfec Fix small bugs in conflict reporting
* Negative precedence values were not displayed
* Rule names were repeated in resolution suggestions
2021-02-01 13:30:06 -08:00
Max Brunsfeld
e3ba701344 Start work on handling unicode property escapes in regexes 2021-01-29 16:37:45 -08:00
Max Brunsfeld
38444ea7f9
Merge pull request #904 from tree-sitter/character-set-ranges
Represent CharacterSet internally as a vector of ranges
2021-01-29 13:35:48 -08:00
Andrew Hlynskyi
2b9e5f6c4b Fix hiding problems in ./build/Debug/tree_sitter_*_binding
In debug building modules also may happen errors and a current implementation
completely hides them, so errors like 'undefined symbol' can't be
easily identified due to wrong traceback and error message.
2021-01-29 15:54:10 +02:00
Max Brunsfeld
ab78ab3f9b Represent CharacterSet internally as a vector of ranges 2021-01-28 16:10:39 -08:00
Max Brunsfeld
026231e93d Merge branch 'master' into HEAD 2020-12-03 09:44:33 -08:00
Max Brunsfeld
3497f34dd7 Fix parser-generation bugs introduced in #782 2020-11-02 13:43:28 -08:00
Arthur Baars
d62e7f7d75 Add test case with extra_symbols 2020-10-30 10:58:41 +01:00
Arthur Baars
f07dda692e Ensure "extras" symbols are included in the node-types.json file
The symbols marked as "extras" are the start symbols of secondary
languages. These should be included in the aliases map just as done
for start symbol of the main language to ensure their node type and
field information is included in the node-types.json file.
2020-10-29 18:05:24 +01:00
Max Brunsfeld
071f4e40f1 Fix generate error when there are aliases in unused rules 2020-10-28 12:34:16 -07:00
Max Brunsfeld
a2d760e426 Ensure nodes are aliased consistently within syntax error nodes
Co-Authored-By: Rick Winfrey <rewinfrey@github.com>
2020-10-27 15:46:09 -07:00
Max Brunsfeld
8bb8e9b8b3 Initialize TSLanguage fields in order of their declaration
This makes parser.c valid under the C++20 standard
2020-10-15 07:20:12 -07:00
Patrick Thomson
683a2da055 Fix crash when extras function doesn't return an array.
Fixes #745, which failed due to attempting to call `map` on a
non-array. This bails out at the same spot, but with a more
illuminating error message.
2020-09-30 16:21:20 -04:00
Max Brunsfeld
ffd3bdc4c1 Escape ? in C string literals
Fixes #714
2020-09-23 13:06:06 -07:00
Max Brunsfeld
5003064da7 Make supertypes automatically hidden, without underscore prefix 2020-09-23 09:35:14 -07:00
Max Brunsfeld
b5a9adb555 Allow queries to match on supertypes
Co-authored-by: Ayman Nadeem <aymannadeem@github.com>
2020-09-21 12:34:48 -07:00
Max Brunsfeld
ff488f89c9 Make the --prev-abi flag work w/ the newest abi change 2020-09-08 10:58:20 -07:00
Max Brunsfeld
2eb04094f8 Handle aliased parent nodes in query analysis 2020-08-21 14:12:04 -07:00
Max Brunsfeld
1ea29053e1 Merge branch 'master' into query-pattern-is-definite 2020-08-14 09:31:55 -07:00
Max Brunsfeld
81bbdf19f4 Fix handling of non-terminal extras that share non-extra rules
Fixes #701
2020-07-29 09:50:13 -07:00
Max Brunsfeld
32099050d6 node_types: Fix panic when field is associated with a hidden token
Fixes #695
2020-07-24 09:26:56 -07:00
Max Brunsfeld
82aa1462fd Clean up get_variable_info function 2020-07-17 15:12:13 -07:00
Max Brunsfeld
c4fca5f73e node types: Fix handling of repetitions inside of fields
Fixes #676
2020-07-17 14:19:59 -07:00
Max Brunsfeld
f4adf0269a Propagate dynamic precedence correctly for inlined rules
Fixes #683
2020-07-17 09:53:01 -07:00
Max Brunsfeld
4c2f36a07b Mark steps as definite on query construction
* Add a ts_query_pattern_is_definite API, just for debugging this
* Store state_count on TSLanguage structs, to allow for scanning parse tables
2020-06-25 15:06:27 -07:00
Max Brunsfeld
a6f71328fe Avoid whitelist/blacklist terminology in test comments 2020-06-16 09:22:34 -07:00
Max Brunsfeld
ec870e9e66 Avoid extracting helpers for char sets that are only used once 2020-05-26 16:37:45 -07:00
Max Brunsfeld
911fb7f1b2
Extract helper functions to reduce the code size of the lexer function (#626)
* Extract helper functions to reduce code size of ts_lex

* Name char set helper functions based on token name
2020-05-26 13:39:11 -07:00
Max Brunsfeld
9d182bb078 node-types: Fix bug w/ required property when multiple rules aliased as same 2020-05-14 10:51:18 -07:00
Max Brunsfeld
b66d149b74 Fix inconsistent whitespace after '{' in generated parser 2020-05-13 15:56:49 -07:00
Max Brunsfeld
cdc973866f Fix build-wasm command on latest emscripten 2020-05-12 15:42:11 -07:00
Riccardo Schirone
780e9cecc9 Do not use multiple unnamed structs inside of unions 2020-04-29 20:42:45 +02:00
Max Brunsfeld
a003e5f6bd generate: Avoid duplicate string tokens in unique symbol map 2020-03-20 11:35:11 -07:00
Max Brunsfeld
6cb8d24de2
Merge pull request #542 from SKalt/issue-524-document-supertypes-in-grammar-schema
feat(cli): documented optional supertypes string[] in grammar schema
2020-02-24 16:14:49 -08:00
Steven Kalt
d82ee739e9
Update cli/src/generate/grammar-schema.json
Co-Authored-By: Max Brunsfeld <maxbrunsfeld@github.com>
2020-02-24 18:13:38 -05:00
Alyssa Verkade
0e689657b7 Add a language linkage declaration to parsers
Previously, in order to compile a `tree-sitter` grammar that contained
c++ source in the parser (ie the `scanner.cc` file), you would have to
compile the `parser.c` file separately from the c++ files. For example,
in rust this would result in a `build.rs` close to the following:
```
extern crate cc;

fn main() {
  let dir: PathBuf = ["tree-sitter-ruby", "src"].iter().collect();

  cc::Build::new()
    .include(&dir)
    .cpp(true)
    .file(dir.join("scanner.cc"))
    // NOTE: must have a name that differs from the c static lib
    .compile("tree-sitter-ruby-scanner");

  cc::Build::new()
    .include(&dir)
    .file(dir.join("parser.c"))
    // NOTE: must have a name that differs from the c++ static lib
    .compile("tree-sitter-ruby-parser");
}
```

This was necessary at the time for the following grammars: `ruby`,
`php`, `python`, `embedded-template`, `html`, `cpp`, `ocaml`,
`bash`, `agda`, and `haskell`.

To solve this, we specify an `extern "C"` language linkage declaration
to the functions that must be linked against to compile a parser with the
scanner, making parsers linkable against c++ source.
On all major compilers (gcc, clang, and msvc) this should be the only
change needed due to the combination of clang and gcc both supporting
designated initialization for years and msvc 2019 adopting designated
initializers as a part of the C++20 conformance push.

Subsequently, for rust projects, the necessary `build.rs` would become
(which also brings these parsers into sync with the current docs):
```
extern crate cc;

fn main() {
  let dir: PathBuf = ["tree-sitter-ruby", "src"].iter().collect();

  cc::Build::new()
    .include(&dir)
    .cpp(true)
    .file(dir.join("scanner.cc"))
    .file(dir.join("parser.c"))
    .compile("tree-sitter-ruby");
}
```
2020-02-18 19:46:59 -08:00