Commit graph

51 commits

Author SHA1 Message Date
Max Brunsfeld
a2d760e426 Ensure nodes are aliased consistently within syntax error nodes
Co-Authored-By: Rick Winfrey <rewinfrey@github.com>
2020-10-27 15:46:09 -07:00
Max Brunsfeld
8bb8e9b8b3 Initialize TSLanguage fields in order of their declaration
This makes parser.c valid under the C++20 standard
2020-10-15 07:20:12 -07:00
Max Brunsfeld
ffd3bdc4c1 Escape ? in C string literals
Fixes #714
2020-09-23 13:06:06 -07:00
Max Brunsfeld
b5a9adb555 Allow queries to match on supertypes
Co-authored-by: Ayman Nadeem <aymannadeem@github.com>
2020-09-21 12:34:48 -07:00
Max Brunsfeld
ff488f89c9 Make the --prev-abi flag work w/ the newest abi change 2020-09-08 10:58:20 -07:00
Max Brunsfeld
2eb04094f8 Handle aliased parent nodes in query analysis 2020-08-21 14:12:04 -07:00
Max Brunsfeld
4c2f36a07b Mark steps as definite on query construction
* Add a ts_query_pattern_is_definite API, just for debugging this
* Store state_count on TSLanguage structs, to allow for scanning parse tables
2020-06-25 15:06:27 -07:00
Max Brunsfeld
ec870e9e66 Avoid extracting helpers for char sets that are only used once 2020-05-26 16:37:45 -07:00
Max Brunsfeld
911fb7f1b2
Extract helper functions to reduce the code size of the lexer function (#626)
* Extract helper functions to reduce code size of ts_lex

* Name char set helper functions based on token name
2020-05-26 13:39:11 -07:00
Max Brunsfeld
b66d149b74 Fix inconsistent whitespace after '{' in generated parser 2020-05-13 15:56:49 -07:00
Max Brunsfeld
cdc973866f Fix build-wasm command on latest emscripten 2020-05-12 15:42:11 -07:00
Riccardo Schirone
780e9cecc9 Do not use multiple unnamed structs inside of unions 2020-04-29 20:42:45 +02:00
Max Brunsfeld
a003e5f6bd generate: Avoid duplicate string tokens in unique symbol map 2020-03-20 11:35:11 -07:00
Alyssa Verkade
0e689657b7 Add a language linkage declaration to parsers
Previously, in order to compile a `tree-sitter` grammar that contained
c++ source in the parser (ie the `scanner.cc` file), you would have to
compile the `parser.c` file separately from the c++ files. For example,
in rust this would result in a `build.rs` close to the following:
```
extern crate cc;

fn main() {
  let dir: PathBuf = ["tree-sitter-ruby", "src"].iter().collect();

  cc::Build::new()
    .include(&dir)
    .cpp(true)
    .file(dir.join("scanner.cc"))
    // NOTE: must have a name that differs from the c static lib
    .compile("tree-sitter-ruby-scanner");

  cc::Build::new()
    .include(&dir)
    .file(dir.join("parser.c"))
    // NOTE: must have a name that differs from the c++ static lib
    .compile("tree-sitter-ruby-parser");
}
```

This was necessary at the time for the following grammars: `ruby`,
`php`, `python`, `embedded-template`, `html`, `cpp`, `ocaml`,
`bash`, `agda`, and `haskell`.

To solve this, we specify an `extern "C"` language linkage declaration
to the functions that must be linked against to compile a parser with the
scanner, making parsers linkable against c++ source.
On all major compilers (gcc, clang, and msvc) this should be the only
change needed due to the combination of clang and gcc both supporting
designated initialization for years and msvc 2019 adopting designated
initializers as a part of the C++20 conformance push.

Subsequently, for rust projects, the necessary `build.rs` would become
(which also brings these parsers into sync with the current docs):
```
extern crate cc;

fn main() {
  let dir: PathBuf = ["tree-sitter-ruby", "src"].iter().collect();

  cc::Build::new()
    .include(&dir)
    .cpp(true)
    .file(dir.join("scanner.cc"))
    .file(dir.join("parser.c"))
    .compile("tree-sitter-ruby");
}
```
2020-02-18 19:46:59 -08:00
Max Brunsfeld
8dd68c360a Fix logic for generating unique symbol map
Previously, this didn't correctly handle the case where *multiple* 
symbols were all simply-aliased to the same *other* symbol.

Refs #500
2020-01-27 12:06:48 -08:00
Max Brunsfeld
fc19312913 Fix node-types bugs involving aliases and external tokens 2019-12-12 10:06:18 -08:00
Max Brunsfeld
a5a9000e29 generate: Ensure that field_map_slices array is long enough 2019-12-09 11:46:32 -08:00
Max Brunsfeld
7032dae4f6 Include alias symbols in unique symbol map 2019-12-06 12:11:09 -08:00
Max Brunsfeld
56c620c005 Store a mapping to ensure no two symbols map to the same metadata 2019-12-05 17:21:46 -08:00
Max Brunsfeld
5767bbc806 Avoid generating C char literals with control characters
Fixes #487
2019-11-13 10:54:34 -08:00
Max Brunsfeld
d765332c61 Don't rely on new eof ABI in parsers unless --next-abi is passed 2019-10-31 14:32:50 -07:00
Max Brunsfeld
d3b7caa565 Add a TSLexer.eof() API, use it in generated parsers 2019-10-31 14:11:52 -07:00
Max Brunsfeld
fcaabea0cf Allow non-terminal extras 2019-10-21 16:08:59 -07:00
Max Brunsfeld
69ab405325 In next ABI, group symbols by action in small parse state table
This is a more compact representation because in most states, many 
symbols share the same actions.
2019-08-30 20:29:55 -07:00
Max Brunsfeld
8037607583 Only generate the new parse table format if --next-abi flag is used 2019-08-29 17:37:33 -07:00
Max Brunsfeld
82ff542d3b Appease MSVC by avoiding empty arrays 2019-08-29 17:31:44 -07:00
Max Brunsfeld
09a2755399 Store parse states with few lookahead symbols in a more compact way 2019-08-29 15:52:23 -07:00
Max Brunsfeld
48a883c1d4 Move external token state id computation out of render module 2019-08-29 15:48:22 -07:00
Max Brunsfeld
2430733ee8 Avoid iterating hashmaps in places where order matters 2019-08-29 15:26:05 -07:00
Max Brunsfeld
56ce4e5d50 Upgrade rsass, remove hashbrown 2019-08-13 10:08:58 -07:00
Max Brunsfeld
5f369a5870 Fix another empty array literal for MSVC compatibility 2019-08-12 15:13:41 -07:00
Max Brunsfeld
13c0aa7dbb Avoid empty initializer list for ts_alias_sequences
Fixes a bug introduced in 68b089b41e
2019-08-12 14:11:40 -07:00
Max Brunsfeld
68b089b41e cli: Fix generation of parsers with fields but no aliases
Fixes #419
2019-08-11 09:22:30 -07:00
Max Brunsfeld
5b38ff5f78 Loosen lex state equality check to catch some spurious duplicates 2019-06-20 09:57:38 -07:00
Max Brunsfeld
e4873191d6 Refactor generated lex function to use fewer instructions per state 2019-06-20 09:57:38 -07:00
Max Brunsfeld
5035e194ff Merge branch 'master' into node-fields 2019-03-26 11:58:21 -07:00
Max Brunsfeld
5a59f19b69 Use explicit syntax for functions with no parameters 2019-03-21 16:06:06 -07:00
Max Brunsfeld
56309a1c28 Generate node-fields.json file 2019-02-12 11:06:18 -08:00
Max Brunsfeld
79d90f0d3e Restore naming of alias sequence lengths
Fields aren't stored in sequences now, so the max length
is back to being just for aliases.
2019-02-08 16:14:18 -08:00
Max Brunsfeld
d8a2c0dda2 Use a separate type for storing field map headers 2019-02-08 16:06:29 -08:00
Max Brunsfeld
1d1674811c Fully implement ts_node_child_by_field_id 2019-02-08 15:16:56 -08:00
Max Brunsfeld
18a13b457d Get basic field API working 2019-02-08 15:16:56 -08:00
Max Brunsfeld
108ca989ea Start work on including child refs in generated parsers 2019-02-08 15:16:56 -08:00
Max Brunsfeld
4badd7cc40 Disable compiler optimizations for lex functions in more cases
* Reduce the lexer state count threshold from 500 to 300
* Disable optimizations on clang and gcc in addition to MSVC

Optimizations in these source files don't seem to make any impact on
parsing performance, but they slow down compile time substantially.
2019-02-06 11:50:37 -08:00
Max Brunsfeld
ed195de8b6 rustfmt 2019-01-17 17:16:04 -08:00
Max Brunsfeld
19b2addcc4 Fix bug in symbol enum code generation 2019-01-14 14:08:07 -08:00
Max Brunsfeld
2e009f7177 Avoid writing empty initializer list for alias sequences 2019-01-12 21:57:34 -08:00
Max Brunsfeld
545e840a08 Remove stray single quotes in symbol name strings 2019-01-12 21:42:31 -08:00
Max Brunsfeld
c76a155174 Fix escaping of characters in C strings 2019-01-11 17:43:27 -08:00
Max Brunsfeld
6592fdd24c Fix parser generation error messages 2019-01-11 17:26:45 -08:00