tree-sitter

Author	SHA1	Message	Date
Andrew Hlynskyi	3c0152a331	chore(fmt): Apply 'cargo fmt' to the whole code base	2021-05-19 23:21:43 +03:00
Markus F.X.J. Oberhumer	cc519b3121	cli: Improve const-correctness of the generated parsers (part 2 of 2). This is a follow-up to my previous commit `1badd131f9` . I've made this an extra patch as it requires a minor API change in <tree_sitter/parser.h>. This commit moves the remaining generated tables into the read-only segment. Before: $ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \ gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \ done $ size --totals .o text data bss dec hex filename 5353477 24472 0 5377949 520f9d (TOTALS) After: $ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \ gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \ done $ size --totals .o 5378147 0 0 5378147 521063 (TOTALS)	2021-05-19 12:49:57 +02:00
Markus F.X.J. Oberhumer	1badd131f9	cli: Improve const-correctness of the generated parsers. This moves most of the generated tables from the data segment into the text segment (read-only memory) so that it can be shared between different processes. As a bonus side effect we can also remove all casts in the generated parsers. Before: size --totals target/scratch/.so text data bss dec hex filename 853623 4684560 2160 5540343 5489f7 (TOTALS) After: size --totals target/scratch/.so text data bss dec hex filename 5472086 68616 480 5541182 548d3e (TOTALS)	2021-04-27 09:22:18 +02:00
Max Brunsfeld	57036b4f8a	Extract lexer helper functions for all large char sets No need to restrict it to char sets used in multiple places. This is important because the helper functions are now implemented more efficiently than the inline comparisons (using a binary search).	2021-03-11 11:48:48 -08:00
Max Brunsfeld	592fd8678d	Organize TSLanguage fields Due to the breaking ABI change in #943, this is our chance to reorder the fields in a more logical way.	2021-03-01 10:27:22 -08:00
Max Brunsfeld	d56f9ebe4e	Re-enable --prev-abi flag to generate command	2021-02-26 14:51:01 -08:00
Max Brunsfeld	c1639cc456	Add production_id_count field to Language objects I think this is the last additional field that's needed so that every array member of TSLanguage has a length that can be calculated at runtime.	2021-02-25 16:32:05 -08:00
Max Brunsfeld	29bc26ecd5	Fix test failure after non-terminal extras change	2021-02-18 15:43:01 -08:00
Max Brunsfeld	86a891fa63	Fix bugs in parser generation for non-terminal extras Previously, we attempted to completely separate the parse states for item sets with non-terminal extras from the parse states for other rules. But there was not a complete separation. It actually isn't necessary to separate the parse states in this way. The only special behavior for parse states with non-terminal extra rules is what happens at the end of the rule: these parse states need to perform an unconditional reduction. Luckily, it's possible to distinguish these non-terminal extra ending states from other states just based on their normal structure, with no additional state.	2021-02-18 14:14:22 -08:00
Max Brunsfeld	6ae04051e7	Tweak whitespace in generated character set functions	2021-02-17 16:32:49 -08:00
Max Brunsfeld	dad8546776	Generate more compact code for character set binary search	2021-02-17 13:52:23 -08:00
Max Brunsfeld	6132a10b1c	Use binary search in generated character set functions	2021-02-17 13:08:56 -08:00
Max Brunsfeld	ab78ab3f9b	Represent CharacterSet internally as a vector of ranges	2021-01-28 16:10:39 -08:00
Max Brunsfeld	3497f34dd7	Fix parser-generation bugs introduced in #782	2020-11-02 13:43:28 -08:00
Max Brunsfeld	071f4e40f1	Fix generate error when there are aliases in unused rules	2020-10-28 12:34:16 -07:00
Max Brunsfeld	a2d760e426	Ensure nodes are aliased consistently within syntax error nodes Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2020-10-27 15:46:09 -07:00
Max Brunsfeld	8bb8e9b8b3	Initialize TSLanguage fields in order of their declaration This makes parser.c valid under the C++20 standard	2020-10-15 07:20:12 -07:00
Max Brunsfeld	ffd3bdc4c1	Escape ? in C string literals Fixes #714	2020-09-23 13:06:06 -07:00
Max Brunsfeld	b5a9adb555	Allow queries to match on supertypes Co-authored-by: Ayman Nadeem <aymannadeem@github.com>	2020-09-21 12:34:48 -07:00
Max Brunsfeld	ff488f89c9	Make the --prev-abi flag work w/ the newest abi change	2020-09-08 10:58:20 -07:00
Max Brunsfeld	2eb04094f8	Handle aliased parent nodes in query analysis	2020-08-21 14:12:04 -07:00
Max Brunsfeld	4c2f36a07b	Mark steps as definite on query construction * Add a ts_query_pattern_is_definite API, just for debugging this * Store state_count on TSLanguage structs, to allow for scanning parse tables	2020-06-25 15:06:27 -07:00
Max Brunsfeld	ec870e9e66	Avoid extracting helpers for char sets that are only used once	2020-05-26 16:37:45 -07:00
Max Brunsfeld	911fb7f1b2	Extract helper functions to reduce the code size of the lexer function (#626 ) * Extract helper functions to reduce code size of ts_lex * Name char set helper functions based on token name	2020-05-26 13:39:11 -07:00
Max Brunsfeld	b66d149b74	Fix inconsistent whitespace after '{' in generated parser	2020-05-13 15:56:49 -07:00
Max Brunsfeld	cdc973866f	Fix build-wasm command on latest emscripten	2020-05-12 15:42:11 -07:00
Riccardo Schirone	780e9cecc9	Do not use multiple unnamed structs inside of unions	2020-04-29 20:42:45 +02:00
Max Brunsfeld	a003e5f6bd	generate: Avoid duplicate string tokens in unique symbol map	2020-03-20 11:35:11 -07:00
Alyssa Verkade	0e689657b7	Add a language linkage declaration to parsers Previously, in order to compile a `tree-sitter` grammar that contained c++ source in the parser (ie the `scanner.cc` file), you would have to compile the `parser.c` file separately from the c++ files. For example, in rust this would result in a `build.rs` close to the following: ``` extern crate cc; fn main() { let dir: PathBuf = ["tree-sitter-ruby", "src"].iter().collect(); cc::Build::new() .include(&dir) .cpp(true) .file(dir.join("scanner.cc")) // NOTE: must have a name that differs from the c static lib .compile("tree-sitter-ruby-scanner"); cc::Build::new() .include(&dir) .file(dir.join("parser.c")) // NOTE: must have a name that differs from the c++ static lib .compile("tree-sitter-ruby-parser"); } ``` This was necessary at the time for the following grammars: `ruby`, `php`, `python`, `embedded-template`, `html`, `cpp`, `ocaml`, `bash`, `agda`, and `haskell`. To solve this, we specify an `extern "C"` language linkage declaration to the functions that must be linked against to compile a parser with the scanner, making parsers linkable against c++ source. On all major compilers (gcc, clang, and msvc) this should be the only change needed due to the combination of clang and gcc both supporting designated initialization for years and msvc 2019 adopting designated initializers as a part of the C++20 conformance push. Subsequently, for rust projects, the necessary `build.rs` would become (which also brings these parsers into sync with the current docs): ``` extern crate cc; fn main() { let dir: PathBuf = ["tree-sitter-ruby", "src"].iter().collect(); cc::Build::new() .include(&dir) .cpp(true) .file(dir.join("scanner.cc")) .file(dir.join("parser.c")) .compile("tree-sitter-ruby"); } ```	2020-02-18 19:46:59 -08:00
Max Brunsfeld	8dd68c360a	Fix logic for generating unique symbol map Previously, this didn't correctly handle the case where multiple symbols were all simply-aliased to the same other symbol. Refs #500	2020-01-27 12:06:48 -08:00
Max Brunsfeld	fc19312913	Fix node-types bugs involving aliases and external tokens	2019-12-12 10:06:18 -08:00
Max Brunsfeld	a5a9000e29	generate: Ensure that field_map_slices array is long enough	2019-12-09 11:46:32 -08:00
Max Brunsfeld	7032dae4f6	Include alias symbols in unique symbol map	2019-12-06 12:11:09 -08:00
Max Brunsfeld	56c620c005	Store a mapping to ensure no two symbols map to the same metadata	2019-12-05 17:21:46 -08:00
Max Brunsfeld	5767bbc806	Avoid generating C char literals with control characters Fixes #487	2019-11-13 10:54:34 -08:00
Max Brunsfeld	d765332c61	Don't rely on new eof ABI in parsers unless --next-abi is passed	2019-10-31 14:32:50 -07:00
Max Brunsfeld	d3b7caa565	Add a TSLexer.eof() API, use it in generated parsers	2019-10-31 14:11:52 -07:00
Max Brunsfeld	fcaabea0cf	Allow non-terminal extras	2019-10-21 16:08:59 -07:00
Max Brunsfeld	69ab405325	In next ABI, group symbols by action in small parse state table This is a more compact representation because in most states, many symbols share the same actions.	2019-08-30 20:29:55 -07:00
Max Brunsfeld	8037607583	Only generate the new parse table format if --next-abi flag is used	2019-08-29 17:37:33 -07:00
Max Brunsfeld	82ff542d3b	Appease MSVC by avoiding empty arrays	2019-08-29 17:31:44 -07:00
Max Brunsfeld	09a2755399	Store parse states with few lookahead symbols in a more compact way	2019-08-29 15:52:23 -07:00
Max Brunsfeld	48a883c1d4	Move external token state id computation out of render module	2019-08-29 15:48:22 -07:00
Max Brunsfeld	2430733ee8	Avoid iterating hashmaps in places where order matters	2019-08-29 15:26:05 -07:00
Max Brunsfeld	56ce4e5d50	Upgrade rsass, remove hashbrown	2019-08-13 10:08:58 -07:00
Max Brunsfeld	5f369a5870	Fix another empty array literal for MSVC compatibility	2019-08-12 15:13:41 -07:00
Max Brunsfeld	13c0aa7dbb	Avoid empty initializer list for ts_alias_sequences Fixes a bug introduced in `68b089b41e`	2019-08-12 14:11:40 -07:00
Max Brunsfeld	68b089b41e	cli: Fix generation of parsers with fields but no aliases Fixes #419	2019-08-11 09:22:30 -07:00
Max Brunsfeld	5b38ff5f78	Loosen lex state equality check to catch some spurious duplicates	2019-06-20 09:57:38 -07:00
Max Brunsfeld	e4873191d6	Refactor generated lex function to use fewer instructions per state	2019-06-20 09:57:38 -07:00

1 2

66 commits