History

Max Brunsfeld 774ae5e3d5 In parse tables, store production ids as 16 bits Also remove the use of bitfields from the parse table format. In all cases, bitfields were not necessary to achieve the current binary sizes. Avoiding them makes the binaries more portable. There was no way to make this change backward-compatible, so we have finally dropped support for parsers generated with an earlier version of Tree-sitter. At some point, when Atom adopts this version of Tree-sitter, this change will affect Atom users who have installed packages using third-party Tree-sitter parsers. The packages will need to be updated to use a regenerated version of the parsers.		2021-02-25 16:12:31 -08:00
..
allocations.rs	Move allocation tracking into lib crate	2021-02-23 09:16:37 -05:00
bindings.rs	In parse tables, store production ids as 16 bits	2021-02-25 16:12:31 -08:00
build.rs	Simplify setup for enabling/disabling allocation recording in the C lib	2020-12-02 15:35:13 -08:00
ffi.rs	Reorganize language bindings	2019-05-07 10:41:49 -07:00
lib.rs	Move allocation tracking into lib crate	2021-02-23 09:16:37 -05:00
README.md	Tweak readmes	2020-05-12 16:16:48 -07:00
util.rs	Move allocation tracking into lib crate	2021-02-23 09:16:37 -05:00

README.md

Rust Tree-sitter

Rust bindings to the Tree-sitter parsing library.

Basic Usage

First, create a parser:

use tree_sitter::{Parser, Language};

let mut parser = Parser::new();

Tree-sitter languages consist of generated C code. To make sure they're properly compiled and linked, you can create a build script like the following (assuming tree-sitter-javascript is in your root directory):

use std::path::PathBuf;

fn main() {
    let dir: PathBuf = ["tree-sitter-javascript", "src"].iter().collect();

    cc::Build::new()
        .include(&dir)
        .file(dir.join("parser.c"))
        .file(dir.join("scanner.c"))
        .compile("tree-sitter-javascript");
}

Add the cc crate to your Cargo.toml under [build-dependencies]:

[build-dependencies]
cc="*"

To then use languages from rust, you must declare them as extern "C" functions and invoke them with unsafe. Then you can assign them to the parser.

extern "C" { fn tree_sitter_c() -> Language; }
extern "C" { fn tree_sitter_rust() -> Language; }
extern "C" { fn tree_sitter_javascript() -> Language; }

let language = unsafe { tree_sitter_rust() };
parser.set_language(language).unwrap();

Now you can parse source code:

let source_code = "fn test() {}";
let tree = parser.parse(source_code, None).unwrap();
let root_node = tree.root_node();

assert_eq!(root_node.kind(), "source_file");
assert_eq!(root_node.start_position().column, 0);
assert_eq!(root_node.end_position().column, 12);

Editing

Once you have a syntax tree, you can update it when your source code changes. Passing in the previous edited tree makes parse run much more quickly:

let new_source_code = "fn test(a: u32) {}"

tree.edit(InputEdit {
  start_byte: 8,
  old_end_byte: 8,
  new_end_byte: 14,
  start_position: Point::new(0, 8),
  old_end_position: Point::new(0, 8),
  new_end_position: Point::new(0, 14),
});

let new_tree = parser.parse(new_source_code, Some(&tree));

Text Input

The source code to parse can be provided either either as a string, a slice, a vector, or as a function that returns a slice. The text can be encoded as either UTF8 or UTF16:

// Store some source code in an array of lines.
let lines = &[
    "pub fn foo() {",
    "  1",
    "}",
];

// Parse the source code using a custom callback. The callback is called
// with both a byte offset and a row/column offset.
let tree = parser.parse_with(&mut |_byte: u32, position: Point| -> &[u8] {
    let row = position.row as usize;
    let column = position.column as usize;
    if row < lines.len() {
        if column < lines[row].as_bytes().len() {
            &lines[row].as_bytes()[column..]
        } else {
            "\n".as_bytes()
        }
    } else {
        &[]
    }
}, None).unwrap();

assert_eq!(
  tree.root_node().to_sexp(),
  "(source_file (function_item (visibility_modifier) (identifier) (parameters) (block (number_literal))))"
);