tree-sitter/lib/binding_rust
Max Brunsfeld a1fec71b19 Tweak QueryCursor to allow iterating either matches or captures
For syntax highlighting, we want to iterate over all of the captures in 
order, and don't care about grouping the captures by pattern.
2019-09-13 15:19:04 -07:00
..
bindings.rs Tweak QueryCursor to allow iterating either matches or captures 2019-09-13 15:19:04 -07:00
build.rs Reorganize language bindings 2019-05-07 10:41:49 -07:00
ffi.rs Reorganize language bindings 2019-05-07 10:41:49 -07:00
helper.c Reorganize language bindings 2019-05-07 10:41:49 -07:00
lib.rs Tweak QueryCursor to allow iterating either matches or captures 2019-09-13 15:19:04 -07:00
README.md Add build instruction to rust binding README (#432) 2019-08-21 11:59:37 -07:00
util.rs Make Rust functions return ExactSizeIterator instead of just Iterator (#438) 2019-08-28 09:28:47 -07:00

Rust Tree-sitter

Build Status Build status Crates.io

Rust bindings to the Tree-sitter parsing library.

Basic Usage

First, create a parser:

use tree_sitter::{Parser, Language};

// ...

let mut parser = Parser::new();

Tree-sitter languages consist of generated C code. To make sure they're properly compiled and linked, you can create a build script like the following (assuming tree-sitter-javascript is in your root directory):

extern crate cc;

use std::path::PathBuf;

fn main() {
    let dir: PathBuf = ["tree-sitter-javascript", "src"].iter().collect();

    cc::Build::new()
        .include(&dir)
        .file(dir.join("parser.c"))
        .file(dir.join("scanner.c"))
        .compile("tree-sitter-javascript");
}

To then use languages from rust, you must declare them as extern "C" functions and invoke them with unsafe. Then you can assign them to the parser.

extern "C" { fn tree_sitter_c() -> Language; }
extern "C" { fn tree_sitter_rust() -> Language; }
extern "C" { fn tree_sitter_javascript() -> Language; }

let language = unsafe { tree_sitter_rust() };
parser.set_language(language).unwrap();

Now you can parse source code:

let source_code = "fn test() {}";
let tree = parser.parse(source_code, None).unwrap();
let root_node = tree.root_node();

assert_eq!(root_node.kind(), "source_file");
assert_eq!(root_node.start_position().column, 0);
assert_eq!(root_node.end_position().column, 12);

Editing

Once you have a syntax tree, you can update it when your source code changes. Passing in the previous edited tree makes parse run much more quickly:

let new_source_code = "fn test(a: u32) {}"

tree.edit(InputEdit {
  start_byte: 8,
  old_end_byte: 8,
  new_end_byte: 14,
  start_position: Point::new(0, 8),
  old_end_position: Point::new(0, 8),
  new_end_position: Point::new(0, 14),
});

let new_tree = parser.parse(new_source_code, Some(&tree));

Text Input

The source code to parse can be provided either either as a string, a slice, a vector, or as a function that returns a slice. The text can be encoded as either UTF8 or UTF16:

// Store some source code in an array of lines.
let lines = &[
    "pub fn foo() {",
    "  1",
    "}",
];

// Parse the source code using a custom callback. The callback is called
// with both a byte offset and a row/column offset.
let tree = parser.parse_with(&mut |_byte: u32, position: Point| -> &[u8] {
    let row = position.row as usize;
    let column = position.column as usize;
    if row < lines.len() {
        if column < lines[row].as_bytes().len() {
            &lines[row].as_bytes()[column..]
        } else {
            "\n".as_bytes()
        }
    } else {
        &[]
    }
}, None).unwrap();

assert_eq!(
  tree.root_node().to_sexp(),
  "(source_file (function_item (visibility_modifier) (identifier) (parameters) (block (number_literal))))"
);