Add unit testing docs

This commit is contained in:
Max Brunsfeld 2020-02-21 12:58:41 -08:00
parent 709ddfebe9
commit a76a232485
2 changed files with 48 additions and 8 deletions

View file

@ -110,9 +110,9 @@ If there is an ambiguity or *local ambiguity* in your grammar, Tree-sitter will
The `tree-sitter test` command allows you to easily test that your parser is working correctly.
For each rule that you add to the grammar, you should first create a *test* that describes how the syntax trees should look when parsing that rule. These tests are written using specially-formatted text files in a `corpus` directory in your parser's root folder.
For each rule that you add to the grammar, you should first create a *test* that describes how the syntax trees should look when parsing that rule. These tests are written using specially-formatted text files in the `corpus/` or `test/corpus/` directories within your parser's root folder.
For example, you might have a file called `corpus/statements.txt` that contains a series of entries like this:
For example, you might have a file called `test/corpus/statements.txt` that contains a series of entries like this:
```
==================
@ -152,7 +152,7 @@ func x() int {
These tests are important. They serve as the parser's API documentation, and they can be run every time you change the grammar to verify that everything still parses correctly.
By default, the `tree-sitter test` command runs all of the tests in your `corpus` folder. To run a particular test, you can use the the `-f` flag:
By default, the `tree-sitter test` command runs all of the tests in your `corpus` or `test/corpus/` folder. To run a particular test, you can use the the `-f` flag:
```sh
tree-sitter test -f 'Return statements'
@ -164,6 +164,10 @@ The recommendation is to be comprehensive in adding tests. If it's a visible nod
You might notice that the first time you run `tree-sitter test` after regenerating your parser, it takes some extra time. This is because Tree-sitter automatically compiles your C code into a dynamically-loadable library. It recompiles your parser as-needed whenever you update it by re-running `tree-sitter generate`.
#### Syntax Highlighting Tests
The `tree-sitter test` command will *also* run any syntax highlighting tests in the `test/highlight` folder, if it exists. For more information about syntax highlighting tests, see [the syntax highlighting page][syntax-highlighting-tests].
### Command: `parse`
You can run your parser on an arbitrary file using `tree-sitter parse`. This will print the resulting the syntax tree, including nodes' ranges and field names, like this:
@ -704,6 +708,7 @@ if (valid_symbols[INDENT] || valid_symbol[DEDENT]) {
[multi-language-section]: ./using-parsers#multi-language-documents
[named-vs-anonymous-nodes-section]: ./using-parsers#named-vs-anonymous-nodes
[field-names-section]: ./using-parsers#node-field-names
[syntax-highlighting-tests]: ./syntax-highlighting#unit-testing
[nan]: https://github.com/nodejs/nan
[node-module]: https://www.npmjs.com/package/tree-sitter-cli
[node.js]: https://nodejs.org

View file

@ -5,7 +5,7 @@ permalink: syntax-highlighting
# Syntax Highlighting
Syntax highlighting is a very common feature in applications that deal with code. Tree-sitter has built-in support for syntax highlighting, via the [`tree-sitter-highlight`](https://github.com/tree-sitter/tree-sitter/tree/master/highlight) library, which is currently used on GitHub.com for highlighting code written in several languages. You can also run also perform syntax highlighting at the command line using the `tree-sitter highlight` command.
Syntax highlighting is a very common feature in applications that deal with code. Tree-sitter has built-in support for syntax highlighting, via the [`tree-sitter-highlight`](https://github.com/tree-sitter/tree-sitter/tree/master/highlight) library, which is currently used on GitHub.com for highlighting code written in several languages. You can also perform syntax highlighting at the command line using the `tree-sitter highlight` command.
This document explains how the Tree-sitter syntax highlighting system works, using the command line interface. If you are using `tree-sitter-highlight` library (either from C or from Rust), all of these concepts are still applicable, but the configuration data is provided using in-memory objects, rather than files.
@ -79,7 +79,7 @@ The `package.json` file is used by package managers like `npm`. Within this file
These keys specify basic information about the parser:
* `scope` (required) - A string like `"source.js"` that identifies the language. Currently, we strive to match the scope names used by popular [TextMate grammars](textmate.com) and by the [Linguist](https://github.com/github/linguist) library.
* `scope` (required) - A string like `"source.js"` that identifies the language. Currently, we strive to match the scope names used by popular [TextMate grammars](https://macromates.com/manual/en/language_grammars) and by the [Linguist](https://github.com/github/linguist) library.
* `path` (optional) - A relative path from the directory containig `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden.
@ -134,7 +134,7 @@ Syntax highlighting is controlled by *three* different types of query files that
Alternatively, you can think of `.scm` as an acronym for "Source Code Matching".
### Highlights Query
### Highlights
The most important query is called the highlights query. The highlights query uses *captures* to assign arbitrary *highlight names* to different nodes in the tree. Each highlight name can then be mapped to a color (as described [above](#theme)). Commonly used highlight names include `keyword`, `function`, `type`, `property`, and `string`. Names can also be dot-separated like `function.builtin`.
@ -210,7 +210,7 @@ Running `tree-sitter highlight` on this Go file would produce output like this:
}
</pre>
### Local Variable Query
### Local Variables
Good syntax highlighting helps the reader to quickly distinguish between the different types of *entities* in their code. Ideally, if a given entity appears in *multiple* places, it should be colored the same in each place. The Tree-sitter syntax highlighting system can help you to achieve this by keeping track of local scopes and variables.
@ -334,7 +334,7 @@ Running `tree-sitter highlight` on this ruby file would produce output like this
<span>list</span> <span style='font-weight: bold;color: #4e4e4e;'>=</span> [<span>item</span><span style='color: #4e4e4e;'>]</span>
</pre>
### Language Injection Query
### Language Injection
Some source files contain code written in multiple different languages. Examples include:
* HTML files, which can contain JavaScript inside of `<script>` tags and CSS inside of `<style>` tags
@ -384,3 +384,38 @@ The following query would specify that the contents of the heredoc should be par
(heredoc_body
(heredoc_end) @injection.language) @injection.content
```
## Unit Testing
Tree-sitter has a built-in way to verify the results of syntax highlighting. The interface is based on [Sublime Text's system](https://www.sublimetext.com/docs/3/syntax.html#testing) for testing highlighting.
Tests are written as normal source code files that contain specially-formatted *comments* that make assertions about the surrounding syntax highlighting. These files are stored in the `test/highlight` directory in a grammar repository.
Here is an example of a syntax highlighting test for JavaScript:
```js
var abc = function(d) {
// <- keyword
// ^ keyword
// ^ variable.parameter
// ^ function
if (a) {
// <- keyword
// ^ punctuation.bracket
foo(`foo ${bar}`);
// <- function
// ^ string
// ^ variable
}
};
```
From the Sublime text docs:
> The two types of tests are:
>
> **Caret**: ^ this will test the following selector against the scope on the most recent non-test line. It will test it at the same column the ^ is in. Consecutive ^s will test each column against the selector.
>
> **Arrow**: <- this will test the following selector against the scope on the most recent non-test line. It will test it at the same column as the comment character is in.