docs(web): update README and add CONTRIBUTING docs

This commit is contained in:
Amaan Qureshi 2025-01-20 04:06:32 -05:00
parent 692332ed1c
commit f6a943a1ad
2 changed files with 182 additions and 16 deletions

View file

@ -0,0 +1,142 @@
# Contributing
## Code of Conduct
Contributors to Tree-sitter should abide by the [Contributor Covenant][covenant].
## Developing Web-tree-sitter
### Prerequisites
To make changes to Web-tree-sitter, you should have:
1. A [Rust toolchain][rust], for running the xtasks necessary to build the library.
2. Node.js and NPM (or an equivalent package manager).
3. Either [Emscripten][emscripten], [Docker][docker], or [podman][podman] for
compiling the library to WASM.
### Building
Clone the repository:
```sh
git clone https://github.com/tree-sitter/tree-sitter
cd tree-sitter/lib/binding_web
```
Install the necessary dependencies:
```sh
npm install
```
Build the library:
```sh
npm run build
```
Note that the build process requires a Rust toolchain to be installed. If you don't have one installed, you can install it
by visiting the [Rust website][rust] and following the instructions there.
> [!NOTE]
> By default, the build process will emit an ES6 module. If you need a CommonJS module, export `CJS` to `true`, or just
> run `CJS=true npm run build`.
> [!TIP]
> To build the library with debug information, you can run `npm run build:debug`. The `CJS` environment variable is still
> taken into account.
### Putting it together
#### The C side
There are several components that come together to build the final JS and WASM files. First, we use `emscripten` in our
xtask located at `xtask/src/build_wasm.rs` from the root directory to compile the WASM files. This WASM module is output into the
local `lib` folder, and is used only in [`src/bindings.ts`][bindings.ts] to handle loading the WASM module. The C code that
is compiled into the WASM module is located in at [`lib/tree-sitter.c`][tree-sitter.c], and contains all the necessary
glue code to interact with the JS environment. If you need to update the imported functions from the tree-sitter library,
or anywhere else, you must update [`lib/exports.txt`][exports.txt]. Lastly, the type information for the WASM module is
located at [`lib/tree-sitter.d.ts`][tree-sitter.d.ts], and can be updated by running `cargo xtask build-wasm --emit-tsd`
from the root directory.
#### The TypeScript side
The TypeScript library is a higher level abstraction over the WASM module, and is located in `src`. This is where the
public API is defined, and where the WASM module is loaded and initialized. The TypeScript library is built into a single
ES6 (or CommonJS) module, and is output into the same directory as `package.json`. If you need to update the public API,
you can do so by editing the files in `src`.
If you make changes to the library that require updating the type definitions, such as adding a new public API method,
you should run:
```sh
npm run build:dts
```
This uses [`dts-buddy`][dts-buddy] to generate `web-tree-sitter.d.ts` from the public types in `src`. Additionally, a sourcemap
is generated for the `.d.ts` file, which enables `go-to definition` and other editor integrations to take you straight
to the TypeScript source code.
This TypeScript code is then compiled into a single JavaScript file with `esbuild`. The build configuration for this can
be found in [`script/build.js`][build.js], but this shouldn't need to be updated. This step is responsible for emitting
the final JS and WASM files that are shipped with the library, as well as their sourcemaps.
### Testing
Before you can run the tests, you need to fetch and build some upstream grammars that are used for testing.
Run this in the root of the repository:
```sh
cargo xtask fetch-fixtures
```
Optionally, to update the generated parser.c files:
```sh
cargo xtask generate-fixtures
```
Then you can build the WASM modules:
```sh
cargo xtask generate-fixtures --wasm
```
Now, you can run the tests. In the `lib/binding_web` directory, run:
```sh
npm test
```
> [!NOTE]
> We use `vitest` to run the tests. If you want to run a specific test, you can use the `-t` flag to pass in a pattern.
> If you want to run a specific file, you can just pass the name of the file as is. For example, to run the `parser` tests
> in `test/parser.test.ts`, you can run `npm test parser`. To run tests that have the name `descendant` somewhere, run
> `npm test -- -t descendant`.
>
> For coverage information, you can run `npm test -- --coverage`.
### Debugging
You might have noticed that when you ran `npm build`, the build process generated a couple of [sourcemaps][sourcemap]:
`tree-sitter.js.map` and `tree-sitter.wasm.map`. These sourcemaps can be used to debug the library in the browser, and are
shipped with the library on both NPM and the GitHub releases.
#### Tweaking the Emscripten build
If you're trying to tweak the Emscripten build, or are trying to debug an issue, the code for this lies in `xtask/src/build_wasm.rs`
file mentioned earlier, namely in the `run_wasm` function.
[bindings.ts]: src/bindings.ts
[build.js]: script/build.js
[covenant]: https://www.contributor-covenant.org/version/1/4/code-of-conduct
[docker]: https://www.docker.com
[dts-buddy]: https://github.com/Rich-Harris/dts-buddy
[emscripten]: https://emscripten.org
[exports.txt]: lib/exports.txt
[podman]: https://podman.io
[rust]: https://www.rust-lang.org/tools/install
[sourcemap]: https://developer.mozilla.org/en-US/docs/Glossary/Source_map
[tree-sitter.c]: lib/tree-sitter.c
[tree-sitter.d.ts]: lib/tree-sitter.d.ts

View file

@ -7,9 +7,10 @@
WebAssembly bindings to the [Tree-sitter](https://github.com/tree-sitter/tree-sitter) parsing library.
### Setup
## Setup
You can download the `tree-sitter.js` and `tree-sitter.wasm` files from [the latest GitHub release](https://github.com/tree-sitter/tree-sitter/releases/latest) and load them using a standalone script:
You can download the `tree-sitter.js` and `tree-sitter.wasm` files from [the latest GitHub release][gh release] and load
them using a standalone script:
```html
<script src="/the/path/to/tree-sitter.js"></script>
@ -20,7 +21,7 @@ You can download the `tree-sitter.js` and `tree-sitter.wasm` files from [the lat
</script>
```
You can also install [the `web-tree-sitter` module](https://www.npmjs.com/package/web-tree-sitter) from NPM and load it using a system like Webpack:
You can also install [the `web-tree-sitter` module][npm module] from NPM and load it using a system like Webpack:
```js
const { Parser } = require('web-tree-sitter');
@ -50,13 +51,22 @@ await Parser.init();
// the library is ready
```
To install a debug version of the library, pass in `--debug` when running `npm install`:
To use the debug version of the library, replace your import of `web-tree-sitter` with `web-tree-sitter/debug`:
```sh
npm install web-tree-sitter --debug
```js
import { Parser } from 'web-tree-sitter/debug'; // or require('web-tree-sitter/debug')
Parser.init().then(() => { /* the library is ready */ });
```
This will load the debug version of the `.wasm` file, which includes sourcemaps for both the JS and WASM files, debug symbols, and assertions.
This will load the debug version of the `.js` and `.wasm` file, which includes debug symbols and assertions.
> [!NOTE]
> The `tree-sitter.js` file on GH releases is an ES6 module. If you are interested in using a pure CommonJS library, such
> as for Electron, you should note that on our NPM package, we use [conditional exports][cond export] to provide both the
> ES6 and CommonJS modules. If you've set up your project correctly, and need to use CommonJS, your package manager will
> automatically handle this for you. As of writing, we do not host a CommonJS version of the library on GH releases, and
> if you do not use the NPM registry, you'll have to build the library yourself.
### Basic Usage
@ -126,7 +136,8 @@ const newTree = parser.parse(newSourceCode, tree);
### Parsing Text From a Custom Data Structure
If your text is stored in a data structure other than a single string, you can parse it by supplying a callback to `parse` instead of a string:
If your text is stored in a data structure other than a single string, you can parse it by supplying a callback to `parse`
instead of a string:
```javascript
const sourceLines = [
@ -146,7 +157,8 @@ There are several options on how to get the `.wasm` files for the languages you
#### From npmjs.com
The recommended way is to just install the package from npm. For example, to parse JavaScript, you can install the `tree-sitter-javascript` package:
The recommended way is to just install the package from npm. For example, to parse JavaScript, you can install the `tree-sitter-javascript`
package:
```sh
npm install tree-sitter-javascript
@ -156,16 +168,19 @@ Then you can find the `.wasm` file in the `node_modules/tree-sitter-javascript`
#### From GitHub
You can also download the `.wasm` files from GitHub releases, so long as the repository uses our reusable workflow to publish them.
For example, you can download the JavaScript `.wasm` file from the tree-sitter-javascript [releases page](https://github.com/tree-sitter/tree-sitter-javascript/releases/latest)
You can also download the `.wasm` files from GitHub releases, so long as the repository uses our reusable workflow to publish
them.
For example, you can download the JavaScript `.wasm` file from the tree-sitter-javascript [releases page][gh release js].
#### Generating `.wasm` files
You can also generate the `.wasm` file for your desired grammar. Shown below is an example of how to generate the `.wasm` file for the JavaScript grammar.
You can also generate the `.wasm` file for your desired grammar. Shown below is an example of how to generate the `.wasm`
file for the JavaScript grammar.
**IMPORTANT**: [emscripten](https://emscripten.org/docs/getting_started/downloads.html), [docker](https://www.docker.com/), or [podman](https://podman.io) need to be installed.
**IMPORTANT**: [Emscripten][emscripten], [Docker][docker], or [Podman][podman] need to be installed.
First install `tree-sitter-cli` and the tree-sitter language for which to generate `.wasm` (`tree-sitter-javascript` in this example):
First install `tree-sitter-cli`, and the tree-sitter language for which to generate `.wasm`
(`tree-sitter-javascript` in this example):
```sh
npm install --save-dev tree-sitter-cli tree-sitter-javascript
@ -181,7 +196,8 @@ If everything is fine, file `tree-sitter-javascript.wasm` should be generated in
### Running .wasm in Node.js
Notice that executing `.wasm` files in node.js is considerably slower than running [node.js bindings](https://github.com/tree-sitter/node-tree-sitter). However could be useful for testing purposes:
Notice that executing `.wasm` files in Node.js is considerably slower than running [Node.js bindings][node bindings].
However, this could be useful for testing purposes:
```javascript
const Parser = require('web-tree-sitter');
@ -229,7 +245,7 @@ For more information on the module options you can pass in, see the [emscripten
#### "Can't resolve 'fs' in 'node_modules/web-tree-sitter"
Most bundlers will notice that the `tree-sitter.js` file is attempting to import `fs`, i.e. node's file system library.
Since this doesn't exist in the browser, the bundlers will get confused. For webpack you can fix this by adding the
Since this doesn't exist in the browser, the bundlers will get confused. For Webpack, you can fix this by adding the
following to your webpack config:
```javascript
@ -242,4 +258,12 @@ following to your webpack config:
}
```
[cond export]: https://nodejs.org/api/packages.html#conditional-exports
[docker]: https://www.docker.com
[emscripten]: https://emscripten.org
[emscripten-module-options]: https://emscripten.org/docs/api_reference/module.html#affecting-execution
[gh release]: https://github.com/tree-sitter/tree-sitter/releases/latest
[gh release js]: https://github.com/tree-sitter/tree-sitter-javascript/releases/latest
[node bindings]: https://github.com/tree-sitter/node-tree-sitter
[npm module]: https://www.npmjs.com/package/web-tree-sitter
[podman]: https://podman.io