Compare commits

..

35 commits

Author SHA1 Message Date
Amaan Qureshi
d97db6d635
0.23.2 2024-10-01 12:34:25 -04:00
Amaan Qureshi
78c41e3ced
0.23.1 2024-09-30 18:25:19 -04:00
Yuta Saito
bf094bd98a fix: exclude APIs that dup given file descriptors from WASI builds
WASI doesn't support `dup(2)` system call, so we cannot implement the
`print_dot_graph` and `print_dot_graphs` functions with exactly the same
semantics as in other platforms.

(cherry picked from commit 94a8262110)
2024-09-30 09:21:54 -04:00
Ron Panduwana
d10308528d fix: handle more cases of editing subtrees that depend on column values
(cherry picked from commit a83b893016)
2024-09-29 20:37:24 -04:00
Amaan Qureshi
77794c558f fix: correct test name parsing when the prior test has equal signs
(cherry picked from commit 2fffe036e0)
2024-09-29 19:58:38 -04:00
Amaan Qureshi
865f6595e7 fix(lib): correct descendant-for-range behavior with zero-width tokens
(cherry picked from commit 0c43988a5e)
2024-09-27 09:21:17 -04:00
Amaan Qureshi
7561e813b8 fix: disallow empty string literals in rules
(cherry picked from commit 86d3a5313d)
2024-09-24 15:41:03 -04:00
Amaan Qureshi
edd7f257df fix: do not generate spurious files if the grammar path is not the default path
(cherry picked from commit 1708a295a8)
2024-09-24 14:37:30 -04:00
Will Lillis
83509ad4d7 fix(fuzz): skip tests marked with :skip & don't report errors on tests marked with :error
(cherry picked from commit 99dbbbcbe9)
2024-09-22 03:57:37 -04:00
Amaan Qureshi
8e1dbb4617 fix: properly handle utf8 code points for highlight and tag assertions
(cherry picked from commit 6f050f0da5)
2024-09-22 01:47:47 -04:00
Joel Spadin
3ad82e6772 fix(wasm): use / paths for workdir
Reimplemented the fix from #2183 to fix building WASM files with Docker
on Windows again. The --workdir argument gives a path inside the Docker
container, so it must use forward slashes regardless of the default path
separator on the host OS.

(cherry picked from commit 755e49e212)
2024-09-22 01:46:05 -04:00
dependabot[bot]
3492bee2f7 build(deps): bump the cargo group across 1 directory with 11 updates
Bumps the cargo group with 10 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [anyhow](https://github.com/dtolnay/anyhow) | `1.0.86` | `1.0.89` |
| [cc](https://github.com/rust-lang/cc-rs) | `1.1.14` | `1.1.19` |
| [clap](https://github.com/clap-rs/clap) | `4.5.16` | `4.5.17` |
| [filetime](https://github.com/alexcrichton/filetime) | `0.2.24` | `0.2.25` |
| [indexmap](https://github.com/indexmap-rs/indexmap) | `2.4.0` | `2.5.0` |
| [pretty_assertions](https://github.com/rust-pretty-assertions/rust-pretty-assertions) | `1.4.0` | `1.4.1` |
| [serde](https://github.com/serde-rs/serde) | `1.0.209` | `1.0.210` |
| [serde_json](https://github.com/serde-rs/json) | `1.0.127` | `1.0.128` |
| [webbrowser](https://github.com/amodm/webbrowser-rs) | `1.0.1` | `1.0.2` |
| [bindgen](https://github.com/rust-lang/rust-bindgen) | `0.69.4` | `0.70.1` |

Updates `anyhow` from 1.0.86 to 1.0.89
- [Release notes](https://github.com/dtolnay/anyhow/releases)
- [Commits](https://github.com/dtolnay/anyhow/compare/1.0.86...1.0.89)

Updates `cc` from 1.1.14 to 1.1.19
- [Release notes](https://github.com/rust-lang/cc-rs/releases)
- [Changelog](https://github.com/rust-lang/cc-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/cc-rs/compare/cc-v1.1.14...cc-v1.1.19)

Updates `clap` from 4.5.16 to 4.5.17
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/clap_complete-v4.5.16...clap_complete-v4.5.17)

Updates `filetime` from 0.2.24 to 0.2.25
- [Commits](https://github.com/alexcrichton/filetime/compare/0.2.24...0.2.25)

Updates `indexmap` from 2.4.0 to 2.5.0
- [Changelog](https://github.com/indexmap-rs/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/indexmap-rs/indexmap/compare/2.4.0...2.5.0)

Updates `pretty_assertions` from 1.4.0 to 1.4.1
- [Release notes](https://github.com/rust-pretty-assertions/rust-pretty-assertions/releases)
- [Changelog](https://github.com/rust-pretty-assertions/rust-pretty-assertions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rust-pretty-assertions/rust-pretty-assertions/compare/v1.4.0...v1.4.1)

Updates `serde` from 1.0.209 to 1.0.210
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.209...v1.0.210)

Updates `serde_derive` from 1.0.209 to 1.0.210
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.209...v1.0.210)

Updates `serde_json` from 1.0.127 to 1.0.128
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/1.0.127...1.0.128)

Updates `webbrowser` from 1.0.1 to 1.0.2
- [Release notes](https://github.com/amodm/webbrowser-rs/releases)
- [Changelog](https://github.com/amodm/webbrowser-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/amodm/webbrowser-rs/compare/v1.0.1...v1.0.2)

Updates `bindgen` from 0.69.4 to 0.70.1
- [Release notes](https://github.com/rust-lang/rust-bindgen/releases)
- [Changelog](https://github.com/rust-lang/rust-bindgen/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/rust-bindgen/compare/v0.69.4...v0.70.1)

---
updated-dependencies:
- dependency-name: anyhow
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: cc
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: filetime
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: cargo
- dependency-name: pretty_assertions
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: serde_derive
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: webbrowser
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo
- dependency-name: bindgen
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: cargo
...

Signed-off-by: dependabot[bot] <support@github.com>
(cherry picked from commit c6faeb948e)
2024-09-22 01:43:38 -04:00
PanGan21
1e9df802fb fix(docs): fix highlight readme example using compatible versions
(cherry picked from commit 1a6af3fafe)
2024-09-17 17:22:21 +02:00
Amaan Qureshi
a6b248c1ad fix(lib): peek at the next sibling when iterating to find the child that contains a given descendant
This issue shows up when we have a zero-width token that is the target
descendant node, previously the previous sibling would be returned as
the child that contains the descendant, which is incorrect.

(cherry picked from commit 0a85744eba)
2024-09-17 17:22:13 +02:00
Firas al-Khalil
6813e8d5c1 build(make): support darwin cross-compile
(cherry picked from commit 4f0d463d49)
2024-09-17 17:22:02 +02:00
Hanlu
b4d4251427 fix: correct comment quote
(cherry picked from commit ff813a311b)
2024-09-17 17:21:54 +02:00
ObserverOfTime
7688c6fa2f chore(bindings): update rust lib docs
(cherry picked from commit 6e19fccf39)
2024-09-17 17:21:34 +02:00
Dave Abrahams
28b71cea27 fix(generate): remove excludes in Package.swift
(cherry picked from commit 112acd5b93)
2024-09-17 17:21:26 +02:00
Will Lillis
d5a1266b25 fix(cli): remove duplicate short options from fuzz command (#3635)
- Remove short option from fuzz command edits option
- Remove short option from fuzz command iterations option

(cherry picked from commit b0e8e50a19)
2024-09-15 11:32:11 +03:00
Amaan Qureshi
0eb5b0a029 fix(binding_web): remove nonexistent function definition
(cherry picked from commit 8667e3ea0c)
2024-09-08 17:06:15 -04:00
Amaan Qureshi
831c144d9b fix(generate): do not generate large character sets for unused variables
(cherry picked from commit d8ab779df4)
2024-09-08 15:55:01 -04:00
Jinser Kafka
5f51550a8c fix(cli): keep skipped tests unchanged in the test/corpus
(cherry picked from commit fd190f1d9d)
2024-09-07 18:56:21 -04:00
Amaan Qureshi
513f19d099 fix(binding_web): correct edit signature
(cherry picked from commit fcbd67b3fa)
2024-09-07 17:53:40 -04:00
Amaan Qureshi
dce9bedc48 fix(generate): add tree-sitter to the dev-dependencies of the Cargo.toml
(cherry picked from commit 4d3d1f0df2)
2024-09-07 17:53:12 -04:00
Amaan Qureshi
43e16dd75c fix(lib): backtrack to the last relevant iterator if no child was found
(cherry picked from commit 9b398c2b84)
2024-09-05 18:03:50 -04:00
Liam Rosenfeld
ea6b62cbc6 feat(language): derive Clone and Copy on LanguageFn
Allows a LanguageFn to be passed around and create multiple languages since Language::new consumes a LanguageFn

LanguageFn just wraps a function pointer, which already conforms to Copy so this is a simple addition.

(cherry picked from commit d60789afdc)
2024-09-03 12:15:56 -04:00
Amaan Qureshi
22b38a083c fix(generate): disallow inline variables referencing themselves
This fixes an infinite loop bug

(cherry picked from commit 53cc93c267)
2024-09-01 16:22:23 -04:00
Amaan Qureshi
f312e2c5f5 fix(test): retain attributes when running test -u
(cherry picked from commit 272ebf77b9)
2024-09-01 16:22:01 -04:00
Amaan Qureshi
9a7048bf14 fix(test): exit with an error if a test marked with :error has no error
(cherry picked from commit 0a486d508f)
2024-09-01 16:22:01 -04:00
Amaan Qureshi
55fb817dc8 fix(rust): add missing TSNode functions
(cherry picked from commit 4387e44b98)
2024-09-01 16:21:51 -04:00
Amaan Qureshi
1a3f486059 fix(lib): correct extra node creation from non-zero root-alias cursors
(cherry picked from commit ee06325f67)
2024-09-01 16:21:33 -04:00
Amaan Qureshi
a9455a2cc7 feat(bindings): bump go-tree-sitter version
(cherry picked from commit d0125ef387)
2024-09-01 16:21:25 -04:00
Amaan Qureshi
366ffc9b3e feat(generate): bump tree-sitter dev dependency to 0.23
(cherry picked from commit b5a91a4a85)
2024-09-01 16:21:16 -04:00
Amaan Qureshi
8c8271875a fix(cli): remove conflicting short flags in the fuzz subcommand
(cherry picked from commit 278526ef75)
2024-09-01 16:21:08 -04:00
Amaan Qureshi
5ff5ab3a42 fix(generate): remove necessary files from gitignore template
(cherry picked from commit 253a112dd4)
2024-09-01 16:19:09 -04:00
532 changed files with 24272 additions and 52405 deletions

View file

@ -10,9 +10,6 @@ insert_final_newline = true
[*.rs]
indent_size = 4
[*.{zig,zon}]
indent_size = 4
[Makefile]
indent_style = tab
indent_size = 8

1
.envrc
View file

@ -1 +0,0 @@
use flake

1
.gitattributes vendored
View file

@ -3,4 +3,5 @@
/lib/src/unicode/*.h linguist-vendored
/lib/src/unicode/LICENSE linguist-vendored
/cli/src/generate/prepare_grammar/*.json -diff
Cargo.lock -diff

15
.github/FUNDING.yml vendored
View file

@ -1,15 +0,0 @@
# These are supported funding model platforms
github: tree-sitter
patreon: # Replace with a single Patreon username
open_collective: tree-sitter # Replace with a single Open Collective username
ko_fi: amaanq
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
polar: # Replace with a single Polar username
buy_me_a_coffee: # Replace with a single Buy Me a Coffee username
thanks_dev: # Replace with a single thanks.dev username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']

View file

@ -1,6 +1,6 @@
name: Bug Report
description: Report a problem
type: Bug
labels: [bug]
body:
- type: textarea
attributes:
@ -13,11 +13,9 @@ body:
attributes:
label: "Steps to reproduce"
placeholder: |
```sh
git clone --depth=1 https://github.com/tree-sitter/tree-sitter-ruby
cd tree-sitter-ruby
tree-sitter generate
```
validations:
required: true

View file

@ -1,6 +1,6 @@
name: Feature request
description: Request an enhancement
type: Feature
labels: [enhancement]
body:
- type: markdown
attributes:

View file

@ -1,25 +1,24 @@
name: Cache
description: This action caches fixtures
name: 'Cache'
description: "This action caches fixtures"
outputs:
cache-hit:
description: Cache hit
value: ${{ steps.cache.outputs.cache-hit }}
description: 'Cache hit'
value: ${{ steps.cache_output.outputs.cache-hit }}
runs:
using: composite
using: "composite"
steps:
- uses: actions/cache@v4
id: cache
id: cache_fixtures
with:
path: |
test/fixtures/grammars
target/release/tree-sitter-*.wasm
key: fixtures-${{ join(matrix.*, '_') }}-${{ hashFiles(
'crates/generate/src/**',
'lib/src/parser.h',
'lib/src/array.h',
'lib/src/alloc.h',
'cli/src/generate/**',
'script/generate-fixtures*',
'test/fixtures/grammars/*/**/src/*.c',
'.github/actions/cache/action.yml') }}
- run: echo "cache-hit=${{ steps.cache_fixtures.outputs.cache-hit }}" >> $GITHUB_OUTPUT
shell: bash
id: cache_output

View file

@ -4,8 +4,6 @@ updates:
directory: "/"
schedule:
interval: "weekly"
cooldown:
default-days: 3
commit-message:
prefix: "build(deps)"
labels:
@ -14,16 +12,10 @@ updates:
groups:
cargo:
patterns: ["*"]
ignore:
- dependency-name: "*"
update-types: ["version-update:semver-major", "version-update:semver-minor"]
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
cooldown:
default-days: 3
commit-message:
prefix: "ci"
labels:
@ -32,22 +24,3 @@ updates:
groups:
actions:
patterns: ["*"]
- package-ecosystem: "npm"
versioning-strategy: increase
directories:
- "/crates/npm"
- "/crates/eslint"
- "/lib/binding_web"
schedule:
interval: "weekly"
cooldown:
default-days: 3
commit-message:
prefix: "build(deps)"
labels:
- "dependencies"
- "npm"
groups:
npm:
patterns: ["*"]

View file

@ -1,29 +0,0 @@
module.exports = async ({ github, context }) => {
let target = context.payload.issue;
if (target) {
await github.rest.issues.update({
...context.repo,
issue_number: target.number,
state: "closed",
state_reason: "not_planned",
title: "[spam]",
body: "",
type: null,
});
} else {
target = context.payload.pull_request;
await github.rest.pulls.update({
...context.repo,
pull_number: target.number,
state: "closed",
title: "[spam]",
body: "",
});
}
await github.rest.issues.lock({
...context.repo,
issue_number: target.number,
lock_reason: "spam",
});
};

17
.github/scripts/cross.sh vendored Executable file
View file

@ -0,0 +1,17 @@
#!/bin/bash
# set -x
set -e
if [ "$BUILD_CMD" != "cross" ]; then
echo "cross.sh - is a helper to assist only in cross compiling environments" >&2
echo "To use this tool set the BUILD_CMD env var to the \"cross\" value" >&2
exit 111
fi
if [ -z "$CROSS_IMAGE" ]; then
echo "The CROSS_IMAGE env var should be provided" >&2
exit 111
fi
docker run --rm -v /home/runner:/home/runner -w "$PWD" "$CROSS_IMAGE" "$@"

19
.github/scripts/make.sh vendored Executable file
View file

@ -0,0 +1,19 @@
#!/bin/bash
# set -x
set -e
if [ "$BUILD_CMD" == "cross" ]; then
if [ -z "$CC" ]; then
echo "make.sh: CC is not set" >&2
exit 111
fi
if [ -z "$AR" ]; then
echo "make.sh: AR is not set" >&2
exit 111
fi
cross.sh make CC=$CC AR=$AR "$@"
else
make "$@"
fi

28
.github/scripts/tree-sitter.sh vendored Executable file
View file

@ -0,0 +1,28 @@
#!/bin/bash
# set -x
set -e
if [ -z "$ROOT" ]; then
echo "The ROOT env var should be set to absolute path of a repo root folder" >&2
exit 111
fi
if [ -z "$TARGET" ]; then
echo "The TARGET env var should be equal to a \`cargo build --target <TARGET>\` command value" >&2
exit 111
fi
tree_sitter="$ROOT"/target/"$TARGET"/release/tree-sitter
if [ "$BUILD_CMD" == "cross" ]; then
if [ -z "$CROSS_RUNNER" ]; then
echo "The CROSS_RUNNER env var should be set to a CARGO_TARGET_*_RUNNER env var value" >&2
echo "that is available in a docker image used by the cross tool under the hood" >&2
exit 111
fi
cross.sh $CROSS_RUNNER "$tree_sitter" "$@"
else
"$tree_sitter" "$@"
fi

View file

@ -1,25 +0,0 @@
module.exports = async ({ github, context, core }) => {
if (context.eventName !== 'pull_request') return;
const prNumber = context.payload.pull_request.number;
const owner = context.repo.owner;
const repo = context.repo.repo;
const { data: files } = await github.rest.pulls.listFiles({
owner,
repo,
pull_number: prNumber
});
const changedFiles = files.map(file => file.filename);
const wasmStdLibSrc = 'crates/language/wasm/';
const dirChanged = changedFiles.some(file => file.startsWith(wasmStdLibSrc));
if (!dirChanged) return;
const wasmStdLibHeader = 'lib/src/wasm/wasm-stdlib.h';
const requiredChanged = changedFiles.includes(wasmStdLibHeader);
if (!requiredChanged) core.setFailed(`Changes detected in ${wasmStdLibSrc} but ${wasmStdLibHeader} was not modified.`);
};

View file

@ -1,31 +0,0 @@
name: Backport Pull Request
on:
pull_request_target:
types: [closed, labeled]
permissions:
contents: write
pull-requests: write
jobs:
backport:
if: github.event.pull_request.merged
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Create app token
uses: actions/create-github-app-token@v2
id: app-token
with:
app-id: ${{ vars.BACKPORT_APP }}
private-key: ${{ secrets.BACKPORT_KEY }}
- name: Create backport PR
uses: korthout/backport-action@v4
with:
pull_title: "${pull_title}"
label_pattern: "^ci:backport ([^ ]+)$"
github_token: ${{ steps.app-token.outputs.token }}

View file

@ -1,30 +0,0 @@
name: Check Bindgen Output
on:
pull_request:
paths:
- lib/include/tree_sitter/api.h
- lib/binding_rust/bindings.rs
push:
branches: [master]
paths:
- lib/include/tree_sitter/api.h
- lib/binding_rust/bindings.rs
jobs:
check-bindgen:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up stable Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
- name: Generate bindings
run: cargo xtask generate-bindings
- name: Check if the bindgen output changed
run: git diff --exit-code lib/binding_rust/bindings.rs

View file

@ -1,9 +1,14 @@
name: Build & Test
env:
CARGO_TERM_COLOR: always
RUSTFLAGS: "-D warnings"
CROSS_DEBUG: 1
on:
workflow_call:
inputs:
run-test:
run_test:
default: true
type: boolean
@ -16,296 +21,168 @@ jobs:
fail-fast: false
matrix:
platform:
- linux-arm64
- linux-arm
- linux-x64
- linux-x86
- linux-powerpc64
- windows-arm64
- windows-x64
- windows-x86
- macos-arm64
- macos-x64
- wasm32
- linux-arm64 #
- linux-arm #
- linux-x64 #
- linux-x86 #
- linux-powerpc64 #
- windows-arm64 #
- windows-x64 # <-- No C library build - requires an additional adapted Makefile for `cl.exe` compiler
- windows-x86 # -- // --
- macos-arm64 #
- macos-x64 #
include:
# When adding a new `target`:
# 1. Define a new platform alias above
# 2. Add a new record to the matrix map in `crates/cli/npm/install.js`
- { platform: linux-arm64 , target: aarch64-unknown-linux-gnu , os: ubuntu-24.04-arm }
- { platform: linux-arm , target: armv7-unknown-linux-gnueabihf , os: ubuntu-24.04-arm }
- { platform: linux-x64 , target: x86_64-unknown-linux-gnu , os: ubuntu-24.04 }
- { platform: linux-x86 , target: i686-unknown-linux-gnu , os: ubuntu-24.04 }
- { platform: linux-powerpc64 , target: powerpc64-unknown-linux-gnu , os: ubuntu-24.04 }
- { platform: windows-arm64 , target: aarch64-pc-windows-msvc , os: windows-11-arm }
- { platform: windows-x64 , target: x86_64-pc-windows-msvc , os: windows-2025 }
- { platform: windows-x86 , target: i686-pc-windows-msvc , os: windows-2025 }
- { platform: macos-arm64 , target: aarch64-apple-darwin , os: macos-15 }
- { platform: macos-x64 , target: x86_64-apple-darwin , os: macos-15-intel }
- { platform: wasm32 , target: wasm32-unknown-unknown , os: ubuntu-24.04 }
# When adding a new `target`:
# 1. Define a new platform alias above
# 2. Add a new record to a matrix map in `cli/npm/install.js`
- { platform: linux-arm64 , target: aarch64-unknown-linux-gnu , os: ubuntu-latest , use-cross: true }
- { platform: linux-arm , target: arm-unknown-linux-gnueabi , os: ubuntu-latest , use-cross: true }
- { platform: linux-x64 , target: x86_64-unknown-linux-gnu , os: ubuntu-20.04 , cli_features: wasm } #2272
- { platform: linux-x86 , target: i686-unknown-linux-gnu , os: ubuntu-latest , use-cross: true }
- { platform: linux-powerpc64 , target: powerpc64-unknown-linux-gnu , os: ubuntu-latest , use-cross: true }
- { platform: windows-arm64 , target: aarch64-pc-windows-msvc , os: windows-latest }
- { platform: windows-x64 , target: x86_64-pc-windows-msvc , os: windows-latest , cli_features: wasm }
- { platform: windows-x86 , target: i686-pc-windows-msvc , os: windows-latest }
- { platform: macos-arm64 , target: aarch64-apple-darwin , os: macos-14 , cli_features: wasm }
- { platform: macos-x64 , target: x86_64-apple-darwin , os: macos-12 , cli_features: wasm }
# Extra features
- { platform: linux-arm64 , features: wasm }
- { platform: linux-x64 , features: wasm }
- { platform: macos-arm64 , features: wasm }
- { platform: macos-x64 , features: wasm }
# Cross compilers for C library
- { platform: linux-arm64 , cc: aarch64-linux-gnu-gcc , ar: aarch64-linux-gnu-ar }
- { platform: linux-arm , cc: arm-linux-gnueabi-gcc , ar: arm-linux-gnueabi-ar }
- { platform: linux-x86 , cc: i686-linux-gnu-gcc , ar: i686-linux-gnu-ar }
- { platform: linux-powerpc64 , cc: powerpc64-linux-gnu-gcc , ar: powerpc64-linux-gnu-ar }
# Cross-compilation
- { platform: linux-arm , cross: true }
- { platform: linux-x86 , cross: true }
- { platform: linux-powerpc64 , cross: true }
# See #2041 tree-sitter issue
- { platform: windows-x64 , rust-test-threads: 1 }
- { platform: windows-x86 , rust-test-threads: 1 }
# Compile-only
- { platform: wasm32 , no-run: true }
# CLI only build
- { platform: windows-arm64 , cli-only: true }
env:
CARGO_TERM_COLOR: always
RUSTFLAGS: -D warnings
BUILD_CMD: cargo
EXE: ${{ contains(matrix.target, 'windows') && '.exe' || '' }}
defaults:
run:
shell: bash
steps:
- name: Checkout repository
uses: actions/checkout@v6
- uses: actions/checkout@v4
- name: Set up cross-compilation
if: matrix.cross
run: |
for target in armv7-unknown-linux-gnueabihf i686-unknown-linux-gnu powerpc64-unknown-linux-gnu; do
camel_target=${target//-/_}; target_cc=${target/-unknown/}
printf 'CC_%s=%s\n' "$camel_target" "${target_cc/v7/}-gcc"
printf 'AR_%s=%s\n' "$camel_target" "${target_cc/v7/}-ar"
printf 'CARGO_TARGET_%s_LINKER=%s\n' "${camel_target^^}" "${target_cc/v7/}-gcc"
done >> $GITHUB_ENV
{
printf 'CARGO_TARGET_ARMV7_UNKNOWN_LINUX_GNUEABIHF_RUNNER=qemu-arm -L /usr/arm-linux-gnueabihf\n'
printf 'CARGO_TARGET_POWERPC64_UNKNOWN_LINUX_GNU_RUNNER=qemu-ppc64 -L /usr/powerpc64-linux-gnu\n'
} >> $GITHUB_ENV
- name: Get emscripten version
if: contains(matrix.features, 'wasm')
run: printf 'EMSCRIPTEN_VERSION=%s\n' "$(<crates/loader/emscripten-version)" >> $GITHUB_ENV
- name: Read Emscripten version
run: echo "EMSCRIPTEN_VERSION=$(cat cli/loader/emscripten-version)" >> $GITHUB_ENV
- name: Install Emscripten
if: contains(matrix.features, 'wasm')
if: ${{ !matrix.cli-only && !matrix.use-cross }}
uses: mymindstorm/setup-emsdk@v14
with:
version: ${{ env.EMSCRIPTEN_VERSION }}
- name: Set up Rust
uses: actions-rust-lang/setup-rust-toolchain@v1
- run: rustup toolchain install stable --profile minimal
- run: rustup target add ${{ matrix.target }}
- uses: Swatinem/rust-cache@v2
- name: Install cross
if: ${{ matrix.use-cross }}
uses: taiki-e/install-action@v2
with:
target: ${{ matrix.target }}
tool: cross
- name: Install cross-compilation toolchain
if: matrix.cross
- name: Build custom cross image
if: ${{ matrix.use-cross && matrix.os == 'ubuntu-latest' }}
run: |
sudo apt-get update -qy
if [[ $PLATFORM == linux-arm ]]; then
sudo apt-get install -qy {binutils,gcc}-arm-linux-gnueabihf qemu-user
elif [[ $PLATFORM == linux-x86 ]]; then
sudo apt-get install -qy {binutils,gcc}-i686-linux-gnu
elif [[ $PLATFORM == linux-powerpc64 ]]; then
sudo apt-get install -qy {binutils,gcc}-powerpc64-linux-gnu qemu-user
target="${{ matrix.target }}"
image=ghcr.io/cross-rs/$target:custom
echo "CROSS_IMAGE=$image" >> $GITHUB_ENV
echo "[target.$target]" >> Cross.toml
echo "image = \"$image\"" >> Cross.toml
echo "CROSS_CONFIG=$PWD/Cross.toml" >> $GITHUB_ENV
echo "FROM ghcr.io/cross-rs/$target:edge" >> Dockerfile
echo "RUN curl -fsSL https://deb.nodesource.com/setup_16.x | bash -" >> Dockerfile
echo "RUN apt-get update && apt-get -y install nodejs" >> Dockerfile
docker build -t $image .
- name: Setup env extras
env:
RUST_TEST_THREADS: ${{ matrix.rust-test-threads }}
USE_CROSS: ${{ matrix.use-cross }}
TARGET: ${{ matrix.target }}
CC: ${{ matrix.cc }}
AR: ${{ matrix.ar }}
run: |
PATH="$PWD/.github/scripts:$PATH"
echo "$PWD/.github/scripts" >> $GITHUB_PATH
echo "TREE_SITTER=tree-sitter.sh" >> $GITHUB_ENV
echo "TARGET=$TARGET" >> $GITHUB_ENV
echo "ROOT=$PWD" >> $GITHUB_ENV
[ -n "$RUST_TEST_THREADS" ] && \
echo "RUST_TEST_THREADS=$RUST_TEST_THREADS" >> $GITHUB_ENV
[ -n "$CC" ] && echo "CC=$CC" >> $GITHUB_ENV
[ -n "$AR" ] && echo "AR=$AR" >> $GITHUB_ENV
if [ "$USE_CROSS" == "true" ]; then
echo "BUILD_CMD=cross" >> $GITHUB_ENV
runner=$(BUILD_CMD=cross cross.sh bash -c "env | sed -nr '/^CARGO_TARGET_.*_RUNNER=/s///p'")
[ -n "$runner" ] && echo "CROSS_RUNNER=$runner" >> $GITHUB_ENV
fi
env:
PLATFORM: ${{ matrix.platform }}
- name: Install MinGW and Clang (Windows x64 MSYS2)
if: matrix.platform == 'windows-x64'
uses: msys2/setup-msys2@v2
with:
update: true
install: |
mingw-w64-x86_64-toolchain
mingw-w64-x86_64-clang
mingw-w64-x86_64-make
mingw-w64-x86_64-cmake
- name: Build C library
if: ${{ !contains(matrix.os, 'windows') }} # Requires an additional adapted Makefile for `cl.exe` compiler
run: make.sh -j CFLAGS="-Werror"
# TODO: Remove RUSTFLAGS="--cap-lints allow" once we use a wasmtime release that addresses
# the `mismatched-lifetime-syntaxes` lint
- name: Build wasmtime library (Windows x64 MSYS2)
if: contains(matrix.features, 'wasm') && matrix.platform == 'windows-x64'
run: |
mkdir -p target
WASMTIME_VERSION=$(cargo metadata --format-version=1 --locked --features wasm | \
jq -r '.packages[] | select(.name == "wasmtime-c-api-impl") | .version')
curl -LSs "$WASMTIME_REPO/archive/refs/tags/v${WASMTIME_VERSION}.tar.gz" | tar xzf - -C target
cd target/wasmtime-${WASMTIME_VERSION}
cmake -S crates/c-api -B target/c-api \
-DCMAKE_INSTALL_PREFIX="$PWD/artifacts" \
-DWASMTIME_DISABLE_ALL_FEATURES=ON \
-DWASMTIME_FEATURE_CRANELIFT=ON \
-DWASMTIME_TARGET='x86_64-pc-windows-gnu'
cmake --build target/c-api && cmake --install target/c-api
printf 'CMAKE_PREFIX_PATH=%s\n' "$PWD/artifacts" >> $GITHUB_ENV
env:
WASMTIME_REPO: https://github.com/bytecodealliance/wasmtime
RUSTFLAGS: ${{ env.RUSTFLAGS }} --cap-lints allow
- name: Build wasm library
if: ${{ !matrix.cli-only && !matrix.use-cross }} # No sense to build on the same Github runner hosts many times
run: script/build-wasm
- name: Build C library (Windows x64 MSYS2 CMake)
if: matrix.platform == 'windows-x64'
shell: msys2 {0}
run: |
cmake -G Ninja -S . -B build/static \
-DBUILD_SHARED_LIBS=OFF \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_COMPILE_WARNING_AS_ERROR=ON \
-DTREE_SITTER_FEATURE_WASM=$WASM \
-DCMAKE_C_COMPILER=clang
cmake --build build/static
- run: $BUILD_CMD build --release --target=${{ matrix.target }} --features=${{ matrix.cli_features }}
cmake -G Ninja -S . -B build/shared \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_COMPILE_WARNING_AS_ERROR=ON \
-DTREE_SITTER_FEATURE_WASM=$WASM \
-DCMAKE_C_COMPILER=clang
cmake --build build/shared
rm -rf \
build/{static,shared} \
"${CMAKE_PREFIX_PATH}/artifacts" \
target/wasmtime-${WASMTIME_VERSION}
env:
WASM: ${{ contains(matrix.features, 'wasm') && 'ON' || 'OFF' }}
- run: script/fetch-fixtures
# TODO: Remove RUSTFLAGS="--cap-lints allow" once we use a wasmtime release that addresses
# the `mismatched-lifetime-syntaxes` lint
- name: Build wasmtime library
if: contains(matrix.features, 'wasm')
run: |
mkdir -p target
WASMTIME_VERSION=$(cargo metadata --format-version=1 --locked --features wasm | \
jq -r '.packages[] | select(.name == "wasmtime-c-api-impl") | .version')
curl -LSs "$WASMTIME_REPO/archive/refs/tags/v${WASMTIME_VERSION}.tar.gz" | tar xzf - -C target
cd target/wasmtime-${WASMTIME_VERSION}
cmake -S crates/c-api -B target/c-api \
-DCMAKE_INSTALL_PREFIX="$PWD/artifacts" \
-DWASMTIME_DISABLE_ALL_FEATURES=ON \
-DWASMTIME_FEATURE_CRANELIFT=ON \
-DWASMTIME_TARGET='${{ matrix.target }}'
cmake --build target/c-api && cmake --install target/c-api
printf 'CMAKE_PREFIX_PATH=%s\n' "$PWD/artifacts" >> $GITHUB_ENV
env:
WASMTIME_REPO: https://github.com/bytecodealliance/wasmtime
RUSTFLAGS: ${{ env.RUSTFLAGS }} --cap-lints allow
- name: Build C library (make)
if: runner.os != 'Windows'
run: |
if [[ $PLATFORM == linux-arm ]]; then
CC=arm-linux-gnueabihf-gcc; AR=arm-linux-gnueabihf-ar
elif [[ $PLATFORM == linux-x86 ]]; then
CC=i686-linux-gnu-gcc; AR=i686-linux-gnu-ar
elif [[ $PLATFORM == linux-powerpc64 ]]; then
CC=powerpc64-linux-gnu-gcc; AR=powerpc64-linux-gnu-ar
else
CC=gcc; AR=ar
fi
make -j CFLAGS="$CFLAGS" CC=$CC AR=$AR
env:
PLATFORM: ${{ matrix.platform }}
CFLAGS: -g -Werror -Wall -Wextra -Wshadow -Wpedantic -Werror=incompatible-pointer-types
- name: Build C library (CMake)
if: "!matrix.cross"
run: |
cmake -S . -B build/static \
-DBUILD_SHARED_LIBS=OFF \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_COMPILE_WARNING_AS_ERROR=ON \
-DTREE_SITTER_FEATURE_WASM=$WASM
cmake --build build/static --verbose
cmake -S . -B build/shared \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_COMPILE_WARNING_AS_ERROR=ON \
-DTREE_SITTER_FEATURE_WASM=$WASM
cmake --build build/shared --verbose
env:
CC: ${{ contains(matrix.platform, 'linux') && 'clang' || '' }}
WASM: ${{ contains(matrix.features, 'wasm') && 'ON' || 'OFF' }}
- name: Build Wasm library
if: contains(matrix.features, 'wasm')
shell: bash
run: |
cd lib/binding_web
npm ci
CJS=true npm run build
CJS=true npm run build:debug
npm run build
npm run build:debug
- name: Check no_std builds
if: inputs.run-test && !matrix.no-run
working-directory: lib
shell: bash
run: cargo check --no-default-features --target='${{ matrix.target }}'
- name: Build target
run: cargo build --release --target='${{ matrix.target }}' --features='${{ matrix.features }}' $PACKAGE
env:
PACKAGE: ${{ matrix.platform == 'wasm32' && '-p tree-sitter' || '' }}
- name: Cache fixtures
- uses: ./.github/actions/cache
id: cache
if: inputs.run-test && !matrix.no-run
uses: ./.github/actions/cache
- name: Fetch fixtures
if: inputs.run-test && !matrix.no-run
run: cargo run -p xtask --target='${{ matrix.target }}' -- fetch-fixtures
- name: Generate fixtures
if: inputs.run-test && !matrix.no-run && steps.cache.outputs.cache-hit != 'true'
run: cargo run -p xtask --target='${{ matrix.target }}' -- generate-fixtures
if: ${{ !matrix.cli-only && inputs.run_test && steps.cache.outputs.cache-hit != 'true' }} # Can't natively run CLI on Github runner's host
run: script/generate-fixtures
- name: Generate Wasm fixtures
if: inputs.run-test && !matrix.no-run && contains(matrix.features, 'wasm') && steps.cache.outputs.cache-hit != 'true'
run: cargo run -p xtask --target='${{ matrix.target }}' -- generate-fixtures --wasm
- name: Generate WASM fixtures
if: ${{ !matrix.cli-only && !matrix.use-cross && inputs.run_test && steps.cache.outputs.cache-hit != 'true' }} # See comment for the "Build wasm library" step
run: script/generate-fixtures-wasm
- name: Run main tests
if: inputs.run-test && !matrix.no-run
run: cargo test --target='${{ matrix.target }}' --features='${{ matrix.features }}'
if: ${{ !matrix.cli-only && inputs.run_test }} # Can't natively run CLI on Github runner's host
run: $BUILD_CMD test --target=${{ matrix.target }} --features=${{ matrix.cli_features }}
- name: Run Wasm tests
if: inputs.run-test && !matrix.no-run && contains(matrix.features, 'wasm')
run: cargo run -p xtask --target='${{ matrix.target }}' -- test-wasm
- name: Run wasm tests
if: ${{ !matrix.cli-only && !matrix.use-cross && inputs.run_test }} # See comment for the "Build wasm library" step
run: script/test-wasm
- name: Run benchmarks
if: ${{ !matrix.cli-only && !matrix.use-cross && inputs.run_test }} # Cross-compiled benchmarks make no sense
run: $BUILD_CMD bench benchmark -p tree-sitter-cli --target=${{ matrix.target }}
- name: Upload CLI artifact
if: "!matrix.no-run"
uses: actions/upload-artifact@v6
uses: actions/upload-artifact@v4
with:
name: tree-sitter.${{ matrix.platform }}
path: target/${{ matrix.target }}/release/tree-sitter${{ contains(matrix.target, 'windows') && '.exe' || '' }}
path: target/${{ matrix.target }}/release/tree-sitter${{ env.EXE }}
if-no-files-found: error
retention-days: 7
- name: Upload Wasm artifacts
if: matrix.platform == 'linux-x64'
uses: actions/upload-artifact@v6
- name: Upload WASM artifacts
if: ${{ matrix.platform == 'linux-x64' }}
uses: actions/upload-artifact@v4
with:
name: tree-sitter.wasm
path: |
lib/binding_web/web-tree-sitter.js
lib/binding_web/web-tree-sitter.js.map
lib/binding_web/web-tree-sitter.cjs
lib/binding_web/web-tree-sitter.cjs.map
lib/binding_web/web-tree-sitter.wasm
lib/binding_web/web-tree-sitter.wasm.map
lib/binding_web/debug/web-tree-sitter.cjs
lib/binding_web/debug/web-tree-sitter.cjs.map
lib/binding_web/debug/web-tree-sitter.js
lib/binding_web/debug/web-tree-sitter.js.map
lib/binding_web/debug/web-tree-sitter.wasm
lib/binding_web/debug/web-tree-sitter.wasm.map
lib/binding_web/lib/*.c
lib/binding_web/lib/*.h
lib/binding_web/lib/*.ts
lib/binding_web/src/*.ts
lib/binding_web/tree-sitter.js
lib/binding_web/tree-sitter.wasm
if-no-files-found: error
retention-days: 7

View file

@ -1,21 +1,9 @@
name: CI
on:
pull_request:
paths-ignore:
- docs/**
- "**/README.md"
- CONTRIBUTING.md
- LICENSE
- cli/src/templates
push:
branches: [master]
paths-ignore:
- docs/**
- "**/README.md"
- CONTRIBUTING.md
- LICENSE
- cli/src/templates
branches:
- 'master'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
@ -25,25 +13,15 @@ jobs:
checks:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up stable Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
components: clippy, rustfmt
- name: Lint files
run: |
make lint
make lint-web
- uses: actions/checkout@v4
- run: rustup toolchain install stable --profile minimal
- run: rustup toolchain install nightly --profile minimal
- run: rustup component add --toolchain nightly rustfmt
- uses: Swatinem/rust-cache@v2
- run: make lint
sanitize:
uses: ./.github/workflows/sanitize.yml
build:
uses: ./.github/workflows/build.yml
check-wasm-stdlib:
uses: ./.github/workflows/wasm_stdlib.yml

View file

@ -1,50 +0,0 @@
name: Deploy Docs
on:
push:
branches: [master]
paths: [docs/**]
workflow_dispatch:
jobs:
deploy-docs:
runs-on: ubuntu-latest
permissions:
contents: write
pages: write
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Rust
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Install mdbook
env:
GH_TOKEN: ${{ github.token }}
run: |
jq_expr='.assets[] | select(.name | contains("x86_64-unknown-linux-gnu")) | .browser_download_url'
url=$(gh api repos/rust-lang/mdbook/releases/tags/v0.4.52 --jq "$jq_expr")
mkdir mdbook
curl -sSL "$url" | tar -xz -C mdbook
printf '%s/mdbook\n' "$PWD" >> "$GITHUB_PATH"
- name: Install mdbook-admonish
run: cargo install mdbook-admonish
- name: Build Book
run: mdbook build docs
- name: Setup Pages
uses: actions/configure-pages@v5
- name: Upload artifact
uses: actions/upload-pages-artifact@v4
with:
path: docs/book
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4

View file

@ -1,69 +0,0 @@
name: nvim-treesitter parser tests
on:
pull_request:
paths:
- 'crates/cli/**'
- 'crates/config/**'
- 'crates/generate/**'
- 'crates/loader/**'
- '.github/workflows/nvim_ts.yml'
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
check_compilation:
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
type: [generate, build]
name: ${{ matrix.os }} - ${{ matrix.type }}
runs-on: ${{ matrix.os }}
env:
NVIM: ${{ matrix.os == 'windows-latest' && 'nvim-win64\\bin\\nvim.exe' || 'nvim' }}
NVIM_TS_DIR: nvim-treesitter
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v6
with:
repository: nvim-treesitter/nvim-treesitter
path: ${{ env.NVIM_TS_DIR }}
ref: main
- if: runner.os != 'Windows'
run: echo ${{ github.workspace }}/target/release >> $GITHUB_PATH
- if: runner.os == 'Windows'
run: echo ${{ github.workspace }}/target/release >> $env:GITHUB_PATH
- uses: actions-rust-lang/setup-rust-toolchain@v1
- run: cargo build --release
- uses: ilammy/msvc-dev-cmd@v1
- name: Install and prepare Neovim
run: bash ./scripts/ci-install.sh
working-directory: ${{ env.NVIM_TS_DIR }}
- if: matrix.type == 'generate'
name: Generate and compile parsers
run: $NVIM -l ./scripts/install-parsers.lua --generate --max-jobs=2
working-directory: ${{ env.NVIM_TS_DIR }}
shell: bash
- if: matrix.type == 'build'
name: Compile parsers
run: $NVIM -l ./scripts/install-parsers.lua --max-jobs=10
working-directory: ${{ env.NVIM_TS_DIR }}
shell: bash
- if: "!cancelled()"
name: Check query files
run: $NVIM -l ./scripts/check-queries.lua
working-directory: ${{ env.NVIM_TS_DIR }}
shell: bash

View file

@ -1,5 +1,4 @@
name: Release
on:
workflow_dispatch:
push:
@ -10,22 +9,19 @@ jobs:
build:
uses: ./.github/workflows/build.yml
with:
run-test: false
run_test: false
release:
name: Release on GitHub
name: Release
runs-on: ubuntu-latest
needs: build
permissions:
id-token: write
attestations: write
contents: write
steps:
- name: Checkout repository
uses: actions/checkout@v6
- uses: actions/checkout@v4
- name: Download build artifacts
uses: actions/download-artifact@v7
uses: actions/download-artifact@v4
with:
path: artifacts
@ -35,13 +31,9 @@ jobs:
- name: Prepare release artifacts
run: |
mkdir -p target web
mv artifacts/tree-sitter.wasm/* web/
tar -czf target/web-tree-sitter.tar.gz -C web .
mkdir -p target
mv artifacts/tree-sitter.wasm/* target/
rm -r artifacts/tree-sitter.wasm
for platform in $(cd artifacts; ls | sed 's/^tree-sitter\.//'); do
exe=$(ls artifacts/tree-sitter.$platform/tree-sitter*)
gzip --stdout --name $exe > target/tree-sitter-$platform.gz
@ -49,81 +41,60 @@ jobs:
rm -rf artifacts
ls -l target/
- name: Generate attestations
uses: actions/attest-build-provenance@v3
with:
subject-path: |
target/tree-sitter-*.gz
target/web-tree-sitter.tar.gz
- name: Create release
run: |-
gh release create $GITHUB_REF_NAME \
target/tree-sitter-*.gz \
target/web-tree-sitter.tar.gz
env:
GH_TOKEN: ${{ github.token }}
uses: softprops/action-gh-release@v2
with:
name: ${{ github.ref_name }}
tag_name: ${{ github.ref_name }}
fail_on_unmatched_files: true
files: |
target/tree-sitter-*.gz
target/tree-sitter.wasm
target/tree-sitter.js
crates_io:
name: Publish packages to Crates.io
name: Publish CLI to Crates.io
runs-on: ubuntu-latest
environment: crates
permissions:
id-token: write
contents: read
needs: release
steps:
- name: Checkout repository
uses: actions/checkout@v6
- uses: actions/checkout@v4
- name: Set up Rust
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Set up registry token
id: auth
uses: rust-lang/crates-io-auth-action@v1
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- name: Publish crates to Crates.io
uses: katyo/publish-crates@v2
with:
registry-token: ${{ steps.auth.outputs.token }}
registry-token: ${{ secrets.CARGO_REGISTRY_TOKEN }}
npm:
name: Publish packages to npmjs.com
name: Publish lib to npmjs.com
runs-on: ubuntu-latest
environment: npm
permissions:
id-token: write
contents: read
needs: release
strategy:
fail-fast: false
matrix:
directory: [crates/cli/npm, lib/binding_web]
directory: ["cli/npm", "lib/binding_web"]
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Node
uses: actions/setup-node@v6
with:
node-version: 24
registry-url: https://registry.npmjs.org
- name: Set up Rust
uses: actions-rust-lang/setup-rust-toolchain@v1
- uses: actions/checkout@v4
- name: Build wasm
if: matrix.directory == 'lib/binding_web'
run: ./script/build-wasm
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: 18
registry-url: "https://registry.npmjs.org"
- name: Publish lib to npmjs.com
env:
NODE_AUTH_TOKEN: ${{secrets.NPM_TOKEN}}
run: |
cd ${{ matrix.directory }}
npm ci
npm run build
npm run build:debug
CJS=true npm run build
CJS=true npm run build:debug
npm run build:dts
- name: Publish to npmjs.com
working-directory: ${{ matrix.directory }}
run: npm publish
npm publish

View file

@ -1,47 +1,34 @@
name: No response
name: no_response
on:
schedule:
- cron: "30 1 * * *" # Run every day at 01:30
- cron: '30 1 * * *' # Run every day at 01:30
workflow_dispatch:
issue_comment:
permissions:
issues: write
pull-requests: write
jobs:
close:
name: Close issues with no response
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- name: Checkout script
uses: actions/checkout@v6
with:
sparse-checkout: .github/scripts/close_unresponsive.js
sparse-checkout-cone-mode: false
- name: Run script
uses: actions/github-script@v8
- uses: actions/checkout@v4
- uses: actions/github-script@v7
with:
script: |
const script = require('./.github/scripts/close_unresponsive.js')
await script({github, context})
remove_label:
name: Remove response label
if: github.event_name == 'issue_comment'
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- name: Checkout script
uses: actions/checkout@v6
with:
sparse-checkout: .github/scripts/remove_response_label.js
sparse-checkout-cone-mode: false
- name: Run script
uses: actions/github-script@v8
- uses: actions/checkout@v4
- uses: actions/github-script@v7
with:
script: |
const script = require('./.github/scripts/remove_response_label.js')

View file

@ -1,24 +1,16 @@
name: Remove Reviewers
name: "reviewers: remove"
on:
pull_request_target:
types: [converted_to_draft, closed]
permissions:
pull-requests: write
jobs:
remove-reviewers:
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- name: Checkout script
uses: actions/checkout@v6
with:
sparse-checkout: .github/scripts/reviewers_remove.js
sparse-checkout-cone-mode: false
- name: Run script
uses: actions/github-script@v8
- uses: actions/checkout@v4
- name: 'Remove reviewers'
uses: actions/github-script@v7
with:
script: |
const script = require('./.github/scripts/reviewers_remove.js')

View file

@ -8,44 +8,39 @@ on:
workflow_call:
jobs:
check-undefined-behaviour:
check_undefined_behaviour:
name: Sanitizer checks
runs-on: ubuntu-latest
timeout-minutes: 20
env:
TREE_SITTER: ${{ github.workspace }}/target/release/tree-sitter
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Checkout source code
uses: actions/checkout@v4
- name: Install UBSAN library
run: sudo apt-get update -y && sudo apt-get install -y libubsan1
- name: Install UBSAN library
run: sudo apt-get update -y && sudo apt-get install -y libubsan1
- name: Set up Rust
uses: actions-rust-lang/setup-rust-toolchain@v1
- run: rustup toolchain install stable --profile minimal
- uses: Swatinem/rust-cache@v2
- run: cargo build --release
- run: script/fetch-fixtures
- name: Build project
run: cargo build --release
- uses: ./.github/actions/cache
id: cache
- name: Cache fixtures
uses: ./.github/actions/cache
id: cache
- if: ${{ steps.cache.outputs.cache-hit != 'true' }}
run: script/generate-fixtures
- name: Fetch fixtures
run: cargo xtask fetch-fixtures
- name: Run main tests with undefined behaviour sanitizer (UBSAN)
env:
CFLAGS: -fsanitize=undefined
RUSTFLAGS: ${{ env.RUSTFLAGS }} -lubsan
run: cargo test -- --test-threads 1
- name: Generate fixtures
if: ${{ steps.cache.outputs.cache-hit != 'true' }}
run: cargo xtask generate-fixtures
- name: Run main tests with undefined behaviour sanitizer (UBSAN)
run: cargo test -- --test-threads 1
env:
CFLAGS: -fsanitize=undefined
RUSTFLAGS: ${{ env.RUSTFLAGS }} -lubsan
- name: Run main tests with address sanitizer (ASAN)
run: cargo test -- --test-threads 1
env:
ASAN_OPTIONS: verify_asan_link_order=0
CFLAGS: -fsanitize=address
RUSTFLAGS: ${{ env.RUSTFLAGS }} -lasan --cfg sanitizing
- name: Run main tests with address sanitizer (ASAN)
env:
ASAN_OPTIONS: verify_asan_link_order=0
CFLAGS: -fsanitize=address
RUSTFLAGS: ${{ env.RUSTFLAGS }} -lasan --cfg sanitizing
run: cargo test -- --test-threads 1

View file

@ -1,29 +0,0 @@
name: Close as spam
on:
issues:
types: [labeled]
pull_request_target:
types: [labeled]
permissions:
issues: write
pull-requests: write
jobs:
spam:
runs-on: ubuntu-latest
if: github.event.label.name == 'spam'
steps:
- name: Checkout script
uses: actions/checkout@v6
with:
sparse-checkout: .github/scripts/close_spam.js
sparse-checkout-cone-mode: false
- name: Run script
uses: actions/github-script@v8
with:
script: |
const script = require('./.github/scripts/close_spam.js')
await script({github, context})

View file

@ -1,41 +0,0 @@
name: Check Wasm Exports
on:
pull_request:
paths:
- lib/include/tree_sitter/api.h
- lib/binding_web/**
- xtask/src/**
push:
branches: [master]
paths:
- lib/include/tree_sitter/api.h
- lib/binding_rust/bindings.rs
- CMakeLists.txt
jobs:
check-wasm-exports:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up stable Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
- name: Install wasm-objdump
run: sudo apt-get update -y && sudo apt-get install -y wabt
- name: Build C library (make)
run: make -j CFLAGS="$CFLAGS"
env:
CFLAGS: -g -Werror -Wall -Wextra -Wshadow -Wpedantic -Werror=incompatible-pointer-types
- name: Build Wasm Library
working-directory: lib/binding_web
run: npm ci && npm run build:debug
- name: Check Wasm exports
run: cargo xtask check-wasm-exports

View file

@ -1,19 +0,0 @@
name: Check Wasm Stdlib build
on:
workflow_call:
jobs:
check:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Check directory changes
uses: actions/github-script@v8
with:
script: |
const scriptPath = `${process.env.GITHUB_WORKSPACE}/.github/scripts/wasm_stdlib.js`;
const script = require(scriptPath);
return script({ github, context, core });

7
.gitignore vendored
View file

@ -1,12 +1,10 @@
log*.html
.direnv
.idea
*.xcodeproj
.vscode
.cache
.zig-cache
.direnv
profile*
fuzz-results
@ -14,6 +12,7 @@ test/fuzz/out
test/fixtures/grammars/*
!test/fixtures/grammars/.gitkeep
package-lock.json
node_modules
docs/assets/js/tree-sitter.js
@ -26,7 +25,6 @@ docs/assets/js/tree-sitter.js
*.dylib
*.so
*.so.[0-9]*
*.dll
*.o
*.obj
*.exp
@ -34,7 +32,4 @@ docs/assets/js/tree-sitter.js
*.wasm
.swiftpm
.build
build
zig-*
/result

View file

@ -1,11 +0,0 @@
{
"lsp": {
"rust-analyzer": {
"initialization_options": {
"cargo": {
"features": "all"
}
}
}
}
}

379
CHANGELOG.md Normal file
View file

@ -0,0 +1,379 @@
# Changelog
## [0.22.6] — 2024-05-05
### Features
- Improve handling of serialization buffer overflows (<https://github.com/tree-sitter/tree-sitter/pull/3318>)
- Reverse iteration through node parents (<https://github.com/tree-sitter/tree-sitter/pull/3214>)
- **cli**: Support `NO_COLOR` (<https://github.com/tree-sitter/tree-sitter/pull/3299>)
- **cli**: Add test listing and allow users to parse a specific test number (<https://github.com/tree-sitter/tree-sitter/pull/3067>)
- **grammar**: Add "inherits" field if available (<https://github.com/tree-sitter/tree-sitter/pull/3295>)
### Bug Fixes
- Correctly load field data from wasm languages
- Improve error message when the `tree-sitter` field is malformed
- Don't error out on package.json lookup errors if `--no-bindings` is passed (<https://github.com/tree-sitter/tree-sitter/pull/3323>)
- **cli**: Keep default cc flags in build
- **cli**: Properly account for multi-grammar repos when using docker to build a wasm parser (<https://github.com/tree-sitter/tree-sitter/pull/3337>)
- **generate**: Don't check arbitrarily named dirs
- **generate**: Take `AsRef<Path>` for the path parameter to avoid clones (<https://github.com/tree-sitter/tree-sitter/pull/3322>)
- **highlight**: Correct signature of `ts_highlighter_add_language`
- **lib**: Do not return field names for extras (<https://github.com/tree-sitter/tree-sitter/pull/3330>)
- **lib**: Advance the lookahead end byte by 4 when there's an invalid code point (<https://github.com/tree-sitter/tree-sitter/pull/3305>)
- **rust**: Update README example (<https://github.com/tree-sitter/tree-sitter/pull/3307>)
- **rust**: Use unix + wasi cfg instead of not windows for fd (<https://github.com/tree-sitter/tree-sitter/pull/3304>)
- **test**: Allow newlines in between test name and attribute
- **wasm**: Correct `childrenFromFieldXXX` method signatures (<https://github.com/tree-sitter/tree-sitter/pull/3301>)
- **xtask**: Always bump every crate in tandem
- **zig**: Make usable as a zig dependency (<https://github.com/tree-sitter/tree-sitter/pull/3315>)
### Documentation
- Mention build command variables
- Swap `\s` for `\\s` in query example
- **highlight**: Typo (<https://github.com/tree-sitter/tree-sitter/pull/3290>)
### Refactor
- **tests**: Migrate remaining `grammar.json` tests to `grammar.js` (<https://github.com/tree-sitter/tree-sitter/pull/3325>)
### Build System and CI
- Add nightly rustfmt to workflow for linting (<https://github.com/tree-sitter/tree-sitter/pull/3333>)
- Fix address sanitizer step (<https://github.com/tree-sitter/tree-sitter/pull/3188>)
- **deps**: Bump cc from 1.0.92 to 1.0.94 in the cargo group (<https://github.com/tree-sitter/tree-sitter/pull/3298>)
- **deps**: Bump the cargo group with 6 updates (<https://github.com/tree-sitter/tree-sitter/pull/3313>)
- **xtask**: Bump `build.zig.zon` version when bumping versions
## [0.22.5] — 2024-04-14
### Bug Fixes
- Avoid generating unused character set constants
- **cli**: Test parsing on windows (<https://github.com/tree-sitter/tree-sitter/pull/3289>)
- **rust**: Compilation on wasm32-wasi (<https://github.com/tree-sitter/tree-sitter/pull/3293>)
## [0.22.4] — 2024-04-12
### Bug Fixes
- Fix sorting of transitions within a lex state
- Include 2-character ranges in array-based state transitions
### Build System and CI
- Always bump at least the patch version in bump xtask
## [0.22.3] — 2024-04-12
### Features
- Add strncat to wasm stdlib
- Generate simpler code for matching large character sets (<https://github.com/tree-sitter/tree-sitter/pull/3234>)
- When loading languages via WASM, gracefully handle memory errors and leaks in external scanners (<https://github.com/tree-sitter/tree-sitter/pull/3181>)
### Bug Fixes
- **bindings**: Add utf-8 flag to python & node (<https://github.com/tree-sitter/tree-sitter/pull/3278>)
- **bindings**: Generate parser.c if missing (<https://github.com/tree-sitter/tree-sitter/pull/3277>)
- **bindings**: Remove required platforms for swift (<https://github.com/tree-sitter/tree-sitter/pull/3264>)
- **cli**: Fix mismatched parenthesis when accounting for `&&` (<https://github.com/tree-sitter/tree-sitter/pull/3274>)
- **lib**: Do not consider childless nodes for ts_node_parent (<https://github.com/tree-sitter/tree-sitter/pull/3191>)
- **lib**: Properly account for aliased root nodes and root nodes with
children in `ts_subtree_string` (<https://github.com/tree-sitter/tree-sitter/pull/3191>)
- **lib**: Account for the root node of a tree cursor being an alias (<https://github.com/tree-sitter/tree-sitter/pull/3191>)
- **lib**: Use correct format specifier in log message (<https://github.com/tree-sitter/tree-sitter/pull/3255>)
- **parser**: Fix variadic macro (<https://github.com/tree-sitter/tree-sitter/pull/3229>)
- render: Proper function prototypes (<https://github.com/tree-sitter/tree-sitter/pull/3277>)
- **windows**: Add `/utf-8` flag for parsers using unicode symbols (<https://github.com/tree-sitter/tree-sitter/pull/3223>)
- Add a semicolon after SKIP macros (<https://github.com/tree-sitter/tree-sitter/pull/3264>)
- Add back `build-wasm` temporarily (<https://github.com/tree-sitter/tree-sitter/pull/3203>)
- Add lifetime to matches function (<https://github.com/tree-sitter/tree-sitter/pull/3254>)
- Default output directory for `build --wasm` should use current_dir (<https://github.com/tree-sitter/tree-sitter/pull/3203>)
- Fix sorting of wasm stdlib symbols
- Insert "tree-sitter" section in current directory's package.json if it exists (<https://github.com/tree-sitter/tree-sitter/pull/3224>)
- Tie the lifetime of the cursor to the query in `QueryCursor::captures()` (<https://github.com/tree-sitter/tree-sitter/pull/3266>)
- Wrong flag check in `build.rs`
### Performance
- **cli**: Reduced the compile time of generated parsers by generating C code with fewer conditionals (<https://github.com/tree-sitter/tree-sitter/pull/3234>)
### Documentation
- Add NGINX grammar
### Refactor
- **parser**: Make REDUCE macro non-variadic (<https://github.com/tree-sitter/tree-sitter/pull/3280>)
- **js**: Misc fixes & tidying
- **rust**: Misc fixes & tidying
### Testing
- Add regression test for node parent + string bug (<https://github.com/tree-sitter/tree-sitter/pull/3191>)
- **test**: Allow colons in test names (<https://github.com/tree-sitter/tree-sitter/pull/3264>)
### Build System and CI
- Upgrade wasmtime
- Update emscripten version (<https://github.com/tree-sitter/tree-sitter/pull/3272>)
- **dependabot**: Improve PR labels (<https://github.com/tree-sitter/tree-sitter/pull/3282>)
## [0.22.2] — 2024-03-17
### Breaking
- **cli**: Add a separate build command to compile parsers
### Features
- **bindings/rust**: Expose `Parser::included_ranges`
- Lower the lib's MSRV (<https://github.com/tree-sitter/tree-sitter/pull/3169>)
- **lib**: Implement Display for Node (<https://github.com/tree-sitter/tree-sitter/pull/3177>)
### Bug Fixes
- **bindings/wasm**: Fix `Parser.getIncludedRanges()` (<https://github.com/tree-sitter/tree-sitter/pull/3164>)
- **lib**: Makefile installation on macOS (<https://github.com/tree-sitter/tree-sitter/pull/3167>)
- **lib**: Makefile installation (<https://github.com/tree-sitter/tree-sitter/pull/3173>)
- **lib**: Avoid possible UB of calling memset on a null ptr when 0 is passed into `array_grow_by` (<https://github.com/tree-sitter/tree-sitter/pull/3176>)
- **lib**: Allow hiding symbols (<https://github.com/tree-sitter/tree-sitter/pull/3180>)
### Documentation
- Fix typo (<https://github.com/tree-sitter/tree-sitter/pull/3158>)
- **licensfe**: Update year (<https://github.com/tree-sitter/tree-sitter/pull/3183>)
### Refactor
- Remove dependency on which crate (<https://github.com/tree-sitter/tree-sitter/pull/3172>)
- Turbofish styling
### Testing
- Fix header writes (<https://github.com/tree-sitter/tree-sitter/pull/3174>)
### Build System and CI
- Simplify workflows (<https://github.com/tree-sitter/tree-sitter/pull/3002>)
- **lib**: Allow overriding CFLAGS on the commandline (<https://github.com/tree-sitter/tree-sitter/pull/3159>)
## [0.22.1] — 2024-03-10
### Bug Fixes
- Cli build script behavior on release
## [0.22.0] — 2024-03-10
### Breaking
- Remove top-level `corpus` dir for tests
The cli will now only look in `test/corpus` for tests
- Remove redundant escape regex & curly brace regex preprocessing (<https://github.com/tree-sitter/tree-sitter/pull/2838>)
- **bindings**: Convert node bindings to NAPI (<https://github.com/tree-sitter/tree-sitter/pull/3077>)
- **wasm**: Make `current*`, `is*`, and `has*` methods properties (<https://github.com/tree-sitter/tree-sitter/pull/3103>)
- **wasm**: Keep API in-line with upstream and start aligning with node (<https://github.com/tree-sitter/tree-sitter/pull/3149>)
### Features
- Add xtasks to assist with bumping crates (<https://github.com/tree-sitter/tree-sitter/pull/3065>)
- Improve language bindings (<https://github.com/tree-sitter/tree-sitter/pull/2438>)
- Expose the allocator and array header files for external scanners (<https://github.com/tree-sitter/tree-sitter/pull/3063>)
- Add typings for the node bindings
- Replace `nan` with `node-addon-api` and conditionally print logs
- **bindings**: Add more make targets
- **bindings**: Add peerDependencies for npm
- **bindings**: Add prebuildify to node
- **bindings**: Remove dsl types file (<https://github.com/tree-sitter/tree-sitter/pull/3126>)
- **node**: Type tag the language (<https://github.com/tree-sitter/tree-sitter/pull/3109>)
- **test**: Add attributes for corpus tests
### Bug Fixes
- Apply some `scan-build` suggestions (unused assignment/garbage access) (<https://github.com/tree-sitter/tree-sitter/pull/3056>)
- Wrap `||` comparison in parentheses when `&&` is used (<https://github.com/tree-sitter/tree-sitter/pull/3070>)
- Ignore unused variables in the array macros (<https://github.com/tree-sitter/tree-sitter/pull/3083>)
- `binding.cc` overwrite should replace `PARSER_NAME` (<https://github.com/tree-sitter/tree-sitter/pull/3116>)
- Don't use `__declspec(dllexport)` on windows (<https://github.com/tree-sitter/tree-sitter/pull/3128>)
- Parsers should export the language function on windows
- Allow the regex `v` flag (<https://github.com/tree-sitter/tree-sitter/pull/3154>)
- **assertions**: Case shouldn't matter for comment node detection
- **bindings**: Editorconfig and setup.py fixes (<https://github.com/tree-sitter/tree-sitter/pull/3082>)
- **bindings**: Insert `types` after `main` if it exists (<https://github.com/tree-sitter/tree-sitter/pull/3122>)
- **bindings**: Fix template oversights (<https://github.com/tree-sitter/tree-sitter/pull/3155>)
- **cli**: Only output the sources with `--no-bindings` (<https://github.com/tree-sitter/tree-sitter/pull/3123>)
- **generate**: Add `.npmignore`, populate Swift's exclude list (<https://github.com/tree-sitter/tree-sitter/pull/3085>)
- **generate**: Extern allocator functions for the template don't need to be "exported" (<https://github.com/tree-sitter/tree-sitter/pull/3132>)
- **generate**: Camel case name in `Cargo.toml` description (<https://github.com/tree-sitter/tree-sitter/pull/3140>)
- **lib**: Include `api.h` so `ts_set_allocator` is visible (<https://github.com/tree-sitter/tree-sitter/pull/3092>)
### Documentation
- Add GitHub user and PR info to the changelog
- Add css for inline code (<https://github.com/tree-sitter/tree-sitter/pull/2844>)
- Document test attributes
- Add `Ohm` language parser
- Remove duplicate `the`'s (<https://github.com/tree-sitter/tree-sitter/pull/3120>)
- Add discord and matrix badges (<https://github.com/tree-sitter/tree-sitter/pull/3148>)
### Refactor
- Rename TS_REUSE_ALLOCATOR flag (<https://github.com/tree-sitter/tree-sitter/pull/3088>)
- Remove extern/const where possible
- **array**: Use pragma GCC in clang too
- **bindings**: Remove npmignore (<https://github.com/tree-sitter/tree-sitter/pull/3089>)
### Testing
- Don't use TS_REUSE_ALLOCATOR on Darwin systems (<https://github.com/tree-sitter/tree-sitter/pull/3087>)
- Add test case for parse stack merging with incorrect error cost bug (<https://github.com/tree-sitter/tree-sitter/pull/3098>)
### Build System and CI
- Improve changelog settings (<https://github.com/tree-sitter/tree-sitter/pull/3064>)
- Unify crate versions via workspace (<https://github.com/tree-sitter/tree-sitter/pull/3074>)
- Update `cc` to remove annoying debug output (<https://github.com/tree-sitter/tree-sitter/pull/3075>)
- Adjust dependabot settings (<https://github.com/tree-sitter/tree-sitter/pull/3079>)
- Use c11 everywhere
- Add uninstall command
- Don't skip tests on failing lint (<https://github.com/tree-sitter/tree-sitter/pull/3102>)
- Remove unused deps, bump deps, and bump MSRV to 1.74.1 (<https://github.com/tree-sitter/tree-sitter/pull/3153>)
- **bindings**: Metadata improvements
- **bindings**: Make everything c11 (<https://github.com/tree-sitter/tree-sitter/pull/3099>)
- **dependabot**: Update weekly instead of daily (<https://github.com/tree-sitter/tree-sitter/pull/3112>)
- **deps**: Bump the cargo group with 1 update (<https://github.com/tree-sitter/tree-sitter/pull/3081>)
- **deps**: Bump the cargo group with 1 update (<https://github.com/tree-sitter/tree-sitter/pull/3097>)
- **deps**: Bump deps & lockfile (<https://github.com/tree-sitter/tree-sitter/pull/3060>)
- **deps**: Bump the cargo group with 4 updates (<https://github.com/tree-sitter/tree-sitter/pull/3134>)
- **lint**: Detect if `Cargo.lock` needs to be updated (<https://github.com/tree-sitter/tree-sitter/pull/3066>)
- **lint**: Make lockfile check quiet (<https://github.com/tree-sitter/tree-sitter/pull/3078>)
- **swift**: Move 'cLanguageStandard' behind 'targets' (<https://github.com/tree-sitter/tree-sitter/pull/3101>)
### Other
- Make Node.js language bindings context aware (<https://github.com/tree-sitter/tree-sitter/pull/2841>)
They don't have any dynamic global data, so all it takes is just declaring them as such
- Fix crash when attempting to load ancient languages via wasm (<https://github.com/tree-sitter/tree-sitter/pull/3068>)
- Use workspace dependencies for internal crates like Tree-sitter (<https://github.com/tree-sitter/tree-sitter/pull/3076>)
- Remove vendored wasmtime headers (https://github.com/tree-sitter/tree-sitter/pull/3084)
When building rust binding, use wasmtime headers provided via cargo
by the wasmtime-c-api crate.
- Fix invalid parse stack recursive merging with mismatched error cost (<https://github.com/tree-sitter/tree-sitter/pull/3086>)
Allowing this invalid merge caused an invariant to be violated
later on during parsing, when handling a later error.
- Fix regression in `subtree_compare` (<https://github.com/tree-sitter/tree-sitter/pull/3111>)
- docs: Add `Ohm` language parser (<https://github.com/tree-sitter/tree-sitter/pull/3114>)
- Delete `binding_files.rs` (<https://github.com/tree-sitter/tree-sitter/pull/3106>)
- **bindings**: Consistent wording (<https://github.com/tree-sitter/tree-sitter/pull/3096>)
- **bindings**: Ignore more artifacts (<https://github.com/tree-sitter/tree-sitter/pull/3119>)
## [0.21.0] — 2024-02-21
### Breaking
- Remove the apply-all-captures flag, make last-wins precedence the default
**NOTE**: This change might cause breakage in your grammar's highlight tests.
Just flip the order around of the relevant queries, and keep in mind that the
last query that matches will win.
### Features
- Use lockfiles to dedup recompilation
- Improve error message for files with an unknown grammar path (<https://github.com/tree-sitter/tree-sitter/pull/2475>)
- Implement first-line-regex (<https://github.com/tree-sitter/tree-sitter/pull/2479>)
- Error out if an empty string is in the `extras` array
- Allow specifying an external scanner's files (<https://github.com/tree-sitter/tree-sitter/pull/3031>)
- Better error info when a scanner is missing required symbols
- **cli**: Add an optional `grammar-path` argument for the playground (<https://github.com/tree-sitter/tree-sitter/pull/3014>)
- **cli**: Add optional `config-path` argument (<https://github.com/tree-sitter/tree-sitter/pull/3050>)
- **loader**: Add more commonly used default parser directories
### Bug Fixes
- Prettify xml output and add node position info (<https://github.com/tree-sitter/tree-sitter/pull/2970>)
- Inherited grammar generation
- Properly error out when the word property is an invalid rule
- Update schema for regex flags (<https://github.com/tree-sitter/tree-sitter/pull/3006>)
- Properly handle `Query.matches` when filtering out results (<https://github.com/tree-sitter/tree-sitter/pull/3013>)
- Sexp format edge case with quoted closed parenthesis (<https://github.com/tree-sitter/tree-sitter/pull/3016>)
- Always push the default files if there's no `externals`
- Don't log NUL characters (<https://github.com/tree-sitter/tree-sitter/pull/3037>)
- Don't throw an error if the user uses `map` in the grammar (<https://github.com/tree-sitter/tree-sitter/pull/3041>)
- Remove redundant imports (<https://github.com/tree-sitter/tree-sitter/pull/3047>)
- **cli**: Installation via a HTTP tunnel proxy (<https://github.com/tree-sitter/tree-sitter/pull/2824>)
- **cli**: Don't update tests automatically if parse errors are detected (<https://github.com/tree-sitter/tree-sitter/pull/3033>)
- **cli**: Don't use `long` for `grammar_path`
- **test**: Allow writing updates to tests without erroneous nodes instead of denying all of them if a single error is found
- **test**: Edge case when parsing `UNEXPECTED`/`MISSING` nodes with an indentation level greater than 0
- **wasm**: Remove C++ mangled symbols (<https://github.com/tree-sitter/tree-sitter/pull/2971>)
### Documentation
- Create issue template (<https://github.com/tree-sitter/tree-sitter/pull/2978>)
- Document regex limitations
- Mention that `token($.foo)` is illegal
- Explicitly mention behavior of walking outside the given "root" node for a `TSTreeCursor` (<https://github.com/tree-sitter/tree-sitter/pull/3021>)
- Small fixes (<https://github.com/tree-sitter/tree-sitter/pull/2987>)
- Add `Tact` language parser (<https://github.com/tree-sitter/tree-sitter/pull/3030>)
- **web**: Provide deno usage information (<https://github.com/tree-sitter/tree-sitter/pull/2498>)
### Refactor
- Extract regex check into a function and lower its precedence
- `&PathBuf` -> `&Path` (<https://github.com/tree-sitter/tree-sitter/pull/3035>)
- Name anonymous types in api.h (<https://github.com/tree-sitter/tree-sitter/pull/1659>)
### Testing
- Add quotes around bash variables (<https://github.com/tree-sitter/tree-sitter/pull/3023>)
- Update html tests
### Build System and CI
- Only create release for normal semver tags (<https://github.com/tree-sitter/tree-sitter/pull/2973>)
- Add useful development targets to makefile (<https://github.com/tree-sitter/tree-sitter/pull/2979>)
- Remove minimum glibc information in summary page (<https://github.com/tree-sitter/tree-sitter/pull/2988>)
- Use the native m1 mac runner (<https://github.com/tree-sitter/tree-sitter/pull/2995>)
- Add editorconfig (<https://github.com/tree-sitter/tree-sitter/pull/2998>)
- Remove symbolic links from repository (<https://github.com/tree-sitter/tree-sitter/pull/2997>)
- Move common Cargo.toml keys into the workspace and inherit them (<https://github.com/tree-sitter/tree-sitter/pull/3019>)
- Remove reviewers when drafting or closing a PR (<https://github.com/tree-sitter/tree-sitter/pull/2963>)
- Enable creating changelogs with git-cliff (<https://github.com/tree-sitter/tree-sitter/pull/3040>)
- Cache fixtures (<https://github.com/tree-sitter/tree-sitter/pull/3038>)
- Don't cancel jobs on master (<https://github.com/tree-sitter/tree-sitter/pull/3052>)
- Relax caching requirements (<https://github.com/tree-sitter/tree-sitter/pull/3051>)
- **deps**: Bump clap from 4.4.18 to 4.5.0 (<https://github.com/tree-sitter/tree-sitter/pull/3007>)
- **deps**: Bump wasmtime from v16.0.0 to v17.0.1 (<https://github.com/tree-sitter/tree-sitter/pull/3008>)
- **deps**: Bump wasmtime to v18.0.1 (<https://github.com/tree-sitter/tree-sitter/pull/3057>)
- **sanitize**: Add a timeout of 60 minutes (<https://github.com/tree-sitter/tree-sitter/pull/3017>)
- **sanitize**: Reduce timeout to 20 minutes (<https://github.com/tree-sitter/tree-sitter/pull/3054>)
### Other
- Document preferred language for scanner (<https://github.com/tree-sitter/tree-sitter/pull/2972>)
- Add java and tsx to corpus tests (<https://github.com/tree-sitter/tree-sitter/pull/2992>)
- Provide a CLI flag to open `log.html` (<https://github.com/tree-sitter/tree-sitter/pull/2996>)
- Some more clippy lints (<https://github.com/tree-sitter/tree-sitter/pull/3010>)
- Remove deprecated query parsing mechanism (<https://github.com/tree-sitter/tree-sitter/pull/3011>)
- Print out full compiler arguments ran when it fails (<https://github.com/tree-sitter/tree-sitter/pull/3018>)
- Deprecate C++ scanners (<https://github.com/tree-sitter/tree-sitter/pull/3020>)
- Add some documentation to the playground page (<https://github.com/tree-sitter/tree-sitter/pull/1495>)
- Update relevant rust tests (<https://github.com/tree-sitter/tree-sitter/pull/2947>)
- Clippy lints (<https://github.com/tree-sitter/tree-sitter/pull/3032>)
- Error out when multiple arguments are passed to `token`/`token.immediate` (<https://github.com/tree-sitter/tree-sitter/pull/3036>)
- Tidying
- Prefer turbofish syntax where possible (<https://github.com/tree-sitter/tree-sitter/pull/3048>)
- Use published wasmtime crates
- Cleaner cast
- Update `Cargo.lock`
- Get rid of `github_issue_test` file (<https://github.com/tree-sitter/tree-sitter/pull/3055>)
- **cli**: Use spawn to display `emcc`'s stdout and stderr (<https://github.com/tree-sitter/tree-sitter/pull/2494>)
- **cli**: Warn users when a query path needed for a subcommand isn't specified in a grammar's package.json
- **generate**: Dedup and warn about duplicate or invalid rules (<https://github.com/tree-sitter/tree-sitter/pull/2994>)
- **test**: Use different languages for async tests (<https://github.com/tree-sitter/tree-sitter/pull/2953>)
- **wasm**: Use `SIDE_MODULE=2` to silence warning (<https://github.com/tree-sitter/tree-sitter/pull/3003>)

View file

@ -1,95 +0,0 @@
cmake_minimum_required(VERSION 3.13)
project(tree-sitter
VERSION "0.27.0"
DESCRIPTION "An incremental parsing system for programming tools"
HOMEPAGE_URL "https://tree-sitter.github.io/tree-sitter/"
LANGUAGES C)
option(BUILD_SHARED_LIBS "Build using shared libraries" ON)
option(TREE_SITTER_FEATURE_WASM "Enable the Wasm feature" OFF)
option(AMALGAMATED "Build using an amalgamated source" OFF)
if(AMALGAMATED)
set(TS_SOURCE_FILES "${PROJECT_SOURCE_DIR}/lib/src/lib.c")
else()
file(GLOB TS_SOURCE_FILES lib/src/*.c)
list(REMOVE_ITEM TS_SOURCE_FILES "${PROJECT_SOURCE_DIR}/lib/src/lib.c")
endif()
add_library(tree-sitter ${TS_SOURCE_FILES})
target_include_directories(tree-sitter PRIVATE lib/src lib/src/wasm PUBLIC lib/include)
if(MSVC)
target_compile_options(tree-sitter PRIVATE
/wd4018 # disable 'signed/unsigned mismatch'
/wd4232 # disable 'nonstandard extension used'
/wd4244 # disable 'possible loss of data'
/wd4267 # disable 'possible loss of data (size_t)'
/wd4701 # disable 'potentially uninitialized local variable'
/we4022 # treat 'incompatible types' as an error
/W4)
else()
target_compile_options(tree-sitter PRIVATE
-Wall -Wextra -Wshadow -Wpedantic
-Werror=incompatible-pointer-types)
endif()
if(TREE_SITTER_FEATURE_WASM)
if(NOT DEFINED CACHE{WASMTIME_INCLUDE_DIR})
message(CHECK_START "Looking for wasmtime headers")
find_path(WASMTIME_INCLUDE_DIR wasmtime.h
PATHS ENV DEP_WASMTIME_C_API_INCLUDE)
if(NOT WASMTIME_INCLUDE_DIR)
unset(WASMTIME_INCLUDE_DIR CACHE)
message(FATAL_ERROR "Could not find wasmtime headers.\nDid you forget to set CMAKE_INCLUDE_PATH?")
endif()
message(CHECK_PASS "found")
endif()
if(NOT DEFINED CACHE{WASMTIME_LIBRARY})
message(CHECK_START "Looking for wasmtime library")
find_library(WASMTIME_LIBRARY wasmtime)
if(NOT WASMTIME_LIBRARY)
unset(WASMTIME_LIBRARY CACHE)
message(FATAL_ERROR "Could not find wasmtime library.\nDid you forget to set CMAKE_LIBRARY_PATH?")
endif()
message(CHECK_PASS "found")
endif()
target_compile_definitions(tree-sitter PUBLIC TREE_SITTER_FEATURE_WASM)
target_include_directories(tree-sitter SYSTEM PRIVATE "${WASMTIME_INCLUDE_DIR}")
target_link_libraries(tree-sitter PUBLIC "${WASMTIME_LIBRARY}")
set_property(TARGET tree-sitter PROPERTY C_STANDARD_REQUIRED ON)
if(NOT BUILD_SHARED_LIBS)
if(WIN32)
target_compile_definitions(tree-sitter PRIVATE WASM_API_EXTERN= WASI_API_EXTERN=)
target_link_libraries(tree-sitter INTERFACE ws2_32 advapi32 userenv ntdll shell32 ole32 bcrypt)
elseif(NOT APPLE)
target_link_libraries(tree-sitter INTERFACE pthread dl m)
endif()
endif()
endif()
set_target_properties(tree-sitter
PROPERTIES
C_STANDARD 11
C_VISIBILITY_PRESET hidden
POSITION_INDEPENDENT_CODE ON
SOVERSION "${PROJECT_VERSION_MAJOR}.${PROJECT_VERSION_MINOR}"
DEFINE_SYMBOL "")
target_compile_definitions(tree-sitter PRIVATE _POSIX_C_SOURCE=200112L _DEFAULT_SOURCE _BSD_SOURCE _DARWIN_C_SOURCE)
include(GNUInstallDirs)
configure_file(lib/tree-sitter.pc.in "${CMAKE_CURRENT_BINARY_DIR}/tree-sitter.pc" @ONLY)
install(FILES lib/include/tree_sitter/api.h
DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/tree_sitter")
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/tree-sitter.pc"
DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig")
install(TARGETS tree-sitter
LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}")

View file

@ -1 +1 @@
See [docs/src/6-contributing.md](./docs/src/6-contributing.md)
See [section-6-contributing.md](./docs/section-6-contributing.md)

2179
Cargo.lock generated

File diff suppressed because it is too large Load diff

View file

@ -1,86 +1,28 @@
[workspace]
default-members = ["crates/cli"]
default-members = ["cli"]
members = [
"crates/cli",
"crates/config",
"crates/generate",
"crates/highlight",
"crates/loader",
"crates/tags",
"crates/xtask",
"crates/language",
"cli",
"cli/config",
"cli/loader",
"lib",
"lib/language",
"tags",
"highlight",
"xtask",
]
resolver = "2"
[workspace.package]
version = "0.27.0"
authors = [
"Max Brunsfeld <maxbrunsfeld@gmail.com>",
"Amaan Qureshi <amaanq12@gmail.com>",
]
version = "0.23.2"
authors = ["Max Brunsfeld <maxbrunsfeld@gmail.com>"]
edition = "2021"
rust-version = "1.85"
rust-version = "1.74.1"
homepage = "https://tree-sitter.github.io/tree-sitter"
repository = "https://github.com/tree-sitter/tree-sitter"
license = "MIT"
keywords = ["incremental", "parsing"]
categories = ["command-line-utilities", "parsing"]
[workspace.lints.clippy]
dbg_macro = "deny"
todo = "deny"
pedantic = { level = "warn", priority = -1 }
nursery = { level = "warn", priority = -1 }
cargo = { level = "warn", priority = -1 }
# The lints below are a specific subset of the pedantic+nursery lints
# that we explicitly allow in the tree-sitter codebase because they either:
#
# 1. Contain false positives,
# 2. Are unnecessary, or
# 3. Worsen the code
branches_sharing_code = "allow"
cast_lossless = "allow"
cast_possible_truncation = "allow"
cast_possible_wrap = "allow"
cast_precision_loss = "allow"
cast_sign_loss = "allow"
checked_conversions = "allow"
cognitive_complexity = "allow"
collection_is_never_read = "allow"
fallible_impl_from = "allow"
fn_params_excessive_bools = "allow"
inline_always = "allow"
if_not_else = "allow"
items_after_statements = "allow"
match_wildcard_for_single_variants = "allow"
missing_errors_doc = "allow"
missing_panics_doc = "allow"
module_name_repetitions = "allow"
multiple_crate_versions = "allow"
needless_for_each = "allow"
obfuscated_if_else = "allow"
option_if_let_else = "allow"
or_fun_call = "allow"
range_plus_one = "allow"
redundant_clone = "allow"
redundant_closure_for_method_calls = "allow"
ref_option = "allow"
similar_names = "allow"
string_lit_as_bytes = "allow"
struct_excessive_bools = "allow"
struct_field_names = "allow"
transmute_undefined_repr = "allow"
too_many_lines = "allow"
unnecessary_wraps = "allow"
unused_self = "allow"
used_underscore_items = "allow"
[workspace.lints.rust]
mismatched_lifetime_syntaxes = "allow"
[profile.optimize]
inherits = "release"
strip = true # Automatically strip symbols from the binary.
@ -92,72 +34,61 @@ codegen-units = 1 # Maximum size reduction optimizations.
inherits = "optimize"
opt-level = "s" # Optimize for size.
[profile.release-dev]
inherits = "release"
lto = false
debug = true
debug-assertions = true
overflow-checks = true
incremental = true
codegen-units = 256
[profile.profile]
inherits = "optimize"
strip = false
[workspace.dependencies]
ansi_colours = "1.2.3"
anstyle = "1.0.13"
anyhow = "1.0.100"
bstr = "1.12.0"
cc = "1.2.53"
clap = { version = "4.5.54", features = [
anstyle = "1.0.8"
anyhow = "1.0.89"
bstr = "1.10.0"
cc = "1.1.19"
clap = { version = "4.5.17", features = [
"cargo",
"derive",
"env",
"help",
"string",
"unstable-styles",
] }
clap_complete = "4.5.65"
clap_complete_nushell = "4.5.10"
crc32fast = "1.5.0"
ctor = "0.2.9"
ctrlc = { version = "3.5.0", features = ["termination"] }
dialoguer = { version = "0.11.0", features = ["fuzzy-select"] }
etcetera = "0.11.0"
fs4 = "0.12.0"
glob = "0.3.3"
ctor = "0.2.8"
ctrlc = { version = "3.4.5", features = ["termination"] }
dirs = "5.0.1"
filetime = "0.2.25"
fs4 = "0.8.4"
git2 = "0.18.3"
glob = "0.3.1"
heck = "0.5.0"
html-escape = "0.2.13"
indexmap = "2.12.1"
indoc = "2.0.6"
libloading = "0.9.0"
log = { version = "0.4.28", features = ["std"] }
memchr = "2.7.6"
once_cell = "1.21.3"
indexmap = "2.5.0"
indoc = "2.0.5"
lazy_static = "1.5.0"
libloading = "0.8.5"
log = { version = "0.4.22", features = ["std"] }
memchr = "2.7.4"
once_cell = "1.19.0"
path-slash = "0.2.1"
pretty_assertions = "1.4.1"
rand = "0.8.5"
regex = "1.11.3"
regex-syntax = "0.8.6"
rustc-hash = "2.1.1"
schemars = "1.0.5"
semver = { version = "1.0.27", features = ["serde"] }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.149", features = ["preserve_order"] }
similar = "2.7.0"
smallbitvec = "2.6.0"
streaming-iterator = "0.1.9"
tempfile = "3.23.0"
thiserror = "2.0.17"
regex = "1.10.6"
regex-syntax = "0.8.4"
rustc-hash = "1.1.0"
semver = "1.0.23"
serde = { version = "1.0.210", features = ["derive"] }
serde_derive = "1.0.197"
serde_json = { version = "1.0.128", features = ["preserve_order"] }
similar = "2.6.0"
smallbitvec = "2.5.3"
tempfile = "3.12.0"
thiserror = "1.0.63"
tiny_http = "0.12.0"
topological-sort = "0.2.2"
unindent = "0.2.4"
toml = "0.8.19"
unindent = "0.2.3"
walkdir = "2.5.0"
wasmparser = "0.243.0"
webbrowser = "1.0.5"
wasmparser = "0.215.0"
webbrowser = "1.0.2"
tree-sitter = { version = "0.27.0", path = "./lib" }
tree-sitter-generate = { version = "0.27.0", path = "./crates/generate" }
tree-sitter-loader = { version = "0.27.0", path = "./crates/loader" }
tree-sitter-config = { version = "0.27.0", path = "./crates/config" }
tree-sitter-highlight = { version = "0.27.0", path = "./crates/highlight" }
tree-sitter-tags = { version = "0.27.0", path = "./crates/tags" }
tree-sitter-language = { version = "0.1", path = "./crates/language" }
tree-sitter = { version = "0.23.2", path = "./lib" }
tree-sitter-loader = { version = "0.23.2", path = "./cli/loader" }
tree-sitter-config = { version = "0.23.2", path = "./cli/config" }
tree-sitter-highlight = { version = "0.23.2", path = "./highlight" }
tree-sitter-tags = { version = "0.23.2", path = "./tags" }

View file

@ -1,6 +1,6 @@
The MIT License (MIT)
Copyright (c) 2018 Max Brunsfeld
Copyright (c) 2018-2024 Max Brunsfeld
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View file

@ -1,12 +1,13 @@
VERSION := 0.27.0
DESCRIPTION := An incremental parsing system for programming tools
HOMEPAGE_URL := https://tree-sitter.github.io/tree-sitter/
ifeq ($(OS),Windows_NT)
$(error Windows is not supported)
endif
VERSION := 0.23.2
# install directory layout
PREFIX ?= /usr/local
INCLUDEDIR ?= $(PREFIX)/include
LIBDIR ?= $(PREFIX)/lib
BINDIR ?= $(PREFIX)/bin
PCLIBDIR ?= $(LIBDIR)/pkgconfig
# collect sources
@ -22,9 +23,8 @@ OBJ := $(SRC:.c=.o)
# define default flags, and override to append mandatory flags
ARFLAGS := rcs
CFLAGS ?= -O3 -Wall -Wextra -Wshadow -Wpedantic -Werror=incompatible-pointer-types
CFLAGS ?= -O3 -Wall -Wextra -Wshadow -pedantic
override CFLAGS += -std=c11 -fPIC -fvisibility=hidden
override CFLAGS += -D_POSIX_C_SOURCE=200112L -D_DEFAULT_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE
override CFLAGS += -Ilib/src -Ilib/src/wasm -Ilib/include
# ABI versioning
@ -32,25 +32,20 @@ SONAME_MAJOR := $(word 1,$(subst ., ,$(VERSION)))
SONAME_MINOR := $(word 2,$(subst ., ,$(VERSION)))
# OS-specific bits
MACHINE := $(shell $(CC) -dumpmachine)
ifneq ($(findstring darwin,$(MACHINE)),)
ifneq ($(findstring darwin,$(shell $(CC) -dumpmachine)),)
SOEXT = dylib
SOEXTVER_MAJOR = $(SONAME_MAJOR).$(SOEXT)
SOEXTVER = $(SONAME_MAJOR).$(SONAME_MINOR).$(SOEXT)
LINKSHARED += -dynamiclib -Wl,-install_name,$(LIBDIR)/libtree-sitter.$(SOEXTVER)
else ifneq ($(findstring mingw32,$(MACHINE)),)
SOEXT = dll
LINKSHARED += -s -shared -Wl,--out-implib,libtree-sitter.dll.a
else
SOEXT = so
SOEXTVER_MAJOR = $(SOEXT).$(SONAME_MAJOR)
SOEXTVER = $(SOEXT).$(SONAME_MAJOR).$(SONAME_MINOR)
LINKSHARED += -shared -Wl,-soname,libtree-sitter.$(SOEXTVER)
endif
ifneq ($(filter $(shell uname),FreeBSD NetBSD DragonFly),)
PCLIBDIR := $(PREFIX)/libdata/pkgconfig
endif
endif
all: libtree-sitter.a libtree-sitter.$(SOEXT) tree-sitter.pc
@ -63,39 +58,24 @@ ifneq ($(STRIP),)
$(STRIP) $@
endif
ifneq ($(findstring mingw32,$(MACHINE)),)
libtree-sitter.dll.a: libtree-sitter.$(SOEXT)
endif
tree-sitter.pc: lib/tree-sitter.pc.in
sed -e 's|@PROJECT_VERSION@|$(VERSION)|' \
-e 's|@CMAKE_INSTALL_LIBDIR@|$(LIBDIR:$(PREFIX)/%=%)|' \
-e 's|@CMAKE_INSTALL_INCLUDEDIR@|$(INCLUDEDIR:$(PREFIX)/%=%)|' \
-e 's|@PROJECT_DESCRIPTION@|$(DESCRIPTION)|' \
-e 's|@PROJECT_HOMEPAGE_URL@|$(HOMEPAGE_URL)|' \
-e 's|@CMAKE_INSTALL_PREFIX@|$(PREFIX)|' $< > $@
shared: libtree-sitter.$(SOEXT)
static: libtree-sitter.a
tree-sitter.pc: tree-sitter.pc.in
sed -e 's|@VERSION@|$(VERSION)|' \
-e 's|@LIBDIR@|$(LIBDIR)|' \
-e 's|@INCLUDEDIR@|$(INCLUDEDIR)|' \
-e 's|=$(PREFIX)|=$${prefix}|' \
-e 's|@PREFIX@|$(PREFIX)|' $< > $@
clean:
$(RM) $(OBJ) tree-sitter.pc libtree-sitter.a libtree-sitter.$(SOEXT) libtree-stitter.dll.a
$(RM) $(OBJ) tree-sitter.pc libtree-sitter.a libtree-sitter.$(SOEXT)
install: all
install -d '$(DESTDIR)$(INCLUDEDIR)'/tree_sitter '$(DESTDIR)$(PCLIBDIR)' '$(DESTDIR)$(LIBDIR)'
install -m644 lib/include/tree_sitter/api.h '$(DESTDIR)$(INCLUDEDIR)'/tree_sitter/api.h
install -m644 tree-sitter.pc '$(DESTDIR)$(PCLIBDIR)'/tree-sitter.pc
install -m644 libtree-sitter.a '$(DESTDIR)$(LIBDIR)'/libtree-sitter.a
ifneq ($(findstring mingw32,$(MACHINE)),)
install -d '$(DESTDIR)$(BINDIR)'
install -m755 libtree-sitter.dll '$(DESTDIR)$(BINDIR)'/libtree-sitter.dll
install -m755 libtree-sitter.dll.a '$(DESTDIR)$(LIBDIR)'/libtree-sitter.dll.a
else
install -m755 libtree-sitter.$(SOEXT) '$(DESTDIR)$(LIBDIR)'/libtree-sitter.$(SOEXTVER)
cd '$(DESTDIR)$(LIBDIR)' && ln -sf libtree-sitter.$(SOEXTVER) libtree-sitter.$(SOEXTVER_MAJOR)
cd '$(DESTDIR)$(LIBDIR)' && ln -sf libtree-sitter.$(SOEXTVER_MAJOR) libtree-sitter.$(SOEXT)
endif
ln -sf libtree-sitter.$(SOEXTVER) '$(DESTDIR)$(LIBDIR)'/libtree-sitter.$(SOEXTVER_MAJOR)
ln -sf libtree-sitter.$(SOEXTVER_MAJOR) '$(DESTDIR)$(LIBDIR)'/libtree-sitter.$(SOEXT)
uninstall:
$(RM) '$(DESTDIR)$(LIBDIR)'/libtree-sitter.a \
@ -104,36 +84,31 @@ uninstall:
'$(DESTDIR)$(LIBDIR)'/libtree-sitter.$(SOEXT) \
'$(DESTDIR)$(INCLUDEDIR)'/tree_sitter/api.h \
'$(DESTDIR)$(PCLIBDIR)'/tree-sitter.pc
rmdir '$(DESTDIR)$(INCLUDEDIR)'/tree_sitter
.PHONY: all shared static install uninstall clean
.PHONY: all install uninstall clean
##### Dev targets #####
test:
cargo xtask fetch-fixtures
cargo xtask generate-fixtures
cargo xtask test
script/fetch-fixtures
script/generate-fixtures
script/test
test-wasm:
cargo xtask generate-fixtures --wasm
cargo xtask test-wasm
test_wasm:
script/generate-fixtures-wasm
script/test-wasm
lint:
cargo update --workspace --locked --quiet
cargo check --workspace --all-targets
cargo fmt --all --check
cargo +nightly fmt --all --check
cargo clippy --workspace --all-targets -- -D warnings
lint-web:
npm --prefix lib/binding_web ci
npm --prefix lib/binding_web run lint
format:
cargo fmt --all
cargo +nightly fmt --all
changelog:
@git-cliff --config .github/cliff.toml --prepend CHANGELOG.md --latest --github-token $(shell gh auth token)
@git-cliff --config script/cliff.toml --output CHANGELOG.md --latest --github-token $(shell gh auth token)
.PHONY: test test-wasm lint format changelog
.PHONY: test test_wasm lint format changelog

View file

@ -14,22 +14,8 @@ let package = Package(
targets: [
.target(name: "TreeSitter",
path: "lib",
exclude: [
"src/unicode/ICU_SHA",
"src/unicode/README.md",
"src/unicode/LICENSE",
"src/wasm/stdlib-symbols.txt",
"src/lib.c",
],
sources: ["src"],
publicHeadersPath: "include",
cSettings: [
.headerSearchPath("src"),
.define("_POSIX_C_SOURCE", to: "200112L"),
.define("_DEFAULT_SOURCE"),
.define("_BSD_SOURCE"),
.define("_DARWIN_C_SOURCE"),
]),
sources: ["src/lib.c"],
cSettings: [.headerSearchPath("src")]),
],
cLanguageStandard: .c11
)

View file

@ -14,8 +14,8 @@ Tree-sitter is a parser generator tool and an incremental parsing library. It ca
## Links
- [Documentation](https://tree-sitter.github.io)
- [Rust binding](lib/binding_rust/README.md)
- [Wasm binding](lib/binding_web/README.md)
- [Command-line interface](crates/cli/README.md)
- [WASM binding](lib/binding_web/README.md)
- [Command-line interface](cli/README.md)
[discord]: https://img.shields.io/discord/1063097320771698699?logo=discord&label=discord
[matrix]: https://img.shields.io/matrix/tree-sitter-chat%3Amatrix.org?logo=matrix&label=matrix

136
build.zig
View file

@ -1,142 +1,18 @@
const std = @import("std");
pub fn build(b: *std.Build) !void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const wasm = b.option(bool, "enable-wasm", "Enable Wasm support") orelse false;
const shared = b.option(bool, "build-shared", "Build a shared library") orelse false;
const amalgamated = b.option(bool, "amalgamated", "Build using an amalgamated source") orelse false;
const lib: *std.Build.Step.Compile = b.addLibrary(.{
pub fn build(b: *std.Build) void {
var lib = b.addStaticLibrary(.{
.name = "tree-sitter",
.linkage = if (shared) .dynamic else .static,
.root_module = b.createModule(.{
.target = target,
.optimize = optimize,
.link_libc = true,
.pic = if (shared) true else null,
}),
.target = b.standardTargetOptions(.{}),
.optimize = b.standardOptimizeOption(.{}),
});
if (amalgamated) {
lib.addCSourceFile(.{
.file = b.path("lib/src/lib.c"),
.flags = &.{"-std=c11"},
});
} else {
const files = try findSourceFiles(b);
defer b.allocator.free(files);
lib.addCSourceFiles(.{
.root = b.path("lib/src"),
.files = files,
.flags = &.{"-std=c11"},
});
}
lib.linkLibC();
lib.addCSourceFile(.{ .file = b.path("lib/src/lib.c"), .flags = &.{"-std=c11"} });
lib.addIncludePath(b.path("lib/include"));
lib.addIncludePath(b.path("lib/src"));
lib.addIncludePath(b.path("lib/src/wasm"));
lib.root_module.addCMacro("_POSIX_C_SOURCE", "200112L");
lib.root_module.addCMacro("_DEFAULT_SOURCE", "");
lib.root_module.addCMacro("_BSD_SOURCE", "");
lib.root_module.addCMacro("_DARWIN_C_SOURCE", "");
if (wasm) {
if (b.lazyDependency(wasmtimeDep(target.result), .{})) |wasmtime| {
lib.root_module.addCMacro("TREE_SITTER_FEATURE_WASM", "");
lib.addSystemIncludePath(wasmtime.path("include"));
lib.addLibraryPath(wasmtime.path("lib"));
if (shared) lib.linkSystemLibrary("wasmtime");
}
}
lib.installHeadersDirectory(b.path("lib/include"), ".", .{});
b.installArtifact(lib);
}
/// Get the name of the wasmtime dependency for this target.
pub fn wasmtimeDep(target: std.Target) []const u8 {
const arch = target.cpu.arch;
const os = target.os.tag;
const abi = target.abi;
return @as(?[]const u8, switch (os) {
.linux => switch (arch) {
.x86_64 => switch (abi) {
.gnu => "wasmtime_c_api_x86_64_linux",
.musl => "wasmtime_c_api_x86_64_musl",
.android => "wasmtime_c_api_x86_64_android",
else => null,
},
.aarch64 => switch (abi) {
.gnu => "wasmtime_c_api_aarch64_linux",
.musl => "wasmtime_c_api_aarch64_musl",
.android => "wasmtime_c_api_aarch64_android",
else => null,
},
.x86 => switch (abi) {
.gnu => "wasmtime_c_api_i686_linux",
else => null,
},
.arm => switch (abi) {
.gnueabi => "wasmtime_c_api_armv7_linux",
else => null,
},
.s390x => switch (abi) {
.gnu => "wasmtime_c_api_s390x_linux",
else => null,
},
.riscv64 => switch (abi) {
.gnu => "wasmtime_c_api_riscv64gc_linux",
else => null,
},
else => null,
},
.windows => switch (arch) {
.x86_64 => switch (abi) {
.gnu => "wasmtime_c_api_x86_64_mingw",
.msvc => "wasmtime_c_api_x86_64_windows",
else => null,
},
.aarch64 => switch (abi) {
.msvc => "wasmtime_c_api_aarch64_windows",
else => null,
},
.x86 => switch (abi) {
.msvc => "wasmtime_c_api_i686_windows",
else => null,
},
else => null,
},
.macos => switch (arch) {
.x86_64 => "wasmtime_c_api_x86_64_macos",
.aarch64 => "wasmtime_c_api_aarch64_macos",
else => null,
},
else => null,
}) orelse std.debug.panic(
"Unsupported target for wasmtime: {s}-{s}-{s}",
.{ @tagName(arch), @tagName(os), @tagName(abi) },
);
}
fn findSourceFiles(b: *std.Build) ![]const []const u8 {
var sources: std.ArrayListUnmanaged([]const u8) = .empty;
var dir = try b.build_root.handle.openDir("lib/src", .{ .iterate = true });
var iter = dir.iterate();
defer dir.close();
while (try iter.next()) |entry| {
if (entry.kind != .file) continue;
const file = entry.name;
const ext = std.fs.path.extension(file);
if (std.mem.eql(u8, ext, ".c") and !std.mem.eql(u8, file, "lib.c")) {
try sources.append(b.allocator, b.dupe(file));
}
}
return sources.toOwnedSlice(b.allocator);
}

View file

@ -1,96 +1,10 @@
.{
.name = .tree_sitter,
.fingerprint = 0x841224b447ac0d4f,
.version = "0.27.0",
.minimum_zig_version = "0.14.1",
.name = "tree-sitter",
.version = "0.23.2",
.paths = .{
"build.zig",
"build.zig.zon",
"lib/src",
"lib/include",
"README.md",
"LICENSE",
},
.dependencies = .{
.wasmtime_c_api_aarch64_android = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-aarch64-android-c-api.tar.xz",
.hash = "N-V-__8AAIfPIgdw2YnV3QyiFQ2NHdrxrXzzCdjYJyxJDOta",
.lazy = true,
},
.wasmtime_c_api_aarch64_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-aarch64-linux-c-api.tar.xz",
.hash = "N-V-__8AAIt97QZi7Pf7nNJ2mVY6uxA80Klyuvvtop3pLMRK",
.lazy = true,
},
.wasmtime_c_api_aarch64_macos = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-aarch64-macos-c-api.tar.xz",
.hash = "N-V-__8AAAO48QQf91w9RmmUDHTja8DrXZA1n6Bmc8waW3qe",
.lazy = true,
},
.wasmtime_c_api_aarch64_musl = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-aarch64-musl-c-api.tar.xz",
.hash = "N-V-__8AAI196wa9pwADoA2RbCDp5F7bKQg1iOPq6gIh8-FH",
.lazy = true,
},
.wasmtime_c_api_aarch64_windows = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-aarch64-windows-c-api.zip",
.hash = "N-V-__8AAC9u4wXfqd1Q6XyQaC8_DbQZClXux60Vu5743N05",
.lazy = true,
},
.wasmtime_c_api_armv7_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-armv7-linux-c-api.tar.xz",
.hash = "N-V-__8AAHXe8gWs3s83Cc5G6SIq0_jWxj8fGTT5xG4vb6-x",
.lazy = true,
},
.wasmtime_c_api_i686_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-i686-linux-c-api.tar.xz",
.hash = "N-V-__8AAN2pzgUUfulRCYnipSfis9IIYHoTHVlieLRmKuct",
.lazy = true,
},
.wasmtime_c_api_i686_windows = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-i686-windows-c-api.zip",
.hash = "N-V-__8AAJu0YAUUTFBLxFIOi-MSQVezA6MMkpoFtuaf2Quf",
.lazy = true,
},
.wasmtime_c_api_riscv64gc_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-riscv64gc-linux-c-api.tar.xz",
.hash = "N-V-__8AAG8m-gc3E3AIImtTZ3l1c7HC6HUWazQ9OH5KACX4",
.lazy = true,
},
.wasmtime_c_api_s390x_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-s390x-linux-c-api.tar.xz",
.hash = "N-V-__8AAH314gd-gE4IBp2uvAL3gHeuW1uUZjMiLLeUdXL_",
.lazy = true,
},
.wasmtime_c_api_x86_64_android = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-x86_64-android-c-api.tar.xz",
.hash = "N-V-__8AAIPNRwfNkznebrcGb0IKUe7f35bkuZEYOjcx6q3f",
.lazy = true,
},
.wasmtime_c_api_x86_64_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-x86_64-linux-c-api.tar.xz",
.hash = "N-V-__8AAI8EDwcyTtk_Afhk47SEaqfpoRqGkJeZpGs69ChF",
.lazy = true,
},
.wasmtime_c_api_x86_64_macos = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-x86_64-macos-c-api.tar.xz",
.hash = "N-V-__8AAGtGNgVaOpHSxC22IjrampbRIy6lLwscdcAE8nG1",
.lazy = true,
},
.wasmtime_c_api_x86_64_mingw = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-x86_64-mingw-c-api.zip",
.hash = "N-V-__8AAPS2PAbVix50L6lnddlgazCPTz3whLUFk1qnRtnZ",
.lazy = true,
},
.wasmtime_c_api_x86_64_musl = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-x86_64-musl-c-api.tar.xz",
.hash = "N-V-__8AAF-WEQe0nzvi09PgusM5i46FIuCKJmIDWUleWgQ3",
.lazy = true,
},
.wasmtime_c_api_x86_64_windows = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v33.0.2/wasmtime-v33.0.2-x86_64-windows-c-api.zip",
.hash = "N-V-__8AAKGNXwbpJQsn0_6kwSIVDDWifSg8cBzf7T2RzsC9",
.lazy = true,
},
},
}

View file

@ -8,17 +8,9 @@ rust-version.workspace = true
readme = "README.md"
homepage.workspace = true
repository.workspace = true
documentation = "https://docs.rs/tree-sitter-cli"
license.workspace = true
keywords.workspace = true
categories.workspace = true
include = ["build.rs", "README.md", "LICENSE", "benches/*", "src/**"]
[lints]
workspace = true
[lib]
path = "src/tree_sitter_cli.rs"
[[bin]]
name = "tree-sitter"
@ -30,52 +22,50 @@ name = "benchmark"
harness = false
[features]
default = ["qjs-rt"]
wasm = ["tree-sitter/wasm", "tree-sitter-loader/wasm"]
qjs-rt = ["tree-sitter-generate/qjs-rt"]
[dependencies]
ansi_colours.workspace = true
anstyle.workspace = true
anyhow.workspace = true
bstr.workspace = true
clap.workspace = true
clap_complete.workspace = true
clap_complete_nushell.workspace = true
crc32fast.workspace = true
ctor.workspace = true
ctrlc.workspace = true
dialoguer.workspace = true
dirs.workspace = true
filetime.workspace = true
glob.workspace = true
heck.workspace = true
html-escape.workspace = true
indexmap.workspace = true
indoc.workspace = true
lazy_static.workspace = true
log.workspace = true
memchr.workspace = true
rand.workspace = true
regex.workspace = true
schemars.workspace = true
regex-syntax.workspace = true
rustc-hash.workspace = true
semver.workspace = true
serde.workspace = true
serde_derive.workspace = true
serde_json.workspace = true
similar.workspace = true
streaming-iterator.workspace = true
thiserror.workspace = true
smallbitvec.workspace = true
tiny_http.workspace = true
walkdir.workspace = true
wasmparser.workspace = true
webbrowser.workspace = true
tree-sitter.workspace = true
tree-sitter-generate.workspace = true
tree-sitter-config.workspace = true
tree-sitter-highlight.workspace = true
tree-sitter-loader.workspace = true
tree-sitter-tags.workspace = true
[target."cfg(windows)".dependencies]
url = "2.5.2"
[dev-dependencies]
encoding_rs = "0.8.35"
widestring = "1.2.1"
tree_sitter_proc_macro = { path = "src/tests/proc_macro", package = "tree-sitter-tests-proc-macro" }
tempfile.workspace = true

View file

@ -7,15 +7,14 @@
[npmjs.com]: https://www.npmjs.org/package/tree-sitter-cli
[npmjs.com badge]: https://img.shields.io/npm/v/tree-sitter-cli.svg?color=%23BF4A4A
The Tree-sitter CLI allows you to develop, test, and use Tree-sitter grammars from the command line. It works on `MacOS`,
`Linux`, and `Windows`.
The Tree-sitter CLI allows you to develop, test, and use Tree-sitter grammars from the command line. It works on MacOS, Linux, and Windows.
### Installation
You can install the `tree-sitter-cli` with `cargo`:
```sh
cargo install --locked tree-sitter-cli
cargo install tree-sitter-cli
```
or with `npm`:
@ -35,11 +34,9 @@ The `tree-sitter` binary itself has no dependencies, but specific commands have
### Commands
* `generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current
working directory. See [the documentation] for more information.
* `generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current working directory. See [the documentation] for more information.
* `test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory.
See [the documentation] for more information.
* `test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory. See [the documentation] for more information.
* `parse` - The `tree-sitter parse` command will parse a file (or list of files) using Tree-sitter parsers.

View file

@ -3,77 +3,70 @@ use std::{
env, fs,
path::{Path, PathBuf},
str,
sync::LazyLock,
time::Instant,
};
use anyhow::Context;
use log::info;
use lazy_static::lazy_static;
use tree_sitter::{Language, Parser, Query};
use tree_sitter_loader::{CompileConfig, Loader};
include!("../src/tests/helpers/dirs.rs");
static LANGUAGE_FILTER: LazyLock<Option<String>> =
LazyLock::new(|| env::var("TREE_SITTER_BENCHMARK_LANGUAGE_FILTER").ok());
static EXAMPLE_FILTER: LazyLock<Option<String>> =
LazyLock::new(|| env::var("TREE_SITTER_BENCHMARK_EXAMPLE_FILTER").ok());
static REPETITION_COUNT: LazyLock<usize> = LazyLock::new(|| {
env::var("TREE_SITTER_BENCHMARK_REPETITION_COUNT")
lazy_static! {
static ref LANGUAGE_FILTER: Option<String> =
env::var("TREE_SITTER_BENCHMARK_LANGUAGE_FILTER").ok();
static ref EXAMPLE_FILTER: Option<String> =
env::var("TREE_SITTER_BENCHMARK_EXAMPLE_FILTER").ok();
static ref REPETITION_COUNT: usize = env::var("TREE_SITTER_BENCHMARK_REPETITION_COUNT")
.map(|s| s.parse::<usize>().unwrap())
.unwrap_or(5)
});
static TEST_LOADER: LazyLock<Loader> =
LazyLock::new(|| Loader::with_parser_lib_path(SCRATCH_DIR.clone()));
.unwrap_or(5);
static ref TEST_LOADER: Loader = Loader::with_parser_lib_path(SCRATCH_DIR.clone());
static ref EXAMPLE_AND_QUERY_PATHS_BY_LANGUAGE_DIR: BTreeMap<PathBuf, (Vec<PathBuf>, Vec<PathBuf>)> = {
fn process_dir(result: &mut BTreeMap<PathBuf, (Vec<PathBuf>, Vec<PathBuf>)>, dir: &Path) {
if dir.join("grammar.js").exists() {
let relative_path = dir.strip_prefix(GRAMMARS_DIR.as_path()).unwrap();
let (example_paths, query_paths) =
result.entry(relative_path.to_owned()).or_default();
#[allow(clippy::type_complexity)]
static EXAMPLE_AND_QUERY_PATHS_BY_LANGUAGE_DIR: LazyLock<
BTreeMap<PathBuf, (Vec<PathBuf>, Vec<PathBuf>)>,
> = LazyLock::new(|| {
fn process_dir(result: &mut BTreeMap<PathBuf, (Vec<PathBuf>, Vec<PathBuf>)>, dir: &Path) {
if dir.join("grammar.js").exists() {
let relative_path = dir.strip_prefix(GRAMMARS_DIR.as_path()).unwrap();
let (example_paths, query_paths) = result.entry(relative_path.to_owned()).or_default();
if let Ok(example_files) = fs::read_dir(dir.join("examples")) {
example_paths.extend(example_files.filter_map(|p| {
let p = p.unwrap().path();
if p.is_file() {
Some(p)
} else {
None
}
}));
}
if let Ok(example_files) = fs::read_dir(dir.join("examples")) {
example_paths.extend(example_files.filter_map(|p| {
let p = p.unwrap().path();
if p.is_file() {
Some(p)
} else {
None
if let Ok(query_files) = fs::read_dir(dir.join("queries")) {
query_paths.extend(query_files.filter_map(|p| {
let p = p.unwrap().path();
if p.is_file() {
Some(p)
} else {
None
}
}));
}
} else {
for entry in fs::read_dir(dir).unwrap() {
let entry = entry.unwrap().path();
if entry.is_dir() {
process_dir(result, &entry);
}
}));
}
if let Ok(query_files) = fs::read_dir(dir.join("queries")) {
query_paths.extend(query_files.filter_map(|p| {
let p = p.unwrap().path();
if p.is_file() {
Some(p)
} else {
None
}
}));
}
} else {
for entry in fs::read_dir(dir).unwrap() {
let entry = entry.unwrap().path();
if entry.is_dir() {
process_dir(result, &entry);
}
}
}
}
let mut result = BTreeMap::new();
process_dir(&mut result, &GRAMMARS_DIR);
result
});
let mut result = BTreeMap::new();
process_dir(&mut result, &GRAMMARS_DIR);
result
};
}
fn main() {
tree_sitter_cli::logger::init();
let max_path_length = EXAMPLE_AND_QUERY_PATHS_BY_LANGUAGE_DIR
.values()
.flat_map(|(e, q)| {
@ -84,7 +77,7 @@ fn main() {
.max()
.unwrap_or(0);
info!("Benchmarking with {} repetitions", *REPETITION_COUNT);
eprintln!("Benchmarking with {} repetitions", *REPETITION_COUNT);
let mut parser = Parser::new();
let mut all_normal_speeds = Vec::new();
@ -101,11 +94,11 @@ fn main() {
}
}
info!("\nLanguage: {language_name}");
eprintln!("\nLanguage: {language_name}");
let language = get_language(language_path);
parser.set_language(&language).unwrap();
info!(" Constructing Queries");
eprintln!(" Constructing Queries");
for path in query_paths {
if let Some(filter) = EXAMPLE_FILTER.as_ref() {
if !path.to_str().unwrap().contains(filter.as_str()) {
@ -115,12 +108,12 @@ fn main() {
parse(path, max_path_length, |source| {
Query::new(&language, str::from_utf8(source).unwrap())
.with_context(|| format!("Query file path: {}", path.display()))
.with_context(|| format!("Query file path: {path:?}"))
.expect("Failed to parse query");
});
}
info!(" Parsing Valid Code:");
eprintln!(" Parsing Valid Code:");
let mut normal_speeds = Vec::new();
for example_path in example_paths {
if let Some(filter) = EXAMPLE_FILTER.as_ref() {
@ -134,7 +127,7 @@ fn main() {
}));
}
info!(" Parsing Invalid Code (mismatched languages):");
eprintln!(" Parsing Invalid Code (mismatched languages):");
let mut error_speeds = Vec::new();
for (other_language_path, (example_paths, _)) in
EXAMPLE_AND_QUERY_PATHS_BY_LANGUAGE_DIR.iter()
@ -155,30 +148,30 @@ fn main() {
}
if let Some((average_normal, worst_normal)) = aggregate(&normal_speeds) {
info!(" Average Speed (normal): {average_normal} bytes/ms");
info!(" Worst Speed (normal): {worst_normal} bytes/ms");
eprintln!(" Average Speed (normal): {average_normal} bytes/ms");
eprintln!(" Worst Speed (normal): {worst_normal} bytes/ms");
}
if let Some((average_error, worst_error)) = aggregate(&error_speeds) {
info!(" Average Speed (errors): {average_error} bytes/ms");
info!(" Worst Speed (errors): {worst_error} bytes/ms");
eprintln!(" Average Speed (errors): {average_error} bytes/ms");
eprintln!(" Worst Speed (errors): {worst_error} bytes/ms");
}
all_normal_speeds.extend(normal_speeds);
all_error_speeds.extend(error_speeds);
}
info!("\n Overall");
eprintln!("\n Overall");
if let Some((average_normal, worst_normal)) = aggregate(&all_normal_speeds) {
info!(" Average Speed (normal): {average_normal} bytes/ms");
info!(" Worst Speed (normal): {worst_normal} bytes/ms");
eprintln!(" Average Speed (normal): {average_normal} bytes/ms");
eprintln!(" Worst Speed (normal): {worst_normal} bytes/ms");
}
if let Some((average_error, worst_error)) = aggregate(&all_error_speeds) {
info!(" Average Speed (errors): {average_error} bytes/ms");
info!(" Worst Speed (errors): {worst_error} bytes/ms");
eprintln!(" Average Speed (errors): {average_error} bytes/ms");
eprintln!(" Worst Speed (errors): {worst_error} bytes/ms");
}
info!("");
eprintln!();
}
fn aggregate(speeds: &[usize]) -> Option<(usize, usize)> {
@ -197,8 +190,14 @@ fn aggregate(speeds: &[usize]) -> Option<(usize, usize)> {
}
fn parse(path: &Path, max_path_length: usize, mut action: impl FnMut(&[u8])) -> usize {
eprint!(
" {:width$}\t",
path.file_name().unwrap().to_str().unwrap(),
width = max_path_length
);
let source_code = fs::read(path)
.with_context(|| format!("Failed to read {}", path.display()))
.with_context(|| format!("Failed to read {path:?}"))
.unwrap();
let time = Instant::now();
for _ in 0..*REPETITION_COUNT {
@ -207,9 +206,8 @@ fn parse(path: &Path, max_path_length: usize, mut action: impl FnMut(&[u8])) ->
let duration = time.elapsed() / (*REPETITION_COUNT as u32);
let duration_ns = duration.as_nanos();
let speed = ((source_code.len() as u128) * 1_000_000) / duration_ns;
info!(
" {:max_path_length$}\ttime {:>7.2} ms\t\tspeed {speed:>6} bytes/ms",
path.file_name().unwrap().to_str().unwrap(),
eprintln!(
"time {:>7.2} ms\t\tspeed {speed:>6} bytes/ms",
(duration_ns as f64) / 1e6,
);
speed as usize
@ -219,6 +217,6 @@ fn get_language(path: &Path) -> Language {
let src_path = GRAMMARS_DIR.join(path).join("src");
TEST_LOADER
.load_language_at_path(CompileConfig::new(&src_path, None, None))
.with_context(|| format!("Failed to load language at path {}", src_path.display()))
.with_context(|| format!("Failed to load language at path {src_path:?}"))
.unwrap()
}

142
cli/build.rs Normal file
View file

@ -0,0 +1,142 @@
use std::{
env,
ffi::OsStr,
fs,
path::{Path, PathBuf},
time::SystemTime,
};
fn main() {
if let Some(git_sha) = read_git_sha() {
println!("cargo:rustc-env=BUILD_SHA={git_sha}");
}
println!("cargo:rustc-check-cfg=cfg(sanitizing)");
println!("cargo:rustc-check-cfg=cfg(TREE_SITTER_EMBED_WASM_BINDING)");
if web_playground_files_present() {
println!("cargo:rustc-cfg=TREE_SITTER_EMBED_WASM_BINDING");
}
let build_time = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_secs_f64();
println!("cargo:rustc-env=BUILD_TIME={build_time}");
#[cfg(any(
target_os = "linux",
target_os = "android",
target_os = "freebsd",
target_os = "openbsd",
target_os = "netbsd",
target_os = "dragonfly",
))]
{
let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap()).join("dynamic-symbols.txt");
std::fs::write(
&out_dir,
"{
ts_current_malloc;
ts_current_calloc;
ts_current_realloc;
ts_current_free;
};",
)
.unwrap();
println!(
"cargo:rustc-link-arg=-Wl,--dynamic-list={}",
out_dir.display()
);
}
}
fn web_playground_files_present() -> bool {
let paths = [
"../docs/assets/js/playground.js",
"../lib/binding_web/tree-sitter.js",
"../lib/binding_web/tree-sitter.wasm",
];
paths.iter().all(|p| Path::new(p).exists())
}
fn read_git_sha() -> Option<String> {
let mut repo_path = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());
let mut git_path;
loop {
git_path = repo_path.join(".git");
if git_path.exists() {
break;
}
if !repo_path.pop() {
return None;
}
}
let git_dir_path;
if git_path.is_dir() {
git_dir_path = git_path;
} else if let Ok(git_path_content) = fs::read_to_string(&git_path) {
git_dir_path = repo_path.join(git_path_content.get("gitdir: ".len()..).unwrap().trim_end());
} else {
return None;
}
let git_head_path = git_dir_path.join("HEAD");
if let Some(path) = git_head_path.to_str() {
println!("cargo:rerun-if-changed={path}");
}
if let Ok(mut head_content) = fs::read_to_string(&git_head_path) {
if head_content.ends_with('\n') {
head_content.pop();
}
// If we're on a branch, read the SHA from the ref file.
if head_content.starts_with("ref: ") {
head_content.replace_range(0.."ref: ".len(), "");
let ref_filename = {
// Go to real non-worktree gitdir
let git_dir_path = git_dir_path
.parent()
.and_then(|p| {
p.file_name()
.map(|n| n == OsStr::new("worktrees"))
.and_then(|x| x.then(|| p.parent()))
})
.flatten()
.unwrap_or(&git_dir_path);
let file = git_dir_path.join(&head_content);
if file.is_file() {
file
} else {
let packed_refs = git_dir_path.join("packed-refs");
if let Ok(packed_refs_content) = fs::read_to_string(&packed_refs) {
for line in packed_refs_content.lines() {
if let Some((hash, r#ref)) = line.split_once(' ') {
if r#ref == head_content {
if let Some(path) = packed_refs.to_str() {
println!("cargo:rerun-if-changed={path}");
}
return Some(hash.to_string());
}
}
}
}
return None;
}
};
if let Some(path) = ref_filename.to_str() {
println!("cargo:rerun-if-changed={path}");
}
return fs::read_to_string(&ref_filename).ok();
}
// If we're on a detached commit, then the `HEAD` file itself contains the sha.
if head_content.len() == 40 {
return Some(head_content);
}
}
None
}

View file

@ -8,20 +8,12 @@ rust-version.workspace = true
readme = "README.md"
homepage.workspace = true
repository.workspace = true
documentation = "https://docs.rs/tree-sitter-config"
license.workspace = true
keywords.workspace = true
categories.workspace = true
[lib]
path = "src/tree_sitter_config.rs"
[lints]
workspace = true
[dependencies]
etcetera.workspace = true
log.workspace = true
anyhow.workspace = true
dirs.workspace = true
serde.workspace = true
serde_json.workspace = true
thiserror.workspace = true

View file

@ -1,54 +1,10 @@
#![cfg_attr(not(any(test, doctest)), doc = include_str!("../README.md"))]
#![doc = include_str!("../README.md")]
use std::{
env, fs,
path::{Path, PathBuf},
};
use std::{env, fs, path::PathBuf};
use etcetera::BaseStrategy as _;
use log::warn;
use anyhow::{anyhow, Context, Result};
use serde::{Deserialize, Serialize};
use serde_json::Value;
use thiserror::Error;
pub type ConfigResult<T> = Result<T, ConfigError>;
#[derive(Debug, Error)]
pub enum ConfigError {
#[error("Bad JSON config {0} -- {1}")]
ConfigRead(String, serde_json::Error),
#[error(transparent)]
HomeDir(#[from] etcetera::HomeDirError),
#[error(transparent)]
IO(IoError),
#[error(transparent)]
Serialization(#[from] serde_json::Error),
}
#[derive(Debug, Error)]
pub struct IoError {
pub error: std::io::Error,
pub path: Option<String>,
}
impl IoError {
fn new(error: std::io::Error, path: Option<&Path>) -> Self {
Self {
error,
path: path.map(|p| p.to_string_lossy().to_string()),
}
}
}
impl std::fmt::Display for IoError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.error)?;
if let Some(ref path) = self.path {
write!(f, " ({path})")?;
}
Ok(())
}
}
/// Holds the contents of tree-sitter's configuration file.
///
@ -65,7 +21,7 @@ pub struct Config {
}
impl Config {
pub fn find_config_file() -> ConfigResult<Option<PathBuf>> {
pub fn find_config_file() -> Result<Option<PathBuf>> {
if let Ok(path) = env::var("TREE_SITTER_DIR") {
let mut path = PathBuf::from(path);
path.push("config.json");
@ -82,28 +38,8 @@ impl Config {
return Ok(Some(xdg_path));
}
if cfg!(target_os = "macos") {
let legacy_apple_path = etcetera::base_strategy::Apple::new()?
.data_dir() // `$HOME/Library/Application Support/`
.join("tree-sitter")
.join("config.json");
if legacy_apple_path.is_file() {
let xdg_dir = xdg_path.parent().unwrap();
fs::create_dir_all(xdg_dir)
.map_err(|e| ConfigError::IO(IoError::new(e, Some(xdg_dir))))?;
fs::rename(&legacy_apple_path, &xdg_path).map_err(|e| {
ConfigError::IO(IoError::new(e, Some(legacy_apple_path.as_path())))
})?;
warn!(
"Your config.json file has been automatically migrated from \"{}\" to \"{}\"",
legacy_apple_path.display(),
xdg_path.display()
);
return Ok(Some(xdg_path));
}
}
let legacy_path = etcetera::home_dir()?
let legacy_path = dirs::home_dir()
.ok_or_else(|| anyhow!("Cannot determine home directory"))?
.join(".tree-sitter")
.join("config.json");
if legacy_path.is_file() {
@ -113,9 +49,9 @@ impl Config {
Ok(None)
}
fn xdg_config_file() -> ConfigResult<PathBuf> {
let xdg_path = etcetera::choose_base_strategy()?
.config_dir()
fn xdg_config_file() -> Result<PathBuf> {
let xdg_path = dirs::config_dir()
.ok_or_else(|| anyhow!("Cannot determine config directory"))?
.join("tree-sitter")
.join("config.json");
Ok(xdg_path)
@ -127,10 +63,10 @@ impl Config {
/// - Location specified by the path parameter if provided
/// - `$TREE_SITTER_DIR/config.json`, if the `TREE_SITTER_DIR` environment variable is set
/// - `tree-sitter/config.json` in your default user configuration directory, as determined by
/// [`etcetera::choose_base_strategy`](https://docs.rs/etcetera/*/etcetera/#basestrategy)
/// [`dirs::config_dir`](https://docs.rs/dirs/*/dirs/fn.config_dir.html)
/// - `$HOME/.tree-sitter/config.json` as a fallback from where tree-sitter _used_ to store
/// its configuration
pub fn load(path: Option<PathBuf>) -> ConfigResult<Self> {
pub fn load(path: Option<PathBuf>) -> Result<Self> {
let location = if let Some(path) = path {
path
} else if let Some(path) = Self::find_config_file()? {
@ -140,9 +76,9 @@ impl Config {
};
let content = fs::read_to_string(&location)
.map_err(|e| ConfigError::IO(IoError::new(e, Some(location.as_path()))))?;
.with_context(|| format!("Failed to read {}", &location.to_string_lossy()))?;
let config = serde_json::from_str(&content)
.map_err(|e| ConfigError::ConfigRead(location.to_string_lossy().to_string(), e))?;
.with_context(|| format!("Bad JSON config {}", &location.to_string_lossy()))?;
Ok(Self { location, config })
}
@ -152,7 +88,7 @@ impl Config {
/// disk.
///
/// (Note that this is typically only done by the `tree-sitter init-config` command.)
pub fn initial() -> ConfigResult<Self> {
pub fn initial() -> Result<Self> {
let location = if let Ok(path) = env::var("TREE_SITTER_DIR") {
let mut path = PathBuf::from(path);
path.push("config.json");
@ -165,20 +101,17 @@ impl Config {
}
/// Saves this configuration to the file that it was originally loaded from.
pub fn save(&self) -> ConfigResult<()> {
pub fn save(&self) -> Result<()> {
let json = serde_json::to_string_pretty(&self.config)?;
let config_dir = self.location.parent().unwrap();
fs::create_dir_all(config_dir)
.map_err(|e| ConfigError::IO(IoError::new(e, Some(config_dir))))?;
fs::write(&self.location, json)
.map_err(|e| ConfigError::IO(IoError::new(e, Some(self.location.as_path()))))?;
fs::create_dir_all(self.location.parent().unwrap())?;
fs::write(&self.location, json)?;
Ok(())
}
/// Parses a component-specific configuration from the configuration file. The type `C` must
/// be [deserializable](https://docs.rs/serde/*/serde/trait.Deserialize.html) from a JSON
/// object, and must only include the fields relevant to that component.
pub fn get<C>(&self) -> ConfigResult<C>
pub fn get<C>(&self) -> Result<C>
where
C: for<'de> Deserialize<'de>,
{
@ -189,7 +122,7 @@ impl Config {
/// Adds a component-specific configuration to the configuration file. The type `C` must be
/// [serializable](https://docs.rs/serde/*/serde/trait.Serialize.html) into a JSON object, and
/// must only include the fields relevant to that component.
pub fn add<C>(&mut self, config: C) -> ConfigResult<()>
pub fn add<C>(&mut self, config: C) -> Result<()>
where
C: Serialize,
{

View file

@ -8,40 +8,30 @@ rust-version.workspace = true
readme = "README.md"
homepage.workspace = true
repository.workspace = true
documentation = "https://docs.rs/tree-sitter-loader"
license.workspace = true
keywords.workspace = true
categories.workspace = true
[package.metadata.docs.rs]
all-features = true
rustdoc-args = ["--cfg", "docsrs"]
[lib]
path = "src/loader.rs"
[lints]
workspace = true
[features]
wasm = ["tree-sitter/wasm"]
# TODO: For backward compatibility these must be enabled by default,
# consider removing for the next semver incompatible release
default = ["tree-sitter-highlight", "tree-sitter-tags"]
[dependencies]
anyhow.workspace = true
cc.workspace = true
etcetera.workspace = true
dirs.workspace = true
fs4.workspace = true
indoc.workspace = true
libloading.workspace = true
log.workspace = true
once_cell.workspace = true
path-slash.workspace = true
regex.workspace = true
semver.workspace = true
serde.workspace = true
serde_json.workspace = true
tempfile.workspace = true
thiserror.workspace = true
tree-sitter = { workspace = true }
tree-sitter-highlight = { workspace = true, optional = true }
tree-sitter-tags = { workspace = true, optional = true }
tree-sitter = {workspace = true}
tree-sitter-highlight = {workspace = true, optional = true}
tree-sitter-tags = {workspace = true, optional = true}

View file

@ -7,4 +7,7 @@ fn main() {
"cargo:rustc-env=BUILD_HOST={}",
std::env::var("HOST").unwrap()
);
let emscripten_version = std::fs::read_to_string("emscripten-version").unwrap();
println!("cargo:rustc-env=EMSCRIPTEN_VERSION={emscripten_version}");
}

View file

@ -0,0 +1 @@
3.1.64

1461
cli/loader/src/lib.rs Normal file

File diff suppressed because it is too large Load diff

View file

@ -10,7 +10,6 @@ type PrecRightRule = { type: 'PREC_RIGHT'; content: Rule; value: number };
type PrecRule = { type: 'PREC'; content: Rule; value: number };
type Repeat1Rule = { type: 'REPEAT1'; content: Rule };
type RepeatRule = { type: 'REPEAT'; content: Rule };
type ReservedRule = { type: 'RESERVED'; content: Rule; context_name: string };
type SeqRule = { type: 'SEQ'; members: Rule[] };
type StringRule = { type: 'STRING'; value: string };
type SymbolRule<Name extends string> = { type: 'SYMBOL'; name: Name };
@ -29,19 +28,12 @@ type Rule =
| PrecRule
| Repeat1Rule
| RepeatRule
| ReservedRule
| SeqRule
| StringRule
| SymbolRule<string>
| TokenRule;
declare class RustRegex {
value: string;
constructor(pattern: string);
}
type RuleOrLiteral = Rule | RegExp | RustRegex | string;
type RuleOrLiteral = Rule | RegExp | string;
type GrammarSymbols<RuleName extends string> = {
[name in RuleName]: SymbolRule<name>;
@ -50,7 +42,7 @@ type GrammarSymbols<RuleName extends string> = {
type RuleBuilder<RuleName extends string> = (
$: GrammarSymbols<RuleName>,
previous?: Rule,
previous: Rule,
) => RuleOrLiteral;
type RuleBuilders<
@ -113,7 +105,7 @@ interface Grammar<
* @param $ grammar rules
* @param previous array of externals from the base schema, if any
*
* @see https://tree-sitter.github.io/tree-sitter/creating-parsers/4-external-scanners
* @see https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners
*/
externals?: (
$: Record<string, SymbolRule<string>>,
@ -151,7 +143,7 @@ interface Grammar<
*
* @param $ grammar rules
*
* @see https://tree-sitter.github.io/tree-sitter/using-parsers/6-static-node-types
* @see https://tree-sitter.github.io/tree-sitter/using-parsers#static-node-types
*/
supertypes?: (
$: GrammarSymbols<RuleName | BaseGrammarRuleName>,
@ -164,20 +156,9 @@ interface Grammar<
*
* @param $ grammar rules
*
* @see https://tree-sitter.github.io/tree-sitter/creating-parsers/3-writing-the-grammar#keyword-extraction
* @see https://tree-sitter.github.io/tree-sitter/creating-parsers#keyword-extraction
*/
word?: ($: GrammarSymbols<RuleName | BaseGrammarRuleName>) => RuleOrLiteral;
/**
* Mapping of names to reserved word sets. The first reserved word set is the
* global word set, meaning it applies to every rule in every parse state.
* The other word sets can be used with the `reserved` function.
*/
reserved?: Record<
string,
($: GrammarSymbols<RuleName | BaseGrammarRuleName>) => RuleOrLiteral[]
>;
}
type GrammarSchema<RuleName extends string> = {
@ -262,7 +243,7 @@ declare function optional(rule: RuleOrLiteral): ChoiceRule;
* @see https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html
*/
declare const prec: {
(value: string | number, rule: RuleOrLiteral): PrecRule;
(value: String | number, rule: RuleOrLiteral): PrecRule;
/**
* Marks the given rule as left-associative (and optionally applies a
@ -278,7 +259,7 @@ declare const prec: {
* @see https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html
*/
left(rule: RuleOrLiteral): PrecLeftRule;
left(value: string | number, rule: RuleOrLiteral): PrecLeftRule;
left(value: String | number, rule: RuleOrLiteral): PrecLeftRule;
/**
* Marks the given rule as right-associative (and optionally applies a
@ -294,7 +275,7 @@ declare const prec: {
* @see https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html
*/
right(rule: RuleOrLiteral): PrecRightRule;
right(value: string | number, rule: RuleOrLiteral): PrecRightRule;
right(value: String | number, rule: RuleOrLiteral): PrecRightRule;
/**
* Marks the given rule with a numerical precedence which will be used to
@ -311,7 +292,7 @@ declare const prec: {
*
* @see https://www.gnu.org/software/bison/manual/html_node/Generalized-LR-Parsing.html
*/
dynamic(value: string | number, rule: RuleOrLiteral): PrecDynamicRule;
dynamic(value: String | number, rule: RuleOrLiteral): PrecDynamicRule;
};
/**
@ -331,15 +312,6 @@ declare function repeat(rule: RuleOrLiteral): RepeatRule;
*/
declare function repeat1(rule: RuleOrLiteral): Repeat1Rule;
/**
* Overrides the global reserved word set for a given rule. The word set name
* should be defined in the `reserved` field in the grammar.
*
* @param wordset name of the reserved word set
* @param rule rule that will use the reserved word set
*/
declare function reserved(wordset: string, rule: RuleOrLiteral): ReservedRule;
/**
* Creates a rule that matches any number of other rules, one after another.
* It is analogous to simply writing multiple symbols next to each other
@ -358,7 +330,7 @@ declare function sym<Name extends string>(name: Name): SymbolRule<Name>;
/**
* Marks the given rule as producing only a single token. Tree-sitter's
* default is to treat each string or RegExp literal in the grammar as a
* default is to treat each String or RegExp literal in the grammar as a
* separate token. Each token is matched separately by the lexer and
* returned as its own leaf node in the tree. The token function allows
* you to express a complex rule using the DSL functions (rather

3
crates/cli/npm/install.js → cli/npm/install.js Normal file → Executable file
View file

@ -6,8 +6,7 @@ const http = require('http');
const https = require('https');
const packageJSON = require('./package.json');
https.globalAgent.keepAlive = false;
// Look to a results table in https://github.com/tree-sitter/tree-sitter/issues/2196
const matrix = {
platform: {
'darwin': {

View file

@ -1,33 +1,24 @@
{
"name": "tree-sitter-cli",
"version": "0.27.0",
"author": {
"name": "Max Brunsfeld",
"email": "maxbrunsfeld@gmail.com"
},
"maintainers": [
{
"name": "Amaan Qureshi",
"email": "amaanq12@gmail.com"
}
],
"version": "0.23.2",
"author": "Max Brunsfeld",
"license": "MIT",
"repository": {
"type": "git",
"url": "git+https://github.com/tree-sitter/tree-sitter.git",
"directory": "crates/cli/npm"
"url": "https://github.com/tree-sitter/tree-sitter.git"
},
"description": "CLI for generating fast incremental parsers",
"keywords": [
"parser",
"lexer"
],
"main": "lib/api/index.js",
"engines": {
"node": ">=12.0.0"
},
"scripts": {
"install": "node install.js",
"prepack": "cp ../../../LICENSE ../README.md .",
"prepack": "cp ../../LICENSE ../README.md .",
"postpack": "rm LICENSE README.md"
},
"bin": {

View file

@ -40,11 +40,7 @@ extern "C" {
fn free(ptr: *mut c_void);
}
pub fn record<T>(f: impl FnOnce() -> T) -> T {
record_checked(f).unwrap()
}
pub fn record_checked<T>(f: impl FnOnce() -> T) -> Result<T, String> {
pub fn record<T>(f: impl FnOnce() -> T) -> Result<T, String> {
RECORDER.with(|recorder| {
recorder.enabled.store(true, SeqCst);
recorder.allocation_count.store(0, SeqCst);
@ -97,49 +93,30 @@ fn record_dealloc(ptr: *mut c_void) {
});
}
/// # Safety
///
/// The caller must ensure that the returned pointer is eventually
/// freed by calling `ts_record_free`.
#[must_use]
pub unsafe extern "C" fn ts_record_malloc(size: usize) -> *mut c_void {
unsafe extern "C" fn ts_record_malloc(size: usize) -> *mut c_void {
let result = malloc(size);
record_alloc(result);
result
}
/// # Safety
///
/// The caller must ensure that the returned pointer is eventually
/// freed by calling `ts_record_free`.
#[must_use]
pub unsafe extern "C" fn ts_record_calloc(count: usize, size: usize) -> *mut c_void {
unsafe extern "C" fn ts_record_calloc(count: usize, size: usize) -> *mut c_void {
let result = calloc(count, size);
record_alloc(result);
result
}
/// # Safety
///
/// The caller must ensure that the returned pointer is eventually
/// freed by calling `ts_record_free`.
#[must_use]
pub unsafe extern "C" fn ts_record_realloc(ptr: *mut c_void, size: usize) -> *mut c_void {
unsafe extern "C" fn ts_record_realloc(ptr: *mut c_void, size: usize) -> *mut c_void {
let result = realloc(ptr, size);
if ptr.is_null() {
record_alloc(result);
} else if !core::ptr::eq(ptr, result) {
} else if ptr != result {
record_dealloc(ptr);
record_alloc(result);
}
result
}
/// # Safety
///
/// The caller must ensure that `ptr` was allocated by a previous call
/// to `ts_record_malloc`, `ts_record_calloc`, or `ts_record_realloc`.
pub unsafe extern "C" fn ts_record_free(ptr: *mut c_void) {
unsafe extern "C" fn ts_record_free(ptr: *mut c_void) {
record_dealloc(ptr);
free(ptr);
}

View file

@ -23,7 +23,7 @@ pub fn check_consistent_sizes(tree: &Tree, input: &[u8]) {
let mut some_child_has_changes = false;
let mut actual_named_child_count = 0;
for i in 0..node.child_count() {
let child = node.child(i as u32).unwrap();
let child = node.child(i).unwrap();
assert!(child.start_byte() >= last_child_end_byte);
assert!(child.start_position() >= last_child_end_point);
check(child, line_offsets);

View file

@ -7,7 +7,6 @@ pub struct Edit {
pub inserted_text: Vec<u8>,
}
#[must_use]
pub fn invert_edit(input: &[u8], edit: &Edit) -> Edit {
let position = edit.position;
let removed_content = &input[position..(position + edit.deleted_length)];

View file

@ -1,11 +1,6 @@
use std::{
collections::HashMap,
env, fs,
path::{Path, PathBuf},
sync::LazyLock,
};
use std::{collections::HashMap, env, fs, path::Path};
use log::{error, info};
use lazy_static::lazy_static;
use rand::Rng;
use regex::Regex;
use tree_sitter::{Language, Parser};
@ -25,30 +20,18 @@ use crate::{
random::Rand,
},
parse::perform_edit,
test::{parse_tests, strip_sexp_fields, DiffKey, TestDiff, TestEntry},
test::{parse_tests, print_diff, print_diff_key, strip_sexp_fields, TestEntry},
};
pub static LOG_ENABLED: LazyLock<bool> = LazyLock::new(|| env::var("TREE_SITTER_LOG").is_ok());
pub static LOG_GRAPH_ENABLED: LazyLock<bool> =
LazyLock::new(|| env::var("TREE_SITTER_LOG_GRAPHS").is_ok());
pub static LANGUAGE_FILTER: LazyLock<Option<String>> =
LazyLock::new(|| env::var("TREE_SITTER_LANGUAGE").ok());
pub static EXAMPLE_INCLUDE: LazyLock<Option<Regex>> =
LazyLock::new(|| regex_env_var("TREE_SITTER_EXAMPLE_INCLUDE"));
pub static EXAMPLE_EXCLUDE: LazyLock<Option<Regex>> =
LazyLock::new(|| regex_env_var("TREE_SITTER_EXAMPLE_EXCLUDE"));
pub static START_SEED: LazyLock<usize> = LazyLock::new(new_seed);
pub static EDIT_COUNT: LazyLock<usize> =
LazyLock::new(|| int_env_var("TREE_SITTER_EDITS").unwrap_or(3));
pub static ITERATION_COUNT: LazyLock<usize> =
LazyLock::new(|| int_env_var("TREE_SITTER_ITERATIONS").unwrap_or(10));
lazy_static! {
pub static ref LOG_ENABLED: bool = env::var("TREE_SITTER_LOG").is_ok();
pub static ref LOG_GRAPH_ENABLED: bool = env::var("TREE_SITTER_LOG_GRAPHS").is_ok();
pub static ref LANGUAGE_FILTER: Option<String> = env::var("TREE_SITTER_LANGUAGE").ok();
pub static ref EXAMPLE_FILTER: Option<Regex> = regex_env_var("TREE_SITTER_EXAMPLE");
pub static ref START_SEED: usize = new_seed();
pub static ref EDIT_COUNT: usize = int_env_var("TREE_SITTER_EDITS").unwrap_or(3);
pub static ref ITERATION_COUNT: usize = int_env_var("TREE_SITTER_ITERATIONS").unwrap_or(10);
}
fn int_env_var(name: &'static str) -> Option<usize> {
env::var(name).ok().and_then(|e| e.parse().ok())
@ -58,23 +41,19 @@ fn regex_env_var(name: &'static str) -> Option<Regex> {
env::var(name).ok().and_then(|e| Regex::new(&e).ok())
}
#[must_use]
pub fn new_seed() -> usize {
int_env_var("TREE_SITTER_SEED").unwrap_or_else(|| {
let mut rng = rand::thread_rng();
let seed = rng.gen::<usize>();
info!("Seed: {seed}");
seed
rng.gen::<usize>()
})
}
pub struct FuzzOptions {
pub skipped: Option<Vec<String>>,
pub subdir: Option<PathBuf>,
pub subdir: Option<String>,
pub edits: usize,
pub iterations: usize,
pub include: Option<Regex>,
pub exclude: Option<Regex>,
pub filter: Option<Regex>,
pub log_graphs: bool,
pub log: bool,
}
@ -86,6 +65,20 @@ pub fn fuzz_language_corpus(
grammar_dir: &Path,
options: &mut FuzzOptions,
) {
let subdir = options.subdir.take().unwrap_or_default();
let corpus_dir = grammar_dir.join(subdir).join("test").join("corpus");
if !corpus_dir.exists() || !corpus_dir.is_dir() {
eprintln!("No corpus directory found, ensure that you have a `test/corpus` directory in your grammar directory with at least one test file.");
return;
}
if std::fs::read_dir(&corpus_dir).unwrap().count() == 0 {
eprintln!("No corpus files found in `test/corpus`, ensure that you have at least one test file in your corpus directory.");
return;
}
fn retain(entry: &mut TestEntry, language_name: &str) -> bool {
match entry {
TestEntry::Example { attributes, .. } => {
@ -104,20 +97,6 @@ pub fn fuzz_language_corpus(
}
}
let subdir = options.subdir.take().unwrap_or_default();
let corpus_dir = grammar_dir.join(subdir).join("test").join("corpus");
if !corpus_dir.exists() || !corpus_dir.is_dir() {
error!("No corpus directory found, ensure that you have a `test/corpus` directory in your grammar directory with at least one test file.");
return;
}
if std::fs::read_dir(&corpus_dir).unwrap().count() == 0 {
error!("No corpus files found in `test/corpus`, ensure that you have at least one test file in your corpus directory.");
return;
}
let mut main_tests = parse_tests(&corpus_dir).unwrap();
match main_tests {
TestEntry::Group {
@ -125,13 +104,9 @@ pub fn fuzz_language_corpus(
} => {
children.retain_mut(|child| retain(child, language_name));
}
TestEntry::Example { .. } => unreachable!(),
_ => unreachable!(),
}
let tests = flatten_tests(
main_tests,
options.include.as_ref(),
options.exclude.as_ref(),
);
let tests = flatten_tests(main_tests, options.filter.as_ref());
let get_test_name = |test: &FlattenedTest| format!("{language_name} - {}", test.name);
@ -150,7 +125,7 @@ pub fn fuzz_language_corpus(
let dump_edits = env::var("TREE_SITTER_DUMP_EDITS").is_ok();
if log_seed {
info!(" start seed: {start_seed}");
println!(" start seed: {start_seed}");
}
println!();
@ -164,7 +139,7 @@ pub fn fuzz_language_corpus(
println!(" {test_index}. {test_name}");
let passed = allocations::record_checked(|| {
let passed = allocations::record(|| {
let mut log_session = None;
let mut parser = get_parser(&mut log_session, "log.html");
parser.set_language(language).unwrap();
@ -183,8 +158,8 @@ pub fn fuzz_language_corpus(
if actual_output != test.output {
println!("Incorrect initial parse for {test_name}");
DiffKey::print();
println!("{}", TestDiff::new(&actual_output, &test.output));
print_diff_key();
print_diff(&actual_output, &test.output, true);
println!();
return false;
}
@ -192,7 +167,7 @@ pub fn fuzz_language_corpus(
true
})
.unwrap_or_else(|e| {
error!("{e}");
eprintln!("Error: {e}");
false
});
@ -208,7 +183,7 @@ pub fn fuzz_language_corpus(
for trial in 0..options.iterations {
let seed = start_seed + trial;
let passed = allocations::record_checked(|| {
let passed = allocations::record(|| {
let mut rand = Rand::new(seed);
let mut log_session = None;
let mut parser = get_parser(&mut log_session, "log.html");
@ -217,20 +192,19 @@ pub fn fuzz_language_corpus(
let mut input = test.input.clone();
if options.log_graphs {
info!("{}\n", String::from_utf8_lossy(&input));
eprintln!("{}\n", String::from_utf8_lossy(&input));
}
// Perform a random series of edits and reparse.
let edit_count = rand.unsigned(*EDIT_COUNT);
let mut undo_stack = Vec::with_capacity(edit_count);
for _ in 0..=edit_count {
let mut undo_stack = Vec::new();
for _ in 0..=rand.unsigned(*EDIT_COUNT) {
let edit = get_random_edit(&mut rand, &input);
undo_stack.push(invert_edit(&input, &edit));
perform_edit(&mut tree, &mut input, &edit).unwrap();
}
if log_seed {
info!(" {test_index}.{trial:<2} seed: {seed}");
println!(" {test_index}.{trial:<2} seed: {seed}");
}
if dump_edits {
@ -244,7 +218,7 @@ pub fn fuzz_language_corpus(
}
if options.log_graphs {
info!("{}\n", String::from_utf8_lossy(&input));
eprintln!("{}\n", String::from_utf8_lossy(&input));
}
set_included_ranges(&mut parser, &input, test.template_delimiters);
@ -253,7 +227,7 @@ pub fn fuzz_language_corpus(
// Check that the new tree is consistent.
check_consistent_sizes(&tree2, &input);
if let Err(message) = check_changed_ranges(&tree, &tree2, &input) {
error!("\nUnexpected scope change in seed {seed} with start seed {start_seed}\n{message}\n\n",);
println!("\nUnexpected scope change in seed {seed} with start seed {start_seed}\n{message}\n\n",);
return false;
}
@ -262,7 +236,7 @@ pub fn fuzz_language_corpus(
perform_edit(&mut tree2, &mut input, &edit).unwrap();
}
if options.log_graphs {
info!("{}\n", String::from_utf8_lossy(&input));
eprintln!("{}\n", String::from_utf8_lossy(&input));
}
set_included_ranges(&mut parser, &test.input, test.template_delimiters);
@ -276,8 +250,8 @@ pub fn fuzz_language_corpus(
if actual_output != test.output && !test.error {
println!("Incorrect parse for {test_name} - seed {seed}");
DiffKey::print();
println!("{}", TestDiff::new(&actual_output, &test.output));
print_diff_key();
print_diff(&actual_output, &test.output, true);
println!();
return false;
}
@ -285,13 +259,13 @@ pub fn fuzz_language_corpus(
// Check that the edited tree is consistent.
check_consistent_sizes(&tree3, &input);
if let Err(message) = check_changed_ranges(&tree2, &tree3, &input) {
error!("Unexpected scope change in seed {seed} with start seed {start_seed}\n{message}\n\n");
println!("Unexpected scope change in seed {seed} with start seed {start_seed}\n{message}\n\n");
return false;
}
true
}).unwrap_or_else(|e| {
error!("{e}");
eprintln!("Error: {e}");
false
});
@ -303,17 +277,17 @@ pub fn fuzz_language_corpus(
}
if failure_count != 0 {
info!("{failure_count} {language_name} corpus tests failed fuzzing");
eprintln!("{failure_count} {language_name} corpus tests failed fuzzing");
}
skipped.retain(|_, v| *v == 0);
if !skipped.is_empty() {
info!("Non matchable skip definitions:");
println!("Non matchable skip definitions:");
for k in skipped.keys() {
info!(" {k}");
println!(" {k}");
}
panic!("Non matchable skip definitions need to be removed");
panic!("Non matchable skip definitions needs to be removed");
}
}
@ -328,16 +302,10 @@ pub struct FlattenedTest {
pub template_delimiters: Option<(&'static str, &'static str)>,
}
#[must_use]
pub fn flatten_tests(
test: TestEntry,
include: Option<&Regex>,
exclude: Option<&Regex>,
) -> Vec<FlattenedTest> {
pub fn flatten_tests(test: TestEntry, filter: Option<&Regex>) -> Vec<FlattenedTest> {
fn helper(
test: TestEntry,
include: Option<&Regex>,
exclude: Option<&Regex>,
filter: Option<&Regex>,
is_root: bool,
prefix: &str,
result: &mut Vec<FlattenedTest>,
@ -355,13 +323,8 @@ pub fn flatten_tests(
name.insert_str(0, " - ");
name.insert_str(0, prefix);
}
if let Some(include) = include {
if !include.is_match(&name) {
return;
}
} else if let Some(exclude) = exclude {
if exclude.is_match(&name) {
if let Some(filter) = filter {
if filter.find(&name).is_none() {
return;
}
}
@ -385,12 +348,12 @@ pub fn flatten_tests(
name.insert_str(0, prefix);
}
for child in children {
helper(child, include, exclude, false, &name, result);
helper(child, filter, false, &name, result);
}
}
}
}
let mut result = Vec::new();
helper(test, include, exclude, true, "", &mut result);
helper(test, filter, true, "", &mut result);
result
}

View file

@ -10,7 +10,6 @@ const OPERATORS: &[char] = &[
pub struct Rand(StdRng);
impl Rand {
#[must_use]
pub fn new(seed: usize) -> Self {
Self(StdRng::seed_from_u64(seed as u64))
}
@ -20,8 +19,8 @@ impl Rand {
}
pub fn words(&mut self, max_count: usize) -> Vec<u8> {
let mut result = Vec::new();
let word_count = self.unsigned(max_count);
let mut result = Vec::with_capacity(2 * word_count);
for i in 0..word_count {
if i > 0 {
if self.unsigned(5) == 0 {

View file

@ -6,7 +6,6 @@ pub struct ScopeSequence(Vec<ScopeStack>);
type ScopeStack = Vec<&'static str>;
impl ScopeSequence {
#[must_use]
pub fn new(tree: &Tree) -> Self {
let mut result = Self(Vec::new());
let mut scope_stack = Vec::new();

View file

@ -3,13 +3,14 @@ use std::{
mem,
};
use log::debug;
use log::info;
use super::{coincident_tokens::CoincidentTokenIndex, token_conflicts::TokenConflictMap};
use crate::{
use crate::generate::{
dedup::split_state_id_groups,
grammars::{LexicalGrammar, SyntaxGrammar},
nfa::{CharacterSet, NfaCursor},
prepare_grammar::symbol_is_used,
rules::{Symbol, TokenSet},
tables::{AdvanceAction, LexState, LexTable, ParseStateId, ParseTable},
};
@ -43,17 +44,15 @@ pub fn build_lex_table(
let tokens = state
.terminal_entries
.keys()
.copied()
.chain(state.reserved_words.iter())
.filter_map(|token| {
if token.is_terminal() {
if keywords.contains(&token) {
if keywords.contains(token) {
syntax_grammar.word_token
} else {
Some(token)
Some(*token)
}
} else if token.is_eof() {
Some(token)
Some(*token)
} else {
None
}
@ -95,6 +94,9 @@ pub fn build_lex_table(
let mut large_character_sets = Vec::new();
for (variable_ix, _variable) in lexical_grammar.variables.iter().enumerate() {
let symbol = Symbol::terminal(variable_ix);
if !symbol_is_used(&syntax_grammar.variables, symbol) {
continue;
}
builder.reset();
builder.add_state_for_tokens(&TokenSet::from_iter([symbol]));
for state in &builder.table.states {
@ -176,8 +178,9 @@ impl<'a> LexTableBuilder<'a> {
let (state_id, is_new) = self.add_state(nfa_states, eof_valid);
if is_new {
debug!(
"entry point state: {state_id}, tokens: {:?}",
info!(
"entry point state: {}, tokens: {:?}",
state_id,
tokens
.iter()
.map(|t| &self.lexical_grammar.variables[t.index].name)
@ -358,7 +361,9 @@ fn minimize_lex_table(table: &mut LexTable, parse_table: &mut ParseTable) {
&mut group_ids_by_state_id,
1,
lex_states_differ,
) {}
) {
continue;
}
let mut new_states = Vec::with_capacity(state_ids_by_group_id.len());
for state_ids in &state_ids_by_group_id {

View file

@ -1,21 +1,22 @@
use std::{
cmp::Ordering,
collections::{BTreeMap, BTreeSet, HashMap, HashSet, VecDeque},
collections::{BTreeMap, HashMap, HashSet, VecDeque},
fmt::Write,
hash::BuildHasherDefault,
};
use anyhow::{anyhow, Result};
use indexmap::{map::Entry, IndexMap};
use log::warn;
use rustc_hash::FxHasher;
use serde::Serialize;
use thiserror::Error;
use super::{
item::{ParseItem, ParseItemSet, ParseItemSetCore, ParseItemSetEntry},
item::{ParseItem, ParseItemSet, ParseItemSetCore},
item_set_builder::ParseItemSetBuilder,
};
use crate::{
grammars::{LexicalGrammar, PrecedenceEntry, ReservedWordSetId, SyntaxGrammar, VariableType},
use crate::generate::{
grammars::{
InlinedProductionMap, LexicalGrammar, PrecedenceEntry, SyntaxGrammar, VariableType,
},
node_types::VariableInfo,
rules::{Associativity, Precedence, Symbol, SymbolType, TokenSet},
tables::{
@ -65,208 +66,8 @@ struct ParseTableBuilder<'a> {
parse_table: ParseTable,
}
pub type BuildTableResult<T> = Result<T, ParseTableBuilderError>;
#[derive(Debug, Error, Serialize)]
pub enum ParseTableBuilderError {
#[error("Unresolved conflict for symbol sequence:\n\n{0}")]
Conflict(#[from] ConflictError),
#[error("Extra rules must have unambiguous endings. Conflicting rules: {0}")]
AmbiguousExtra(#[from] AmbiguousExtraError),
#[error(
"The non-terminal rule `{0}` is used in a non-terminal `extra` rule, which is not allowed."
)]
ImproperNonTerminalExtra(String),
#[error("State count `{0}` exceeds the max value {max}.", max=u16::MAX)]
StateCount(usize),
}
#[derive(Default, Debug, Serialize, Error)]
pub struct ConflictError {
pub symbol_sequence: Vec<String>,
pub conflicting_lookahead: String,
pub possible_interpretations: Vec<Interpretation>,
pub possible_resolutions: Vec<Resolution>,
}
#[derive(Default, Debug, Serialize, Error)]
pub struct Interpretation {
pub preceding_symbols: Vec<String>,
pub variable_name: String,
pub production_step_symbols: Vec<String>,
pub step_index: u32,
pub done: bool,
pub conflicting_lookahead: String,
pub precedence: Option<String>,
pub associativity: Option<String>,
}
#[derive(Debug, Serialize)]
pub enum Resolution {
Precedence { symbols: Vec<String> },
Associativity { symbols: Vec<String> },
AddConflict { symbols: Vec<String> },
}
#[derive(Debug, Serialize, Error)]
pub struct AmbiguousExtraError {
pub parent_symbols: Vec<String>,
}
impl std::fmt::Display for ConflictError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
for symbol in &self.symbol_sequence {
write!(f, " {symbol}")?;
}
writeln!(f, " • {} …\n", self.conflicting_lookahead)?;
writeln!(f, "Possible interpretations:\n")?;
let mut interpretations = self
.possible_interpretations
.iter()
.map(|i| {
let line = i.to_string();
let prec_line = if let (Some(precedence), Some(associativity)) =
(&i.precedence, &i.associativity)
{
Some(format!(
"(precedence: {precedence}, associativity: {associativity})",
))
} else {
i.precedence
.as_ref()
.map(|precedence| format!("(precedence: {precedence})"))
};
(line, prec_line)
})
.collect::<Vec<_>>();
let max_interpretation_length = interpretations
.iter()
.map(|i| i.0.chars().count())
.max()
.unwrap();
interpretations.sort_unstable();
for (i, (line, prec_suffix)) in interpretations.into_iter().enumerate() {
write!(f, " {}:", i + 1).unwrap();
write!(f, "{line}")?;
if let Some(prec_suffix) = prec_suffix {
write!(
f,
"{:1$}",
"",
max_interpretation_length.saturating_sub(line.chars().count()) + 2
)?;
write!(f, "{prec_suffix}")?;
}
writeln!(f)?;
}
writeln!(f, "\nPossible resolutions:\n")?;
for (i, resolution) in self.possible_resolutions.iter().enumerate() {
writeln!(f, " {}: {resolution}", i + 1)?;
}
Ok(())
}
}
impl std::fmt::Display for Interpretation {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
for symbol in &self.preceding_symbols {
write!(f, " {symbol}")?;
}
write!(f, " ({}", self.variable_name)?;
for (i, symbol) in self.production_step_symbols.iter().enumerate() {
if i == self.step_index as usize {
write!(f, "")?;
}
write!(f, " {symbol}")?;
}
write!(f, ")")?;
if self.done {
write!(f, " • {} …", self.conflicting_lookahead)?;
}
Ok(())
}
}
impl std::fmt::Display for Resolution {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self {
Self::Precedence { symbols } => {
write!(f, "Specify a higher precedence in ")?;
for (i, symbol) in symbols.iter().enumerate() {
if i > 0 {
write!(f, " and ")?;
}
write!(f, "`{symbol}`")?;
}
write!(f, " than in the other rules.")?;
}
Self::Associativity { symbols } => {
write!(f, "Specify a left or right associativity in ")?;
for (i, symbol) in symbols.iter().enumerate() {
if i > 0 {
write!(f, ", ")?;
}
write!(f, "`{symbol}`")?;
}
}
Self::AddConflict { symbols } => {
write!(f, "Add a conflict for these rules: ")?;
for (i, symbol) in symbols.iter().enumerate() {
if i > 0 {
write!(f, ", ")?;
}
write!(f, "`{symbol}`")?;
}
}
}
Ok(())
}
}
impl std::fmt::Display for AmbiguousExtraError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
for (i, symbol) in self.parent_symbols.iter().enumerate() {
if i > 0 {
write!(f, ", ")?;
}
write!(f, "{symbol}")?;
}
Ok(())
}
}
impl<'a> ParseTableBuilder<'a> {
fn new(
syntax_grammar: &'a SyntaxGrammar,
lexical_grammar: &'a LexicalGrammar,
item_set_builder: ParseItemSetBuilder<'a>,
variable_info: &'a [VariableInfo],
) -> Self {
Self {
syntax_grammar,
lexical_grammar,
item_set_builder,
variable_info,
non_terminal_extra_states: Vec::new(),
state_ids_by_item_set: IndexMap::default(),
core_ids_by_core: HashMap::new(),
parse_state_info_by_id: Vec::new(),
parse_state_queue: VecDeque::new(),
actual_conflicts: syntax_grammar.expected_conflicts.iter().cloned().collect(),
parse_table: ParseTable {
states: Vec::new(),
symbols: Vec::new(),
external_lex_states: Vec::new(),
production_infos: Vec::new(),
max_aliased_production_length: 1,
},
}
}
fn build(mut self) -> BuildTableResult<(ParseTable, Vec<ParseStateInfo<'a>>)> {
fn build(mut self) -> Result<(ParseTable, Vec<ParseStateInfo<'a>>)> {
// Ensure that the empty alias sequence has index 0.
self.parse_table
.production_infos
@ -279,13 +80,10 @@ impl<'a> ParseTableBuilder<'a> {
self.add_parse_state(
&Vec::new(),
&Vec::new(),
ParseItemSet {
entries: vec![ParseItemSetEntry {
item: ParseItem::start(),
lookaheads: std::iter::once(Symbol::end()).collect(),
following_reserved_word_set: ReservedWordSetId::default(),
}],
},
ParseItemSet::with(std::iter::once((
ParseItem::start(),
std::iter::once(&Symbol::end()).copied().collect(),
))),
);
// Compute the possible item sets for non-terminal extras.
@ -301,35 +99,25 @@ impl<'a> ParseTableBuilder<'a> {
non_terminal_extra_item_sets_by_first_terminal
.entry(production.first_symbol().unwrap())
.or_insert_with(ParseItemSet::default)
.insert(ParseItem {
variable_index: extra_non_terminal.index as u32,
production,
step_index: 1,
has_preceding_inherited_fields: false,
})
.lookaheads
.insert(Symbol::end_of_nonterminal_extra());
.insert(
ParseItem {
variable_index: extra_non_terminal.index as u32,
production,
step_index: 1,
has_preceding_inherited_fields: false,
},
&std::iter::once(&Symbol::end_of_nonterminal_extra())
.copied()
.collect(),
);
}
}
let non_terminal_sets_len = non_terminal_extra_item_sets_by_first_terminal.len();
self.non_terminal_extra_states
.reserve(non_terminal_sets_len);
self.parse_state_info_by_id.reserve(non_terminal_sets_len);
self.parse_table.states.reserve(non_terminal_sets_len);
self.parse_state_queue.reserve(non_terminal_sets_len);
// Add a state for each starting terminal of a non-terminal extra rule.
for (terminal, item_set) in non_terminal_extra_item_sets_by_first_terminal {
if terminal.is_non_terminal() {
Err(ParseTableBuilderError::ImproperNonTerminalExtra(
self.symbol_name(&terminal),
))?;
}
// Add the parse state, and *then* push the terminal and the state id into the
// list of nonterminal extra states
let state_id = self.add_parse_state(&Vec::new(), &Vec::new(), item_set);
self.non_terminal_extra_states.push((terminal, state_id));
self.non_terminal_extra_states
.push((terminal, self.parse_table.states.len()));
self.add_parse_state(&Vec::new(), &Vec::new(), item_set);
}
while let Some(entry) = self.parse_state_queue.pop_front() {
@ -346,21 +134,17 @@ impl<'a> ParseTableBuilder<'a> {
}
if !self.actual_conflicts.is_empty() {
warn!(
"unnecessary conflicts:\n {}",
&self
.actual_conflicts
.iter()
.map(|conflict| {
conflict
.iter()
.map(|symbol| format!("`{}`", self.symbol_name(symbol)))
.collect::<Vec<_>>()
.join(", ")
})
.collect::<Vec<_>>()
.join("\n ")
);
println!("Warning: unnecessary conflicts");
for conflict in &self.actual_conflicts {
println!(
" {}",
conflict
.iter()
.map(|symbol| format!("`{}`", self.symbol_name(symbol)))
.collect::<Vec<_>>()
.join(", ")
);
}
}
Ok((self.parse_table, self.parse_state_info_by_id))
@ -394,7 +178,6 @@ impl<'a> ParseTableBuilder<'a> {
external_lex_state_id: 0,
terminal_entries: IndexMap::default(),
nonterminal_entries: IndexMap::default(),
reserved_words: TokenSet::default(),
core_id,
});
self.parse_state_queue.push_back(ParseStateQueueEntry {
@ -413,7 +196,7 @@ impl<'a> ParseTableBuilder<'a> {
mut preceding_auxiliary_symbols: AuxiliarySymbolSequence,
state_id: ParseStateId,
item_set: &ParseItemSet<'a>,
) -> BuildTableResult<()> {
) -> Result<()> {
let mut terminal_successors = BTreeMap::new();
let mut non_terminal_successors = BTreeMap::new();
let mut lookaheads_with_conflicts = TokenSet::new();
@ -421,18 +204,13 @@ impl<'a> ParseTableBuilder<'a> {
// Each item in the item set contributes to either or a Shift action or a Reduce
// action in this state.
for ParseItemSetEntry {
item,
lookaheads,
following_reserved_word_set: reserved_lookaheads,
} in &item_set.entries
{
for (item, lookaheads) in &item_set.entries {
// If the item is unfinished, then this state has a transition for the item's
// next symbol. Advance the item to its next step and insert the resulting
// item into the successor item set.
if let Some(next_symbol) = item.symbol() {
let mut successor = item.successor();
let successor_set = if next_symbol.is_non_terminal() {
if next_symbol.is_non_terminal() {
let variable = &self.syntax_grammar.variables[next_symbol.index];
// Keep track of where auxiliary non-terminals (repeat symbols) are
@ -461,16 +239,13 @@ impl<'a> ParseTableBuilder<'a> {
non_terminal_successors
.entry(next_symbol)
.or_insert_with(ParseItemSet::default)
.insert(successor, lookaheads);
} else {
terminal_successors
.entry(next_symbol)
.or_insert_with(ParseItemSet::default)
};
let successor_entry = successor_set.insert(successor);
successor_entry.lookaheads.insert_all(lookaheads);
successor_entry.following_reserved_word_set = successor_entry
.following_reserved_word_set
.max(*reserved_lookaheads);
.insert(successor, lookaheads);
}
}
// If the item is finished, then add a Reduce action to this state based
// on this item.
@ -597,7 +372,7 @@ impl<'a> ParseTableBuilder<'a> {
)?;
}
// Add actions for the grammar's `extra` symbols.
// Finally, add actions for the grammar's `extra` symbols.
let state = &mut self.parse_table.states[state_id];
let is_end_of_non_terminal_extra = state.is_end_of_non_terminal_extra();
@ -609,7 +384,7 @@ impl<'a> ParseTableBuilder<'a> {
let parent_symbols = item_set
.entries
.iter()
.filter_map(|ParseItemSetEntry { item, .. }| {
.filter_map(|(item, _)| {
if !item.is_augmented() && item.step_index > 0 {
Some(item.variable_index)
} else {
@ -617,18 +392,15 @@ impl<'a> ParseTableBuilder<'a> {
}
})
.collect::<HashSet<_>>();
let parent_symbol_names = parent_symbols
.iter()
.map(|&variable_index| {
self.syntax_grammar.variables[variable_index as usize]
.name
.clone()
})
.collect::<Vec<_>>();
Err(AmbiguousExtraError {
parent_symbols: parent_symbol_names,
})?;
let mut message =
"Extra rules must have unambiguous endings. Conflicting rules: ".to_string();
for (i, variable_index) in parent_symbols.iter().enumerate() {
if i > 0 {
message += ", ";
}
message += &self.syntax_grammar.variables[*variable_index as usize].name;
}
return Err(anyhow!(message));
}
}
// Add actions for the start tokens of each non-terminal extra rule.
@ -666,30 +438,6 @@ impl<'a> ParseTableBuilder<'a> {
}
}
if let Some(keyword_capture_token) = self.syntax_grammar.word_token {
let reserved_word_set_id = item_set
.entries
.iter()
.filter_map(|entry| {
if let Some(next_step) = entry.item.step() {
if next_step.symbol == keyword_capture_token {
Some(next_step.reserved_word_set_id)
} else {
None
}
} else if entry.lookaheads.contains(&keyword_capture_token) {
Some(entry.following_reserved_word_set)
} else {
None
}
})
.max();
if let Some(reserved_word_set_id) = reserved_word_set_id {
state.reserved_words =
self.syntax_grammar.reserved_word_sets[reserved_word_set_id.0].clone();
}
}
Ok(())
}
@ -701,7 +449,7 @@ impl<'a> ParseTableBuilder<'a> {
preceding_auxiliary_symbols: &[AuxiliarySymbolInfo],
conflicting_lookahead: Symbol,
reduction_info: &ReductionInfo,
) -> BuildTableResult<()> {
) -> Result<()> {
let entry = self.parse_table.states[state_id]
.terminal_entries
.get_mut(&conflicting_lookahead)
@ -715,11 +463,8 @@ impl<'a> ParseTableBuilder<'a> {
// precedence, and there can still be SHIFT/REDUCE conflicts.
let mut considered_associativity = false;
let mut shift_precedence = Vec::<(&Precedence, Symbol)>::new();
let mut conflicting_items = BTreeSet::new();
for ParseItemSetEntry {
item, lookaheads, ..
} in &item_set.entries
{
let mut conflicting_items = HashSet::new();
for (item, lookaheads) in &item_set.entries {
if let Some(step) = item.step() {
if item.step_index > 0
&& self
@ -856,55 +601,93 @@ impl<'a> ParseTableBuilder<'a> {
return Ok(());
}
let mut conflict_error = ConflictError::default();
let mut msg = "Unresolved conflict for symbol sequence:\n\n".to_string();
for symbol in preceding_symbols {
conflict_error
.symbol_sequence
.push(self.symbol_name(symbol));
write!(&mut msg, " {}", self.symbol_name(symbol)).unwrap();
}
conflict_error.conflicting_lookahead = self.symbol_name(&conflicting_lookahead);
let interpretations = conflicting_items
writeln!(
&mut msg,
" • {} …\n",
self.symbol_name(&conflicting_lookahead)
)
.unwrap();
writeln!(&mut msg, "Possible interpretations:\n").unwrap();
let mut interpretations = conflicting_items
.iter()
.map(|item| {
let preceding_symbols = preceding_symbols
let mut line = String::new();
for preceding_symbol in preceding_symbols
.iter()
.take(preceding_symbols.len() - item.step_index as usize)
.map(|symbol| self.symbol_name(symbol))
.collect::<Vec<_>>();
{
write!(&mut line, " {}", self.symbol_name(preceding_symbol)).unwrap();
}
let variable_name = self.syntax_grammar.variables[item.variable_index as usize]
.name
.clone();
write!(
&mut line,
" ({}",
&self.syntax_grammar.variables[item.variable_index as usize].name
)
.unwrap();
let production_step_symbols = item
.production
.steps
.iter()
.map(|step| self.symbol_name(&step.symbol))
.collect::<Vec<_>>();
for (j, step) in item.production.steps.iter().enumerate() {
if j as u32 == item.step_index {
write!(&mut line, "").unwrap();
}
write!(&mut line, " {}", self.symbol_name(&step.symbol)).unwrap();
}
let precedence = match item.precedence() {
Precedence::None => None,
_ => Some(item.precedence().to_string()),
write!(&mut line, ")").unwrap();
if item.is_done() {
write!(
&mut line,
" • {} …",
self.symbol_name(&conflicting_lookahead)
)
.unwrap();
}
let precedence = item.precedence();
let associativity = item.associativity();
let prec_line = if let Some(associativity) = associativity {
Some(format!(
"(precedence: {precedence}, associativity: {associativity:?})",
))
} else if !precedence.is_none() {
Some(format!("(precedence: {precedence})"))
} else {
None
};
let associativity = item.associativity().map(|assoc| format!("{assoc:?}"));
Interpretation {
preceding_symbols,
variable_name,
production_step_symbols,
step_index: item.step_index,
done: item.is_done(),
conflicting_lookahead: self.symbol_name(&conflicting_lookahead),
precedence,
associativity,
}
(line, prec_line)
})
.collect::<Vec<_>>();
conflict_error.possible_interpretations = interpretations;
let max_interpretation_length = interpretations
.iter()
.map(|i| i.0.chars().count())
.max()
.unwrap();
interpretations.sort_unstable();
for (i, (line, prec_suffix)) in interpretations.into_iter().enumerate() {
write!(&mut msg, " {}:", i + 1).unwrap();
msg += &line;
if let Some(prec_suffix) = prec_suffix {
for _ in line.chars().count()..max_interpretation_length {
msg.push(' ');
}
msg += " ";
msg += &prec_suffix;
}
msg.push('\n');
}
let mut resolution_count = 0;
writeln!(&mut msg, "\nPossible resolutions:\n").unwrap();
let mut shift_items = Vec::new();
let mut reduce_items = Vec::new();
for item in conflicting_items {
@ -917,57 +700,76 @@ impl<'a> ParseTableBuilder<'a> {
shift_items.sort_unstable();
reduce_items.sort_unstable();
let get_rule_names = |items: &[&ParseItem]| -> Vec<String> {
let list_rule_names = |mut msg: &mut String, items: &[&ParseItem]| {
let mut last_rule_id = None;
let mut result = Vec::with_capacity(items.len());
for item in items {
if last_rule_id == Some(item.variable_index) {
continue;
}
last_rule_id = Some(item.variable_index);
result.push(self.symbol_name(&Symbol::non_terminal(item.variable_index as usize)));
}
result
if last_rule_id.is_some() {
write!(&mut msg, " and").unwrap();
}
last_rule_id = Some(item.variable_index);
write!(
msg,
" `{}`",
self.symbol_name(&Symbol::non_terminal(item.variable_index as usize))
)
.unwrap();
}
};
if actual_conflict.len() > 1 {
if !shift_items.is_empty() {
let names = get_rule_names(&shift_items);
conflict_error
.possible_resolutions
.push(Resolution::Precedence { symbols: names });
resolution_count += 1;
write!(
&mut msg,
" {resolution_count}: Specify a higher precedence in",
)
.unwrap();
list_rule_names(&mut msg, &shift_items);
writeln!(&mut msg, " than in the other rules.").unwrap();
}
for item in &reduce_items {
let name = self.symbol_name(&Symbol::non_terminal(item.variable_index as usize));
conflict_error
.possible_resolutions
.push(Resolution::Precedence {
symbols: vec![name],
});
resolution_count += 1;
writeln!(
&mut msg,
" {resolution_count}: Specify a higher precedence in `{}` than in the other rules.",
self.symbol_name(&Symbol::non_terminal(item.variable_index as usize))
)
.unwrap();
}
}
if considered_associativity {
let names = get_rule_names(&reduce_items);
conflict_error
.possible_resolutions
.push(Resolution::Associativity { symbols: names });
resolution_count += 1;
write!(
&mut msg,
" {resolution_count}: Specify a left or right associativity in",
)
.unwrap();
list_rule_names(&mut msg, &reduce_items);
writeln!(&mut msg).unwrap();
}
conflict_error
.possible_resolutions
.push(Resolution::AddConflict {
symbols: actual_conflict
.iter()
.map(|s| self.symbol_name(s))
.collect(),
});
resolution_count += 1;
write!(
&mut msg,
" {resolution_count}: Add a conflict for these rules: ",
)
.unwrap();
for (i, symbol) in actual_conflict.iter().enumerate() {
if i > 0 {
write!(&mut msg, ", ").unwrap();
}
write!(&mut msg, "`{}`", self.symbol_name(symbol)).unwrap();
}
writeln!(&mut msg).unwrap();
self.actual_conflicts.insert(actual_conflict);
Err(conflict_error)?
Err(anyhow!(msg))
}
fn compare_precedence(
@ -1036,7 +838,7 @@ impl<'a> ParseTableBuilder<'a> {
let parent_symbols = item_set
.entries
.iter()
.filter_map(|ParseItemSetEntry { item, .. }| {
.filter_map(|(item, _)| {
let variable_index = item.variable_index as usize;
if item.symbol() == Some(symbol)
&& !self.syntax_grammar.variables[variable_index].is_auxiliary()
@ -1124,24 +926,84 @@ impl<'a> ParseTableBuilder<'a> {
if variable.kind == VariableType::Named {
variable.name.clone()
} else {
format!("'{}'", variable.name)
format!("'{}'", &variable.name)
}
}
}
}
}
fn populate_following_tokens(
result: &mut [TokenSet],
grammar: &SyntaxGrammar,
inlines: &InlinedProductionMap,
builder: &ParseItemSetBuilder,
) {
let productions = grammar
.variables
.iter()
.flat_map(|v| &v.productions)
.chain(&inlines.productions);
let all_tokens = (0..result.len())
.map(Symbol::terminal)
.collect::<TokenSet>();
for production in productions {
for i in 1..production.steps.len() {
let left_tokens = builder.last_set(&production.steps[i - 1].symbol);
let right_tokens = builder.first_set(&production.steps[i].symbol);
for left_token in left_tokens.iter() {
if left_token.is_terminal() {
result[left_token.index].insert_all_terminals(right_tokens);
}
}
}
}
for extra in &grammar.extra_symbols {
if extra.is_terminal() {
for entry in result.iter_mut() {
entry.insert(*extra);
}
result[extra.index].clone_from(&all_tokens);
}
}
}
pub fn build_parse_table<'a>(
syntax_grammar: &'a SyntaxGrammar,
lexical_grammar: &'a LexicalGrammar,
item_set_builder: ParseItemSetBuilder<'a>,
inlines: &'a InlinedProductionMap,
variable_info: &'a [VariableInfo],
) -> BuildTableResult<(ParseTable, Vec<ParseStateInfo<'a>>)> {
ParseTableBuilder::new(
) -> Result<(ParseTable, Vec<TokenSet>, Vec<ParseStateInfo<'a>>)> {
let actual_conflicts = syntax_grammar.expected_conflicts.iter().cloned().collect();
let item_set_builder = ParseItemSetBuilder::new(syntax_grammar, lexical_grammar, inlines);
let mut following_tokens = vec![TokenSet::new(); lexical_grammar.variables.len()];
populate_following_tokens(
&mut following_tokens,
syntax_grammar,
inlines,
&item_set_builder,
);
let (table, item_sets) = ParseTableBuilder {
syntax_grammar,
lexical_grammar,
item_set_builder,
variable_info,
)
.build()
non_terminal_extra_states: Vec::new(),
actual_conflicts,
state_ids_by_item_set: IndexMap::default(),
core_ids_by_core: HashMap::new(),
parse_state_info_by_id: Vec::new(),
parse_state_queue: VecDeque::new(),
parse_table: ParseTable {
states: Vec::new(),
symbols: Vec::new(),
external_lex_states: Vec::new(),
production_infos: Vec::new(),
max_aliased_production_length: 1,
},
}
.build()?;
Ok((table, following_tokens, item_sets))
}

View file

@ -1,6 +1,6 @@
use std::fmt;
use crate::{
use crate::generate::{
grammars::LexicalGrammar,
rules::Symbol,
tables::{ParseStateId, ParseTable},
@ -55,7 +55,7 @@ impl<'a> CoincidentTokenIndex<'a> {
}
}
impl fmt::Debug for CoincidentTokenIndex<'_> {
impl<'a> fmt::Debug for CoincidentTokenIndex<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
writeln!(f, "CoincidentTokenIndex {{")?;

View file

@ -2,31 +2,30 @@ use std::{
cmp::Ordering,
fmt,
hash::{Hash, Hasher},
sync::LazyLock,
};
use crate::{
grammars::{
LexicalGrammar, Production, ProductionStep, ReservedWordSetId, SyntaxGrammar,
NO_RESERVED_WORDS,
},
use lazy_static::lazy_static;
use crate::generate::{
grammars::{LexicalGrammar, Production, ProductionStep, SyntaxGrammar},
rules::{Associativity, Precedence, Symbol, SymbolType, TokenSet},
};
static START_PRODUCTION: LazyLock<Production> = LazyLock::new(|| Production {
dynamic_precedence: 0,
steps: vec![ProductionStep {
symbol: Symbol {
index: 0,
kind: SymbolType::NonTerminal,
},
precedence: Precedence::None,
associativity: None,
alias: None,
field_name: None,
reserved_word_set_id: NO_RESERVED_WORDS,
}],
});
lazy_static! {
static ref START_PRODUCTION: Production = Production {
dynamic_precedence: 0,
steps: vec![ProductionStep {
symbol: Symbol {
index: 0,
kind: SymbolType::NonTerminal,
},
precedence: Precedence::None,
associativity: None,
alias: None,
field_name: None,
}],
};
}
/// A [`ParseItem`] represents an in-progress match of a single production in a grammar.
#[derive(Clone, Copy, Debug)]
@ -59,14 +58,7 @@ pub struct ParseItem<'a> {
/// to a state in the final parse table.
#[derive(Clone, Debug, PartialEq, Eq, Default)]
pub struct ParseItemSet<'a> {
pub entries: Vec<ParseItemSetEntry<'a>>,
}
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct ParseItemSetEntry<'a> {
pub item: ParseItem<'a>,
pub lookaheads: TokenSet,
pub following_reserved_word_set: ReservedWordSetId,
pub entries: Vec<(ParseItem<'a>, TokenSet)>,
}
/// A [`ParseItemSetCore`] is like a [`ParseItemSet`], but without the lookahead
@ -152,7 +144,7 @@ impl<'a> ParseItem<'a> {
/// Create an item identical to this one, but with a different production.
/// This is used when dynamically "inlining" certain symbols in a production.
pub const fn substitute_production(&self, production: &'a Production) -> Self {
pub const fn substitute_production(&self, production: &'a Production) -> ParseItem<'a> {
let mut result = *self;
result.production = production;
result
@ -160,31 +152,35 @@ impl<'a> ParseItem<'a> {
}
impl<'a> ParseItemSet<'a> {
pub fn insert(&mut self, item: ParseItem<'a>) -> &mut ParseItemSetEntry<'a> {
match self.entries.binary_search_by(|e| e.item.cmp(&item)) {
pub fn with(elements: impl IntoIterator<Item = (ParseItem<'a>, TokenSet)>) -> Self {
let mut result = Self::default();
for (item, lookaheads) in elements {
result.insert(item, &lookaheads);
}
result
}
pub fn insert(&mut self, item: ParseItem<'a>, lookaheads: &TokenSet) -> &mut TokenSet {
match self.entries.binary_search_by(|(i, _)| i.cmp(&item)) {
Err(i) => {
self.entries.insert(
i,
ParseItemSetEntry {
item,
lookaheads: TokenSet::new(),
following_reserved_word_set: ReservedWordSetId::default(),
},
);
&mut self.entries[i]
self.entries.insert(i, (item, lookaheads.clone()));
&mut self.entries[i].1
}
Ok(i) => {
self.entries[i].1.insert_all(lookaheads);
&mut self.entries[i].1
}
Ok(i) => &mut self.entries[i],
}
}
pub fn core(&self) -> ParseItemSetCore<'a> {
ParseItemSetCore {
entries: self.entries.iter().map(|e| e.item).collect(),
entries: self.entries.iter().map(|e| e.0).collect(),
}
}
}
impl fmt::Display for ParseItemDisplay<'_> {
impl<'a> fmt::Display for ParseItemDisplay<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> Result<(), fmt::Error> {
if self.0.is_augmented() {
write!(f, "START →")?;
@ -192,42 +188,35 @@ impl fmt::Display for ParseItemDisplay<'_> {
write!(
f,
"{} →",
self.1.variables[self.0.variable_index as usize].name
&self.1.variables[self.0.variable_index as usize].name
)?;
}
for (i, step) in self.0.production.steps.iter().enumerate() {
if i == self.0.step_index as usize {
write!(f, "")?;
if !step.precedence.is_none()
|| step.associativity.is_some()
|| step.reserved_word_set_id != ReservedWordSetId::default()
{
write!(f, " (")?;
if !step.precedence.is_none() {
write!(f, " {}", step.precedence)?;
if let Some(associativity) = step.associativity {
if step.precedence.is_none() {
write!(f, " ({associativity:?})")?;
} else {
write!(f, " ({} {associativity:?})", step.precedence)?;
}
if let Some(associativity) = step.associativity {
write!(f, " {associativity:?}")?;
}
if step.reserved_word_set_id != ReservedWordSetId::default() {
write!(f, "reserved: {}", step.reserved_word_set_id)?;
}
write!(f, " )")?;
} else if !step.precedence.is_none() {
write!(f, " ({})", step.precedence)?;
}
}
write!(f, " ")?;
if step.symbol.is_terminal() {
if let Some(variable) = self.2.variables.get(step.symbol.index) {
write!(f, "{}", variable.name)?;
write!(f, "{}", &variable.name)?;
} else {
write!(f, "terminal-{}", step.symbol.index)?;
}
} else if step.symbol.is_external() {
write!(f, "{}", self.1.external_tokens[step.symbol.index].name)?;
write!(f, "{}", &self.1.external_tokens[step.symbol.index].name)?;
} else {
write!(f, "{}", self.1.variables[step.symbol.index].name)?;
write!(f, "{}", &self.1.variables[step.symbol.index].name)?;
}
if let Some(alias) = &step.alias {
@ -254,33 +243,7 @@ impl fmt::Display for ParseItemDisplay<'_> {
}
}
const fn escape_invisible(c: char) -> Option<&'static str> {
Some(match c {
'\n' => "\\n",
'\r' => "\\r",
'\t' => "\\t",
'\0' => "\\0",
'\\' => "\\\\",
'\x0b' => "\\v",
'\x0c' => "\\f",
_ => return None,
})
}
fn display_variable_name(source: &str) -> String {
source
.chars()
.fold(String::with_capacity(source.len()), |mut acc, c| {
if let Some(esc) = escape_invisible(c) {
acc.push_str(esc);
} else {
acc.push(c);
}
acc
})
}
impl fmt::Display for TokenSetDisplay<'_> {
impl<'a> fmt::Display for TokenSetDisplay<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> Result<(), fmt::Error> {
write!(f, "[")?;
for (i, symbol) in self.0.iter().enumerate() {
@ -290,14 +253,14 @@ impl fmt::Display for TokenSetDisplay<'_> {
if symbol.is_terminal() {
if let Some(variable) = self.2.variables.get(symbol.index) {
write!(f, "{}", display_variable_name(&variable.name))?;
write!(f, "{}", &variable.name)?;
} else {
write!(f, "terminal-{}", symbol.index)?;
}
} else if symbol.is_external() {
write!(f, "{}", self.1.external_tokens[symbol.index].name)?;
write!(f, "{}", &self.1.external_tokens[symbol.index].name)?;
} else {
write!(f, "{}", self.1.variables[symbol.index].name)?;
write!(f, "{}", &self.1.variables[symbol.index].name)?;
}
}
write!(f, "]")?;
@ -305,29 +268,21 @@ impl fmt::Display for TokenSetDisplay<'_> {
}
}
impl fmt::Display for ParseItemSetDisplay<'_> {
impl<'a> fmt::Display for ParseItemSetDisplay<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> Result<(), fmt::Error> {
for entry in &self.0.entries {
write!(
for (item, lookaheads) in &self.0.entries {
writeln!(
f,
"{}\t{}",
ParseItemDisplay(&entry.item, self.1, self.2),
TokenSetDisplay(&entry.lookaheads, self.1, self.2),
ParseItemDisplay(item, self.1, self.2),
TokenSetDisplay(lookaheads, self.1, self.2)
)?;
if entry.following_reserved_word_set != ReservedWordSetId::default() {
write!(
f,
"\treserved word set: {}",
entry.following_reserved_word_set
)?;
}
writeln!(f)?;
}
Ok(())
}
}
impl Hash for ParseItem<'_> {
impl<'a> Hash for ParseItem<'a> {
fn hash<H: Hasher>(&self, hasher: &mut H) {
hasher.write_u32(self.variable_index);
hasher.write_u32(self.step_index);
@ -341,7 +296,7 @@ impl Hash for ParseItem<'_> {
// this item, unless any of the following are true:
// * the children have fields
// * the children have aliases
// * the children are hidden and represent rules that have fields.
// * the children are hidden and
// See the docs for `has_preceding_inherited_fields`.
for step in &self.production.steps[0..self.step_index as usize] {
step.alias.hash(hasher);
@ -356,7 +311,7 @@ impl Hash for ParseItem<'_> {
}
}
impl PartialEq for ParseItem<'_> {
impl<'a> PartialEq for ParseItem<'a> {
fn eq(&self, other: &Self) -> bool {
if self.variable_index != other.variable_index
|| self.step_index != other.step_index
@ -393,7 +348,7 @@ impl PartialEq for ParseItem<'_> {
}
}
impl Ord for ParseItem<'_> {
impl<'a> Ord for ParseItem<'a> {
fn cmp(&self, other: &Self) -> Ordering {
self.step_index
.cmp(&other.step_index)
@ -433,26 +388,25 @@ impl Ord for ParseItem<'_> {
}
}
impl PartialOrd for ParseItem<'_> {
impl<'a> PartialOrd for ParseItem<'a> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl Eq for ParseItem<'_> {}
impl<'a> Eq for ParseItem<'a> {}
impl Hash for ParseItemSet<'_> {
impl<'a> Hash for ParseItemSet<'a> {
fn hash<H: Hasher>(&self, hasher: &mut H) {
hasher.write_usize(self.entries.len());
for entry in &self.entries {
entry.item.hash(hasher);
entry.lookaheads.hash(hasher);
entry.following_reserved_word_set.hash(hasher);
for (item, lookaheads) in &self.entries {
item.hash(hasher);
lookaheads.hash(hasher);
}
}
}
impl Hash for ParseItemSetCore<'_> {
impl<'a> Hash for ParseItemSetCore<'a> {
fn hash<H: Hasher>(&self, hasher: &mut H) {
hasher.write_usize(self.entries.len());
for item in &self.entries {

View file

@ -3,9 +3,9 @@ use std::{
fmt,
};
use super::item::{ParseItem, ParseItemDisplay, ParseItemSet, ParseItemSetEntry, TokenSetDisplay};
use crate::{
grammars::{InlinedProductionMap, LexicalGrammar, ReservedWordSetId, SyntaxGrammar},
use super::item::{ParseItem, ParseItemDisplay, ParseItemSet, TokenSetDisplay};
use crate::generate::{
grammars::{InlinedProductionMap, LexicalGrammar, SyntaxGrammar},
rules::{Symbol, SymbolType, TokenSet},
};
@ -15,10 +15,9 @@ struct TransitiveClosureAddition<'a> {
info: FollowSetInfo,
}
#[derive(Clone, Debug, Default, PartialEq, Eq)]
#[derive(Clone, Debug, PartialEq, Eq)]
struct FollowSetInfo {
lookaheads: TokenSet,
reserved_lookaheads: ReservedWordSetId,
propagates_lookaheads: bool,
}
@ -26,7 +25,6 @@ pub struct ParseItemSetBuilder<'a> {
syntax_grammar: &'a SyntaxGrammar,
lexical_grammar: &'a LexicalGrammar,
first_sets: HashMap<Symbol, TokenSet>,
reserved_first_sets: HashMap<Symbol, ReservedWordSetId>,
last_sets: HashMap<Symbol, TokenSet>,
inlines: &'a InlinedProductionMap,
transitive_closure_additions: Vec<Vec<TransitiveClosureAddition<'a>>>,
@ -48,7 +46,6 @@ impl<'a> ParseItemSetBuilder<'a> {
syntax_grammar,
lexical_grammar,
first_sets: HashMap::new(),
reserved_first_sets: HashMap::new(),
last_sets: HashMap::new(),
inlines,
transitive_closure_additions: vec![Vec::new(); syntax_grammar.variables.len()],
@ -57,7 +54,8 @@ impl<'a> ParseItemSetBuilder<'a> {
// For each grammar symbol, populate the FIRST and LAST sets: the set of
// terminals that appear at the beginning and end that symbol's productions,
// respectively.
// For a terminal symbol, the FIRST and LAST sets just consist of the
//
// For a terminal symbol, the FIRST and LAST set just consists of the
// terminal itself.
for i in 0..lexical_grammar.variables.len() {
let symbol = Symbol::terminal(i);
@ -65,9 +63,6 @@ impl<'a> ParseItemSetBuilder<'a> {
set.insert(symbol);
result.first_sets.insert(symbol, set.clone());
result.last_sets.insert(symbol, set);
result
.reserved_first_sets
.insert(symbol, ReservedWordSetId::default());
}
for i in 0..syntax_grammar.external_tokens.len() {
@ -76,15 +71,12 @@ impl<'a> ParseItemSetBuilder<'a> {
set.insert(symbol);
result.first_sets.insert(symbol, set.clone());
result.last_sets.insert(symbol, set);
result
.reserved_first_sets
.insert(symbol, ReservedWordSetId::default());
}
// The FIRST set of a non-terminal `i` is the union of the FIRST sets
// of all the symbols that appear at the beginnings of i's productions. Some
// of these symbols may themselves be non-terminals, so this is a recursive
// definition.
// The FIRST set of a non-terminal `i` is the union of the following sets:
// * the set of all terminals that appear at the beginnings of i's productions
// * the FIRST sets of all the non-terminals that appear at the beginnings of i's
// productions
//
// Rather than computing these sets using recursion, we use an explicit stack
// called `symbols_to_process`.
@ -92,36 +84,37 @@ impl<'a> ParseItemSetBuilder<'a> {
let mut processed_non_terminals = HashSet::new();
for i in 0..syntax_grammar.variables.len() {
let symbol = Symbol::non_terminal(i);
let first_set = result.first_sets.entry(symbol).or_default();
let reserved_first_set = result.reserved_first_sets.entry(symbol).or_default();
let first_set = result
.first_sets
.entry(symbol)
.or_insert_with(TokenSet::new);
processed_non_terminals.clear();
symbols_to_process.clear();
symbols_to_process.push(symbol);
while let Some(sym) = symbols_to_process.pop() {
for production in &syntax_grammar.variables[sym.index].productions {
if let Some(step) = production.steps.first() {
if step.symbol.is_terminal() || step.symbol.is_external() {
first_set.insert(step.symbol);
} else if processed_non_terminals.insert(step.symbol) {
while let Some(current_symbol) = symbols_to_process.pop() {
if current_symbol.is_terminal() || current_symbol.is_external() {
first_set.insert(current_symbol);
} else if processed_non_terminals.insert(current_symbol) {
for production in &syntax_grammar.variables[current_symbol.index].productions {
if let Some(step) = production.steps.first() {
symbols_to_process.push(step.symbol);
}
*reserved_first_set = (*reserved_first_set).max(step.reserved_word_set_id);
}
}
}
// The LAST set is defined in a similar way to the FIRST set.
let last_set = result.last_sets.entry(symbol).or_default();
let last_set = result.last_sets.entry(symbol).or_insert_with(TokenSet::new);
processed_non_terminals.clear();
symbols_to_process.clear();
symbols_to_process.push(symbol);
while let Some(sym) = symbols_to_process.pop() {
for production in &syntax_grammar.variables[sym.index].productions {
if let Some(step) = production.steps.last() {
if step.symbol.is_terminal() || step.symbol.is_external() {
last_set.insert(step.symbol);
} else if processed_non_terminals.insert(step.symbol) {
while let Some(current_symbol) = symbols_to_process.pop() {
if current_symbol.is_terminal() || current_symbol.is_external() {
last_set.insert(current_symbol);
} else if processed_non_terminals.insert(current_symbol) {
for production in &syntax_grammar.variables[current_symbol.index].productions {
if let Some(step) = production.steps.last() {
symbols_to_process.push(step.symbol);
}
}
@ -131,75 +124,67 @@ impl<'a> ParseItemSetBuilder<'a> {
// To compute an item set's transitive closure, we find each item in the set
// whose next symbol is a non-terminal, and we add new items to the set for
// each of that symbol's productions. These productions might themselves begin
// each of that symbols' productions. These productions might themselves begin
// with non-terminals, so the process continues recursively. In this process,
// the total set of entries that get added depends only on two things:
//
// * the non-terminal symbol that occurs next in each item
//
// * the set of terminals that can follow that non-terminal symbol in the item
// * the set of non-terminal symbols that occur at each item's current position
// * the set of terminals that occurs after each of these non-terminal symbols
//
// So we can avoid a lot of duplicated recursive work by precomputing, for each
// non-terminal symbol `i`, a final list of *additions* that must be made to an
// item set when symbol `i` occurs as the next symbol in one if its core items.
// The structure of a precomputed *addition* is as follows:
//
// * `item` - the new item that must be added as part of the expansion of the symbol `i`.
//
// * `lookaheads` - the set of possible lookahead tokens that can always come after `item`
// in an expansion of symbol `i`.
//
// * `reserved_lookaheads` - the set of reserved lookahead lookahead tokens that can
// always come after `item` in the expansion of symbol `i`.
//
// item set when `i` occurs as the next symbol in one if its core items. The
// structure of an *addition* is as follows:
// * `item` - the new item that must be added as part of the expansion of `i`
// * `lookaheads` - lookahead tokens that can always come after that item in the expansion
// of `i`
// * `propagates_lookaheads` - a boolean indicating whether or not `item` can occur at the
// *end* of the expansion of symbol `i`, so that i's own current lookahead tokens can
// occur after `item`.
// *end* of the expansion of `i`, so that i's own current lookahead tokens can occur
// after `item`.
//
// Rather than computing these additions recursively, we use an explicit stack.
let empty_lookaheads = TokenSet::new();
let mut stack = Vec::new();
let mut follow_set_info_by_non_terminal = HashMap::<usize, FollowSetInfo>::new();
// Again, rather than computing these additions recursively, we use an explicit
// stack called `entries_to_process`.
for i in 0..syntax_grammar.variables.len() {
let empty_lookaheads = TokenSet::new();
let mut entries_to_process = vec![(i, &empty_lookaheads, true)];
// First, build up a map whose keys are all of the non-terminals that can
// appear at the beginning of non-terminal `i`, and whose values store
// information about the tokens that can follow those non-terminals.
stack.clear();
stack.push((i, &empty_lookaheads, ReservedWordSetId::default(), true));
follow_set_info_by_non_terminal.clear();
while let Some((sym_ix, lookaheads, reserved_word_set_id, propagates_lookaheads)) =
stack.pop()
{
let mut did_add = false;
let info = follow_set_info_by_non_terminal.entry(sym_ix).or_default();
did_add |= info.lookaheads.insert_all(lookaheads);
if reserved_word_set_id > info.reserved_lookaheads {
info.reserved_lookaheads = reserved_word_set_id;
did_add = true;
}
did_add |= propagates_lookaheads && !info.propagates_lookaheads;
info.propagates_lookaheads |= propagates_lookaheads;
if !did_add {
continue;
// information about the tokens that can follow each non-terminal.
let mut follow_set_info_by_non_terminal = HashMap::new();
while let Some(entry) = entries_to_process.pop() {
let (variable_index, lookaheads, propagates_lookaheads) = entry;
let existing_info = follow_set_info_by_non_terminal
.entry(variable_index)
.or_insert_with(|| FollowSetInfo {
lookaheads: TokenSet::new(),
propagates_lookaheads: false,
});
let did_add_follow_set_info;
if propagates_lookaheads {
did_add_follow_set_info = !existing_info.propagates_lookaheads;
existing_info.propagates_lookaheads = true;
} else {
did_add_follow_set_info = existing_info.lookaheads.insert_all(lookaheads);
}
for production in &syntax_grammar.variables[sym_ix].productions {
if let Some(symbol) = production.first_symbol() {
if symbol.is_non_terminal() {
if let Some(next_step) = production.steps.get(1) {
stack.push((
symbol.index,
&result.first_sets[&next_step.symbol],
result.reserved_first_sets[&next_step.symbol],
false,
));
} else {
stack.push((
symbol.index,
lookaheads,
reserved_word_set_id,
propagates_lookaheads,
));
if did_add_follow_set_info {
for production in &syntax_grammar.variables[variable_index].productions {
if let Some(symbol) = production.first_symbol() {
if symbol.is_non_terminal() {
if production.steps.len() == 1 {
entries_to_process.push((
symbol.index,
lookaheads,
propagates_lookaheads,
));
} else {
entries_to_process.push((
symbol.index,
&result.first_sets[&production.steps[1].symbol],
false,
));
}
}
}
}
@ -209,7 +194,7 @@ impl<'a> ParseItemSetBuilder<'a> {
// Store all of those non-terminals' productions, along with their associated
// lookahead info, as *additions* associated with non-terminal `i`.
let additions_for_non_terminal = &mut result.transitive_closure_additions[i];
for (&variable_index, follow_set_info) in &follow_set_info_by_non_terminal {
for (variable_index, follow_set_info) in follow_set_info_by_non_terminal {
let variable = &syntax_grammar.variables[variable_index];
let non_terminal = Symbol::non_terminal(variable_index);
let variable_index = variable_index as u32;
@ -252,25 +237,22 @@ impl<'a> ParseItemSetBuilder<'a> {
result
}
pub fn transitive_closure(&self, item_set: &ParseItemSet<'a>) -> ParseItemSet<'a> {
pub fn transitive_closure(&mut self, item_set: &ParseItemSet<'a>) -> ParseItemSet<'a> {
let mut result = ParseItemSet::default();
for entry in &item_set.entries {
for (item, lookaheads) in &item_set.entries {
if let Some(productions) = self
.inlines
.inlined_productions(entry.item.production, entry.item.step_index)
.inlined_productions(item.production, item.step_index)
{
for production in productions {
self.add_item(
&mut result,
&ParseItemSetEntry {
item: entry.item.substitute_production(production),
lookaheads: entry.lookaheads.clone(),
following_reserved_word_set: entry.following_reserved_word_set,
},
item.substitute_production(production),
lookaheads,
);
}
} else {
self.add_item(&mut result, entry);
self.add_item(&mut result, *item, lookaheads);
}
}
result
@ -280,68 +262,34 @@ impl<'a> ParseItemSetBuilder<'a> {
&self.first_sets[symbol]
}
pub fn reserved_first_set(&self, symbol: &Symbol) -> Option<&TokenSet> {
let id = *self.reserved_first_sets.get(symbol)?;
Some(&self.syntax_grammar.reserved_word_sets[id.0])
}
pub fn last_set(&self, symbol: &Symbol) -> &TokenSet {
&self.last_sets[symbol]
}
fn add_item(&self, set: &mut ParseItemSet<'a>, entry: &ParseItemSetEntry<'a>) {
if let Some(step) = entry.item.step() {
fn add_item(&self, set: &mut ParseItemSet<'a>, item: ParseItem<'a>, lookaheads: &TokenSet) {
if let Some(step) = item.step() {
if step.symbol.is_non_terminal() {
let next_step = entry.item.successor().step();
let next_step = item.successor().step();
// Determine which tokens can follow this non-terminal.
let (following_tokens, following_reserved_tokens) =
if let Some(next_step) = next_step {
(
self.first_sets.get(&next_step.symbol).unwrap(),
*self.reserved_first_sets.get(&next_step.symbol).unwrap(),
)
} else {
(&entry.lookaheads, entry.following_reserved_word_set)
};
let following_tokens = next_step.map_or(lookaheads, |next_step| {
self.first_sets.get(&next_step.symbol).unwrap()
});
// Use the pre-computed *additions* to expand the non-terminal.
for addition in &self.transitive_closure_additions[step.symbol.index] {
let entry = set.insert(addition.item);
entry.lookaheads.insert_all(&addition.info.lookaheads);
if let Some(word_token) = self.syntax_grammar.word_token {
if addition.info.lookaheads.contains(&word_token) {
entry.following_reserved_word_set = entry
.following_reserved_word_set
.max(addition.info.reserved_lookaheads);
}
}
let lookaheads = set.insert(addition.item, &addition.info.lookaheads);
if addition.info.propagates_lookaheads {
entry.lookaheads.insert_all(following_tokens);
if let Some(word_token) = self.syntax_grammar.word_token {
if following_tokens.contains(&word_token) {
entry.following_reserved_word_set = entry
.following_reserved_word_set
.max(following_reserved_tokens);
}
}
lookaheads.insert_all(following_tokens);
}
}
}
}
let e = set.insert(entry.item);
e.lookaheads.insert_all(&entry.lookaheads);
e.following_reserved_word_set = e
.following_reserved_word_set
.max(entry.following_reserved_word_set);
set.insert(item, lookaheads);
}
}
impl fmt::Debug for ParseItemSetBuilder<'_> {
impl<'a> fmt::Debug for ParseItemSetBuilder<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
writeln!(f, "ParseItemSetBuilder {{")?;

View file

@ -3,15 +3,14 @@ use std::{
mem,
};
use log::debug;
use log::info;
use super::token_conflicts::TokenConflictMap;
use crate::{
use crate::generate::{
dedup::split_state_id_groups,
grammars::{LexicalGrammar, SyntaxGrammar, VariableType},
rules::{AliasMap, Symbol, TokenSet},
tables::{GotoAction, ParseAction, ParseState, ParseStateId, ParseTable, ParseTableEntry},
OptLevel,
};
pub fn minimize_parse_table(
@ -21,7 +20,6 @@ pub fn minimize_parse_table(
simple_aliases: &AliasMap,
token_conflict_map: &TokenConflictMap,
keywords: &TokenSet,
optimizations: OptLevel,
) {
let mut minimizer = Minimizer {
parse_table,
@ -31,9 +29,7 @@ pub fn minimize_parse_table(
keywords,
simple_aliases,
};
if optimizations.contains(OptLevel::MergeStates) {
minimizer.merge_compatible_states();
}
minimizer.merge_compatible_states();
minimizer.remove_unit_reductions();
minimizer.remove_unused_states();
minimizer.reorder_states_by_descending_size();
@ -48,7 +44,7 @@ struct Minimizer<'a> {
simple_aliases: &'a AliasMap,
}
impl Minimizer<'_> {
impl<'a> Minimizer<'a> {
fn remove_unit_reductions(&mut self) {
let mut aliased_symbols = HashSet::new();
for variable in &self.syntax_grammar.variables {
@ -74,17 +70,18 @@ impl Minimizer<'_> {
production_id: 0,
symbol,
..
} if !self.simple_aliases.contains_key(symbol)
&& !self.syntax_grammar.supertype_symbols.contains(symbol)
&& !self.syntax_grammar.extra_symbols.contains(symbol)
&& !aliased_symbols.contains(symbol)
&& self.syntax_grammar.variables[symbol.index].kind
!= VariableType::Named
&& (unit_reduction_symbol.is_none()
|| unit_reduction_symbol == Some(symbol)) =>
{
unit_reduction_symbol = Some(symbol);
continue;
} => {
if !self.simple_aliases.contains_key(symbol)
&& !self.syntax_grammar.supertype_symbols.contains(symbol)
&& !aliased_symbols.contains(symbol)
&& self.syntax_grammar.variables[symbol.index].kind
!= VariableType::Named
&& (unit_reduction_symbol.is_none()
|| unit_reduction_symbol == Some(symbol))
{
unit_reduction_symbol = Some(symbol);
continue;
}
}
_ => {}
}
@ -155,7 +152,9 @@ impl Minimizer<'_> {
&mut group_ids_by_state_id,
0,
|left, right, groups| self.state_successors_differ(left, right, groups),
) {}
) {
continue;
}
let error_group_index = state_ids_by_group_id
.iter()
@ -172,12 +171,17 @@ impl Minimizer<'_> {
let mut new_states = Vec::with_capacity(state_ids_by_group_id.len());
for state_ids in &state_ids_by_group_id {
// Initialize the new state based on the first old state in the group.
let mut parse_state = mem::take(&mut self.parse_table.states[state_ids[0]]);
let mut parse_state = ParseState::default();
mem::swap(&mut parse_state, &mut self.parse_table.states[state_ids[0]]);
// Extend the new state with all of the actions from the other old states
// in the group.
for state_id in &state_ids[1..] {
let other_parse_state = mem::take(&mut self.parse_table.states[*state_id]);
let mut other_parse_state = ParseState::default();
mem::swap(
&mut other_parse_state,
&mut self.parse_table.states[*state_id],
);
parse_state
.terminal_entries
@ -185,12 +189,6 @@ impl Minimizer<'_> {
parse_state
.nonterminal_entries
.extend(other_parse_state.nonterminal_entries);
parse_state
.reserved_words
.insert_all(&other_parse_state.reserved_words);
for symbol in parse_state.terminal_entries.keys() {
parse_state.reserved_words.remove(symbol);
}
}
// Update the new state's outgoing references using the new grouping.
@ -219,14 +217,24 @@ impl Minimizer<'_> {
) {
return true;
}
} else if self.token_conflicts(left_state.id, right_state.id, right_state, *token) {
} else if self.token_conflicts(
left_state.id,
right_state.id,
right_state.terminal_entries.keys(),
*token,
) {
return true;
}
}
for token in right_state.terminal_entries.keys() {
if !left_state.terminal_entries.contains_key(token)
&& self.token_conflicts(left_state.id, right_state.id, left_state, *token)
&& self.token_conflicts(
left_state.id,
right_state.id,
left_state.terminal_entries.keys(),
*token,
)
{
return true;
}
@ -248,7 +256,7 @@ impl Minimizer<'_> {
let group1 = group_ids_by_state_id[*s1];
let group2 = group_ids_by_state_id[*s2];
if group1 != group2 {
debug!(
info!(
"split states {} {} - successors for {} are split: {s1} {s2}",
state1.id,
state2.id,
@ -264,12 +272,12 @@ impl Minimizer<'_> {
for (symbol, s1) in &state1.nonterminal_entries {
if let Some(s2) = state2.nonterminal_entries.get(symbol) {
match (s1, s2) {
(GotoAction::ShiftExtra, GotoAction::ShiftExtra) => {}
(GotoAction::ShiftExtra, GotoAction::ShiftExtra) => continue,
(GotoAction::Goto(s1), GotoAction::Goto(s2)) => {
let group1 = group_ids_by_state_id[*s1];
let group2 = group_ids_by_state_id[*s2];
if group1 != group2 {
debug!(
info!(
"split states {} {} - successors for {} are split: {s1} {s2}",
state1.id,
state2.id,
@ -299,14 +307,16 @@ impl Minimizer<'_> {
let actions1 = &entry1.actions;
let actions2 = &entry2.actions;
if actions1.len() != actions2.len() {
debug!(
info!(
"split states {state_id1} {state_id2} - differing action counts for token {}",
self.symbol_name(token)
);
return true;
}
for (action1, action2) in actions1.iter().zip(actions2.iter()) {
for (i, action1) in actions1.iter().enumerate() {
let action2 = &actions2[i];
// Two shift actions are equivalent if their destinations are in the same group.
if let (
ParseAction::Shift {
@ -324,13 +334,13 @@ impl Minimizer<'_> {
if group1 == group2 && is_repetition1 == is_repetition2 {
continue;
}
debug!(
info!(
"split states {state_id1} {state_id2} - successors for {} are split: {s1} {s2}",
self.symbol_name(token),
);
return true;
} else if action1 != action2 {
debug!(
info!(
"split states {state_id1} {state_id2} - unequal actions for {}",
self.symbol_name(token),
);
@ -341,32 +351,28 @@ impl Minimizer<'_> {
false
}
fn token_conflicts(
fn token_conflicts<'b>(
&self,
left_id: ParseStateId,
right_id: ParseStateId,
right_state: &ParseState,
existing_tokens: impl Iterator<Item = &'b Symbol>,
new_token: Symbol,
) -> bool {
if new_token == Symbol::end_of_nonterminal_extra() {
debug!("split states {left_id} {right_id} - end of non-terminal extra",);
info!("split states {left_id} {right_id} - end of non-terminal extra",);
return true;
}
// Do not add external tokens; they could conflict lexically with any of the state's
// existing lookahead tokens.
if new_token.is_external() {
debug!(
info!(
"split states {left_id} {right_id} - external token {}",
self.symbol_name(&new_token),
);
return true;
}
if right_state.reserved_words.contains(&new_token) {
return false;
}
// Do not add tokens which are both internal and external. Their validity could
// influence the behavior of the external scanner.
if self
@ -375,7 +381,7 @@ impl Minimizer<'_> {
.iter()
.any(|external| external.corresponding_internal_token == Some(new_token))
{
debug!(
info!(
"split states {left_id} {right_id} - internal/external token {}",
self.symbol_name(&new_token),
);
@ -383,30 +389,23 @@ impl Minimizer<'_> {
}
// Do not add a token if it conflicts with an existing token.
for token in right_state.terminal_entries.keys().copied() {
if !token.is_terminal() {
continue;
}
if self.syntax_grammar.word_token == Some(token) && self.keywords.contains(&new_token) {
continue;
}
if self.syntax_grammar.word_token == Some(new_token) && self.keywords.contains(&token) {
continue;
}
if self
.token_conflict_map
.does_conflict(new_token.index, token.index)
|| self
for token in existing_tokens {
if token.is_terminal()
&& !(self.syntax_grammar.word_token == Some(*token)
&& self.keywords.contains(&new_token))
&& !(self.syntax_grammar.word_token == Some(new_token)
&& self.keywords.contains(token))
&& (self
.token_conflict_map
.does_match_same_string(new_token.index, token.index)
.does_conflict(new_token.index, token.index)
|| self
.token_conflict_map
.does_match_same_string(new_token.index, token.index))
{
debug!(
"split states {} {} - token {} conflicts with {}",
left_id,
right_id,
info!(
"split states {left_id} {right_id} - token {} conflicts with {}",
self.symbol_name(&new_token),
self.symbol_name(&token),
self.symbol_name(token),
);
return true;
}

View file

@ -8,32 +8,30 @@ mod token_conflicts;
use std::collections::{BTreeSet, HashMap};
use anyhow::Result;
pub use build_lex_table::LARGE_CHARACTER_RANGE_COUNT;
use build_parse_table::BuildTableResult;
pub use build_parse_table::ParseTableBuilderError;
use log::{debug, info};
use log::info;
use self::{
build_lex_table::build_lex_table,
build_parse_table::{build_parse_table, ParseStateInfo},
coincident_tokens::CoincidentTokenIndex,
item_set_builder::ParseItemSetBuilder,
minimize_parse_table::minimize_parse_table,
token_conflicts::TokenConflictMap,
};
use crate::{
use crate::generate::{
grammars::{InlinedProductionMap, LexicalGrammar, SyntaxGrammar},
nfa::{CharacterSet, NfaCursor},
node_types::VariableInfo,
rules::{AliasMap, Symbol, SymbolType, TokenSet},
tables::{LexTable, ParseAction, ParseTable, ParseTableEntry},
OptLevel,
};
pub struct Tables {
pub parse_table: ParseTable,
pub main_lex_table: LexTable,
pub keyword_lex_table: LexTable,
pub word_token: Option<Symbol>,
pub large_character_sets: Vec<(Option<Symbol>, CharacterSet)>,
}
@ -44,17 +42,9 @@ pub fn build_tables(
variable_info: &[VariableInfo],
inlines: &InlinedProductionMap,
report_symbol_name: Option<&str>,
optimizations: OptLevel,
) -> BuildTableResult<Tables> {
let item_set_builder = ParseItemSetBuilder::new(syntax_grammar, lexical_grammar, inlines);
let following_tokens =
get_following_tokens(syntax_grammar, lexical_grammar, inlines, &item_set_builder);
let (mut parse_table, parse_state_info) = build_parse_table(
syntax_grammar,
lexical_grammar,
item_set_builder,
variable_info,
)?;
) -> Result<Tables> {
let (mut parse_table, following_tokens, parse_state_info) =
build_parse_table(syntax_grammar, lexical_grammar, inlines, variable_info)?;
let token_conflict_map = TokenConflictMap::new(lexical_grammar, following_tokens);
let coincident_token_index = CoincidentTokenIndex::new(&parse_table, lexical_grammar);
let keywords = identify_keywords(
@ -80,7 +70,6 @@ pub fn build_tables(
simple_aliases,
&token_conflict_map,
&keywords,
optimizations,
);
let lex_tables = build_lex_table(
&mut parse_table,
@ -103,59 +92,15 @@ pub fn build_tables(
);
}
if parse_table.states.len() > u16::MAX as usize {
Err(ParseTableBuilderError::StateCount(parse_table.states.len()))?;
}
Ok(Tables {
parse_table,
main_lex_table: lex_tables.main_lex_table,
keyword_lex_table: lex_tables.keyword_lex_table,
large_character_sets: lex_tables.large_character_sets,
word_token: syntax_grammar.word_token,
})
}
fn get_following_tokens(
syntax_grammar: &SyntaxGrammar,
lexical_grammar: &LexicalGrammar,
inlines: &InlinedProductionMap,
builder: &ParseItemSetBuilder,
) -> Vec<TokenSet> {
let mut result = vec![TokenSet::new(); lexical_grammar.variables.len()];
let productions = syntax_grammar
.variables
.iter()
.flat_map(|v| &v.productions)
.chain(&inlines.productions);
let all_tokens = (0..result.len())
.map(Symbol::terminal)
.collect::<TokenSet>();
for production in productions {
for i in 1..production.steps.len() {
let left_tokens = builder.last_set(&production.steps[i - 1].symbol);
let right_tokens = builder.first_set(&production.steps[i].symbol);
let right_reserved_tokens = builder.reserved_first_set(&production.steps[i].symbol);
for left_token in left_tokens.iter() {
if left_token.is_terminal() {
result[left_token.index].insert_all_terminals(right_tokens);
if let Some(reserved_tokens) = right_reserved_tokens {
result[left_token.index].insert_all_terminals(reserved_tokens);
}
}
}
}
}
for extra in &syntax_grammar.extra_symbols {
if extra.is_terminal() {
for entry in &mut result {
entry.insert(*extra);
}
result[extra.index] = all_tokens.clone();
}
}
result
}
fn populate_error_state(
parse_table: &mut ParseTable,
syntax_grammar: &SyntaxGrammar,
@ -179,7 +124,7 @@ fn populate_error_state(
if conflicts_with_other_tokens {
None
} else {
debug!(
info!(
"error recovery - token {} has no conflicts",
lexical_grammar.variables[i].name
);
@ -205,14 +150,14 @@ fn populate_error_state(
!coincident_token_index.contains(symbol, *t)
&& token_conflict_map.does_conflict(symbol.index, t.index)
}) {
debug!(
info!(
"error recovery - exclude token {} because of conflict with {}",
lexical_grammar.variables[i].name, lexical_grammar.variables[t.index].name
);
continue;
}
}
debug!(
info!(
"error recovery - include token {}",
lexical_grammar.variables[i].name
);
@ -263,7 +208,7 @@ fn populate_used_symbols(
// ensure that a subtree's symbol can be successfully reassigned to the word token
// without having to move the subtree to the heap.
// See https://github.com/tree-sitter/tree-sitter/issues/258
if syntax_grammar.word_token.is_some_and(|t| t.index == i) {
if syntax_grammar.word_token.map_or(false, |t| t.index == i) {
parse_table.symbols.insert(1, Symbol::terminal(i));
} else {
parse_table.symbols.push(Symbol::terminal(i));
@ -345,7 +290,7 @@ fn identify_keywords(
&& token_conflict_map.does_match_same_string(i, word_token.index)
&& !token_conflict_map.does_match_different_string(i, word_token.index)
{
debug!(
info!(
"Keywords - add candidate {}",
lexical_grammar.variables[i].name
);
@ -364,7 +309,7 @@ fn identify_keywords(
if other_token != *token
&& token_conflict_map.does_match_same_string(other_token.index, token.index)
{
debug!(
info!(
"Keywords - exclude {} because it matches the same string as {}",
lexical_grammar.variables[token.index].name,
lexical_grammar.variables[other_token.index].name
@ -406,7 +351,7 @@ fn identify_keywords(
word_token.index,
other_index,
) {
debug!(
info!(
"Keywords - exclude {} because of conflict with {}",
lexical_grammar.variables[token.index].name,
lexical_grammar.variables[other_index].name
@ -415,7 +360,7 @@ fn identify_keywords(
}
}
debug!(
info!(
"Keywords - include {}",
lexical_grammar.variables[token.index].name,
);
@ -469,9 +414,9 @@ fn report_state_info<'a>(
for (i, state) in parse_table.states.iter().enumerate() {
all_state_indices.insert(i);
let item_set = &parse_state_info[state.id];
for entry in &item_set.1.entries {
if !entry.item.is_augmented() {
symbols_with_state_indices[entry.item.variable_index as usize]
for (item, _) in &item_set.1.entries {
if !item.is_augmented() {
symbols_with_state_indices[item.variable_index as usize]
.1
.insert(i);
}
@ -487,14 +432,14 @@ fn report_state_info<'a>(
.max()
.unwrap();
for (symbol, states) in &symbols_with_state_indices {
info!(
eprintln!(
"{:width$}\t{}",
syntax_grammar.variables[symbol.index].name,
states.len(),
width = max_symbol_name_length
);
}
info!("");
eprintln!();
let state_indices = if report_symbol_name == "*" {
Some(&all_state_indices)
@ -517,27 +462,22 @@ fn report_state_info<'a>(
for state_index in state_indices {
let id = parse_table.states[state_index].id;
let (preceding_symbols, item_set) = &parse_state_info[id];
info!("state index: {state_index}");
info!("state id: {id}");
info!(
"symbol sequence: {}",
preceding_symbols
.iter()
.map(|symbol| {
if symbol.is_terminal() {
lexical_grammar.variables[symbol.index].name.clone()
} else if symbol.is_external() {
syntax_grammar.external_tokens[symbol.index].name.clone()
} else {
syntax_grammar.variables[symbol.index].name.clone()
}
})
.collect::<Vec<_>>()
.join(" ")
);
info!(
eprintln!("state index: {state_index}");
eprintln!("state id: {id}");
eprint!("symbol sequence:");
for symbol in preceding_symbols {
let name = if symbol.is_terminal() {
&lexical_grammar.variables[symbol.index].name
} else if symbol.is_external() {
&syntax_grammar.external_tokens[symbol.index].name
} else {
&syntax_grammar.variables[symbol.index].name
};
eprint!(" {name}");
}
eprintln!(
"\nitems:\n{}",
item::ParseItemSetDisplay(item_set, syntax_grammar, lexical_grammar),
self::item::ParseItemSetDisplay(item_set, syntax_grammar, lexical_grammar,),
);
}
}

View file

@ -1,6 +1,6 @@
use std::{cmp::Ordering, collections::HashSet, fmt};
use crate::{
use crate::generate::{
build_tables::item::TokenSetDisplay,
grammars::{LexicalGrammar, SyntaxGrammar},
nfa::{CharacterSet, NfaCursor, NfaTransition},
@ -28,7 +28,7 @@ pub struct TokenConflictMap<'a> {
impl<'a> TokenConflictMap<'a> {
/// Create a token conflict map based on a lexical grammar, which describes the structure
/// of each token, and a `following_token` map, which indicates which tokens may be appear
/// each token, and a `following_token` map, which indicates which tokens may be appear
/// immediately after each other token.
///
/// This analyzes the possible kinds of overlap between each pair of tokens and stores
@ -145,7 +145,7 @@ impl<'a> TokenConflictMap<'a> {
}
}
impl fmt::Debug for TokenConflictMap<'_> {
impl<'a> fmt::Debug for TokenConflictMap<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
writeln!(f, "TokenConflictMap {{")?;
@ -373,7 +373,7 @@ fn compute_conflict_status(
#[cfg(test)]
mod tests {
use super::*;
use crate::{
use crate::generate::{
grammars::{Variable, VariableType},
prepare_grammar::{expand_tokens, ExtractedLexicalGrammar},
rules::{Precedence, Rule, Symbol},

View file

@ -3,7 +3,7 @@ pub fn split_state_id_groups<S>(
state_ids_by_group_id: &mut Vec<Vec<usize>>,
group_ids_by_state_id: &mut [usize],
start_group_id: usize,
mut should_split: impl FnMut(&S, &S, &[usize]) -> bool,
mut f: impl FnMut(&S, &S, &[usize]) -> bool,
) -> bool {
let mut result = false;
@ -33,7 +33,7 @@ pub fn split_state_id_groups<S>(
}
let right_state = &states[right_state_id];
if should_split(left_state, right_state, group_ids_by_state_id) {
if f(left_state, right_state, group_ids_by_state_id) {
split_state_ids.push(right_state_id);
}

View file

@ -16,7 +16,6 @@ function alias(rule, value) {
result.value = value.symbol.name;
return result;
case Object:
case GrammarSymbol:
if (typeof value.type === 'string' && value.type === 'SYMBOL') {
result.named = true;
result.value = value.name;
@ -70,7 +69,7 @@ function prec(number, rule) {
};
}
prec.left = function (number, rule) {
prec.left = function(number, rule) {
if (rule == null) {
rule = number;
number = 0;
@ -92,7 +91,7 @@ prec.left = function (number, rule) {
};
}
prec.right = function (number, rule) {
prec.right = function(number, rule) {
if (rule == null) {
rule = number;
number = 0;
@ -114,7 +113,7 @@ prec.right = function (number, rule) {
};
}
prec.dynamic = function (number, rule) {
prec.dynamic = function(number, rule) {
checkPrecedence(number);
checkArguments(
arguments,
@ -154,26 +153,11 @@ function seq(...elements) {
};
}
class GrammarSymbol {
constructor(name) {
this.type = "SYMBOL";
this.name = name;
}
}
function reserved(wordset, rule) {
if (typeof wordset !== 'string') {
throw new Error('Invalid reserved word set name: ' + wordset)
}
return {
type: "RESERVED",
content: normalize(rule),
context_name: wordset,
}
}
function sym(name) {
return new GrammarSymbol(name);
return {
type: "SYMBOL",
name
};
}
function token(value) {
@ -184,7 +168,7 @@ function token(value) {
};
}
token.immediate = function (value) {
token.immediate = function(value) {
checkArguments(arguments, arguments.length, token.immediate, 'token.immediate', '', 'literal');
return {
type: "IMMEDIATE_TOKEN",
@ -211,11 +195,6 @@ function normalize(value) {
type: 'PATTERN',
value: value.source
};
case RustRegex:
return {
type: 'PATTERN',
value: value.value
};
case ReferenceError:
throw value
default:
@ -257,7 +236,6 @@ function grammar(baseGrammar, options) {
inline: [],
supertypes: [],
precedences: [],
reserved: {},
};
} else {
baseGrammar = baseGrammar.grammar;
@ -331,28 +309,6 @@ function grammar(baseGrammar, options) {
}
}
let reserved = baseGrammar.reserved;
if (options.reserved) {
if (typeof options.reserved !== "object") {
throw new Error("Grammar's 'reserved' property must be an object.");
}
for (const reservedWordSetName of Object.keys(options.reserved)) {
const reservedWordSetFn = options.reserved[reservedWordSetName]
if (typeof reservedWordSetFn !== "function") {
throw new Error(`Grammar reserved word sets must all be functions. '${reservedWordSetName}' is not.`);
}
const reservedTokens = reservedWordSetFn.call(ruleBuilder, ruleBuilder, baseGrammar.reserved[reservedWordSetName]);
if (!Array.isArray(reservedTokens)) {
throw new Error(`Grammar's reserved word set functions must all return arrays of rules. '${reservedWordSetName}' does not.`);
}
reserved[reservedWordSetName] = reservedTokens.map(normalize);
}
}
let extras = baseGrammar.extras.slice();
if (options.extras) {
if (typeof options.extras !== "function") {
@ -483,17 +439,10 @@ function grammar(baseGrammar, options) {
externals,
inline,
supertypes,
reserved,
},
};
}
class RustRegex {
constructor(value) {
this.value = value;
}
}
function checkArguments(args, ruleCount, caller, callerName, suffix = '', argType = 'rule') {
// Allow for .map() usage where additional arguments are index and the entire array.
const isMapCall = ruleCount === 3 && typeof args[1] === 'number' && Array.isArray(args[2]);
@ -517,7 +466,6 @@ function checkPrecedence(value) {
}
function getEnv(name) {
if (globalThis.native) return globalThis.__ts_grammar_path;
if (globalThis.process) return process.env[name]; // Node/Bun
if (globalThis.Deno) return Deno.env.get(name); // Deno
throw Error("Unsupported JS runtime");
@ -530,31 +478,16 @@ globalThis.optional = optional;
globalThis.prec = prec;
globalThis.repeat = repeat;
globalThis.repeat1 = repeat1;
globalThis.reserved = reserved;
globalThis.seq = seq;
globalThis.sym = sym;
globalThis.token = token;
globalThis.grammar = grammar;
globalThis.field = field;
globalThis.RustRegex = RustRegex;
const grammarPath = getEnv("TREE_SITTER_GRAMMAR_PATH");
let result = await import(grammarPath);
let grammarObj = result.default?.grammar ?? result.grammar;
const result = await import(getEnv("TREE_SITTER_GRAMMAR_PATH"));
const output = JSON.stringify(result.default?.grammar ?? result.grammar);
if (globalThis.native && !grammarObj) {
grammarObj = module.exports.grammar;
}
const object = {
"$schema": "https://tree-sitter.github.io/tree-sitter/assets/schemas/grammar.schema.json",
...grammarObj,
};
const output = JSON.stringify(object);
if (globalThis.native) {
globalThis.output = output;
} else if (globalThis.process) { // Node/Bun
if (globalThis.process) { // Node/Bun
process.stdout.write(output);
} else if (globalThis.Deno) { // Deno
Deno.stdout.writeSync(new TextEncoder().encode(output));

View file

@ -1,6 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Tree-sitter grammar specification",
"title": "tree-sitter grammar specification",
"type": "object",
"required": ["name", "rules"],
@ -8,18 +8,14 @@
"additionalProperties": false,
"properties": {
"$schema": {
"type": "string"
},
"name": {
"description": "The name of the grammar",
"description": "the name of the grammar",
"type": "string",
"pattern": "^[a-zA-Z_]\\w*"
},
"inherits": {
"description": "The name of the parent grammar",
"description": "the name of the parent grammar",
"type": "string",
"pattern": "^[a-zA-Z_]\\w*"
},
@ -36,7 +32,6 @@
"extras": {
"type": "array",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/rule"
}
@ -44,36 +39,16 @@
"precedences": {
"type": "array",
"uniqueItems": true,
"items": {
"type": "array",
"uniqueItems": true,
"items": {
"oneOf": [
{ "type": "string" },
{ "$ref": "#/definitions/symbol-rule" }
]
"$ref": "#/definitions/rule"
}
}
},
"reserved": {
"type": "object",
"patternProperties": {
"^[a-zA-Z_]\\w*$": {
"type": "array",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/rule"
}
}
},
"additionalProperties": false
},
"externals": {
"type": "array",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/rule"
}
@ -81,7 +56,6 @@
"inline": {
"type": "array",
"uniqueItems": true,
"items": {
"type": "string",
"pattern": "^[a-zA-Z_]\\w*$"
@ -90,10 +64,8 @@
"conflicts": {
"type": "array",
"uniqueItems": true,
"items": {
"type": "array",
"uniqueItems": true,
"items": {
"type": "string",
"pattern": "^[a-zA-Z_]\\w*$"
@ -107,11 +79,10 @@
},
"supertypes": {
"description": "A list of hidden rule names that should be considered supertypes in the generated node types file. See https://tree-sitter.github.io/tree-sitter/using-parsers/6-static-node-types.",
"description": "A list of hidden rule names that should be considered supertypes in the generated node types file. See https://tree-sitter.github.io/tree-sitter/using-parsers#static-node-types.",
"type": "array",
"uniqueItems": true,
"items": {
"description": "The name of a rule in `rules` or `extras`",
"description": "the name of a rule in `rules` or `extras`",
"type": "string"
}
}
@ -123,7 +94,7 @@
"properties": {
"type": {
"type": "string",
"const": "BLANK"
"pattern": "^BLANK$"
}
},
"required": ["type"]
@ -134,7 +105,7 @@
"properties": {
"type": {
"type": "string",
"const": "STRING"
"pattern": "^STRING$"
},
"value": {
"type": "string"
@ -148,7 +119,7 @@
"properties": {
"type": {
"type": "string",
"const": "PATTERN"
"pattern": "^PATTERN$"
},
"value": { "type": "string" },
"flags": { "type": "string" }
@ -161,7 +132,7 @@
"properties": {
"type": {
"type": "string",
"const": "SYMBOL"
"pattern": "^SYMBOL$"
},
"name": { "type": "string" }
},
@ -173,7 +144,7 @@
"properties": {
"type": {
"type": "string",
"const": "SEQ"
"pattern": "^SEQ$"
},
"members": {
"type": "array",
@ -190,7 +161,7 @@
"properties": {
"type": {
"type": "string",
"const": "CHOICE"
"pattern": "^CHOICE$"
},
"members": {
"type": "array",
@ -207,10 +178,14 @@
"properties": {
"type": {
"type": "string",
"const": "ALIAS"
"pattern": "^ALIAS$"
},
"value": {
"type": "string"
},
"named": {
"type": "boolean"
},
"value": { "type": "string" },
"named": { "type": "boolean" },
"content": {
"$ref": "#/definitions/rule"
}
@ -223,7 +198,7 @@
"properties": {
"type": {
"type": "string",
"const": "REPEAT"
"pattern": "^REPEAT$"
},
"content": {
"$ref": "#/definitions/rule"
@ -237,7 +212,7 @@
"properties": {
"type": {
"type": "string",
"const": "REPEAT1"
"pattern": "^REPEAT1$"
},
"content": {
"$ref": "#/definitions/rule"
@ -246,30 +221,12 @@
"required": ["type", "content"]
},
"reserved-rule": {
"type": "object",
"properties": {
"type": {
"type": "string",
"const": "RESERVED"
},
"context_name": { "type": "string" },
"content": {
"$ref": "#/definitions/rule"
}
},
"required": ["type", "context_name", "content"]
},
"token-rule": {
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [
"TOKEN",
"IMMEDIATE_TOKEN"
]
"pattern": "^(TOKEN|IMMEDIATE_TOKEN)$"
},
"content": {
"$ref": "#/definitions/rule"
@ -283,7 +240,7 @@
"name": { "type": "string" },
"type": {
"type": "string",
"const": "FIELD"
"pattern": "^FIELD$"
},
"content": {
"$ref": "#/definitions/rule"
@ -297,12 +254,7 @@
"properties": {
"type": {
"type": "string",
"enum": [
"PREC",
"PREC_LEFT",
"PREC_RIGHT",
"PREC_DYNAMIC"
]
"pattern": "^(PREC|PREC_LEFT|PREC_RIGHT|PREC_DYNAMIC)$"
},
"value": {
"oneof": [
@ -328,7 +280,6 @@
{ "$ref": "#/definitions/choice-rule" },
{ "$ref": "#/definitions/repeat1-rule" },
{ "$ref": "#/definitions/repeat-rule" },
{ "$ref": "#/definitions/reserved-rule" },
{ "$ref": "#/definitions/token-rule" },
{ "$ref": "#/definitions/field-rule" },
{ "$ref": "#/definitions/prec-rule" }

View file

@ -0,0 +1,685 @@
use std::{
fs,
fs::File,
io::BufReader,
path::{Path, PathBuf},
str,
};
use anyhow::{anyhow, Context, Result};
use heck::{ToKebabCase, ToShoutySnakeCase, ToSnakeCase, ToUpperCamelCase};
use indoc::indoc;
use serde::Deserialize;
use serde_json::{json, Map, Value};
use super::write_file;
const CLI_VERSION: &str = env!("CARGO_PKG_VERSION");
const CLI_VERSION_PLACEHOLDER: &str = "CLI_VERSION";
const PARSER_NAME_PLACEHOLDER: &str = "PARSER_NAME";
const CAMEL_PARSER_NAME_PLACEHOLDER: &str = "CAMEL_PARSER_NAME";
const UPPER_PARSER_NAME_PLACEHOLDER: &str = "UPPER_PARSER_NAME";
const LOWER_PARSER_NAME_PLACEHOLDER: &str = "LOWER_PARSER_NAME";
const GRAMMAR_JS_TEMPLATE: &str = include_str!("./templates/grammar.js");
const PACKAGE_JSON_TEMPLATE: &str = include_str!("./templates/package.json");
const GITIGNORE_TEMPLATE: &str = include_str!("./templates/gitignore");
const GITATTRIBUTES_TEMPLATE: &str = include_str!("./templates/gitattributes");
const EDITORCONFIG_TEMPLATE: &str = include_str!("./templates/.editorconfig");
const RUST_BINDING_VERSION: &str = env!("CARGO_PKG_VERSION");
const RUST_BINDING_VERSION_PLACEHOLDER: &str = "RUST_BINDING_VERSION";
const LIB_RS_TEMPLATE: &str = include_str!("./templates/lib.rs");
const BUILD_RS_TEMPLATE: &str = include_str!("./templates/build.rs");
const CARGO_TOML_TEMPLATE: &str = include_str!("./templates/_cargo.toml");
const INDEX_JS_TEMPLATE: &str = include_str!("./templates/index.js");
const INDEX_D_TS_TEMPLATE: &str = include_str!("./templates/index.d.ts");
const JS_BINDING_CC_TEMPLATE: &str = include_str!("./templates/js-binding.cc");
const BINDING_GYP_TEMPLATE: &str = include_str!("./templates/binding.gyp");
const BINDING_TEST_JS_TEMPLATE: &str = include_str!("./templates/binding_test.js");
const MAKEFILE_TEMPLATE: &str = include_str!("./templates/makefile");
const PARSER_NAME_H_TEMPLATE: &str = include_str!("./templates/PARSER_NAME.h");
const PARSER_NAME_PC_IN_TEMPLATE: &str = include_str!("./templates/PARSER_NAME.pc.in");
const GO_MOD_TEMPLATE: &str = include_str!("./templates/go.mod");
const BINDING_GO_TEMPLATE: &str = include_str!("./templates/binding.go");
const BINDING_TEST_GO_TEMPLATE: &str = include_str!("./templates/binding_test.go");
const SETUP_PY_TEMPLATE: &str = include_str!("./templates/setup.py");
const INIT_PY_TEMPLATE: &str = include_str!("./templates/__init__.py");
const INIT_PYI_TEMPLATE: &str = include_str!("./templates/__init__.pyi");
const PYPROJECT_TOML_TEMPLATE: &str = include_str!("./templates/pyproject.toml");
const PY_BINDING_C_TEMPLATE: &str = include_str!("./templates/py-binding.c");
const TEST_BINDING_PY_TEMPLATE: &str = include_str!("./templates/test_binding.py");
const PACKAGE_SWIFT_TEMPLATE: &str = include_str!("./templates/package.swift");
const TESTS_SWIFT_TEMPLATE: &str = include_str!("./templates/tests.swift");
#[derive(Deserialize, Debug)]
struct LanguageConfiguration {}
#[derive(Deserialize, Debug)]
pub struct PackageJSON {
#[serde(rename = "tree-sitter")]
tree_sitter: Option<Vec<LanguageConfiguration>>,
}
pub fn path_in_ignore(repo_path: &Path) -> bool {
[
"bindings",
"build",
"examples",
"node_modules",
"queries",
"script",
"src",
"target",
"test",
"types",
]
.iter()
.any(|dir| repo_path.ends_with(dir))
}
fn insert_after(
map: Map<String, Value>,
after: &str,
key: &str,
value: Value,
) -> Map<String, Value> {
let mut entries = map.into_iter().collect::<Vec<_>>();
let after_index = entries
.iter()
.position(|(k, _)| k == after)
.unwrap_or(entries.len() - 1)
+ 1;
entries.insert(after_index, (key.to_string(), value));
entries.into_iter().collect()
}
pub fn generate_grammar_files(
repo_path: &Path,
language_name: &str,
generate_bindings: bool,
) -> Result<()> {
let dashed_language_name = language_name.to_kebab_case();
// TODO: remove legacy code updates in v0.24.0
// Create or update package.json
let package_json_path_state = missing_path_else(
repo_path.join("package.json"),
|path| generate_file(path, PACKAGE_JSON_TEMPLATE, dashed_language_name.as_str()),
|path| {
let package_json_str =
fs::read_to_string(path).with_context(|| "Failed to read package.json")?;
let mut package_json = serde_json::from_str::<Map<String, Value>>(&package_json_str)
.with_context(|| "Failed to parse package.json")?;
if generate_bindings {
let mut updated = false;
let dependencies = package_json
.entry("dependencies".to_string())
.or_insert_with(|| Value::Object(Map::new()))
.as_object_mut()
.unwrap();
if dependencies.remove("nan").is_some() {
eprintln!("Replacing nan dependency with node-addon-api in package.json");
dependencies.insert("node-addon-api".to_string(), "^8.0.0".into());
updated = true;
}
if !dependencies.contains_key("node-gyp-build") {
eprintln!("Adding node-gyp-build dependency to package.json");
dependencies.insert("node-gyp-build".to_string(), "^4.8.1".into());
updated = true;
}
let dev_dependencies = package_json
.entry("devDependencies".to_string())
.or_insert_with(|| Value::Object(Map::new()))
.as_object_mut()
.unwrap();
if !dev_dependencies.contains_key("prebuildify") {
eprintln!("Adding prebuildify devDependency to package.json");
dev_dependencies.insert("prebuildify".to_string(), "^6.0.1".into());
updated = true;
}
let node_test = "node --test bindings/node/*_test.js";
let scripts = package_json
.entry("scripts".to_string())
.or_insert_with(|| Value::Object(Map::new()))
.as_object_mut()
.unwrap();
if !scripts.get("test").is_some_and(|v| v == node_test) {
eprintln!("Updating package.json scripts");
*scripts = Map::from_iter([
("install".to_string(), "node-gyp-build".into()),
("prestart".to_string(), "tree-sitter build --wasm".into()),
("start".to_string(), "tree-sitter playground".into()),
("test".to_string(), node_test.into()),
]);
updated = true;
}
// insert `peerDependencies` after `dependencies`
if !package_json.contains_key("peerDependencies") {
eprintln!("Adding peerDependencies to package.json");
package_json = insert_after(
package_json,
"dependencies",
"peerDependencies",
json!({"tree-sitter": "^0.21.1"}),
);
package_json = insert_after(
package_json,
"peerDependencies",
"peerDependenciesMeta",
json!({"tree_sitter": {"optional": true}}),
);
updated = true;
}
// insert `types` right after `main`
if !package_json.contains_key("types") {
eprintln!("Adding types to package.json");
package_json =
insert_after(package_json, "main", "types", "bindings/node".into());
updated = true;
}
// insert `files` right after `keywords`
if !package_json.contains_key("files") {
eprintln!("Adding files to package.json");
package_json = insert_after(
package_json,
"keywords",
"files",
json!([
"grammar.js",
"binding.gyp",
"prebuilds/**",
"bindings/node/*",
"queries/*",
"src/**",
"*.wasm"
]),
);
updated = true;
}
// insert `tree-sitter` at the end
if !package_json.contains_key("tree-sitter") {
eprintln!("Adding a `tree-sitter` section to package.json");
package_json.insert(
"tree-sitter".to_string(),
json!([{
"scope": format!("source.{language_name}"),
"injection-regex": format!("^{language_name}$"),
}]),
);
updated = true;
}
if updated {
let mut package_json_str = serde_json::to_string_pretty(&package_json)?;
package_json_str.push('\n');
write_file(path, package_json_str)?;
}
}
Ok(())
},
)?;
let package_json = match lookup_package_json_for_path(package_json_path_state.as_path()) {
Ok((_, p)) => p,
Err(e) if generate_bindings => return Err(e),
_ => return Ok(()),
};
// Do not create a grammar.js file in a repo with multiple language configs
if !package_json.has_multiple_language_configs() {
missing_path(repo_path.join("grammar.js"), |path| {
generate_file(path, GRAMMAR_JS_TEMPLATE, language_name)
})?;
}
if !generate_bindings {
// our job is done
return Ok(());
}
// Write .gitignore file
missing_path(repo_path.join(".gitignore"), |path| {
generate_file(path, GITIGNORE_TEMPLATE, language_name)
})?;
// Write .gitattributes file
missing_path(repo_path.join(".gitattributes"), |path| {
generate_file(path, GITATTRIBUTES_TEMPLATE, language_name)
})?;
// Write .editorconfig file
missing_path(repo_path.join(".editorconfig"), |path| {
generate_file(path, EDITORCONFIG_TEMPLATE, language_name)
})?;
let bindings_dir = repo_path.join("bindings");
// Generate Rust bindings
missing_path(bindings_dir.join("rust"), create_dir)?.apply(|path| {
missing_path_else(
path.join("lib.rs"),
|path| generate_file(path, LIB_RS_TEMPLATE, language_name),
|path| {
let lib_rs =
fs::read_to_string(path).with_context(|| "Failed to read lib.rs")?;
if !lib_rs.contains("tree_sitter_language") {
generate_file(path, LIB_RS_TEMPLATE, language_name)?;
eprintln!("Updated lib.rs with `tree_sitter_language` dependency");
}
Ok(())
},
)?;
missing_path_else(
path.join("build.rs"),
|path| generate_file(path, BUILD_RS_TEMPLATE, language_name),
|path| {
let build_rs =
fs::read_to_string(path).with_context(|| "Failed to read build.rs")?;
if !build_rs.contains("-utf-8") {
let index = build_rs
.find(" let parser_path = src_dir.join(\"parser.c\")")
.ok_or_else(|| anyhow!(indoc!{
"Failed to auto-update build.rs with the `/utf-8` flag for windows.
To fix this, remove `bindings/rust/build.rs` and re-run `tree-sitter generate`"}))?;
let build_rs = format!(
"{}{}{}\n{}",
&build_rs[..index],
" #[cfg(target_env = \"msvc\")]\n",
" c_config.flag(\"-utf-8\");\n",
&build_rs[index..]
);
write_file(path, build_rs)?;
eprintln!("Updated build.rs with the /utf-8 flag for Windows compilation");
}
Ok(())
},
)?;
missing_path_else(
repo_path.join("Cargo.toml"),
|path| generate_file(path, CARGO_TOML_TEMPLATE, dashed_language_name.as_str()),
|path| {
let cargo_toml =
fs::read_to_string(path).with_context(|| "Failed to read Cargo.toml")?;
if !cargo_toml.contains("tree-sitter-language") {
let start_index = cargo_toml
.find("tree-sitter = \"")
.ok_or_else(|| anyhow!("Failed to find the `tree-sitter` dependency in Cargo.toml"))?;
let version_start_index = start_index + "tree-sitter = \"".len();
let version_end_index = cargo_toml[version_start_index..]
.find('\"')
.map(|i| i + version_start_index)
.ok_or_else(|| anyhow!("Failed to find the end of the `tree-sitter` version in Cargo.toml"))?;
let cargo_toml = format!(
"{}{}{}\n{}\n{}",
&cargo_toml[..start_index],
"tree-sitter-language = \"0.1.0\"",
&cargo_toml[version_end_index + 1..],
"[dev-dependencies]",
"tree-sitter = \"0.23\"",
);
write_file(path, cargo_toml)?;
eprintln!("Updated Cargo.toml with the `tree-sitter-language` dependency");
}
Ok(())
},
)?;
Ok(())
})?;
// Generate Node bindings
missing_path(bindings_dir.join("node"), create_dir)?.apply(|path| {
missing_path_else(
path.join("index.js"),
|path| generate_file(path, INDEX_JS_TEMPLATE, language_name),
|path| {
let index_js =
fs::read_to_string(path).with_context(|| "Failed to read index.js")?;
if index_js.contains("../../build/Release") {
eprintln!("Replacing index.js with new binding API");
generate_file(path, INDEX_JS_TEMPLATE, language_name)?;
}
Ok(())
},
)?;
missing_path(path.join("index.d.ts"), |path| {
generate_file(path, INDEX_D_TS_TEMPLATE, language_name)
})?;
missing_path(path.join("binding_test.js"), |path| {
generate_file(path, BINDING_TEST_JS_TEMPLATE, language_name)
})?;
missing_path_else(
path.join("binding.cc"),
|path| generate_file(path, JS_BINDING_CC_TEMPLATE, language_name),
|path| {
let binding_cc =
fs::read_to_string(path).with_context(|| "Failed to read binding.cc")?;
if binding_cc.contains("NAN_METHOD(New) {}") {
eprintln!("Replacing binding.cc with new binding API");
generate_file(path, JS_BINDING_CC_TEMPLATE, language_name)?;
}
Ok(())
},
)?;
// Create binding.gyp, or update it with new binding API.
missing_path_else(
repo_path.join("binding.gyp"),
|path| generate_file(path, BINDING_GYP_TEMPLATE, language_name),
|path| {
let binding_gyp =
fs::read_to_string(path).with_context(|| "Failed to read binding.gyp")?;
if binding_gyp.contains("require('nan')") {
eprintln!("Replacing binding.gyp with new binding API");
generate_file(path, BINDING_GYP_TEMPLATE, language_name)?;
}
Ok(())
},
)?;
Ok(())
})?;
// Generate C bindings
missing_path(bindings_dir.join("c"), create_dir)?.apply(|path| {
missing_path(
path.join(format!("tree-sitter-{language_name}.h")),
|path| generate_file(path, PARSER_NAME_H_TEMPLATE, language_name),
)?;
missing_path(
path.join(format!("tree-sitter-{language_name}.pc.in")),
|path| generate_file(path, PARSER_NAME_PC_IN_TEMPLATE, language_name),
)?;
missing_path(repo_path.join("Makefile"), |path| {
generate_file(path, MAKEFILE_TEMPLATE, language_name)
})?;
Ok(())
})?;
// Generate Go bindings
missing_path(bindings_dir.join("go"), create_dir)?.apply(|path| {
missing_path(path.join("binding.go"), |path| {
generate_file(path, BINDING_GO_TEMPLATE, language_name)
})?;
missing_path_else(
path.join("binding_test.go"),
|path| generate_file(path, BINDING_TEST_GO_TEMPLATE, language_name),
|path| {
let binding_test_go =
fs::read_to_string(path).with_context(|| "Failed to read binding_test.go")?;
if binding_test_go.contains("smacker") {
eprintln!("Replacing binding_test.go with new binding API");
generate_file(path, BINDING_TEST_GO_TEMPLATE, language_name)?;
}
Ok(())
},
)?;
// Delete the old go.mod file that lives inside bindings/go, it now lives in the root dir
let go_mod_path = path.join("go.mod");
if go_mod_path.exists() {
fs::remove_file(go_mod_path).with_context(|| "Failed to remove old go.mod file")?;
}
missing_path(repo_path.join("go.mod"), |path| {
generate_file(path, GO_MOD_TEMPLATE, language_name)
})?;
Ok(())
})?;
// Generate Python bindings
missing_path(bindings_dir.join("python"), create_dir)?.apply(|path| {
let lang_path = path.join(format!("tree_sitter_{}", language_name.to_snake_case()));
missing_path(&lang_path, create_dir)?;
missing_path_else(
lang_path.join("binding.c"),
|path| generate_file(path, PY_BINDING_C_TEMPLATE, language_name),
|path| {
let binding_c = fs::read_to_string(path)
.with_context(|| "Failed to read bindings/python/binding.c")?;
if !binding_c.contains("PyCapsule_New") {
eprintln!("Replacing bindings/python/binding.c with new binding API");
generate_file(path, PY_BINDING_C_TEMPLATE, language_name)?;
}
Ok(())
},
)?;
missing_path(lang_path.join("__init__.py"), |path| {
generate_file(path, INIT_PY_TEMPLATE, language_name)
})?;
missing_path(lang_path.join("__init__.pyi"), |path| {
generate_file(path, INIT_PYI_TEMPLATE, language_name)
})?;
missing_path(lang_path.join("py.typed"), |path| {
generate_file(path, "", language_name) // py.typed is empty
})?;
missing_path(path.join("tests"), create_dir)?.apply(|path| {
missing_path(path.join("test_binding.py"), |path| {
generate_file(path, TEST_BINDING_PY_TEMPLATE, language_name)
})?;
Ok(())
})?;
missing_path(repo_path.join("setup.py"), |path| {
generate_file(path, SETUP_PY_TEMPLATE, language_name)
})?;
missing_path(repo_path.join("pyproject.toml"), |path| {
generate_file(path, PYPROJECT_TOML_TEMPLATE, dashed_language_name.as_str())
})?;
Ok(())
})?;
// Generate Swift bindings
missing_path(bindings_dir.join("swift"), create_dir)?.apply(|path| {
let lang_path = path.join(format!("TreeSitter{}", language_name.to_upper_camel_case()));
missing_path(&lang_path, create_dir)?;
missing_path(lang_path.join(format!("{language_name}.h")), |path| {
generate_file(path, PARSER_NAME_H_TEMPLATE, language_name)
})?;
missing_path(
path.join(format!(
"TreeSitter{}Tests",
language_name.to_upper_camel_case()
)),
create_dir,
)?
.apply(|path| {
missing_path(
path.join(format!(
"TreeSitter{}Tests.swift",
language_name.to_upper_camel_case()
)),
|path| generate_file(path, TESTS_SWIFT_TEMPLATE, language_name),
)?;
Ok(())
})?;
missing_path(repo_path.join("Package.swift"), |path| {
generate_file(path, PACKAGE_SWIFT_TEMPLATE, language_name)
})?;
Ok(())
})?;
Ok(())
}
pub fn lookup_package_json_for_path(path: &Path) -> Result<(PathBuf, PackageJSON)> {
let mut pathbuf = path.to_owned();
loop {
let package_json = pathbuf
.exists()
.then(|| -> Result<PackageJSON> {
let file =
File::open(pathbuf.as_path()).with_context(|| "Failed to open package.json")?;
serde_json::from_reader(BufReader::new(file)).context(
"Failed to parse package.json, is the `tree-sitter` section malformed?",
)
})
.transpose()?;
if let Some(package_json) = package_json {
if package_json.tree_sitter.is_some() {
return Ok((pathbuf, package_json));
}
}
pathbuf.pop(); // package.json
if !pathbuf.pop() {
return Err(anyhow!(concat!(
"Failed to locate a package.json file that has a \"tree-sitter\" section,",
" please ensure you have one, and if you don't then consult the docs",
)));
}
pathbuf.push("package.json");
}
}
fn generate_file(path: &Path, template: &str, language_name: &str) -> Result<()> {
write_file(
path,
template
.replace(
CAMEL_PARSER_NAME_PLACEHOLDER,
&language_name.to_upper_camel_case(),
)
.replace(
UPPER_PARSER_NAME_PLACEHOLDER,
&language_name.to_shouty_snake_case(),
)
.replace(
LOWER_PARSER_NAME_PLACEHOLDER,
&language_name.to_snake_case(),
)
.replace(PARSER_NAME_PLACEHOLDER, language_name)
.replace(CLI_VERSION_PLACEHOLDER, CLI_VERSION)
.replace(RUST_BINDING_VERSION_PLACEHOLDER, RUST_BINDING_VERSION),
)
}
fn create_dir(path: &Path) -> Result<()> {
fs::create_dir_all(path)
.with_context(|| format!("Failed to create {:?}", path.to_string_lossy()))
}
#[derive(PartialEq, Eq, Debug)]
enum PathState<P>
where
P: AsRef<Path>,
{
Exists(P),
Missing(P),
}
#[allow(dead_code)]
impl<P> PathState<P>
where
P: AsRef<Path>,
{
fn exists(&self, mut action: impl FnMut(&Path) -> Result<()>) -> Result<&Self> {
if let Self::Exists(path) = self {
action(path.as_ref())?;
}
Ok(self)
}
fn missing(&self, mut action: impl FnMut(&Path) -> Result<()>) -> Result<&Self> {
if let Self::Missing(path) = self {
action(path.as_ref())?;
}
Ok(self)
}
fn apply(&self, mut action: impl FnMut(&Path) -> Result<()>) -> Result<&Self> {
action(self.as_path())?;
Ok(self)
}
fn apply_state(&self, mut action: impl FnMut(&Self) -> Result<()>) -> Result<&Self> {
action(self)?;
Ok(self)
}
fn as_path(&self) -> &Path {
match self {
Self::Exists(path) | Self::Missing(path) => path.as_ref(),
}
}
}
fn missing_path<P, F>(path: P, mut action: F) -> Result<PathState<P>>
where
P: AsRef<Path>,
F: FnMut(&Path) -> Result<()>,
{
let path_ref = path.as_ref();
if !path_ref.exists() {
action(path_ref)?;
Ok(PathState::Missing(path))
} else {
Ok(PathState::Exists(path))
}
}
fn missing_path_else<P, T, F>(path: P, mut action: T, mut else_action: F) -> Result<PathState<P>>
where
P: AsRef<Path>,
T: FnMut(&Path) -> Result<()>,
F: FnMut(&Path) -> Result<()>,
{
let path_ref = path.as_ref();
if !path_ref.exists() {
action(path_ref)?;
Ok(PathState::Missing(path))
} else {
else_action(path_ref)?;
Ok(PathState::Exists(path))
}
}
impl PackageJSON {
fn has_multiple_language_configs(&self) -> bool {
self.tree_sitter.as_ref().is_some_and(|c| c.len() > 1)
}
}

View file

@ -2,7 +2,7 @@ use std::{collections::HashMap, fmt};
use super::{
nfa::Nfa,
rules::{Alias, Associativity, Precedence, Rule, Symbol, TokenSet},
rules::{Alias, Associativity, Precedence, Rule, Symbol},
};
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
@ -39,13 +39,6 @@ pub struct InputGrammar {
pub variables_to_inline: Vec<String>,
pub supertype_symbols: Vec<String>,
pub word_token: Option<String>,
pub reserved_words: Vec<ReservedWordContext<Rule>>,
}
#[derive(Debug, Default, PartialEq, Eq)]
pub struct ReservedWordContext<T> {
pub name: String,
pub reserved_words: Vec<T>,
}
// Extracted lexical grammar
@ -73,20 +66,8 @@ pub struct ProductionStep {
pub associativity: Option<Associativity>,
pub alias: Option<Alias>,
pub field_name: Option<String>,
pub reserved_word_set_id: ReservedWordSetId,
}
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct ReservedWordSetId(pub usize);
impl fmt::Display for ReservedWordSetId {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}
pub const NO_RESERVED_WORDS: ReservedWordSetId = ReservedWordSetId(usize::MAX);
#[derive(Clone, Debug, Default, PartialEq, Eq)]
pub struct Production {
pub steps: Vec<ProductionStep>,
@ -123,44 +104,50 @@ pub struct SyntaxGrammar {
pub variables_to_inline: Vec<Symbol>,
pub word_token: Option<Symbol>,
pub precedence_orderings: Vec<Vec<PrecedenceEntry>>,
pub reserved_word_sets: Vec<TokenSet>,
}
#[cfg(test)]
impl ProductionStep {
#[must_use]
pub fn new(symbol: Symbol) -> Self {
pub const fn new(symbol: Symbol) -> Self {
Self {
symbol,
precedence: Precedence::None,
associativity: None,
alias: None,
field_name: None,
reserved_word_set_id: ReservedWordSetId::default(),
}
}
pub fn with_prec(
mut self,
precedence: Precedence,
associativity: Option<Associativity>,
) -> Self {
self.precedence = precedence;
self.associativity = associativity;
self
pub fn with_prec(self, precedence: Precedence, associativity: Option<Associativity>) -> Self {
Self {
symbol: self.symbol,
precedence,
associativity,
alias: self.alias,
field_name: self.field_name,
}
}
pub fn with_alias(mut self, value: &str, is_named: bool) -> Self {
self.alias = Some(Alias {
value: value.to_string(),
is_named,
});
self
pub fn with_alias(self, value: &str, is_named: bool) -> Self {
Self {
symbol: self.symbol,
precedence: self.precedence,
associativity: self.associativity,
alias: Some(Alias {
value: value.to_string(),
is_named,
}),
field_name: self.field_name,
}
}
pub fn with_field_name(mut self, name: &str) -> Self {
self.field_name = Some(name.to_string());
self
pub fn with_field_name(self, name: &str) -> Self {
Self {
symbol: self.symbol,
precedence: self.precedence,
associativity: self.associativity,
alias: self.alias,
field_name: Some(name.to_string()),
}
}
}
@ -253,7 +240,7 @@ impl InlinedProductionMap {
step_index: u32,
) -> Option<impl Iterator<Item = &'a Production> + 'a> {
self.production_map
.get(&(std::ptr::from_ref::<Production>(production), step_index))
.get(&(production as *const Production, step_index))
.map(|production_indices| {
production_indices
.iter()

273
cli/src/generate/mod.rs Normal file
View file

@ -0,0 +1,273 @@
use std::{
env, fs,
io::Write,
path::{Path, PathBuf},
process::{Command, Stdio},
};
use anyhow::{anyhow, Context, Result};
use build_tables::build_tables;
use grammar_files::path_in_ignore;
use grammars::InputGrammar;
use lazy_static::lazy_static;
use parse_grammar::parse_grammar;
use prepare_grammar::prepare_grammar;
use regex::{Regex, RegexBuilder};
use render::render_c_code;
use semver::Version;
mod build_tables;
mod dedup;
mod grammar_files;
mod grammars;
mod nfa;
mod node_types;
pub mod parse_grammar;
mod prepare_grammar;
mod render;
mod rules;
mod tables;
pub use grammar_files::lookup_package_json_for_path;
lazy_static! {
static ref JSON_COMMENT_REGEX: Regex = RegexBuilder::new("^\\s*//.*")
.multi_line(true)
.build()
.unwrap();
}
struct GeneratedParser {
c_code: String,
node_types_json: String,
}
pub const ALLOC_HEADER: &str = include_str!("./templates/alloc.h");
pub fn generate_parser_in_directory(
repo_path: &Path,
grammar_path: Option<&str>,
abi_version: usize,
generate_bindings: bool,
report_symbol_name: Option<&str>,
js_runtime: Option<&str>,
) -> Result<()> {
let mut repo_path = repo_path.to_owned();
let mut grammar_path = grammar_path;
// Populate a new empty grammar directory.
if let Some(path) = grammar_path {
let path = PathBuf::from(path);
if !path
.try_exists()
.with_context(|| "Some error with specified path")?
{
fs::create_dir_all(&path)?;
grammar_path = None;
repo_path = path;
}
}
let grammar_path = grammar_path
.map(PathBuf::from)
.unwrap_or(repo_path.join("grammar.js"));
if repo_path.is_dir() && !grammar_path.exists() && !path_in_ignore(&repo_path) {
if let Some(dir_name) = repo_path
.file_name()
.map(|x| x.to_string_lossy().to_ascii_lowercase())
{
if let Some(language_name) = dir_name
.strip_prefix("tree-sitter-")
.or_else(|| Some(dir_name.as_ref()))
{
grammar_files::generate_grammar_files(&repo_path, language_name, false)?;
}
}
}
// Read the grammar file.
let grammar_json = load_grammar_file(&grammar_path, js_runtime)?;
let src_path = repo_path.join("src");
let header_path = src_path.join("tree_sitter");
// Ensure that the output directories exist.
fs::create_dir_all(&src_path)?;
fs::create_dir_all(&header_path)?;
if grammar_path.file_name().unwrap() != "grammar.json" {
fs::write(src_path.join("grammar.json"), &grammar_json)
.with_context(|| format!("Failed to write grammar.json to {src_path:?}"))?;
}
// Parse and preprocess the grammar.
let input_grammar = parse_grammar(&grammar_json)?;
// Generate the parser and related files.
let GeneratedParser {
c_code,
node_types_json,
} = generate_parser_for_grammar_with_opts(&input_grammar, abi_version, report_symbol_name)?;
write_file(&src_path.join("parser.c"), c_code)?;
write_file(&src_path.join("node-types.json"), node_types_json)?;
write_file(&header_path.join("alloc.h"), ALLOC_HEADER)?;
write_file(&header_path.join("array.h"), tree_sitter::ARRAY_HEADER)?;
write_file(&header_path.join("parser.h"), tree_sitter::PARSER_HEADER)?;
if !path_in_ignore(&repo_path) && grammar_path == repo_path.join("grammar.js") {
grammar_files::generate_grammar_files(&repo_path, &input_grammar.name, generate_bindings)?;
}
Ok(())
}
pub fn generate_parser_for_grammar(grammar_json: &str) -> Result<(String, String)> {
let grammar_json = JSON_COMMENT_REGEX.replace_all(grammar_json, "\n");
let input_grammar = parse_grammar(&grammar_json)?;
let parser =
generate_parser_for_grammar_with_opts(&input_grammar, tree_sitter::LANGUAGE_VERSION, None)?;
Ok((input_grammar.name.clone(), parser.c_code))
}
fn generate_parser_for_grammar_with_opts(
input_grammar: &InputGrammar,
abi_version: usize,
report_symbol_name: Option<&str>,
) -> Result<GeneratedParser> {
let (syntax_grammar, lexical_grammar, inlines, simple_aliases) =
prepare_grammar(input_grammar)?;
let variable_info =
node_types::get_variable_info(&syntax_grammar, &lexical_grammar, &simple_aliases)?;
let node_types_json = node_types::generate_node_types_json(
&syntax_grammar,
&lexical_grammar,
&simple_aliases,
&variable_info,
);
let tables = build_tables(
&syntax_grammar,
&lexical_grammar,
&simple_aliases,
&variable_info,
&inlines,
report_symbol_name,
)?;
let c_code = render_c_code(
&input_grammar.name,
tables,
syntax_grammar,
lexical_grammar,
simple_aliases,
abi_version,
);
Ok(GeneratedParser {
c_code,
node_types_json: serde_json::to_string_pretty(&node_types_json).unwrap(),
})
}
pub fn load_grammar_file(grammar_path: &Path, js_runtime: Option<&str>) -> Result<String> {
if grammar_path.is_dir() {
return Err(anyhow!(
"Path to a grammar file with `.js` or `.json` extension is required"
));
}
match grammar_path.extension().and_then(|e| e.to_str()) {
Some("js") => Ok(load_js_grammar_file(grammar_path, js_runtime)
.with_context(|| "Failed to load grammar.js")?),
Some("json") => {
Ok(fs::read_to_string(grammar_path).with_context(|| "Failed to load grammar.json")?)
}
_ => Err(anyhow!("Unknown grammar file extension: {grammar_path:?}",)),
}
}
fn load_js_grammar_file(grammar_path: &Path, js_runtime: Option<&str>) -> Result<String> {
let grammar_path = fs::canonicalize(grammar_path)?;
#[cfg(windows)]
let grammar_path = url::Url::from_file_path(grammar_path)
.expect("Failed to convert path to URL")
.to_string();
let js_runtime = js_runtime.unwrap_or("node");
let mut js_command = Command::new(js_runtime);
match js_runtime {
"node" => {
js_command.args(["--input-type=module", "-"]);
}
"bun" => {
js_command.arg("-");
}
"deno" => {
js_command.args(["run", "--allow-all", "-"]);
}
_ => {}
}
let mut js_process = js_command
.env("TREE_SITTER_GRAMMAR_PATH", grammar_path)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.with_context(|| format!("Failed to run `{js_runtime}`"))?;
let mut js_stdin = js_process
.stdin
.take()
.with_context(|| format!("Failed to open stdin for {js_runtime}"))?;
let cli_version = Version::parse(env!("CARGO_PKG_VERSION"))
.with_context(|| "Could not parse this package's version as semver.")?;
write!(
js_stdin,
"globalThis.TREE_SITTER_CLI_VERSION_MAJOR = {};
globalThis.TREE_SITTER_CLI_VERSION_MINOR = {};
globalThis.TREE_SITTER_CLI_VERSION_PATCH = {};",
cli_version.major, cli_version.minor, cli_version.patch,
)
.with_context(|| format!("Failed to write tree-sitter version to {js_runtime}'s stdin"))?;
js_stdin
.write(include_bytes!("./dsl.js"))
.with_context(|| format!("Failed to write grammar dsl to {js_runtime}'s stdin"))?;
drop(js_stdin);
let output = js_process
.wait_with_output()
.with_context(|| format!("Failed to read output from {js_runtime}"))?;
match output.status.code() {
None => panic!("{js_runtime} process was killed"),
Some(0) => {
let stdout = String::from_utf8(output.stdout)
.with_context(|| format!("Got invalid UTF8 from {js_runtime}"))?;
let mut grammar_json = &stdout[..];
if let Some(pos) = stdout.rfind('\n') {
// If there's a newline, split the last line from the rest of the output
let node_output = &stdout[..pos];
grammar_json = &stdout[pos + 1..];
let mut stdout = std::io::stdout().lock();
stdout.write_all(node_output.as_bytes())?;
stdout.write_all(b"\n")?;
stdout.flush()?;
}
Ok(serde_json::to_string_pretty(
&serde_json::from_str::<serde_json::Value>(grammar_json)
.with_context(|| "Failed to parse grammar JSON")?,
)
.with_context(|| "Failed to serialize grammar JSON")?
+ "\n")
}
Some(code) => Err(anyhow!("{js_runtime} process exited with status {code}")),
}
}
fn write_file(path: &Path, body: impl AsRef<[u8]>) -> Result<()> {
fs::write(path, body)
.with_context(|| format!("Failed to write {:?}", path.file_name().unwrap()))
}

View file

@ -58,8 +58,7 @@ impl CharacterSet {
/// Create a character set with a given *inclusive* range of characters.
#[allow(clippy::single_range_in_vec_init)]
#[cfg(test)]
fn from_range(mut first: char, mut last: char) -> Self {
pub fn from_range(mut first: char, mut last: char) -> Self {
if first > last {
swap(&mut first, &mut last);
}
@ -287,8 +286,7 @@ impl CharacterSet {
/// Produces a `CharacterSet` containing every character that is in _exactly one_ of `self` or
/// `other`, but is not present in both sets.
#[cfg(test)]
fn symmetric_difference(mut self, mut other: Self) -> Self {
pub fn symmetric_difference(mut self, mut other: Self) -> Self {
self.remove_intersection(&mut other);
self.add(&other)
}
@ -363,9 +361,9 @@ impl CharacterSet {
}) {
Ok(ix) | Err(ix) => ix,
};
self.ranges
.get(ix)
.is_some_and(|range| range.start <= seek_range.start && range.end >= seek_range.end)
self.ranges.get(ix).map_or(false, |range| {
range.start <= seek_range.start && range.end >= seek_range.end
})
}
pub fn contains(&self, c: char) -> bool {
@ -428,13 +426,11 @@ impl fmt::Debug for CharacterSet {
}
impl Nfa {
#[must_use]
pub const fn new() -> Self {
Self { states: Vec::new() }
}
pub fn last_state_id(&self) -> u32 {
assert!(!self.states.is_empty());
self.states.len() as u32 - 1
}
}
@ -950,19 +946,20 @@ mod tests {
assert_eq!(
left.remove_intersection(&mut right),
row.intersection,
"row {i}a: {:?} && {:?}",
"row {}a: {:?} && {:?}",
i,
row.left,
row.right
);
assert_eq!(
left, row.left_only,
"row {i}a: {:?} - {:?}",
row.left, row.right
"row {}a: {:?} - {:?}",
i, row.left, row.right
);
assert_eq!(
right, row.right_only,
"row {i}a: {:?} - {:?}",
row.right, row.left
"row {}a: {:?} - {:?}",
i, row.right, row.left
);
let mut left = row.left.clone();
@ -970,25 +967,27 @@ mod tests {
assert_eq!(
right.remove_intersection(&mut left),
row.intersection,
"row {i}b: {:?} && {:?}",
"row {}b: {:?} && {:?}",
i,
row.left,
row.right
);
assert_eq!(
left, row.left_only,
"row {i}b: {:?} - {:?}",
row.left, row.right
"row {}b: {:?} - {:?}",
i, row.left, row.right
);
assert_eq!(
right, row.right_only,
"row {i}b: {:?} - {:?}",
row.right, row.left
"row {}b: {:?} - {:?}",
i, row.right, row.left
);
assert_eq!(
row.left.clone().difference(row.right.clone()),
row.left_only,
"row {i}b: {:?} -- {:?}",
"row {}b: {:?} -- {:?}",
i,
row.left,
row.right
);

View file

@ -1,7 +1,10 @@
use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet};
use std::{
cmp::Ordering,
collections::{BTreeMap, HashMap, HashSet},
};
use anyhow::{anyhow, Result};
use serde::Serialize;
use thiserror::Error;
use super::{
grammars::{LexicalGrammar, SyntaxGrammar, VariableType},
@ -29,15 +32,10 @@ pub struct VariableInfo {
}
#[derive(Debug, Serialize, PartialEq, Eq, Default, PartialOrd, Ord)]
#[cfg(feature = "load")]
pub struct NodeInfoJSON {
#[serde(rename = "type")]
kind: String,
named: bool,
#[serde(skip_serializing_if = "std::ops::Not::not")]
root: bool,
#[serde(skip_serializing_if = "std::ops::Not::not")]
extra: bool,
#[serde(skip_serializing_if = "Option::is_none")]
fields: Option<BTreeMap<String, FieldInfoJSON>>,
#[serde(skip_serializing_if = "Option::is_none")]
@ -47,7 +45,6 @@ pub struct NodeInfoJSON {
}
#[derive(Clone, Debug, Serialize, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[cfg(feature = "load")]
pub struct NodeTypeJSON {
#[serde(rename = "type")]
kind: String,
@ -55,7 +52,6 @@ pub struct NodeTypeJSON {
}
#[derive(Debug, Serialize, PartialEq, Eq, PartialOrd, Ord)]
#[cfg(feature = "load")]
pub struct FieldInfoJSON {
multiple: bool,
required: bool,
@ -69,7 +65,6 @@ pub struct ChildQuantity {
multiple: bool,
}
#[cfg(feature = "load")]
impl Default for FieldInfoJSON {
fn default() -> Self {
Self {
@ -105,7 +100,7 @@ impl ChildQuantity {
}
}
const fn append(&mut self, other: Self) {
fn append(&mut self, other: Self) {
if other.exists {
if self.exists || other.multiple {
self.multiple = true;
@ -117,7 +112,7 @@ impl ChildQuantity {
}
}
const fn union(&mut self, other: Self) -> bool {
fn union(&mut self, other: Self) -> bool {
let mut result = false;
if !self.exists && other.exists {
result = true;
@ -135,14 +130,6 @@ impl ChildQuantity {
}
}
pub type VariableInfoResult<T> = Result<T, VariableInfoError>;
#[derive(Debug, Error, Serialize)]
pub enum VariableInfoError {
#[error("Grammar error: Supertype symbols must always have a single visible child, but `{0}` can have multiple")]
InvalidSupertype(String),
}
/// Compute a summary of the public-facing structure of each variable in the
/// grammar. Each variable in the grammar corresponds to a distinct public-facing
/// node type.
@ -168,7 +155,7 @@ pub fn get_variable_info(
syntax_grammar: &SyntaxGrammar,
lexical_grammar: &LexicalGrammar,
default_aliases: &AliasMap,
) -> VariableInfoResult<Vec<VariableInfo>> {
) -> Result<Vec<VariableInfo>> {
let child_type_is_visible = |t: &ChildType| {
variable_type_for_child_type(t, syntax_grammar, lexical_grammar) >= VariableType::Anonymous
};
@ -349,7 +336,13 @@ pub fn get_variable_info(
for supertype_symbol in &syntax_grammar.supertype_symbols {
if result[supertype_symbol.index].has_multi_step_production {
let variable = &syntax_grammar.variables[supertype_symbol.index];
Err(VariableInfoError::InvalidSupertype(variable.name.clone()))?;
return Err(anyhow!(
concat!(
"Grammar error: Supertype symbols must always ",
"have a single visible child, but `{}` can have multiple"
),
variable.name
));
}
}
@ -374,105 +367,12 @@ pub fn get_variable_info(
Ok(result)
}
fn get_aliases_by_symbol(
syntax_grammar: &SyntaxGrammar,
default_aliases: &AliasMap,
) -> HashMap<Symbol, BTreeSet<Option<Alias>>> {
let mut aliases_by_symbol = HashMap::new();
for (symbol, alias) in default_aliases {
aliases_by_symbol.insert(*symbol, {
let mut aliases = BTreeSet::new();
aliases.insert(Some(alias.clone()));
aliases
});
}
for extra_symbol in &syntax_grammar.extra_symbols {
if !default_aliases.contains_key(extra_symbol) {
aliases_by_symbol
.entry(*extra_symbol)
.or_insert_with(BTreeSet::new)
.insert(None);
}
}
for variable in &syntax_grammar.variables {
for production in &variable.productions {
for step in &production.steps {
aliases_by_symbol
.entry(step.symbol)
.or_insert_with(BTreeSet::new)
.insert(
step.alias
.as_ref()
.or_else(|| default_aliases.get(&step.symbol))
.cloned(),
);
}
}
}
aliases_by_symbol.insert(
Symbol::non_terminal(0),
std::iter::once(&None).cloned().collect(),
);
aliases_by_symbol
}
pub fn get_supertype_symbol_map(
syntax_grammar: &SyntaxGrammar,
default_aliases: &AliasMap,
variable_info: &[VariableInfo],
) -> BTreeMap<Symbol, Vec<ChildType>> {
let aliases_by_symbol = get_aliases_by_symbol(syntax_grammar, default_aliases);
let mut supertype_symbol_map = BTreeMap::new();
let mut symbols_by_alias = HashMap::new();
for (symbol, aliases) in &aliases_by_symbol {
for alias in aliases.iter().flatten() {
symbols_by_alias
.entry(alias)
.or_insert_with(Vec::new)
.push(*symbol);
}
}
for (i, info) in variable_info.iter().enumerate() {
let symbol = Symbol::non_terminal(i);
if syntax_grammar.supertype_symbols.contains(&symbol) {
let subtypes = info.children.types.clone();
supertype_symbol_map.insert(symbol, subtypes);
}
}
supertype_symbol_map
}
#[cfg(feature = "load")]
pub type SuperTypeCycleResult<T> = Result<T, SuperTypeCycleError>;
#[derive(Debug, Error, Serialize)]
pub struct SuperTypeCycleError {
items: Vec<String>,
}
impl std::fmt::Display for SuperTypeCycleError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Dependency cycle detected in node types:")?;
for (i, item) in self.items.iter().enumerate() {
write!(f, " {item}")?;
if i < self.items.len() - 1 {
write!(f, ",")?;
}
}
Ok(())
}
}
#[cfg(feature = "load")]
pub fn generate_node_types_json(
syntax_grammar: &SyntaxGrammar,
lexical_grammar: &LexicalGrammar,
default_aliases: &AliasMap,
variable_info: &[VariableInfo],
) -> SuperTypeCycleResult<Vec<NodeInfoJSON>> {
) -> Vec<NodeInfoJSON> {
let mut node_types_json = BTreeMap::new();
let child_type_to_node_type = |child_type: &ChildType| match child_type {
@ -528,32 +428,41 @@ pub fn generate_node_types_json(
}
};
let aliases_by_symbol = get_aliases_by_symbol(syntax_grammar, default_aliases);
let empty = BTreeSet::new();
let extra_names = syntax_grammar
.extra_symbols
.iter()
.flat_map(|symbol| {
let mut aliases_by_symbol = HashMap::new();
for (symbol, alias) in default_aliases {
aliases_by_symbol.insert(*symbol, {
let mut aliases = HashSet::new();
aliases.insert(Some(alias.clone()));
aliases
});
}
for extra_symbol in &syntax_grammar.extra_symbols {
if !default_aliases.contains_key(extra_symbol) {
aliases_by_symbol
.get(symbol)
.unwrap_or(&empty)
.iter()
.map(|alias| {
alias.as_ref().map_or(
match symbol.kind {
SymbolType::NonTerminal => &syntax_grammar.variables[symbol.index].name,
SymbolType::Terminal => &lexical_grammar.variables[symbol.index].name,
SymbolType::External => {
&syntax_grammar.external_tokens[symbol.index].name
}
_ => unreachable!(),
},
|alias| &alias.value,
)
})
})
.collect::<HashSet<_>>();
.entry(*extra_symbol)
.or_insert_with(HashSet::new)
.insert(None);
}
}
for variable in &syntax_grammar.variables {
for production in &variable.productions {
for step in &production.steps {
aliases_by_symbol
.entry(step.symbol)
.or_insert_with(HashSet::new)
.insert(
step.alias
.as_ref()
.or_else(|| default_aliases.get(&step.symbol))
.cloned(),
);
}
}
}
aliases_by_symbol.insert(
Symbol::non_terminal(0),
std::iter::once(&None).cloned().collect(),
);
let mut subtype_map = Vec::new();
for (i, info) in variable_info.iter().enumerate() {
@ -566,8 +475,6 @@ pub fn generate_node_types_json(
.or_insert_with(|| NodeInfoJSON {
kind: variable.name.clone(),
named: true,
root: false,
extra: extra_names.contains(&variable.name),
fields: None,
children: None,
subtypes: None,
@ -589,7 +496,10 @@ pub fn generate_node_types_json(
} else if !syntax_grammar.variables_to_inline.contains(&symbol) {
// If a rule is aliased under multiple names, then its information
// contributes to multiple entries in the final JSON.
for alias in aliases_by_symbol.get(&symbol).unwrap_or(&BTreeSet::new()) {
for alias in aliases_by_symbol
.get(&Symbol::non_terminal(i))
.unwrap_or(&HashSet::new())
{
let kind;
let is_named;
if let Some(alias) = alias {
@ -610,8 +520,6 @@ pub fn generate_node_types_json(
NodeInfoJSON {
kind: kind.clone(),
named: is_named,
root: i == 0,
extra: extra_names.contains(&kind),
fields: Some(BTreeMap::new()),
children: None,
subtypes: None,
@ -650,40 +558,22 @@ pub fn generate_node_types_json(
}
}
// Sort the subtype map topologically so that subtypes are listed before their supertypes.
let mut sorted_kinds = Vec::with_capacity(subtype_map.len());
let mut top_sort = topological_sort::TopologicalSort::<String>::new();
for (supertype, subtypes) in &subtype_map {
for subtype in subtypes {
top_sort.add_dependency(subtype.kind.clone(), supertype.kind.clone());
}
}
loop {
let mut next_kinds = top_sort.pop_all();
match (next_kinds.is_empty(), top_sort.is_empty()) {
(true, true) => break,
(true, false) => {
let mut items = top_sort.collect::<Vec<String>>();
items.sort();
return Err(SuperTypeCycleError { items });
}
(false, _) => {
next_kinds.sort();
sorted_kinds.extend(next_kinds);
}
}
}
// Sort the subtype map so that subtypes are listed before their supertypes.
subtype_map.sort_by(|a, b| {
let a_idx = sorted_kinds.iter().position(|n| n.eq(&a.0.kind)).unwrap();
let b_idx = sorted_kinds.iter().position(|n| n.eq(&b.0.kind)).unwrap();
a_idx.cmp(&b_idx)
if b.1.contains(&a.0) {
Ordering::Less
} else if a.1.contains(&b.0) {
Ordering::Greater
} else {
Ordering::Equal
}
});
for node_type_json in node_types_json.values_mut() {
if node_type_json
.children
.as_ref()
.is_some_and(|c| c.types.is_empty())
.map_or(false, |c| c.types.is_empty())
{
node_type_json.children = None;
}
@ -700,6 +590,7 @@ pub fn generate_node_types_json(
let mut anonymous_node_types = Vec::new();
let empty = HashSet::new();
let regular_tokens = lexical_grammar
.variables
.iter()
@ -737,18 +628,13 @@ pub fn generate_node_types_json(
for (name, kind) in regular_tokens.chain(external_tokens) {
match kind {
VariableType::Named => {
let node_type_json =
node_types_json
.entry(name.clone())
.or_insert_with(|| NodeInfoJSON {
kind: name.clone(),
named: true,
root: false,
extra: extra_names.contains(&name),
fields: None,
children: None,
subtypes: None,
});
let node_type_json = node_types_json.entry(name.clone()).or_insert(NodeInfoJSON {
kind: name.clone(),
named: true,
fields: None,
children: None,
subtypes: None,
});
if let Some(children) = &mut node_type_json.children {
children.required = false;
}
@ -761,8 +647,6 @@ pub fn generate_node_types_json(
VariableType::Anonymous => anonymous_node_types.push(NodeInfoJSON {
kind: name.clone(),
named: false,
root: false,
extra: extra_names.contains(&name),
fields: None,
children: None,
subtypes: None,
@ -783,15 +667,11 @@ pub fn generate_node_types_json(
a_is_leaf.cmp(&b_is_leaf)
})
.then_with(|| a.kind.cmp(&b.kind))
.then_with(|| a.named.cmp(&b.named))
.then_with(|| a.root.cmp(&b.root))
.then_with(|| a.extra.cmp(&b.extra))
});
result.dedup();
Ok(result)
result
}
#[cfg(feature = "load")]
fn process_supertypes(info: &mut FieldInfoJSON, subtype_map: &[(NodeTypeJSON, Vec<NodeTypeJSON>)]) {
for (supertype, subtypes) in subtype_map {
if info.types.contains(supertype) {
@ -828,20 +708,20 @@ fn extend_sorted<'a, T>(vec: &mut Vec<T>, values: impl IntoIterator<Item = &'a T
where
T: 'a + Clone + Eq + Ord,
{
values.into_iter().fold(false, |acc, value| {
values.into_iter().any(|value| {
if let Err(i) = vec.binary_search(value) {
vec.insert(i, value.clone());
true
} else {
acc
false
}
})
}
#[cfg(all(test, feature = "load"))]
#[cfg(test)]
mod tests {
use super::*;
use crate::{
use crate::generate::{
grammars::{
InputGrammar, LexicalVariable, Production, ProductionStep, SyntaxVariable, Variable,
},
@ -875,8 +755,7 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(node_types.len(), 3);
@ -885,8 +764,6 @@ mod tests {
NodeInfoJSON {
kind: "v1".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
children: None,
fields: Some(
@ -924,8 +801,6 @@ mod tests {
NodeInfoJSON {
kind: ";".to_string(),
named: false,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None
@ -936,8 +811,6 @@ mod tests {
NodeInfoJSON {
kind: "v2".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None
@ -965,9 +838,7 @@ mod tests {
},
// This rule is not reachable from the start symbol, but
// it is reachable from the 'extra_symbols' so it
// should be present in the node_types.
// But because it's only a literal, it will get replaced by
// a lexical variable.
// should be present in the node_types
Variable {
name: "v3".to_string(),
kind: VariableType::Named,
@ -975,8 +846,7 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(node_types.len(), 4);
@ -985,8 +855,6 @@ mod tests {
NodeInfoJSON {
kind: "v1".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
children: None,
fields: Some(
@ -1024,8 +892,6 @@ mod tests {
NodeInfoJSON {
kind: ";".to_string(),
named: false,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None
@ -1036,8 +902,6 @@ mod tests {
NodeInfoJSON {
kind: "v2".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None
@ -1048,120 +912,6 @@ mod tests {
NodeInfoJSON {
kind: "v3".to_string(),
named: true,
root: false,
extra: true,
subtypes: None,
children: None,
fields: None
}
);
}
#[test]
fn test_node_types_deeper_extras() {
let node_types = get_node_types(&InputGrammar {
extra_symbols: vec![Rule::named("v3")],
variables: vec![
Variable {
name: "v1".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![
Rule::field("f1".to_string(), Rule::named("v2")),
Rule::field("f2".to_string(), Rule::string(";")),
]),
},
Variable {
name: "v2".to_string(),
kind: VariableType::Named,
rule: Rule::string("x"),
},
// This rule is not reachable from the start symbol, but
// it is reachable from the 'extra_symbols' so it
// should be present in the node_types.
// Because it is not just a literal, it won't get replaced
// by a lexical variable.
Variable {
name: "v3".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![Rule::string("y"), Rule::repeat(Rule::string("z"))]),
},
],
..Default::default()
})
.unwrap();
assert_eq!(node_types.len(), 6);
assert_eq!(
node_types[0],
NodeInfoJSON {
kind: "v1".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
children: None,
fields: Some(
vec![
(
"f1".to_string(),
FieldInfoJSON {
multiple: false,
required: true,
types: vec![NodeTypeJSON {
kind: "v2".to_string(),
named: true,
}]
}
),
(
"f2".to_string(),
FieldInfoJSON {
multiple: false,
required: true,
types: vec![NodeTypeJSON {
kind: ";".to_string(),
named: false,
}]
}
),
]
.into_iter()
.collect()
)
}
);
assert_eq!(
node_types[1],
NodeInfoJSON {
kind: "v3".to_string(),
named: true,
root: false,
extra: true,
subtypes: None,
children: None,
fields: Some(BTreeMap::default())
}
);
assert_eq!(
node_types[2],
NodeInfoJSON {
kind: ";".to_string(),
named: false,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None
}
);
assert_eq!(
node_types[3],
NodeInfoJSON {
kind: "v2".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None
@ -1200,16 +950,13 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(
node_types[0],
NodeInfoJSON {
kind: "_v2".to_string(),
named: true,
root: false,
extra: false,
fields: None,
children: None,
subtypes: Some(vec![
@ -1233,8 +980,6 @@ mod tests {
NodeInfoJSON {
kind: "v1".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
children: None,
fields: Some(
@ -1290,16 +1035,13 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(
node_types[0],
NodeInfoJSON {
kind: "v1".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
children: Some(FieldInfoJSON {
multiple: true,
@ -1337,8 +1079,6 @@ mod tests {
NodeInfoJSON {
kind: "v2".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: Some(FieldInfoJSON {
multiple: false,
@ -1376,16 +1116,13 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(
node_types[0],
NodeInfoJSON {
kind: "v1".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
children: Some(FieldInfoJSON {
multiple: true,
@ -1451,8 +1188,7 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(node_types.iter().find(|t| t.kind == "foo_identifier"), None);
assert_eq!(
@ -1460,8 +1196,6 @@ mod tests {
Some(&NodeInfoJSON {
kind: "identifier".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None,
@ -1472,8 +1206,6 @@ mod tests {
Some(&NodeInfoJSON {
kind: "type_identifier".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None,
@ -1508,16 +1240,13 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(
node_types[0],
NodeInfoJSON {
kind: "a".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
children: Some(FieldInfoJSON {
multiple: true,
@ -1558,16 +1287,13 @@ mod tests {
]),
}],
..Default::default()
})
.unwrap();
});
assert_eq!(
node_types,
[NodeInfoJSON {
kind: "script".to_string(),
named: true,
root: true,
extra: false,
fields: Some(BTreeMap::new()),
children: None,
subtypes: None
@ -1607,8 +1333,7 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(
&node_types
@ -1625,8 +1350,6 @@ mod tests {
NodeInfoJSON {
kind: "a".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: None,
fields: Some(
@ -1682,8 +1405,6 @@ mod tests {
NodeInfoJSON {
kind: "script".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
// Only one node
children: Some(FieldInfoJSON {
@ -1727,8 +1448,7 @@ mod tests {
},
],
..Default::default()
})
.unwrap();
});
assert_eq!(
node_types.iter().map(|n| &n.kind).collect::<Vec<_>>(),
@ -1739,8 +1459,6 @@ mod tests {
NodeInfoJSON {
kind: "b".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: Some(FieldInfoJSON {
multiple: true,
@ -2055,7 +1773,7 @@ mod tests {
);
}
fn get_node_types(grammar: &InputGrammar) -> SuperTypeCycleResult<Vec<NodeInfoJSON>> {
fn get_node_types(grammar: &InputGrammar) -> Vec<NodeInfoJSON> {
let (syntax_grammar, lexical_grammar, _, default_aliases) =
prepare_grammar(grammar).unwrap();
let variable_info =

View file

@ -0,0 +1,258 @@
use anyhow::{anyhow, Result};
use serde::Deserialize;
use serde_json::{Map, Value};
use super::{
grammars::{InputGrammar, PrecedenceEntry, Variable, VariableType},
rules::{Precedence, Rule},
};
#[derive(Deserialize)]
#[serde(tag = "type")]
#[allow(non_camel_case_types)]
#[allow(clippy::upper_case_acronyms)]
enum RuleJSON {
ALIAS {
content: Box<RuleJSON>,
named: bool,
value: String,
},
BLANK,
STRING {
value: String,
},
PATTERN {
value: String,
flags: Option<String>,
},
SYMBOL {
name: String,
},
CHOICE {
members: Vec<RuleJSON>,
},
FIELD {
name: String,
content: Box<RuleJSON>,
},
SEQ {
members: Vec<RuleJSON>,
},
REPEAT {
content: Box<RuleJSON>,
},
REPEAT1 {
content: Box<RuleJSON>,
},
PREC_DYNAMIC {
value: i32,
content: Box<RuleJSON>,
},
PREC_LEFT {
value: PrecedenceValueJSON,
content: Box<RuleJSON>,
},
PREC_RIGHT {
value: PrecedenceValueJSON,
content: Box<RuleJSON>,
},
PREC {
value: PrecedenceValueJSON,
content: Box<RuleJSON>,
},
TOKEN {
content: Box<RuleJSON>,
},
IMMEDIATE_TOKEN {
content: Box<RuleJSON>,
},
}
#[derive(Deserialize)]
#[serde(untagged)]
enum PrecedenceValueJSON {
Integer(i32),
Name(String),
}
#[derive(Deserialize)]
pub(crate) struct GrammarJSON {
pub(crate) name: String,
rules: Map<String, Value>,
#[serde(default)]
precedences: Vec<Vec<RuleJSON>>,
#[serde(default)]
conflicts: Vec<Vec<String>>,
#[serde(default)]
externals: Vec<RuleJSON>,
#[serde(default)]
extras: Vec<RuleJSON>,
#[serde(default)]
inline: Vec<String>,
#[serde(default)]
supertypes: Vec<String>,
word: Option<String>,
}
pub(crate) fn parse_grammar(input: &str) -> Result<InputGrammar> {
let grammar_json = serde_json::from_str::<GrammarJSON>(input)?;
let mut variables = Vec::with_capacity(grammar_json.rules.len());
for (name, value) in grammar_json.rules {
variables.push(Variable {
name: name.clone(),
kind: VariableType::Named,
rule: parse_rule(serde_json::from_value(value)?),
});
}
let mut precedence_orderings = Vec::with_capacity(grammar_json.precedences.len());
for list in grammar_json.precedences {
let mut ordering = Vec::with_capacity(list.len());
for entry in list {
ordering.push(match entry {
RuleJSON::STRING { value } => PrecedenceEntry::Name(value),
RuleJSON::SYMBOL { name } => PrecedenceEntry::Symbol(name),
_ => {
return Err(anyhow!(
"Invalid rule in precedences array. Only strings and symbols are allowed"
))
}
});
}
precedence_orderings.push(ordering);
}
let extra_symbols = grammar_json
.extras
.into_iter()
.try_fold(Vec::new(), |mut acc, item| {
let rule = parse_rule(item);
if let Rule::String(ref value) = rule {
if value.is_empty() {
return Err(anyhow!(
"Rules in the `extras` array must not contain empty strings"
));
}
}
acc.push(rule);
Ok(acc)
})?;
let external_tokens = grammar_json.externals.into_iter().map(parse_rule).collect();
Ok(InputGrammar {
name: grammar_json.name,
word_token: grammar_json.word,
expected_conflicts: grammar_json.conflicts,
supertype_symbols: grammar_json.supertypes,
variables_to_inline: grammar_json.inline,
precedence_orderings,
variables,
extra_symbols,
external_tokens,
})
}
fn parse_rule(json: RuleJSON) -> Rule {
match json {
RuleJSON::ALIAS {
content,
value,
named,
} => Rule::alias(parse_rule(*content), value, named),
RuleJSON::BLANK => Rule::Blank,
RuleJSON::STRING { value } => Rule::String(value),
RuleJSON::PATTERN { value, flags } => Rule::Pattern(
value,
flags.map_or(String::new(), |f| {
f.matches(|c| {
if c == 'i' {
true
} else {
// silently ignore unicode flags
if c != 'u' && c != 'v' {
eprintln!("Warning: unsupported flag {c}");
}
false
}
})
.collect()
}),
),
RuleJSON::SYMBOL { name } => Rule::NamedSymbol(name),
RuleJSON::CHOICE { members } => Rule::choice(members.into_iter().map(parse_rule).collect()),
RuleJSON::FIELD { content, name } => Rule::field(name, parse_rule(*content)),
RuleJSON::SEQ { members } => Rule::seq(members.into_iter().map(parse_rule).collect()),
RuleJSON::REPEAT1 { content } => Rule::repeat(parse_rule(*content)),
RuleJSON::REPEAT { content } => {
Rule::choice(vec![Rule::repeat(parse_rule(*content)), Rule::Blank])
}
RuleJSON::PREC { value, content } => Rule::prec(value.into(), parse_rule(*content)),
RuleJSON::PREC_LEFT { value, content } => {
Rule::prec_left(value.into(), parse_rule(*content))
}
RuleJSON::PREC_RIGHT { value, content } => {
Rule::prec_right(value.into(), parse_rule(*content))
}
RuleJSON::PREC_DYNAMIC { value, content } => {
Rule::prec_dynamic(value, parse_rule(*content))
}
RuleJSON::TOKEN { content } => Rule::token(parse_rule(*content)),
RuleJSON::IMMEDIATE_TOKEN { content } => Rule::immediate_token(parse_rule(*content)),
}
}
impl From<PrecedenceValueJSON> for Precedence {
fn from(val: PrecedenceValueJSON) -> Self {
match val {
PrecedenceValueJSON::Integer(i) => Self::Integer(i),
PrecedenceValueJSON::Name(i) => Self::Name(i),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_grammar() {
let grammar = parse_grammar(
r#"{
"name": "my_lang",
"rules": {
"file": {
"type": "REPEAT1",
"content": {
"type": "SYMBOL",
"name": "statement"
}
},
"statement": {
"type": "STRING",
"value": "foo"
}
}
}"#,
)
.unwrap();
assert_eq!(grammar.name, "my_lang");
assert_eq!(
grammar.variables,
vec![
Variable {
name: "file".to_string(),
kind: VariableType::Named,
rule: Rule::repeat(Rule::NamedSymbol("statement".to_string()))
},
Variable {
name: "statement".to_string(),
kind: VariableType::Named,
rule: Rule::String("foo".to_string())
},
]
);
}
}

View file

@ -1,7 +1,7 @@
use std::{collections::HashMap, mem};
use super::ExtractedSyntaxGrammar;
use crate::{
use crate::generate::{
grammars::{Variable, VariableType},
rules::{Rule, Symbol},
};

View file

@ -1,57 +1,41 @@
use regex_syntax::{
hir::{Class, Hir, HirKind},
ParserBuilder,
use std::collections::HashMap;
use anyhow::{anyhow, Context, Result};
use lazy_static::lazy_static;
use regex_syntax::ast::{
parse, Ast, ClassPerlKind, ClassSet, ClassSetBinaryOpKind, ClassSetItem, ClassUnicodeKind,
RepetitionKind, RepetitionRange,
};
use serde::Serialize;
use thiserror::Error;
use super::ExtractedLexicalGrammar;
use crate::{
use crate::generate::{
grammars::{LexicalGrammar, LexicalVariable},
nfa::{CharacterSet, Nfa, NfaState},
rules::{Precedence, Rule},
};
lazy_static! {
static ref UNICODE_CATEGORIES: HashMap<&'static str, Vec<u32>> =
serde_json::from_str(UNICODE_CATEGORIES_JSON).unwrap();
static ref UNICODE_PROPERTIES: HashMap<&'static str, Vec<u32>> =
serde_json::from_str(UNICODE_PROPERTIES_JSON).unwrap();
static ref UNICODE_CATEGORY_ALIASES: HashMap<&'static str, String> =
serde_json::from_str(UNICODE_CATEGORY_ALIASES_JSON).unwrap();
static ref UNICODE_PROPERTY_ALIASES: HashMap<&'static str, String> =
serde_json::from_str(UNICODE_PROPERTY_ALIASES_JSON).unwrap();
}
const UNICODE_CATEGORIES_JSON: &str = include_str!("./unicode-categories.json");
const UNICODE_PROPERTIES_JSON: &str = include_str!("./unicode-properties.json");
const UNICODE_CATEGORY_ALIASES_JSON: &str = include_str!("./unicode-category-aliases.json");
const UNICODE_PROPERTY_ALIASES_JSON: &str = include_str!("./unicode-property-aliases.json");
struct NfaBuilder {
nfa: Nfa,
is_sep: bool,
precedence_stack: Vec<i32>,
}
pub type ExpandTokensResult<T> = Result<T, ExpandTokensError>;
#[derive(Debug, Error, Serialize)]
pub enum ExpandTokensError {
#[error(
"The rule `{0}` matches the empty string.
Tree-sitter does not support syntactic rules that match the empty string
unless they are used only as the grammar's start rule.
"
)]
EmptyString(String),
#[error(transparent)]
Processing(ExpandTokensProcessingError),
#[error(transparent)]
ExpandRule(ExpandRuleError),
}
#[derive(Debug, Error, Serialize)]
pub struct ExpandTokensProcessingError {
rule: String,
error: ExpandRuleError,
}
impl std::fmt::Display for ExpandTokensProcessingError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
writeln!(
f,
"Error processing rule {}: Grammar error: Unexpected rule {:?}",
self.rule, self.error
)?;
Ok(())
}
}
fn get_implicit_precedence(rule: &Rule) -> i32 {
match rule {
Rule::String(_) => 2,
@ -75,7 +59,7 @@ const fn get_completion_precedence(rule: &Rule) -> i32 {
0
}
pub fn expand_tokens(mut grammar: ExtractedLexicalGrammar) -> ExpandTokensResult<LexicalGrammar> {
pub fn expand_tokens(mut grammar: ExtractedLexicalGrammar) -> Result<LexicalGrammar> {
let mut builder = NfaBuilder {
nfa: Nfa::new(),
is_sep: true,
@ -89,12 +73,8 @@ pub fn expand_tokens(mut grammar: ExtractedLexicalGrammar) -> ExpandTokensResult
Rule::repeat(Rule::choice(grammar.separators))
};
let mut variables = Vec::with_capacity(grammar.variables.len());
let mut variables = Vec::new();
for (i, variable) in grammar.variables.into_iter().enumerate() {
if variable.rule.is_empty() {
Err(ExpandTokensError::EmptyString(variable.name.clone()))?;
}
let is_immediate_token = match &variable.rule {
Rule::Metadata { params, .. } => params.is_main_token,
_ => false,
@ -108,19 +88,12 @@ pub fn expand_tokens(mut grammar: ExtractedLexicalGrammar) -> ExpandTokensResult
let last_state_id = builder.nfa.last_state_id();
builder
.expand_rule(&variable.rule, last_state_id)
.map_err(|e| {
ExpandTokensError::Processing(ExpandTokensProcessingError {
rule: variable.name.clone(),
error: e,
})
})?;
.with_context(|| format!("Error processing rule {}", variable.name))?;
if !is_immediate_token {
builder.is_sep = true;
let last_state_id = builder.nfa.last_state_id();
builder
.expand_rule(&separator_rule, last_state_id)
.map_err(ExpandTokensError::ExpandRule)?;
builder.expand_rule(&separator_rule, last_state_id)?;
}
variables.push(LexicalVariable {
@ -137,64 +110,22 @@ pub fn expand_tokens(mut grammar: ExtractedLexicalGrammar) -> ExpandTokensResult
})
}
pub type ExpandRuleResult<T> = Result<T, ExpandRuleError>;
#[derive(Debug, Error, Serialize)]
pub enum ExpandRuleError {
#[error("Grammar error: Unexpected rule {0:?}")]
UnexpectedRule(Rule),
#[error("{0}")]
Parse(String),
#[error(transparent)]
ExpandRegex(ExpandRegexError),
}
pub type ExpandRegexResult<T> = Result<T, ExpandRegexError>;
#[derive(Debug, Error, Serialize)]
pub enum ExpandRegexError {
#[error("{0}")]
Utf8(String),
#[error("Regex error: Assertions are not supported")]
Assertion,
}
impl NfaBuilder {
fn expand_rule(&mut self, rule: &Rule, mut next_state_id: u32) -> ExpandRuleResult<bool> {
fn expand_rule(&mut self, rule: &Rule, mut next_state_id: u32) -> Result<bool> {
match rule {
Rule::Pattern(s, f) => {
// With unicode enabled, `\w`, `\s` and `\d` expand to character sets that are much
// larger than intended, so we replace them with the actual
// character sets they should represent. If the full unicode range
// of `\w`, `\s` or `\d` are needed then `\p{L}`, `\p{Z}` and `\p{N}` should be
// used.
let s = s
.replace(r"\w", r"[0-9A-Za-z_]")
.replace(r"\s", r"[\t-\r ]")
.replace(r"\d", r"[0-9]")
.replace(r"\W", r"[^0-9A-Za-z_]")
.replace(r"\S", r"[^\t-\r ]")
.replace(r"\D", r"[^0-9]");
let mut parser = ParserBuilder::new()
.case_insensitive(f.contains('i'))
.unicode(true)
.utf8(false)
.build();
let hir = parser
.parse(&s)
.map_err(|e| ExpandRuleError::Parse(e.to_string()))?;
self.expand_regex(&hir, next_state_id)
.map_err(ExpandRuleError::ExpandRegex)
let ast = parse::Parser::new().parse(s)?;
self.expand_regex(&ast, next_state_id, f.contains('i'))
}
Rule::String(s) => {
for c in s.chars().rev() {
self.push_advance(CharacterSet::from_char(c), next_state_id);
self.push_advance(CharacterSet::empty().add_char(c), next_state_id);
next_state_id = self.nfa.last_state_id();
}
Ok(!s.is_empty())
}
Rule::Choice(elements) => {
let mut alternative_state_ids = Vec::with_capacity(elements.len());
let mut alternative_state_ids = Vec::new();
for element in elements {
if self.expand_rule(element, next_state_id)? {
alternative_state_ids.push(self.nfa.last_state_id());
@ -248,98 +179,129 @@ impl NfaBuilder {
result
}
Rule::Blank => Ok(false),
_ => Err(ExpandRuleError::UnexpectedRule(rule.clone()))?,
_ => Err(anyhow!("Grammar error: Unexpected rule {rule:?}")),
}
}
fn expand_regex(&mut self, hir: &Hir, mut next_state_id: u32) -> ExpandRegexResult<bool> {
match hir.kind() {
HirKind::Empty => Ok(false),
HirKind::Literal(literal) => {
for character in std::str::from_utf8(&literal.0)
.map_err(|e| ExpandRegexError::Utf8(e.to_string()))?
.chars()
.rev()
{
let char_set = CharacterSet::from_char(character);
self.push_advance(char_set, next_state_id);
next_state_id = self.nfa.last_state_id();
}
fn expand_regex(
&mut self,
ast: &Ast,
mut next_state_id: u32,
case_insensitive: bool,
) -> Result<bool> {
const fn inverse_char(c: char) -> char {
match c {
'a'..='z' => (c as u8 - b'a' + b'A') as char,
'A'..='Z' => (c as u8 - b'A' + b'a') as char,
c => c,
}
}
fn with_inverse_char(mut chars: CharacterSet) -> CharacterSet {
for char in chars.clone().chars() {
let inverted = inverse_char(char);
if char != inverted {
chars = chars.add_char(inverted);
}
}
chars
}
match ast {
Ast::Empty(_) => Ok(false),
Ast::Flags(_) => Err(anyhow!("Regex error: Flags are not supported")),
Ast::Literal(literal) => {
let mut char_set = CharacterSet::from_char(literal.c);
if case_insensitive {
let inverted = inverse_char(literal.c);
if literal.c != inverted {
char_set = char_set.add_char(inverted);
}
}
self.push_advance(char_set, next_state_id);
Ok(true)
}
HirKind::Class(class) => match class {
Class::Unicode(class) => {
let mut chars = CharacterSet::default();
for c in class.ranges() {
chars = chars.add_range(c.start(), c.end());
}
// For some reason, the long s `ſ` is included if the letter `s` is in a
// pattern, so we remove it.
if chars.range_count() == 3
&& chars
.ranges()
// exact check to ensure that `ſ` wasn't intentionally added.
.all(|r| ['s'..='s', 'S'..='S', 'ſ'..='ſ'].contains(&r))
{
chars = chars.difference(CharacterSet::from_char('ſ'));
}
self.push_advance(chars, next_state_id);
Ok(true)
Ast::Dot(_) => {
self.push_advance(CharacterSet::from_char('\n').negate(), next_state_id);
Ok(true)
}
Ast::Assertion(_) => Err(anyhow!("Regex error: Assertions are not supported")),
Ast::ClassUnicode(class) => {
let mut chars = self.expand_unicode_character_class(&class.kind)?;
if class.negated {
chars = chars.negate();
}
Class::Bytes(bytes_class) => {
let mut chars = CharacterSet::default();
for c in bytes_class.ranges() {
chars = chars.add_range(c.start().into(), c.end().into());
}
self.push_advance(chars, next_state_id);
Ok(true)
if case_insensitive {
chars = with_inverse_char(chars);
}
},
HirKind::Look(_) => Err(ExpandRegexError::Assertion)?,
HirKind::Repetition(repetition) => match (repetition.min, repetition.max) {
(0, Some(1)) => self.expand_zero_or_one(&repetition.sub, next_state_id),
(1, None) => self.expand_one_or_more(&repetition.sub, next_state_id),
(0, None) => self.expand_zero_or_more(&repetition.sub, next_state_id),
(min, Some(max)) if min == max => {
self.expand_count(&repetition.sub, min, next_state_id)
self.push_advance(chars, next_state_id);
Ok(true)
}
Ast::ClassPerl(class) => {
let mut chars = self.expand_perl_character_class(&class.kind);
if class.negated {
chars = chars.negate();
}
(min, None) => {
if self.expand_zero_or_more(&repetition.sub, next_state_id)? {
self.expand_count(&repetition.sub, min, next_state_id)
if case_insensitive {
chars = with_inverse_char(chars);
}
self.push_advance(chars, next_state_id);
Ok(true)
}
Ast::ClassBracketed(class) => {
let mut chars = self.translate_class_set(&class.kind)?;
if class.negated {
chars = chars.negate();
}
if case_insensitive {
chars = with_inverse_char(chars);
}
self.push_advance(chars, next_state_id);
Ok(true)
}
Ast::Repetition(repetition) => match repetition.op.kind {
RepetitionKind::ZeroOrOne => {
self.expand_zero_or_one(&repetition.ast, next_state_id, case_insensitive)
}
RepetitionKind::OneOrMore => {
self.expand_one_or_more(&repetition.ast, next_state_id, case_insensitive)
}
RepetitionKind::ZeroOrMore => {
self.expand_zero_or_more(&repetition.ast, next_state_id, case_insensitive)
}
RepetitionKind::Range(RepetitionRange::Exactly(count)) => {
self.expand_count(&repetition.ast, count, next_state_id, case_insensitive)
}
RepetitionKind::Range(RepetitionRange::AtLeast(min)) => {
if self.expand_zero_or_more(&repetition.ast, next_state_id, case_insensitive)? {
self.expand_count(&repetition.ast, min, next_state_id, case_insensitive)
} else {
Ok(false)
}
}
(min, Some(max)) => {
let mut result = self.expand_count(&repetition.sub, min, next_state_id)?;
RepetitionKind::Range(RepetitionRange::Bounded(min, max)) => {
let mut result =
self.expand_count(&repetition.ast, min, next_state_id, case_insensitive)?;
for _ in min..max {
if result {
next_state_id = self.nfa.last_state_id();
}
if self.expand_zero_or_one(&repetition.sub, next_state_id)? {
if self.expand_zero_or_one(
&repetition.ast,
next_state_id,
case_insensitive,
)? {
result = true;
}
}
Ok(result)
}
},
HirKind::Capture(capture) => self.expand_regex(&capture.sub, next_state_id),
HirKind::Concat(concat) => {
let mut result = false;
for hir in concat.iter().rev() {
if self.expand_regex(hir, next_state_id)? {
result = true;
next_state_id = self.nfa.last_state_id();
}
}
Ok(result)
}
HirKind::Alternation(alternations) => {
let mut alternative_state_ids = Vec::with_capacity(alternations.len());
for hir in alternations {
if self.expand_regex(hir, next_state_id)? {
Ast::Group(group) => self.expand_regex(&group.ast, next_state_id, case_insensitive),
Ast::Alternation(alternation) => {
let mut alternative_state_ids = Vec::new();
for ast in &alternation.asts {
if self.expand_regex(ast, next_state_id, case_insensitive)? {
alternative_state_ids.push(self.nfa.last_state_id());
} else {
alternative_state_ids.push(next_state_id);
@ -348,21 +310,58 @@ impl NfaBuilder {
alternative_state_ids.sort_unstable();
alternative_state_ids.dedup();
alternative_state_ids.retain(|i| *i != self.nfa.last_state_id());
for alternative_state_id in alternative_state_ids {
self.push_split(alternative_state_id);
}
Ok(true)
}
Ast::Concat(concat) => {
let mut result = false;
for ast in concat.asts.iter().rev() {
if self.expand_regex(ast, next_state_id, case_insensitive)? {
result = true;
next_state_id = self.nfa.last_state_id();
}
}
Ok(result)
}
}
}
fn expand_one_or_more(&mut self, hir: &Hir, next_state_id: u32) -> ExpandRegexResult<bool> {
fn translate_class_set(&self, class_set: &ClassSet) -> Result<CharacterSet> {
match &class_set {
ClassSet::Item(item) => self.expand_character_class(item),
ClassSet::BinaryOp(binary_op) => {
let mut lhs_char_class = self.translate_class_set(&binary_op.lhs)?;
let mut rhs_char_class = self.translate_class_set(&binary_op.rhs)?;
match binary_op.kind {
ClassSetBinaryOpKind::Intersection => {
Ok(lhs_char_class.remove_intersection(&mut rhs_char_class))
}
ClassSetBinaryOpKind::Difference => {
Ok(lhs_char_class.difference(rhs_char_class))
}
ClassSetBinaryOpKind::SymmetricDifference => {
Ok(lhs_char_class.symmetric_difference(rhs_char_class))
}
}
}
}
}
fn expand_one_or_more(
&mut self,
ast: &Ast,
next_state_id: u32,
case_insensitive: bool,
) -> Result<bool> {
self.nfa.states.push(NfaState::Accept {
variable_index: 0,
precedence: 0,
}); // Placeholder for split
let split_state_id = self.nfa.last_state_id();
if self.expand_regex(hir, split_state_id)? {
if self.expand_regex(ast, split_state_id, case_insensitive)? {
self.nfa.states[split_state_id as usize] =
NfaState::Split(self.nfa.last_state_id(), next_state_id);
Ok(true)
@ -372,8 +371,13 @@ impl NfaBuilder {
}
}
fn expand_zero_or_one(&mut self, hir: &Hir, next_state_id: u32) -> ExpandRegexResult<bool> {
if self.expand_regex(hir, next_state_id)? {
fn expand_zero_or_one(
&mut self,
ast: &Ast,
next_state_id: u32,
case_insensitive: bool,
) -> Result<bool> {
if self.expand_regex(ast, next_state_id, case_insensitive)? {
self.push_split(next_state_id);
Ok(true)
} else {
@ -381,8 +385,13 @@ impl NfaBuilder {
}
}
fn expand_zero_or_more(&mut self, hir: &Hir, next_state_id: u32) -> ExpandRegexResult<bool> {
if self.expand_one_or_more(hir, next_state_id)? {
fn expand_zero_or_more(
&mut self,
ast: &Ast,
next_state_id: u32,
case_insensitive: bool,
) -> Result<bool> {
if self.expand_one_or_more(ast, next_state_id, case_insensitive)? {
self.push_split(next_state_id);
Ok(true)
} else {
@ -392,13 +401,14 @@ impl NfaBuilder {
fn expand_count(
&mut self,
hir: &Hir,
ast: &Ast,
count: u32,
mut next_state_id: u32,
) -> ExpandRegexResult<bool> {
case_insensitive: bool,
) -> Result<bool> {
let mut result = false;
for _ in 0..count {
if self.expand_regex(hir, next_state_id)? {
if self.expand_regex(ast, next_state_id, case_insensitive)? {
result = true;
next_state_id = self.nfa.last_state_id();
}
@ -406,6 +416,111 @@ impl NfaBuilder {
Ok(result)
}
fn expand_character_class(&self, item: &ClassSetItem) -> Result<CharacterSet> {
match item {
ClassSetItem::Empty(_) => Ok(CharacterSet::empty()),
ClassSetItem::Literal(literal) => Ok(CharacterSet::from_char(literal.c)),
ClassSetItem::Range(range) => Ok(CharacterSet::from_range(range.start.c, range.end.c)),
ClassSetItem::Union(union) => {
let mut result = CharacterSet::empty();
for item in &union.items {
result = result.add(&self.expand_character_class(item)?);
}
Ok(result)
}
ClassSetItem::Perl(class) => Ok(self.expand_perl_character_class(&class.kind)),
ClassSetItem::Unicode(class) => {
let mut set = self.expand_unicode_character_class(&class.kind)?;
if class.negated {
set = set.negate();
}
Ok(set)
}
ClassSetItem::Bracketed(class) => {
let mut set = self.translate_class_set(&class.kind)?;
if class.negated {
set = set.negate();
}
Ok(set)
}
ClassSetItem::Ascii(_) => Err(anyhow!(
"Regex error: Unsupported character class syntax {item:?}",
)),
}
}
fn expand_unicode_character_class(&self, class: &ClassUnicodeKind) -> Result<CharacterSet> {
let mut chars = CharacterSet::empty();
let category_letter;
match class {
ClassUnicodeKind::OneLetter(le) => {
category_letter = le.to_string();
}
ClassUnicodeKind::Named(class_name) => {
let actual_class_name = UNICODE_CATEGORY_ALIASES
.get(class_name.as_str())
.or_else(|| UNICODE_PROPERTY_ALIASES.get(class_name.as_str()))
.unwrap_or(class_name);
if actual_class_name.len() == 1 {
category_letter = actual_class_name.clone();
} else {
let code_points =
UNICODE_CATEGORIES
.get(actual_class_name.as_str())
.or_else(|| UNICODE_PROPERTIES.get(actual_class_name.as_str()))
.ok_or_else(|| {
anyhow!(
"Regex error: Unsupported unicode character class {class_name}",
)
})?;
for c in code_points {
if let Some(c) = char::from_u32(*c) {
chars = chars.add_char(c);
}
}
return Ok(chars);
}
}
ClassUnicodeKind::NamedValue { .. } => {
return Err(anyhow!(
"Regex error: Key-value unicode properties are not supported"
))
}
}
for (category, code_points) in UNICODE_CATEGORIES.iter() {
if category.starts_with(&category_letter) {
for c in code_points {
if let Some(c) = char::from_u32(*c) {
chars = chars.add_char(c);
}
}
}
}
Ok(chars)
}
fn expand_perl_character_class(&self, item: &ClassPerlKind) -> CharacterSet {
match item {
ClassPerlKind::Digit => CharacterSet::from_range('0', '9'),
ClassPerlKind::Space => CharacterSet::empty()
.add_char(' ')
.add_char('\t')
.add_char('\r')
.add_char('\n')
.add_char('\x0B')
.add_char('\x0C'),
ClassPerlKind::Word => CharacterSet::empty()
.add_char('_')
.add_range('A', 'Z')
.add_range('a', 'z')
.add_range('0', '9'),
}
}
fn push_advance(&mut self, chars: CharacterSet, state_id: u32) {
let precedence = *self.precedence_stack.last().unwrap();
self.nfa.states.push(NfaState::Advance {
@ -427,7 +542,7 @@ impl NfaBuilder {
#[cfg(test)]
mod tests {
use super::*;
use crate::{
use crate::generate::{
grammars::Variable,
nfa::{NfaCursor, NfaTransition},
};

View file

@ -1,4 +1,4 @@
use crate::{
use crate::generate::{
grammars::{LexicalGrammar, SyntaxGrammar},
rules::{Alias, AliasMap, Symbol, SymbolType},
};
@ -69,7 +69,9 @@ pub(super) fn extract_default_aliases(
SymbolType::External => &mut external_status_list[symbol.index],
SymbolType::NonTerminal => &mut non_terminal_status_list[symbol.index],
SymbolType::Terminal => &mut terminal_status_list[symbol.index],
SymbolType::End | SymbolType::EndOfNonTerminalExtra => panic!("Unexpected end token"),
SymbolType::End | SymbolType::EndOfNonTerminalExtra => {
panic!("Unexpected end token")
}
};
status.appears_unaliased = true;
}
@ -162,7 +164,7 @@ pub(super) fn extract_default_aliases(
#[cfg(test)]
mod tests {
use super::*;
use crate::{
use crate::generate::{
grammars::{LexicalVariable, Production, ProductionStep, SyntaxVariable, VariableType},
nfa::Nfa,
};

View file

@ -1,63 +1,16 @@
use std::collections::HashMap;
use std::{collections::HashMap, mem};
use serde::Serialize;
use thiserror::Error;
use anyhow::{anyhow, Result};
use super::{ExtractedLexicalGrammar, ExtractedSyntaxGrammar, InternedGrammar};
use crate::{
grammars::{ExternalToken, ReservedWordContext, Variable, VariableType},
use crate::generate::{
grammars::{ExternalToken, Variable, VariableType},
rules::{MetadataParams, Rule, Symbol, SymbolType},
};
pub type ExtractTokensResult<T> = Result<T, ExtractTokensError>;
#[derive(Debug, Error, Serialize)]
pub enum ExtractTokensError {
#[error(
"The rule `{0}` contains an empty string.
Tree-sitter does not support syntactic rules that contain an empty string
unless they are used only as the grammar's start rule.
"
)]
EmptyString(String),
#[error("Rule '{0}' cannot be used as both an external token and a non-terminal rule")]
ExternalTokenNonTerminal(String),
#[error("Non-symbol rules cannot be used as external tokens")]
NonSymbolExternalToken,
#[error(transparent)]
WordToken(NonTerminalWordTokenError),
#[error("Reserved word '{0}' must be a token")]
NonTokenReservedWord(String),
}
#[derive(Debug, Error, Serialize)]
pub struct NonTerminalWordTokenError {
pub symbol_name: String,
pub conflicting_symbol_name: Option<String>,
}
impl std::fmt::Display for NonTerminalWordTokenError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"Non-terminal symbol '{}' cannot be used as the word token",
self.symbol_name
)?;
if let Some(conflicting_name) = &self.conflicting_symbol_name {
writeln!(
f,
", because its rule is duplicated in '{conflicting_name}'",
)
} else {
writeln!(f)
}
}
}
pub(super) fn extract_tokens(
mut grammar: InternedGrammar,
) -> ExtractTokensResult<(ExtractedSyntaxGrammar, ExtractedLexicalGrammar)> {
) -> Result<(ExtractedSyntaxGrammar, ExtractedLexicalGrammar)> {
let mut extractor = TokenExtractor {
current_variable_name: String::new(),
current_variable_token_count: 0,
@ -76,7 +29,11 @@ pub(super) fn extract_tokens(
let mut lexical_variables = Vec::with_capacity(extractor.extracted_variables.len());
for variable in extractor.extracted_variables {
lexical_variables.push(variable);
lexical_variables.push(Variable {
name: variable.name,
kind: variable.kind,
rule: variable.rule,
});
}
// If a variable's entire rule was extracted as a token and that token didn't
@ -85,7 +42,7 @@ pub(super) fn extract_tokens(
// that pointed to that variable will need to be updated to point to the
// variable in the lexical grammar. Symbols that pointed to later variables
// will need to have their indices decremented.
let mut variables = Vec::with_capacity(grammar.variables.len());
let mut variables = Vec::new();
let mut symbol_replacer = SymbolReplacer {
replacements: HashMap::new(),
};
@ -152,14 +109,15 @@ pub(super) fn extract_tokens(
}
}
let mut external_tokens = Vec::with_capacity(grammar.external_tokens.len());
let mut external_tokens = Vec::new();
for external_token in grammar.external_tokens {
let rule = symbol_replacer.replace_symbols_in_rule(&external_token.rule);
if let Rule::Symbol(symbol) = rule {
if symbol.is_non_terminal() {
Err(ExtractTokensError::ExternalTokenNonTerminal(
variables[symbol.index].name.clone(),
))?;
return Err(anyhow!(
"Rule '{}' cannot be used as both an external token and a non-terminal rule",
&variables[symbol.index].name,
));
}
if symbol.is_external() {
@ -176,59 +134,22 @@ pub(super) fn extract_tokens(
});
}
} else {
Err(ExtractTokensError::NonSymbolExternalToken)?;
return Err(anyhow!(
"Non-symbol rules cannot be used as external tokens"
));
}
}
let word_token = if let Some(token) = grammar.word_token {
let mut word_token = None;
if let Some(token) = grammar.word_token {
let token = symbol_replacer.replace_symbol(token);
if token.is_non_terminal() {
let word_token_variable = &variables[token.index];
let conflicting_symbol_name = variables
.iter()
.enumerate()
.find(|(i, v)| *i != token.index && v.rule == word_token_variable.rule)
.map(|(_, v)| v.name.clone());
Err(ExtractTokensError::WordToken(NonTerminalWordTokenError {
symbol_name: word_token_variable.name.clone(),
conflicting_symbol_name,
}))?;
return Err(anyhow!(
"Non-terminal symbol '{}' cannot be used as the word token",
&variables[token.index].name
));
}
Some(token)
} else {
None
};
let mut reserved_word_contexts = Vec::with_capacity(grammar.reserved_word_sets.len());
for reserved_word_context in grammar.reserved_word_sets {
let mut reserved_words = Vec::with_capacity(reserved_word_contexts.len());
for reserved_rule in reserved_word_context.reserved_words {
if let Rule::Symbol(symbol) = reserved_rule {
reserved_words.push(symbol_replacer.replace_symbol(symbol));
} else if let Some(index) = lexical_variables
.iter()
.position(|v| v.rule == reserved_rule)
{
reserved_words.push(Symbol::terminal(index));
} else {
let rule = if let Rule::Metadata { rule, .. } = &reserved_rule {
rule.as_ref()
} else {
&reserved_rule
};
let token_name = match rule {
Rule::String(s) => s.clone(),
Rule::Pattern(p, _) => p.clone(),
_ => "unknown".to_string(),
};
Err(ExtractTokensError::NonTokenReservedWord(token_name))?;
}
}
reserved_word_contexts.push(ReservedWordContext {
name: reserved_word_context.name,
reserved_words,
});
word_token = Some(token);
}
Ok((
@ -241,7 +162,6 @@ pub(super) fn extract_tokens(
external_tokens,
word_token,
precedence_orderings: grammar.precedence_orderings,
reserved_word_sets: reserved_word_contexts,
},
ExtractedLexicalGrammar {
variables: lexical_variables,
@ -267,16 +187,18 @@ impl TokenExtractor {
&mut self,
is_first: bool,
variable: &mut Variable,
) -> ExtractTokensResult<()> {
) -> Result<()> {
self.current_variable_name.clear();
self.current_variable_name.push_str(&variable.name);
self.current_variable_token_count = 0;
self.is_first_rule = is_first;
variable.rule = self.extract_tokens_in_rule(&variable.rule)?;
let mut rule = Rule::Blank;
mem::swap(&mut rule, &mut variable.rule);
variable.rule = self.extract_tokens_in_rule(&rule)?;
Ok(())
}
fn extract_tokens_in_rule(&mut self, input: &Rule) -> ExtractTokensResult<Rule> {
fn extract_tokens_in_rule(&mut self, input: &Rule) -> Result<Rule> {
match input {
Rule::String(name) => Ok(self.extract_token(input, Some(name))?.into()),
Rule::Pattern(..) => Ok(self.extract_token(input, None)?.into()),
@ -285,11 +207,10 @@ impl TokenExtractor {
let mut params = params.clone();
params.is_token = false;
let string_value = if let Rule::String(value) = rule.as_ref() {
Some(value)
} else {
None
};
let mut string_value = None;
if let Rule::String(value) = rule.as_ref() {
string_value = Some(value);
}
let rule_to_extract = if params == MetadataParams::default() {
rule.as_ref()
@ -312,27 +233,19 @@ impl TokenExtractor {
elements
.iter()
.map(|e| self.extract_tokens_in_rule(e))
.collect::<ExtractTokensResult<Vec<_>>>()?,
.collect::<Result<Vec<_>>>()?,
)),
Rule::Choice(elements) => Ok(Rule::Choice(
elements
.iter()
.map(|e| self.extract_tokens_in_rule(e))
.collect::<ExtractTokensResult<Vec<_>>>()?,
.collect::<Result<Vec<_>>>()?,
)),
Rule::Reserved { rule, context_name } => Ok(Rule::Reserved {
rule: Box::new(self.extract_tokens_in_rule(rule)?),
context_name: context_name.clone(),
}),
_ => Ok(input.clone()),
}
}
fn extract_token(
&mut self,
rule: &Rule,
string_value: Option<&String>,
) -> ExtractTokensResult<Symbol> {
fn extract_token(&mut self, rule: &Rule, string_value: Option<&String>) -> Result<Symbol> {
for (i, variable) in self.extracted_variables.iter_mut().enumerate() {
if variable.rule == *rule {
self.extracted_usage_counts[i] += 1;
@ -343,9 +256,14 @@ impl TokenExtractor {
let index = self.extracted_variables.len();
let variable = if let Some(string_value) = string_value {
if string_value.is_empty() && !self.is_first_rule {
Err(ExtractTokensError::EmptyString(
self.current_variable_name.clone(),
))?;
return Err(anyhow!(
"The rule `{}` contains an empty string.
Tree-sitter does not support syntactic rules that contain an empty string
unless they are used only as the grammar's start rule.
",
self.current_variable_name
));
}
Variable {
name: string_value.clone(),
@ -357,7 +275,7 @@ impl TokenExtractor {
Variable {
name: format!(
"{}_token{}",
self.current_variable_name, self.current_variable_token_count
&self.current_variable_name, self.current_variable_token_count
),
kind: VariableType::Auxiliary,
rule: rule.clone(),
@ -391,10 +309,6 @@ impl SymbolReplacer {
params: params.clone(),
rule: Box::new(self.replace_symbols_in_rule(rule)),
},
Rule::Reserved { rule, context_name } => Rule::Reserved {
rule: Box::new(self.replace_symbols_in_rule(rule)),
context_name: context_name.clone(),
},
_ => rule.clone(),
}
}
@ -590,13 +504,14 @@ mod test {
]);
grammar.external_tokens = vec![Variable::named("rule_1", Rule::non_terminal(1))];
let result = extract_tokens(grammar);
assert!(result.is_err(), "Expected an error but got no error");
let err = result.err().unwrap();
assert_eq!(
err.to_string(),
"Rule 'rule_1' cannot be used as both an external token and a non-terminal rule"
);
match extract_tokens(grammar) {
Err(e) => {
assert_eq!(e.to_string(), "Rule 'rule_1' cannot be used as both an external token and a non-terminal rule");
}
_ => {
panic!("Expected an error but got no error");
}
}
}
#[test]

View file

@ -1,96 +1,47 @@
use std::collections::HashMap;
use serde::Serialize;
use thiserror::Error;
use anyhow::{anyhow, Result};
use super::ExtractedSyntaxGrammar;
use crate::{
grammars::{
Production, ProductionStep, ReservedWordSetId, SyntaxGrammar, SyntaxVariable, Variable,
},
rules::{Alias, Associativity, Precedence, Rule, Symbol, TokenSet},
use crate::generate::{
grammars::{Production, ProductionStep, SyntaxGrammar, SyntaxVariable, Variable},
rules::{Alias, Associativity, Precedence, Rule, Symbol},
};
pub type FlattenGrammarResult<T> = Result<T, FlattenGrammarError>;
#[derive(Debug, Error, Serialize)]
pub enum FlattenGrammarError {
#[error("No such reserved word set: {0}")]
NoReservedWordSet(String),
#[error(
"The rule `{0}` matches the empty string.
Tree-sitter does not support syntactic rules that match the empty string
unless they are used only as the grammar's start rule.
"
)]
EmptyString(String),
#[error("Rule `{0}` cannot be inlined because it contains a reference to itself")]
RecursiveInline(String),
}
struct RuleFlattener {
production: Production,
reserved_word_set_ids: HashMap<String, ReservedWordSetId>,
precedence_stack: Vec<Precedence>,
associativity_stack: Vec<Associativity>,
reserved_word_stack: Vec<ReservedWordSetId>,
alias_stack: Vec<Alias>,
field_name_stack: Vec<String>,
}
impl RuleFlattener {
const fn new(reserved_word_set_ids: HashMap<String, ReservedWordSetId>) -> Self {
fn new() -> Self {
Self {
production: Production {
steps: Vec::new(),
dynamic_precedence: 0,
},
reserved_word_set_ids,
precedence_stack: Vec::new(),
associativity_stack: Vec::new(),
reserved_word_stack: Vec::new(),
alias_stack: Vec::new(),
field_name_stack: Vec::new(),
}
}
fn flatten_variable(&mut self, variable: Variable) -> FlattenGrammarResult<SyntaxVariable> {
let choices = extract_choices(variable.rule);
let mut productions = Vec::with_capacity(choices.len());
for rule in choices {
let production = self.flatten_rule(rule)?;
if !productions.contains(&production) {
productions.push(production);
}
}
Ok(SyntaxVariable {
name: variable.name,
kind: variable.kind,
productions,
})
fn flatten(mut self, rule: Rule) -> Production {
self.apply(rule, true);
self.production
}
fn flatten_rule(&mut self, rule: Rule) -> FlattenGrammarResult<Production> {
self.production = Production::default();
self.alias_stack.clear();
self.reserved_word_stack.clear();
self.precedence_stack.clear();
self.associativity_stack.clear();
self.field_name_stack.clear();
self.apply(rule, true)?;
Ok(self.production.clone())
}
fn apply(&mut self, rule: Rule, at_end: bool) -> FlattenGrammarResult<bool> {
fn apply(&mut self, rule: Rule, at_end: bool) -> bool {
match rule {
Rule::Seq(members) => {
let mut result = false;
let last_index = members.len() - 1;
for (i, member) in members.into_iter().enumerate() {
result |= self.apply(member, i == last_index && at_end)?;
result |= self.apply(member, i == last_index && at_end);
}
Ok(result)
result
}
Rule::Metadata { rule, params } => {
let mut has_precedence = false;
@ -121,7 +72,7 @@ impl RuleFlattener {
self.production.dynamic_precedence = params.dynamic_precedence;
}
let did_push = self.apply(*rule, at_end)?;
let did_push = self.apply(*rule, at_end);
if has_precedence {
self.precedence_stack.pop();
@ -150,20 +101,7 @@ impl RuleFlattener {
self.field_name_stack.pop();
}
Ok(did_push)
}
Rule::Reserved { rule, context_name } => {
self.reserved_word_stack.push(
self.reserved_word_set_ids
.get(&context_name)
.copied()
.ok_or_else(|| {
FlattenGrammarError::NoReservedWordSet(context_name.clone())
})?,
);
let did_push = self.apply(*rule, at_end)?;
self.reserved_word_stack.pop();
Ok(did_push)
did_push
}
Rule::Symbol(symbol) => {
self.production.steps.push(ProductionStep {
@ -174,17 +112,12 @@ impl RuleFlattener {
.cloned()
.unwrap_or(Precedence::None),
associativity: self.associativity_stack.last().copied(),
reserved_word_set_id: self
.reserved_word_stack
.last()
.copied()
.unwrap_or(ReservedWordSetId::default()),
alias: self.alias_stack.last().cloned(),
field_name: self.field_name_stack.last().cloned(),
});
Ok(true)
true
}
_ => Ok(false),
_ => false,
}
}
}
@ -195,7 +128,7 @@ fn extract_choices(rule: Rule) -> Vec<Rule> {
let mut result = vec![Rule::Blank];
for element in elements {
let extraction = extract_choices(element);
let mut next_result = Vec::with_capacity(result.len());
let mut next_result = Vec::new();
for entry in result {
for extraction_entry in &extraction {
next_result.push(Rule::Seq(vec![entry.clone(), extraction_entry.clone()]));
@ -206,7 +139,7 @@ fn extract_choices(rule: Rule) -> Vec<Rule> {
result
}
Rule::Choice(elements) => {
let mut result = Vec::with_capacity(elements.len());
let mut result = Vec::new();
for element in elements {
for rule in extract_choices(element) {
result.push(rule);
@ -221,18 +154,26 @@ fn extract_choices(rule: Rule) -> Vec<Rule> {
params: params.clone(),
})
.collect(),
Rule::Reserved { rule, context_name } => extract_choices(*rule)
.into_iter()
.map(|rule| Rule::Reserved {
rule: Box::new(rule),
context_name: context_name.clone(),
})
.collect(),
_ => vec![rule],
}
}
fn symbol_is_used(variables: &[SyntaxVariable], symbol: Symbol) -> bool {
fn flatten_variable(variable: Variable) -> SyntaxVariable {
let mut productions = Vec::new();
for rule in extract_choices(variable.rule) {
let production = RuleFlattener::new().flatten(rule);
if !productions.contains(&production) {
productions.push(production);
}
}
SyntaxVariable {
name: variable.name,
kind: variable.kind,
productions,
}
}
pub fn symbol_is_used(variables: &[SyntaxVariable], symbol: Symbol) -> bool {
for variable in variables {
for production in &variable.productions {
for step in &production.steps {
@ -245,48 +186,36 @@ fn symbol_is_used(variables: &[SyntaxVariable], symbol: Symbol) -> bool {
false
}
pub(super) fn flatten_grammar(
grammar: ExtractedSyntaxGrammar,
) -> FlattenGrammarResult<SyntaxGrammar> {
let mut reserved_word_set_ids_by_name = HashMap::new();
for (ix, set) in grammar.reserved_word_sets.iter().enumerate() {
reserved_word_set_ids_by_name.insert(set.name.clone(), ReservedWordSetId(ix));
pub(super) fn flatten_grammar(grammar: ExtractedSyntaxGrammar) -> Result<SyntaxGrammar> {
let mut variables = Vec::new();
for variable in grammar.variables {
variables.push(flatten_variable(variable));
}
let mut flattener = RuleFlattener::new(reserved_word_set_ids_by_name);
let variables = grammar
.variables
.into_iter()
.map(|variable| flattener.flatten_variable(variable))
.collect::<FlattenGrammarResult<Vec<_>>>()?;
for (i, variable) in variables.iter().enumerate() {
let symbol = Symbol::non_terminal(i);
let used = symbol_is_used(&variables, symbol);
for production in &variable.productions {
if used && production.steps.is_empty() {
Err(FlattenGrammarError::EmptyString(variable.name.clone()))?;
if production.steps.is_empty() && symbol_is_used(&variables, symbol) {
return Err(anyhow!(
"The rule `{}` matches the empty string.
Tree-sitter does not support syntactic rules that match the empty string
unless they are used only as the grammar's start rule.
",
variable.name
));
}
if grammar.variables_to_inline.contains(&symbol)
&& production.steps.iter().any(|step| step.symbol == symbol)
{
Err(FlattenGrammarError::RecursiveInline(variable.name.clone()))?;
return Err(anyhow!(
"Rule `{}` cannot be inlined because it contains a reference to itself.",
variable.name,
));
}
}
}
let mut reserved_word_sets = grammar
.reserved_word_sets
.into_iter()
.map(|set| set.reserved_words.into_iter().collect())
.collect::<Vec<_>>();
// If no default reserved word set is specified, there are no reserved words.
if reserved_word_sets.is_empty() {
reserved_word_sets.push(TokenSet::default());
}
Ok(SyntaxGrammar {
extra_symbols: grammar.extra_symbols,
expected_conflicts: grammar.expected_conflicts,
@ -295,7 +224,6 @@ pub(super) fn flatten_grammar(
external_tokens: grammar.external_tokens,
supertype_symbols: grammar.supertype_symbols,
word_token: grammar.word_token,
reserved_word_sets,
variables,
})
}
@ -303,35 +231,32 @@ pub(super) fn flatten_grammar(
#[cfg(test)]
mod tests {
use super::*;
use crate::grammars::VariableType;
use crate::generate::grammars::VariableType;
#[test]
fn test_flatten_grammar() {
let mut flattener = RuleFlattener::new(HashMap::default());
let result = flattener
.flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![
Rule::non_terminal(1),
Rule::prec_left(
Precedence::Integer(101),
Rule::seq(vec![
Rule::non_terminal(2),
Rule::choice(vec![
Rule::prec_right(
Precedence::Integer(102),
Rule::seq(vec![Rule::non_terminal(3), Rule::non_terminal(4)]),
),
Rule::non_terminal(5),
]),
Rule::non_terminal(6),
let result = flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![
Rule::non_terminal(1),
Rule::prec_left(
Precedence::Integer(101),
Rule::seq(vec![
Rule::non_terminal(2),
Rule::choice(vec![
Rule::prec_right(
Precedence::Integer(102),
Rule::seq(vec![Rule::non_terminal(3), Rule::non_terminal(4)]),
),
Rule::non_terminal(5),
]),
),
Rule::non_terminal(7),
]),
})
.unwrap();
Rule::non_terminal(6),
]),
),
Rule::non_terminal(7),
]),
});
assert_eq!(
result.productions,
@ -368,31 +293,28 @@ mod tests {
#[test]
fn test_flatten_grammar_with_maximum_dynamic_precedence() {
let mut flattener = RuleFlattener::new(HashMap::default());
let result = flattener
.flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![
Rule::non_terminal(1),
Rule::prec_dynamic(
101,
Rule::seq(vec![
Rule::non_terminal(2),
Rule::choice(vec![
Rule::prec_dynamic(
102,
Rule::seq(vec![Rule::non_terminal(3), Rule::non_terminal(4)]),
),
Rule::non_terminal(5),
]),
Rule::non_terminal(6),
let result = flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![
Rule::non_terminal(1),
Rule::prec_dynamic(
101,
Rule::seq(vec![
Rule::non_terminal(2),
Rule::choice(vec![
Rule::prec_dynamic(
102,
Rule::seq(vec![Rule::non_terminal(3), Rule::non_terminal(4)]),
),
Rule::non_terminal(5),
]),
),
Rule::non_terminal(7),
]),
})
.unwrap();
Rule::non_terminal(6),
]),
),
Rule::non_terminal(7),
]),
});
assert_eq!(
result.productions,
@ -424,17 +346,14 @@ mod tests {
#[test]
fn test_flatten_grammar_with_final_precedence() {
let mut flattener = RuleFlattener::new(HashMap::default());
let result = flattener
.flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::prec_left(
Precedence::Integer(101),
Rule::seq(vec![Rule::non_terminal(1), Rule::non_terminal(2)]),
),
})
.unwrap();
let result = flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::prec_left(
Precedence::Integer(101),
Rule::seq(vec![Rule::non_terminal(1), Rule::non_terminal(2)]),
),
});
assert_eq!(
result.productions,
@ -449,16 +368,14 @@ mod tests {
}]
);
let result = flattener
.flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::prec_left(
Precedence::Integer(101),
Rule::seq(vec![Rule::non_terminal(1)]),
),
})
.unwrap();
let result = flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::prec_left(
Precedence::Integer(101),
Rule::seq(vec![Rule::non_terminal(1)]),
),
});
assert_eq!(
result.productions,
@ -472,21 +389,18 @@ mod tests {
#[test]
fn test_flatten_grammar_with_field_names() {
let mut flattener = RuleFlattener::new(HashMap::default());
let result = flattener
.flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![
Rule::field("first-thing".to_string(), Rule::terminal(1)),
Rule::terminal(2),
Rule::choice(vec![
Rule::Blank,
Rule::field("second-thing".to_string(), Rule::terminal(3)),
]),
let result = flatten_variable(Variable {
name: "test".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![
Rule::field("first-thing".to_string(), Rule::terminal(1)),
Rule::terminal(2),
Rule::choice(vec![
Rule::Blank,
Rule::field("second-thing".to_string(), Rule::terminal(3)),
]),
})
.unwrap();
]),
});
assert_eq!(
result.productions,
@ -520,7 +434,6 @@ mod tests {
external_tokens: Vec::new(),
supertype_symbols: Vec::new(),
word_token: None,
reserved_word_sets: Vec::new(),
variables: vec![Variable {
name: "test".to_string(),
kind: VariableType::Named,
@ -534,7 +447,7 @@ mod tests {
assert_eq!(
result.unwrap_err().to_string(),
"Rule `test` cannot be inlined because it contains a reference to itself",
"Rule `test` cannot be inlined because it contains a reference to itself.",
);
}
}

View file

@ -1,34 +1,16 @@
use log::warn;
use serde::Serialize;
use thiserror::Error;
use anyhow::{anyhow, Result};
use super::InternedGrammar;
use crate::{
grammars::{InputGrammar, ReservedWordContext, Variable, VariableType},
use crate::generate::{
grammars::{InputGrammar, Variable, VariableType},
rules::{Rule, Symbol},
};
pub type InternSymbolsResult<T> = Result<T, InternSymbolsError>;
#[derive(Debug, Error, Serialize)]
pub enum InternSymbolsError {
#[error("A grammar's start rule must be visible.")]
HiddenStartRule,
#[error("Undefined symbol `{0}`")]
Undefined(String),
#[error("Undefined symbol `{0}` in grammar's supertypes array")]
UndefinedSupertype(String),
#[error("Undefined symbol `{0}` in grammar's conflicts array")]
UndefinedConflict(String),
#[error("Undefined symbol `{0}` as grammar's word token")]
UndefinedWordToken(String),
}
pub(super) fn intern_symbols(grammar: &InputGrammar) -> InternSymbolsResult<InternedGrammar> {
pub(super) fn intern_symbols(grammar: &InputGrammar) -> Result<InternedGrammar> {
let interner = Interner { grammar };
if variable_type_for_name(&grammar.variables[0].name) == VariableType::Hidden {
Err(InternSymbolsError::HiddenStartRule)?;
return Err(anyhow!("A grammar's start rule must be visible."));
}
let mut variables = Vec::with_capacity(grammar.variables.len());
@ -58,31 +40,21 @@ pub(super) fn intern_symbols(grammar: &InputGrammar) -> InternSymbolsResult<Inte
let mut supertype_symbols = Vec::with_capacity(grammar.supertype_symbols.len());
for supertype_symbol_name in &grammar.supertype_symbols {
supertype_symbols.push(interner.intern_name(supertype_symbol_name).ok_or_else(|| {
InternSymbolsError::UndefinedSupertype(supertype_symbol_name.clone())
})?);
supertype_symbols.push(
interner
.intern_name(supertype_symbol_name)
.ok_or_else(|| anyhow!("Undefined symbol `{supertype_symbol_name}`"))?,
);
}
let mut reserved_words = Vec::with_capacity(grammar.reserved_words.len());
for reserved_word_set in &grammar.reserved_words {
let mut interned_set = Vec::with_capacity(reserved_word_set.reserved_words.len());
for rule in &reserved_word_set.reserved_words {
interned_set.push(interner.intern_rule(rule, None)?);
}
reserved_words.push(ReservedWordContext {
name: reserved_word_set.name.clone(),
reserved_words: interned_set,
});
}
let mut expected_conflicts = Vec::with_capacity(grammar.expected_conflicts.len());
let mut expected_conflicts = Vec::new();
for conflict in &grammar.expected_conflicts {
let mut interned_conflict = Vec::with_capacity(conflict.len());
for name in conflict {
interned_conflict.push(
interner
.intern_name(name)
.ok_or_else(|| InternSymbolsError::UndefinedConflict(name.clone()))?,
.ok_or_else(|| anyhow!("Undefined symbol `{name}`"))?,
);
}
expected_conflicts.push(interned_conflict);
@ -95,15 +67,14 @@ pub(super) fn intern_symbols(grammar: &InputGrammar) -> InternSymbolsResult<Inte
}
}
let word_token = if let Some(name) = grammar.word_token.as_ref() {
Some(
let mut word_token = None;
if let Some(name) = grammar.word_token.as_ref() {
word_token = Some(
interner
.intern_name(name)
.ok_or_else(|| InternSymbolsError::UndefinedWordToken(name.clone()))?,
)
} else {
None
};
.ok_or_else(|| anyhow!("Undefined symbol `{name}`"))?,
);
}
for (i, variable) in variables.iter_mut().enumerate() {
if supertype_symbols.contains(&Symbol::non_terminal(i)) {
@ -120,7 +91,6 @@ pub(super) fn intern_symbols(grammar: &InputGrammar) -> InternSymbolsResult<Inte
supertype_symbols,
word_token,
precedence_orderings: grammar.precedence_orderings.clone(),
reserved_word_sets: reserved_words,
})
}
@ -128,11 +98,11 @@ struct Interner<'a> {
grammar: &'a InputGrammar,
}
impl Interner<'_> {
fn intern_rule(&self, rule: &Rule, name: Option<&str>) -> InternSymbolsResult<Rule> {
impl<'a> Interner<'a> {
fn intern_rule(&self, rule: &Rule, name: Option<&str>) -> Result<Rule> {
match rule {
Rule::Choice(elements) => {
self.check_single(elements, name, "choice");
self.check_single(elements, name);
let mut result = Vec::with_capacity(elements.len());
for element in elements {
result.push(self.intern_rule(element, name)?);
@ -140,7 +110,7 @@ impl Interner<'_> {
Ok(Rule::Choice(result))
}
Rule::Seq(elements) => {
self.check_single(elements, name, "seq");
self.check_single(elements, name);
let mut result = Vec::with_capacity(elements.len());
for element in elements {
result.push(self.intern_rule(element, name)?);
@ -152,12 +122,8 @@ impl Interner<'_> {
rule: Box::new(self.intern_rule(rule, name)?),
params: params.clone(),
}),
Rule::Reserved { rule, context_name } => Ok(Rule::Reserved {
rule: Box::new(self.intern_rule(rule, name)?),
context_name: context_name.clone(),
}),
Rule::NamedSymbol(name) => self.intern_name(name).map_or_else(
|| Err(InternSymbolsError::Undefined(name.clone())),
|| Err(anyhow!("Undefined symbol `{name}`")),
|symbol| Ok(Rule::Symbol(symbol)),
),
_ => Ok(rule.clone()),
@ -184,10 +150,10 @@ impl Interner<'_> {
// In the case of a seq or choice rule of 1 element in a hidden rule, weird
// inconsistent behavior with queries can occur. So we should warn the user about it.
fn check_single(&self, elements: &[Rule], name: Option<&str>, kind: &str) {
fn check_single(&self, elements: &[Rule], name: Option<&str>) {
if elements.len() == 1 && matches!(elements[0], Rule::String(_) | Rule::Pattern(_, _)) {
warn!(
"rule {} contains a `{kind}` rule with a single element. This is unnecessary.",
eprintln!(
"Warning: rule {} is just a `seq` or `choice` rule with a single element. This is unnecessary.",
name.unwrap_or_default()
);
}
@ -278,9 +244,10 @@ mod tests {
fn test_grammar_with_undefined_symbols() {
let result = intern_symbols(&build_grammar(vec![Variable::named("x", Rule::named("y"))]));
assert!(result.is_err(), "Expected an error but got none");
let e = result.err().unwrap();
assert_eq!(e.to_string(), "Undefined symbol `y`");
match result {
Err(e) => assert_eq!(e.to_string(), "Undefined symbol `y`"),
_ => panic!("Expected an error but got none"),
}
}
fn build_grammar(variables: Vec<Variable>) -> InputGrammar {

View file

@ -8,18 +8,12 @@ mod process_inlines;
use std::{
cmp::Ordering,
collections::{hash_map, BTreeSet, HashMap, HashSet},
collections::{hash_map, HashMap, HashSet},
mem,
};
pub use expand_tokens::ExpandTokensError;
pub use extract_tokens::ExtractTokensError;
pub use flatten_grammar::FlattenGrammarError;
use indexmap::IndexMap;
pub use intern_symbols::InternSymbolsError;
pub use process_inlines::ProcessInlinesError;
use serde::Serialize;
use thiserror::Error;
use anyhow::{anyhow, Result};
pub(super) use flatten_grammar::symbol_is_used;
pub use self::expand_tokens::expand_tokens;
use self::{
@ -34,7 +28,6 @@ use super::{
},
rules::{AliasMap, Precedence, Rule, Symbol},
};
use crate::grammars::ReservedWordContext;
pub struct IntermediateGrammar<T, U> {
variables: Vec<Variable>,
@ -45,7 +38,6 @@ pub struct IntermediateGrammar<T, U> {
variables_to_inline: Vec<Symbol>,
supertype_symbols: Vec<Symbol>,
word_token: Option<Symbol>,
reserved_word_sets: Vec<ReservedWordContext<T>>,
}
pub type InternedGrammar = IntermediateGrammar<Rule, Variable>;
@ -69,96 +61,21 @@ impl<T, U> Default for IntermediateGrammar<T, U> {
variables_to_inline: Vec::default(),
supertype_symbols: Vec::default(),
word_token: Option::default(),
reserved_word_sets: Vec::default(),
}
}
}
pub type PrepareGrammarResult<T> = Result<T, PrepareGrammarError>;
#[derive(Debug, Error, Serialize)]
#[error(transparent)]
pub enum PrepareGrammarError {
ValidatePrecedences(#[from] ValidatePrecedenceError),
ValidateIndirectRecursion(#[from] IndirectRecursionError),
InternSymbols(#[from] InternSymbolsError),
ExtractTokens(#[from] ExtractTokensError),
FlattenGrammar(#[from] FlattenGrammarError),
ExpandTokens(#[from] ExpandTokensError),
ProcessInlines(#[from] ProcessInlinesError),
}
pub type ValidatePrecedenceResult<T> = Result<T, ValidatePrecedenceError>;
#[derive(Debug, Error, Serialize)]
#[error(transparent)]
pub enum ValidatePrecedenceError {
Undeclared(#[from] UndeclaredPrecedenceError),
Ordering(#[from] ConflictingPrecedenceOrderingError),
}
#[derive(Debug, Error, Serialize)]
pub struct IndirectRecursionError(pub Vec<String>);
impl std::fmt::Display for IndirectRecursionError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Grammar contains an indirectly recursive rule: ")?;
for (i, symbol) in self.0.iter().enumerate() {
if i > 0 {
write!(f, " -> ")?;
}
write!(f, "{symbol}")?;
}
Ok(())
}
}
#[derive(Debug, Error, Serialize)]
pub struct UndeclaredPrecedenceError {
pub precedence: String,
pub rule: String,
}
impl std::fmt::Display for UndeclaredPrecedenceError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"Undeclared precedence '{}' in rule '{}'",
self.precedence, self.rule
)?;
Ok(())
}
}
#[derive(Debug, Error, Serialize)]
pub struct ConflictingPrecedenceOrderingError {
pub precedence_1: String,
pub precedence_2: String,
}
impl std::fmt::Display for ConflictingPrecedenceOrderingError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"Conflicting orderings for precedences {} and {}",
self.precedence_1, self.precedence_2
)?;
Ok(())
}
}
/// Transform an input grammar into separate components that are ready
/// for parse table construction.
pub fn prepare_grammar(
input_grammar: &InputGrammar,
) -> PrepareGrammarResult<(
) -> Result<(
SyntaxGrammar,
LexicalGrammar,
InlinedProductionMap,
AliasMap,
)> {
validate_precedences(input_grammar)?;
validate_indirect_recursion(input_grammar)?;
let interned_grammar = intern_symbols(input_grammar)?;
let (syntax_grammar, lexical_grammar) = extract_tokens(interned_grammar)?;
@ -170,115 +87,10 @@ pub fn prepare_grammar(
Ok((syntax_grammar, lexical_grammar, inlines, default_aliases))
}
/// Check for indirect recursion cycles in the grammar that can cause infinite loops while
/// parsing. An indirect recursion cycle occurs when a non-terminal can derive itself through
/// a chain of single-symbol productions (e.g., A -> B, B -> A).
fn validate_indirect_recursion(grammar: &InputGrammar) -> Result<(), IndirectRecursionError> {
let mut epsilon_transitions: IndexMap<&str, BTreeSet<String>> = IndexMap::new();
for variable in &grammar.variables {
let productions = get_single_symbol_productions(&variable.rule);
// Filter out rules that *directly* reference themselves, as this doesn't
// cause a parsing loop.
let filtered: BTreeSet<String> = productions
.into_iter()
.filter(|s| s != &variable.name)
.collect();
epsilon_transitions.insert(variable.name.as_str(), filtered);
}
for start_symbol in epsilon_transitions.keys() {
let mut visited = BTreeSet::new();
let mut path = Vec::new();
if let Some((start_idx, end_idx)) =
get_cycle(start_symbol, &epsilon_transitions, &mut visited, &mut path)
{
let cycle_symbols = path[start_idx..=end_idx]
.iter()
.map(|s| (*s).to_string())
.collect();
return Err(IndirectRecursionError(cycle_symbols));
}
}
Ok(())
}
fn get_single_symbol_productions(rule: &Rule) -> BTreeSet<String> {
match rule {
Rule::NamedSymbol(name) => BTreeSet::from([name.clone()]),
Rule::Choice(choices) => choices
.iter()
.flat_map(get_single_symbol_productions)
.collect(),
Rule::Metadata { rule, .. } => get_single_symbol_productions(rule),
_ => BTreeSet::new(),
}
}
/// Perform a depth-first search to detect cycles in single state transitions.
fn get_cycle<'a>(
current: &'a str,
transitions: &'a IndexMap<&'a str, BTreeSet<String>>,
visited: &mut BTreeSet<&'a str>,
path: &mut Vec<&'a str>,
) -> Option<(usize, usize)> {
if let Some(first_idx) = path.iter().position(|s| *s == current) {
path.push(current);
return Some((first_idx, path.len() - 1));
}
if visited.contains(current) {
return None;
}
path.push(current);
visited.insert(current);
if let Some(next_symbols) = transitions.get(current) {
for next in next_symbols {
if let Some(cycle) = get_cycle(next, transitions, visited, path) {
return Some(cycle);
}
}
}
path.pop();
None
}
/// Check that all of the named precedences used in the grammar are declared
/// within the `precedences` lists, and also that there are no conflicting
/// precedence orderings declared in those lists.
fn validate_precedences(grammar: &InputGrammar) -> ValidatePrecedenceResult<()> {
// Check that no rule contains a named precedence that is not present in
// any of the `precedences` lists.
fn validate(
rule_name: &str,
rule: &Rule,
names: &HashSet<&String>,
) -> ValidatePrecedenceResult<()> {
match rule {
Rule::Repeat(rule) => validate(rule_name, rule, names),
Rule::Seq(elements) | Rule::Choice(elements) => elements
.iter()
.try_for_each(|e| validate(rule_name, e, names)),
Rule::Metadata { rule, params } => {
if let Precedence::Name(n) = &params.precedence {
if !names.contains(n) {
Err(UndeclaredPrecedenceError {
precedence: n.clone(),
rule: rule_name.to_string(),
})?;
}
}
validate(rule_name, rule, names)?;
Ok(())
}
_ => Ok(()),
}
}
fn validate_precedences(grammar: &InputGrammar) -> Result<()> {
// For any two precedence names `a` and `b`, if `a` comes before `b`
// in some list, then it cannot come *after* `b` in any list.
let mut pairs = HashMap::new();
@ -299,10 +111,9 @@ fn validate_precedences(grammar: &InputGrammar) -> ValidatePrecedenceResult<()>
}
hash_map::Entry::Occupied(e) => {
if e.get() != &ordering {
Err(ConflictingPrecedenceOrderingError {
precedence_1: entry1.to_string(),
precedence_2: entry2.to_string(),
})?;
return Err(anyhow!(
"Conflicting orderings for precedences {entry1} and {entry2}",
));
}
}
}
@ -310,6 +121,27 @@ fn validate_precedences(grammar: &InputGrammar) -> ValidatePrecedenceResult<()>
}
}
// Check that no rule contains a named precedence that is not present in
// any of the `precedences` lists.
fn validate(rule_name: &str, rule: &Rule, names: &HashSet<&String>) -> Result<()> {
match rule {
Rule::Repeat(rule) => validate(rule_name, rule, names),
Rule::Seq(elements) | Rule::Choice(elements) => elements
.iter()
.try_for_each(|e| validate(rule_name, e, names)),
Rule::Metadata { rule, params } => {
if let Precedence::Name(n) = &params.precedence {
if !names.contains(n) {
return Err(anyhow!("Undeclared precedence '{n}' in rule '{rule_name}'"));
}
}
validate(rule_name, rule, names)?;
Ok(())
}
_ => Ok(()),
}
}
let precedence_names = grammar
.precedence_orderings
.iter()
@ -332,7 +164,7 @@ fn validate_precedences(grammar: &InputGrammar) -> ValidatePrecedenceResult<()>
#[cfg(test)]
mod tests {
use super::*;
use crate::grammars::VariableType;
use crate::generate::grammars::VariableType;
#[test]
fn test_validate_precedences_with_undeclared_precedence() {

View file

@ -1,9 +1,8 @@
use std::collections::HashMap;
use serde::Serialize;
use thiserror::Error;
use anyhow::{anyhow, Result};
use crate::{
use crate::generate::{
grammars::{InlinedProductionMap, LexicalGrammar, Production, ProductionStep, SyntaxGrammar},
rules::SymbolType,
};
@ -70,13 +69,12 @@ impl InlinedProductionMapBuilder {
let production_map = production_indices_by_step_id
.into_iter()
.map(|(step_id, production_indices)| {
let production =
core::ptr::from_ref::<Production>(step_id.variable_index.map_or_else(
|| &productions[step_id.production_index],
|variable_index| {
&grammar.variables[variable_index].productions[step_id.production_index]
},
));
let production = step_id.variable_index.map_or_else(
|| &productions[step_id.production_index],
|variable_index| {
&grammar.variables[variable_index].productions[step_id.production_index]
},
) as *const Production;
((production, step_id.step_index as u32), production_indices)
})
.collect();
@ -189,38 +187,29 @@ impl InlinedProductionMapBuilder {
}
}
pub type ProcessInlinesResult<T> = Result<T, ProcessInlinesError>;
#[derive(Debug, Error, Serialize)]
pub enum ProcessInlinesError {
#[error("External token `{0}` cannot be inlined")]
ExternalToken(String),
#[error("Token `{0}` cannot be inlined")]
Token(String),
#[error("Rule `{0}` cannot be inlined because it is the first rule")]
FirstRule(String),
}
pub(super) fn process_inlines(
grammar: &SyntaxGrammar,
lexical_grammar: &LexicalGrammar,
) -> ProcessInlinesResult<InlinedProductionMap> {
) -> Result<InlinedProductionMap> {
for symbol in &grammar.variables_to_inline {
match symbol.kind {
SymbolType::External => {
Err(ProcessInlinesError::ExternalToken(
grammar.external_tokens[symbol.index].name.clone(),
))?;
return Err(anyhow!(
"External token `{}` cannot be inlined",
grammar.external_tokens[symbol.index].name
))
}
SymbolType::Terminal => {
Err(ProcessInlinesError::Token(
lexical_grammar.variables[symbol.index].name.clone(),
))?;
return Err(anyhow!(
"Token `{}` cannot be inlined",
lexical_grammar.variables[symbol.index].name,
))
}
SymbolType::NonTerminal if symbol.index == 0 => {
Err(ProcessInlinesError::FirstRule(
grammar.variables[symbol.index].name.clone(),
))?;
return Err(anyhow!(
"Rule `{}` cannot be inlined because it is the first rule",
grammar.variables[symbol.index].name,
))
}
_ => {}
}
@ -236,7 +225,7 @@ pub(super) fn process_inlines(
#[cfg(test)]
mod tests {
use super::*;
use crate::{
use crate::generate::{
grammars::{LexicalVariable, SyntaxVariable, VariableType},
rules::{Associativity, Precedence, Symbol},
};
@ -549,9 +538,10 @@ mod tests {
..Default::default()
};
let result = process_inlines(&grammar, &lexical_grammar);
assert!(result.is_err(), "expected an error, but got none");
let err = result.err().unwrap();
assert_eq!(err.to_string(), "Token `something` cannot be inlined",);
if let Err(error) = process_inlines(&grammar, &lexical_grammar) {
assert_eq!(error.to_string(), "Token `something` cannot be inlined");
} else {
panic!("expected an error, but got none");
}
}
}

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1 @@
{"Other":"C","Control":"Cc","cntrl":"Cc","Format":"Cf","Unassigned":"Cn","Private_Use":"Co","Surrogate":"Cs","Letter":"L","Cased_Letter":"LC","Lowercase_Letter":"Ll","Modifier_Letter":"Lm","Other_Letter":"Lo","Titlecase_Letter":"Lt","Uppercase_Letter":"Lu","Mark":"M","Combining_Mark":"M","Spacing_Mark":"Mc","Enclosing_Mark":"Me","Nonspacing_Mark":"Mn","Number":"N","Decimal_Number":"Nd","digit":"Nd","Letter_Number":"Nl","Other_Number":"No","Punctuation":"P","punct":"P","Connector_Punctuation":"Pc","Dash_Punctuation":"Pd","Close_Punctuation":"Pe","Final_Punctuation":"Pf","Initial_Punctuation":"Pi","Other_Punctuation":"Po","Open_Punctuation":"Ps","Symbol":"S","Currency_Symbol":"Sc","Modifier_Symbol":"Sk","Math_Symbol":"Sm","Other_Symbol":"So","Separator":"Z","Line_Separator":"Zl","Paragraph_Separator":"Zp","Space_Separator":"Zs"}

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1 @@
{"cjkAccountingNumeric":"kAccountingNumeric","cjkOtherNumeric":"kOtherNumeric","cjkPrimaryNumeric":"kPrimaryNumeric","nv":"Numeric_Value","bmg":"Bidi_Mirroring_Glyph","bpb":"Bidi_Paired_Bracket","cf":"Case_Folding","cjkCompatibilityVariant":"kCompatibilityVariant","dm":"Decomposition_Mapping","EqUIdeo":"Equivalent_Unified_Ideograph","FC_NFKC":"FC_NFKC_Closure","lc":"Lowercase_Mapping","NFKC_CF":"NFKC_Casefold","NFKC_SCF":"NFKC_Simple_Casefold","scf":"Simple_Case_Folding","sfc":"Simple_Case_Folding","slc":"Simple_Lowercase_Mapping","stc":"Simple_Titlecase_Mapping","suc":"Simple_Uppercase_Mapping","tc":"Titlecase_Mapping","uc":"Uppercase_Mapping","cjkIICore":"kIICore","cjkIRG_GSource":"kIRG_GSource","cjkIRG_HSource":"kIRG_HSource","cjkIRG_JSource":"kIRG_JSource","cjkIRG_KPSource":"kIRG_KPSource","cjkIRG_KSource":"kIRG_KSource","cjkIRG_MSource":"kIRG_MSource","cjkIRG_SSource":"kIRG_SSource","cjkIRG_TSource":"kIRG_TSource","cjkIRG_UKSource":"kIRG_UKSource","cjkIRG_USource":"kIRG_USource","cjkIRG_VSource":"kIRG_VSource","cjkRSUnicode":"kRSUnicode","Unicode_Radical_Stroke":"kRSUnicode","URS":"kRSUnicode","isc":"ISO_Comment","JSN":"Jamo_Short_Name","na":"Name","na1":"Unicode_1_Name","Name_Alias":"Name_Alias","scx":"Script_Extensions","age":"Age","blk":"Block","sc":"Script","bc":"Bidi_Class","bpt":"Bidi_Paired_Bracket_Type","ccc":"Canonical_Combining_Class","dt":"Decomposition_Type","ea":"East_Asian_Width","gc":"General_Category","GCB":"Grapheme_Cluster_Break","hst":"Hangul_Syllable_Type","InCB":"Indic_Conjunct_Break","InPC":"Indic_Positional_Category","InSC":"Indic_Syllabic_Category","jg":"Joining_Group","jt":"Joining_Type","lb":"Line_Break","NFC_QC":"NFC_Quick_Check","NFD_QC":"NFD_Quick_Check","NFKC_QC":"NFKC_Quick_Check","NFKD_QC":"NFKD_Quick_Check","nt":"Numeric_Type","SB":"Sentence_Break","vo":"Vertical_Orientation","WB":"Word_Break","AHex":"ASCII_Hex_Digit","Alpha":"Alphabetic","Bidi_C":"Bidi_Control","Bidi_M":"Bidi_Mirrored","Cased":"Cased","CE":"Composition_Exclusion","CI":"Case_Ignorable","Comp_Ex":"Full_Composition_Exclusion","CWCF":"Changes_When_Casefolded","CWCM":"Changes_When_Casemapped","CWKCF":"Changes_When_NFKC_Casefolded","CWL":"Changes_When_Lowercased","CWT":"Changes_When_Titlecased","CWU":"Changes_When_Uppercased","Dash":"Dash","Dep":"Deprecated","DI":"Default_Ignorable_Code_Point","Dia":"Diacritic","EBase":"Emoji_Modifier_Base","EComp":"Emoji_Component","EMod":"Emoji_Modifier","Emoji":"Emoji","EPres":"Emoji_Presentation","Ext":"Extender","ExtPict":"Extended_Pictographic","Gr_Base":"Grapheme_Base","Gr_Ext":"Grapheme_Extend","Gr_Link":"Grapheme_Link","Hex":"Hex_Digit","Hyphen":"Hyphen","ID_Compat_Math_Continue":"ID_Compat_Math_Continue","ID_Compat_Math_Start":"ID_Compat_Math_Start","IDC":"ID_Continue","Ideo":"Ideographic","IDS":"ID_Start","IDSB":"IDS_Binary_Operator","IDST":"IDS_Trinary_Operator","IDSU":"IDS_Unary_Operator","Join_C":"Join_Control","LOE":"Logical_Order_Exception","Lower":"Lowercase","Math":"Math","NChar":"Noncharacter_Code_Point","OAlpha":"Other_Alphabetic","ODI":"Other_Default_Ignorable_Code_Point","OGr_Ext":"Other_Grapheme_Extend","OIDC":"Other_ID_Continue","OIDS":"Other_ID_Start","OLower":"Other_Lowercase","OMath":"Other_Math","OUpper":"Other_Uppercase","Pat_Syn":"Pattern_Syntax","Pat_WS":"Pattern_White_Space","PCM":"Prepended_Concatenation_Mark","QMark":"Quotation_Mark","Radical":"Radical","RI":"Regional_Indicator","SD":"Soft_Dotted","STerm":"Sentence_Terminal","Term":"Terminal_Punctuation","UIdeo":"Unified_Ideograph","Upper":"Uppercase","VS":"Variation_Selector","WSpace":"White_Space","space":"White_Space","XIDC":"XID_Continue","XIDS":"XID_Start","XO_NFC":"Expands_On_NFC","XO_NFD":"Expands_On_NFD","XO_NFKC":"Expands_On_NFKC","XO_NFKD":"Expands_On_NFKD"}

View file

@ -1,19 +1,15 @@
use std::{
cmp,
collections::{BTreeMap, BTreeSet, HashMap, HashSet},
collections::{HashMap, HashSet},
fmt::Write,
mem::swap,
};
use crate::LANGUAGE_VERSION;
use indoc::indoc;
use super::{
build_tables::Tables,
grammars::{ExternalToken, LexicalGrammar, SyntaxGrammar, VariableType},
nfa::CharacterSet,
node_types::ChildType,
rules::{Alias, AliasMap, Symbol, SymbolType, TokenSet},
rules::{Alias, AliasMap, Symbol, SymbolType},
tables::{
AdvanceAction, FieldLocation, GotoAction, LexState, LexTable, ParseAction, ParseTable,
ParseTableEntry,
@ -21,11 +17,10 @@ use super::{
};
const SMALL_STATE_THRESHOLD: usize = 64;
pub const ABI_VERSION_MIN: usize = 14;
pub const ABI_VERSION_MAX: usize = LANGUAGE_VERSION;
const ABI_VERSION_WITH_RESERVED_WORDS: usize = 15;
const ABI_VERSION_MIN: usize = 13;
const ABI_VERSION_MAX: usize = tree_sitter::LANGUAGE_VERSION;
const ABI_VERSION_WITH_PRIMARY_STATES: usize = 14;
#[clippy::format_args]
macro_rules! add {
($this: tt, $($arg: tt)*) => {{
$this.buffer.write_fmt(format_args!($($arg)*)).unwrap();
@ -34,15 +29,12 @@ macro_rules! add {
macro_rules! add_whitespace {
($this:tt) => {{
// 4 bytes per char, 2 spaces per indent level
$this.buffer.reserve(4 * 2 * $this.indent_level);
for _ in 0..$this.indent_level {
write!(&mut $this.buffer, " ").unwrap();
}
}};
}
#[clippy::format_args]
macro_rules! add_line {
($this: tt, $($arg: tt)*) => {
add_whitespace!($this);
@ -64,7 +56,6 @@ macro_rules! dedent {
};
}
#[derive(Default)]
struct Generator {
buffer: String,
indent_level: usize,
@ -75,6 +66,7 @@ struct Generator {
large_character_sets: Vec<(Option<Symbol>, CharacterSet)>,
large_character_set_info: Vec<LargeCharacterSetInfo>,
large_state_count: usize,
keyword_capture_token: Option<Symbol>,
syntax_grammar: SyntaxGrammar,
lexical_grammar: LexicalGrammar,
default_aliases: AliasMap,
@ -83,13 +75,10 @@ struct Generator {
alias_ids: HashMap<Alias, String>,
unique_aliases: Vec<Alias>,
symbol_map: HashMap<Symbol, Symbol>,
reserved_word_sets: Vec<TokenSet>,
reserved_word_set_ids_by_parse_state: Vec<usize>,
field_names: Vec<String>,
supertype_symbol_map: BTreeMap<Symbol, Vec<ChildType>>,
supertype_map: BTreeMap<String, Vec<ChildType>>,
#[allow(unused)]
abi_version: usize,
metadata: Option<Metadata>,
}
struct LargeCharacterSetInfo {
@ -97,16 +86,9 @@ struct LargeCharacterSetInfo {
is_used: bool,
}
struct Metadata {
major_version: u8,
minor_version: u8,
patch_version: u8,
}
impl Generator {
fn generate(mut self) -> String {
self.init();
self.add_header();
self.add_includes();
self.add_pragmas();
self.add_stats();
@ -126,10 +108,9 @@ impl Generator {
}
self.add_non_terminal_alias_map();
self.add_primary_state_id_list();
if self.abi_version >= ABI_VERSION_WITH_RESERVED_WORDS && !self.supertype_map.is_empty() {
self.add_supertype_map();
if self.abi_version >= ABI_VERSION_WITH_PRIMARY_STATES {
self.add_primary_state_id_list();
}
let buffer_offset_before_lex_functions = self.buffer.len();
@ -138,7 +119,7 @@ impl Generator {
swap(&mut main_lex_table, &mut self.main_lex_table);
self.add_lex_function("ts_lex", main_lex_table);
if self.syntax_grammar.word_token.is_some() {
if self.keyword_capture_token.is_some() {
let mut keyword_lex_table = LexTable::default();
swap(&mut keyword_lex_table, &mut self.keyword_lex_table);
self.add_lex_function("ts_lex_keywords", keyword_lex_table);
@ -154,13 +135,7 @@ impl Generator {
}
self.buffer.push_str(&lex_functions);
self.add_lex_modes();
if self.abi_version >= ABI_VERSION_WITH_RESERVED_WORDS && self.reserved_word_sets.len() > 1
{
self.add_reserved_word_sets();
}
self.add_lex_modes_list();
self.add_parse_table();
if !self.syntax_grammar.external_tokens.is_empty() {
@ -241,24 +216,33 @@ impl Generator {
for alias in &production_info.alias_sequence {
// Generate a mapping from aliases to C identifiers.
if let Some(alias) = &alias {
// Some aliases match an existing symbol in the grammar.
let alias_id =
if let Some(existing_symbol) = self.symbols_for_alias(alias).first() {
self.symbol_ids[&self.symbol_map[existing_symbol]].clone()
}
// Other aliases don't match any existing symbol, and need their own
// identifiers.
else {
if let Err(i) = self.unique_aliases.binary_search(alias) {
self.unique_aliases.insert(i, alias.clone());
}
let existing_symbol = self.parse_table.symbols.iter().copied().find(|symbol| {
self.default_aliases.get(symbol).map_or_else(
|| {
let (name, kind) = self.metadata_for_symbol(*symbol);
name == alias.value && kind == alias.kind()
},
|default_alias| default_alias == alias,
)
});
if alias.is_named {
format!("alias_sym_{}", self.sanitize_identifier(&alias.value))
} else {
format!("anon_alias_sym_{}", self.sanitize_identifier(&alias.value))
}
};
// Some aliases match an existing symbol in the grammar.
let alias_id = if let Some(existing_symbol) = existing_symbol {
self.symbol_ids[&self.symbol_map[&existing_symbol]].clone()
}
// Other aliases don't match any existing symbol, and need their own
// identifiers.
else {
if let Err(i) = self.unique_aliases.binary_search(alias) {
self.unique_aliases.insert(i, alias.clone());
}
if alias.is_named {
format!("alias_sym_{}", self.sanitize_identifier(&alias.value))
} else {
format!("anon_alias_sym_{}", self.sanitize_identifier(&alias.value))
}
};
self.alias_ids.entry(alias.clone()).or_insert(alias_id);
}
@ -282,34 +266,6 @@ impl Generator {
});
}
// Assign an id to each unique reserved word set
self.reserved_word_sets.push(TokenSet::new());
for state in &self.parse_table.states {
let id = if let Some(ix) = self
.reserved_word_sets
.iter()
.position(|set| *set == state.reserved_words)
{
ix
} else {
self.reserved_word_sets.push(state.reserved_words.clone());
self.reserved_word_sets.len() - 1
};
self.reserved_word_set_ids_by_parse_state.push(id);
}
if self.abi_version >= ABI_VERSION_WITH_RESERVED_WORDS {
for (supertype, subtypes) in &self.supertype_symbol_map {
if let Some(supertype) = self.symbol_ids.get(supertype) {
self.supertype_map
.entry(supertype.clone())
.or_insert_with(|| subtypes.clone());
}
}
self.supertype_symbol_map.clear();
}
// Determine which states should use the "small state" representation, and which should
// use the normal array representation.
let threshold = cmp::min(SMALL_STATE_THRESHOLD, self.parse_table.symbols.len() / 2);
@ -324,11 +280,6 @@ impl Generator {
.count();
}
fn add_header(&mut self) {
add_line!(self, "/* Automatically @generated by tree-sitter */",);
add_line!(self, "");
}
fn add_includes(&mut self) {
add_line!(self, "#include \"tree_sitter/parser.h\"");
add_line!(self, "");
@ -390,7 +341,7 @@ impl Generator {
self.parse_table.symbols.len()
);
add_line!(self, "#define ALIAS_COUNT {}", self.unique_aliases.len());
add_line!(self, "#define TOKEN_COUNT {token_count}");
add_line!(self, "#define TOKEN_COUNT {}", token_count);
add_line!(
self,
"#define EXTERNAL_TOKEN_COUNT {}",
@ -402,22 +353,11 @@ impl Generator {
"#define MAX_ALIAS_SEQUENCE_LENGTH {}",
self.parse_table.max_aliased_production_length
);
add_line!(
self,
"#define MAX_RESERVED_WORD_SET_SIZE {}",
self.reserved_word_sets
.iter()
.map(TokenSet::len)
.max()
.unwrap()
);
add_line!(
self,
"#define PRODUCTION_ID_COUNT {}",
self.parse_table.production_infos.len()
);
add_line!(self, "#define SUPERTYPE_COUNT {}", self.supertype_map.len());
add_line!(self, "");
}
@ -679,32 +619,31 @@ impl Generator {
&mut next_flat_field_map_index,
);
let mut field_map_ids = Vec::with_capacity(self.parse_table.production_infos.len());
let mut field_map_ids = Vec::new();
for production_info in &self.parse_table.production_infos {
if production_info.field_map.is_empty() {
field_map_ids.push((0, 0));
} else {
let mut flat_field_map = Vec::with_capacity(production_info.field_map.len());
let mut flat_field_map = Vec::new();
for (field_name, locations) in &production_info.field_map {
for location in locations {
flat_field_map.push((field_name.clone(), *location));
}
}
let field_map_len = flat_field_map.len();
field_map_ids.push((
self.get_field_map_id(
flat_field_map,
flat_field_map.clone(),
&mut flat_field_maps,
&mut next_flat_field_map_index,
),
field_map_len,
flat_field_map.len(),
));
}
}
add_line!(
self,
"static const TSMapSlice ts_field_map_slices[PRODUCTION_ID_COUNT] = {{",
"static const TSFieldMapSlice ts_field_map_slices[PRODUCTION_ID_COUNT] = {{",
);
indent!(self);
for (production_id, (row_id, length)) in field_map_ids.into_iter().enumerate() {
@ -743,83 +682,6 @@ impl Generator {
add_line!(self, "");
}
fn add_supertype_map(&mut self) {
add_line!(
self,
"static const TSSymbol ts_supertype_symbols[SUPERTYPE_COUNT] = {{"
);
indent!(self);
for supertype in self.supertype_map.keys() {
add_line!(self, "{supertype},");
}
dedent!(self);
add_line!(self, "}};\n");
add_line!(
self,
"static const TSMapSlice ts_supertype_map_slices[] = {{",
);
indent!(self);
let mut row_id = 0;
let mut supertype_ids = vec![0];
let mut supertype_string_map = BTreeMap::new();
for (supertype, subtypes) in &self.supertype_map {
supertype_string_map.insert(
supertype,
subtypes
.iter()
.flat_map(|s| match s {
ChildType::Normal(symbol) => vec![self.symbol_ids.get(symbol).cloned()],
ChildType::Aliased(alias) => {
self.alias_ids.get(alias).cloned().map_or_else(
|| {
self.symbols_for_alias(alias)
.into_iter()
.map(|s| self.symbol_ids.get(&s).cloned())
.collect()
},
|a| vec![Some(a)],
)
}
})
.flatten()
.collect::<BTreeSet<String>>(),
);
}
for (supertype, subtypes) in &supertype_string_map {
let length = subtypes.len();
add_line!(
self,
"[{supertype}] = {{.index = {row_id}, .length = {length}}},",
);
row_id += length;
supertype_ids.push(row_id);
}
dedent!(self);
add_line!(self, "}};");
add_line!(self, "");
add_line!(
self,
"static const TSSymbol ts_supertype_map_entries[] = {{",
);
indent!(self);
for (i, (_, subtypes)) in supertype_string_map.iter().enumerate() {
let row_index = supertype_ids[i];
add_line!(self, "[{row_index}] =");
indent!(self);
for subtype in subtypes {
add_whitespace!(self);
add!(self, "{subtype},\n");
}
dedent!(self);
}
dedent!(self);
add_line!(self, "}};");
add_line!(self, "");
}
fn add_lex_function(&mut self, name: &str, lex_table: LexTable) {
add_line!(
self,
@ -877,7 +739,7 @@ impl Generator {
&& chars.ranges().all(|r| {
let start = *r.start() as u32;
let end = *r.end() as u32;
end <= start + 1 && u16::try_from(end).is_ok()
end <= start + 1 && end <= u16::MAX as u32
})
{
leading_simple_transition_count += 1;
@ -965,7 +827,10 @@ impl Generator {
large_char_set_ix = Some(char_set_ix);
}
let line_break = format!("\n{}", " ".repeat(self.indent_level + 2));
let mut line_break = "\n".to_string();
for _ in 0..self.indent_level + 2 {
line_break.push_str(" ");
}
let has_positive_condition = large_char_set_ix.is_some() || !asserted_chars.is_empty();
let has_negative_condition = !negated_chars.is_empty();
@ -984,7 +849,7 @@ impl Generator {
// are not at the end of the file.
let check_eof = large_set.contains('\0');
if check_eof {
add!(self, "(!eof && ");
add!(self, "(!eof && ")
}
let char_set_info = &mut self.large_character_set_info[large_char_set_ix];
@ -992,7 +857,7 @@ impl Generator {
add!(
self,
"set_contains({}, {}, lookahead)",
char_set_info.constant_name,
&char_set_info.constant_name,
large_set.range_count(),
);
if check_eof {
@ -1057,6 +922,7 @@ impl Generator {
}
self.add_character(end);
add!(self, ")");
continue;
} else if end == start {
add!(self, "lookahead == ");
self.add_character(start);
@ -1107,7 +973,7 @@ impl Generator {
add_line!(
self,
"static const TSCharacterRange {}[] = {{",
"static TSCharacterRange {}[] = {{",
info.constant_name
);
@ -1142,66 +1008,25 @@ impl Generator {
}
}
fn add_lex_modes(&mut self) {
fn add_lex_modes_list(&mut self) {
add_line!(
self,
"static const {} ts_lex_modes[STATE_COUNT] = {{",
if self.abi_version >= ABI_VERSION_WITH_RESERVED_WORDS {
"TSLexerMode"
} else {
"TSLexMode"
}
"static const TSLexMode ts_lex_modes[STATE_COUNT] = {{"
);
indent!(self);
for (i, state) in self.parse_table.states.iter().enumerate() {
add_whitespace!(self);
add!(self, "[{i}] = {{");
if state.is_end_of_non_terminal_extra() {
add!(self, "(TSStateId)(-1),");
add_line!(self, "[{i}] = {{(TSStateId)(-1)}},");
} else if state.external_lex_state_id > 0 {
add_line!(
self,
"[{i}] = {{.lex_state = {}, .external_lex_state = {}}},",
state.lex_state_id,
state.external_lex_state_id
);
} else {
add!(self, ".lex_state = {}", state.lex_state_id);
if state.external_lex_state_id > 0 {
add!(
self,
", .external_lex_state = {}",
state.external_lex_state_id
);
}
if self.abi_version >= ABI_VERSION_WITH_RESERVED_WORDS {
let reserved_word_set_id = self.reserved_word_set_ids_by_parse_state[i];
if reserved_word_set_id != 0 {
add!(self, ", .reserved_word_set_id = {reserved_word_set_id}");
}
}
add_line!(self, "[{i}] = {{.lex_state = {}}},", state.lex_state_id);
}
add!(self, "}},\n");
}
dedent!(self);
add_line!(self, "}};");
add_line!(self, "");
}
fn add_reserved_word_sets(&mut self) {
add_line!(
self,
"static const TSSymbol ts_reserved_words[{}][MAX_RESERVED_WORD_SET_SIZE] = {{",
self.reserved_word_sets.len(),
);
indent!(self);
for (id, set) in self.reserved_word_sets.iter().enumerate() {
if id == 0 {
continue;
}
add_line!(self, "[{id}] = {{");
indent!(self);
for token in set.iter() {
add_line!(self, "{},", self.symbol_ids[&token]);
}
dedent!(self);
add_line!(self, "}},");
}
dedent!(self);
add_line!(self, "}};");
@ -1255,7 +1080,7 @@ impl Generator {
indent!(self);
for i in 0..self.parse_table.external_lex_states.len() {
if !self.parse_table.external_lex_states[i].is_empty() {
add_line!(self, "[{i}] = {{");
add_line!(self, "[{}] = {{", i);
indent!(self);
for token in self.parse_table.external_lex_states[i].iter() {
add_line!(
@ -1277,7 +1102,6 @@ impl Generator {
let mut parse_table_entries = HashMap::new();
let mut next_parse_action_list_index = 0;
// Parse action lists zero is for the default value, when a symbol is not valid.
self.get_parse_action_list_id(
&ParseTableEntry {
actions: Vec::new(),
@ -1303,7 +1127,7 @@ impl Generator {
.enumerate()
.take(self.large_state_count)
{
add_line!(self, "[STATE({i})] = {{");
add_line!(self, "[{i}] = {{");
indent!(self);
// Ensure the entries are in a deterministic order, since they are
@ -1335,11 +1159,9 @@ impl Generator {
);
add_line!(self, "[{}] = ACTIONS({entry_id}),", self.symbol_ids[symbol]);
}
dedent!(self);
add_line!(self, "}},");
}
dedent!(self);
add_line!(self, "}};");
add_line!(self, "");
@ -1348,16 +1170,11 @@ impl Generator {
add_line!(self, "static const uint16_t ts_small_parse_table[] = {{");
indent!(self);
let mut next_table_index = 0;
let mut small_state_indices = Vec::with_capacity(
self.parse_table
.states
.len()
.saturating_sub(self.large_state_count),
);
let mut index = 0;
let mut small_state_indices = Vec::new();
let mut symbols_by_value = HashMap::<(usize, SymbolType), Vec<Symbol>>::new();
for state in self.parse_table.states.iter().skip(self.large_state_count) {
small_state_indices.push(next_table_index);
small_state_indices.push(index);
symbols_by_value.clear();
terminal_entries.clear();
@ -1396,16 +1213,10 @@ impl Generator {
(symbols.len(), *kind, *value, symbols[0])
});
add_line!(
self,
"[{next_table_index}] = {},",
values_with_symbols.len()
);
add_line!(self, "[{index}] = {},", values_with_symbols.len());
indent!(self);
next_table_index += 1;
for ((value, kind), symbols) in &mut values_with_symbols {
next_table_index += 2 + symbols.len();
if *kind == SymbolType::NonTerminal {
add_line!(self, "STATE({value}), {},", symbols.len());
} else {
@ -1421,6 +1232,11 @@ impl Generator {
}
dedent!(self);
index += 1 + values_with_symbols
.iter()
.map(|(_, symbols)| 2 + symbols.len())
.sum::<usize>();
}
dedent!(self);
@ -1549,7 +1365,7 @@ impl Generator {
indent!(self);
add_line!(self, "static const TSLanguage language = {{");
indent!(self);
add_line!(self, ".abi_version = LANGUAGE_VERSION,");
add_line!(self, ".version = LANGUAGE_VERSION,");
// Quantities
add_line!(self, ".symbol_count = SYMBOL_COUNT,");
@ -1559,9 +1375,6 @@ impl Generator {
add_line!(self, ".state_count = STATE_COUNT,");
add_line!(self, ".large_state_count = LARGE_STATE_COUNT,");
add_line!(self, ".production_id_count = PRODUCTION_ID_COUNT,");
if self.abi_version >= ABI_VERSION_WITH_RESERVED_WORDS {
add_line!(self, ".supertype_count = SUPERTYPE_COUNT,");
}
add_line!(self, ".field_count = FIELD_COUNT,");
add_line!(
self,
@ -1583,11 +1396,6 @@ impl Generator {
add_line!(self, ".field_map_slices = ts_field_map_slices,");
add_line!(self, ".field_map_entries = ts_field_map_entries,");
}
if !self.supertype_map.is_empty() && self.abi_version >= ABI_VERSION_WITH_RESERVED_WORDS {
add_line!(self, ".supertype_map_slices = ts_supertype_map_slices,");
add_line!(self, ".supertype_map_entries = ts_supertype_map_entries,");
add_line!(self, ".supertype_symbols = ts_supertype_symbols,");
}
add_line!(self, ".symbol_metadata = ts_symbol_metadata,");
add_line!(self, ".public_symbol_map = ts_symbol_map,");
add_line!(self, ".alias_map = ts_non_terminal_alias_map,");
@ -1596,9 +1404,9 @@ impl Generator {
}
// Lexing
add_line!(self, ".lex_modes = (const void*)ts_lex_modes,");
add_line!(self, ".lex_modes = ts_lex_modes,");
add_line!(self, ".lex_fn = ts_lex,");
if let Some(keyword_capture_token) = self.syntax_grammar.word_token {
if let Some(keyword_capture_token) = self.keyword_capture_token {
add_line!(self, ".keyword_lex_fn = ts_lex_keywords,");
add_line!(
self,
@ -1621,42 +1429,8 @@ impl Generator {
add_line!(self, "}},");
}
add_line!(self, ".primary_state_ids = ts_primary_state_ids,");
if self.abi_version >= ABI_VERSION_WITH_RESERVED_WORDS {
add_line!(self, ".name = \"{}\",", self.language_name);
if self.reserved_word_sets.len() > 1 {
add_line!(self, ".reserved_words = &ts_reserved_words[0][0],");
}
add_line!(
self,
".max_reserved_word_set_size = {},",
self.reserved_word_sets
.iter()
.map(TokenSet::len)
.max()
.unwrap()
);
let Some(metadata) = &self.metadata else {
panic!(
indoc! {"
Metadata is required to generate ABI version {}.
This means that your grammar doesn't have a tree-sitter.json config file with an appropriate version field in the metadata table.
"},
self.abi_version
);
};
add_line!(self, ".metadata = {{");
indent!(self);
add_line!(self, ".major_version = {},", metadata.major_version);
add_line!(self, ".minor_version = {},", metadata.minor_version);
add_line!(self, ".patch_version = {},", metadata.patch_version);
dedent!(self);
add_line!(self, "}},");
if self.abi_version >= ABI_VERSION_WITH_PRIMARY_STATES {
add_line!(self, ".primary_state_ids = ts_primary_state_ids,");
}
dedent!(self);
@ -1758,23 +1532,6 @@ impl Generator {
}
}
fn symbols_for_alias(&self, alias: &Alias) -> Vec<Symbol> {
self.parse_table
.symbols
.iter()
.copied()
.filter(move |symbol| {
self.default_aliases.get(symbol).map_or_else(
|| {
let (name, kind) = self.metadata_for_symbol(*symbol);
name == alias.value && kind == alias.kind()
},
|default_alias| default_alias == alias,
)
})
.collect()
}
fn sanitize_identifier(&self, name: &str) -> String {
let mut result = String::with_capacity(name.len());
for c in name.chars() {
@ -1850,11 +1607,11 @@ impl Generator {
'\u{007F}' => "DEL",
'\u{FEFF}' => "BOM",
'\u{0080}'..='\u{FFFF}' => {
write!(result, "u{:04x}", c as u32).unwrap();
result.push_str(&format!("u{:04x}", c as u32));
break 'special_chars;
}
'\u{10000}'..='\u{10FFFF}' => {
write!(result, "U{:08x}", c as u32).unwrap();
result.push_str(&format!("U{:08x}", c as u32));
break 'special_chars;
}
'0'..='9' | 'a'..='z' | 'A'..='Z' | '_' => unreachable!(),
@ -1885,9 +1642,11 @@ impl Generator {
'\r' => result += "\\r",
'\t' => result += "\\t",
'\0' => result += "\\0",
'\u{0001}'..='\u{001f}' => write!(result, "\\x{:02x}", c as u32).unwrap(),
'\u{007F}'..='\u{FFFF}' => write!(result, "\\u{:04x}", c as u32).unwrap(),
'\u{10000}'..='\u{10FFFF}' => write!(result, "\\U{:08x}", c as u32).unwrap(),
'\u{0001}'..='\u{001f}' => result += &format!("\\x{:02x}", c as u32),
'\u{007F}'..='\u{FFFF}' => result += &format!("\\u{:04x}", c as u32),
'\u{10000}'..='\u{10FFFF}' => {
result.push_str(&format!("\\U{:08x}", c as u32));
}
_ => result.push(c),
}
}
@ -1904,7 +1663,7 @@ impl Generator {
'\r' => add!(self, "'\\r'"),
_ => {
if c == '\0' {
add!(self, "0");
add!(self, "0")
} else if c == ' ' || c.is_ascii_graphic() {
add!(self, "'{c}'");
} else {
@ -1940,8 +1699,6 @@ pub fn render_c_code(
lexical_grammar: LexicalGrammar,
default_aliases: AliasMap,
abi_version: usize,
semantic_version: Option<(u8, u8, u8)>,
supertype_symbol_map: BTreeMap<Symbol, Vec<ChildType>>,
) -> String {
assert!(
(ABI_VERSION_MIN..=ABI_VERSION_MAX).contains(&abi_version),
@ -1949,23 +1706,26 @@ pub fn render_c_code(
);
Generator {
buffer: String::new(),
indent_level: 0,
language_name: name.to_string(),
large_state_count: 0,
parse_table: tables.parse_table,
main_lex_table: tables.main_lex_table,
keyword_lex_table: tables.keyword_lex_table,
keyword_capture_token: tables.word_token,
large_character_sets: tables.large_character_sets,
large_character_set_info: Vec::new(),
syntax_grammar,
lexical_grammar,
default_aliases,
symbol_ids: HashMap::new(),
symbol_order: HashMap::new(),
alias_ids: HashMap::new(),
symbol_map: HashMap::new(),
unique_aliases: Vec::new(),
field_names: Vec::new(),
abi_version,
metadata: semantic_version.map(|(major_version, minor_version, patch_version)| Metadata {
major_version,
minor_version,
patch_version,
}),
supertype_symbol_map,
..Default::default()
}
.generate()
}

View file

@ -1,11 +1,10 @@
use std::{collections::BTreeMap, fmt};
use std::{collections::HashMap, fmt};
use serde::Serialize;
use smallbitvec::SmallBitVec;
use super::grammars::VariableType;
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize)]
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum SymbolType {
External,
End,
@ -14,19 +13,19 @@ pub enum SymbolType {
NonTerminal,
}
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize)]
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum Associativity {
Left,
Right,
}
#[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize)]
#[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct Alias {
pub value: String,
pub is_named: bool,
}
#[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Default, Serialize)]
#[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Default)]
pub enum Precedence {
#[default]
None,
@ -34,50 +33,48 @@ pub enum Precedence {
Name(String),
}
pub type AliasMap = BTreeMap<Symbol, Alias>;
pub type AliasMap = HashMap<Symbol, Alias>;
#[derive(Clone, Debug, Default, PartialEq, Eq, Hash, Serialize)]
#[derive(Clone, Debug, Default, PartialEq, Eq, Hash)]
pub struct MetadataParams {
pub precedence: Precedence,
pub dynamic_precedence: i32,
pub associativity: Option<Associativity>,
pub is_token: bool,
pub is_string: bool,
pub is_active: bool,
pub is_main_token: bool,
pub alias: Option<Alias>,
pub field_name: Option<String>,
}
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize)]
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct Symbol {
pub kind: SymbolType,
pub index: usize,
}
#[derive(Clone, Debug, PartialEq, Eq, Hash, Serialize)]
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub enum Rule {
Blank,
String(String),
Pattern(String, String),
NamedSymbol(String),
Symbol(Symbol),
Choice(Vec<Self>),
Choice(Vec<Rule>),
Metadata {
params: MetadataParams,
rule: Box<Self>,
},
Repeat(Box<Self>),
Seq(Vec<Self>),
Reserved {
rule: Box<Self>,
context_name: String,
rule: Box<Rule>,
},
Repeat(Box<Rule>),
Seq(Vec<Rule>),
}
// Because tokens are represented as small (~400 max) unsigned integers,
// sets of tokens can be efficiently represented as bit vectors with each
// index corresponding to a token, and each value representing whether or not
// the token is present in the set.
#[derive(Default, Clone, PartialEq, Eq, Hash)]
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct TokenSet {
terminal_bits: SmallBitVec,
external_bits: SmallBitVec,
@ -85,32 +82,6 @@ pub struct TokenSet {
end_of_nonterminal_extra: bool,
}
impl fmt::Debug for TokenSet {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_list().entries(self.iter()).finish()
}
}
impl PartialOrd for TokenSet {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for TokenSet {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.terminal_bits
.iter()
.cmp(other.terminal_bits.iter())
.then_with(|| self.external_bits.iter().cmp(other.external_bits.iter()))
.then_with(|| self.eof.cmp(&other.eof))
.then_with(|| {
self.end_of_nonterminal_extra
.cmp(&other.end_of_nonterminal_extra)
})
}
}
impl Rule {
pub fn field(name: String, content: Self) -> Self {
add_metadata(content, move |params| {
@ -175,21 +146,9 @@ impl Rule {
Self::Choice(elements)
}
pub const fn seq(rules: Vec<Self>) -> Self {
pub fn seq(rules: Vec<Self>) -> Self {
Self::Seq(rules)
}
pub fn is_empty(&self) -> bool {
match self {
Self::Blank | Self::Pattern(..) | Self::NamedSymbol(_) | Self::Symbol(_) => false,
Self::String(string) => string.is_empty(),
Self::Metadata { rule, .. } | Self::Repeat(rule) | Self::Reserved { rule, .. } => {
rule.is_empty()
}
Self::Choice(rules) => rules.iter().any(Self::is_empty),
Self::Seq(rules) => rules.iter().all(Self::is_empty),
}
}
}
impl Alias {
@ -306,14 +265,14 @@ impl Symbol {
}
impl From<Symbol> for Rule {
#[must_use]
fn from(symbol: Symbol) -> Self {
Self::Symbol(symbol)
}
}
impl TokenSet {
#[must_use]
pub const fn new() -> Self {
pub fn new() -> Self {
Self {
terminal_bits: SmallBitVec::new(),
external_bits: SmallBitVec::new(),
@ -424,9 +383,6 @@ impl TokenSet {
};
if other.index < vec.len() && vec[other.index] {
vec.set(other.index, false);
while vec.last() == Some(false) {
vec.pop();
}
return true;
}
false
@ -439,13 +395,6 @@ impl TokenSet {
&& !self.external_bits.iter().any(|a| a)
}
pub fn len(&self) -> usize {
self.eof as usize
+ self.end_of_nonterminal_extra as usize
+ self.terminal_bits.iter().filter(|b| *b).count()
+ self.external_bits.iter().filter(|b| *b).count()
}
pub fn insert_all_terminals(&mut self, other: &Self) -> bool {
let mut result = false;
if other.terminal_bits.len() > self.terminal_bits.len() {

View file

@ -47,7 +47,6 @@ pub struct ParseState {
pub id: ParseStateId,
pub terminal_entries: IndexMap<Symbol, ParseTableEntry, BuildHasherDefault<FxHasher>>,
pub nonterminal_entries: IndexMap<Symbol, GotoAction, BuildHasherDefault<FxHasher>>,
pub reserved_words: TokenSet,
pub lex_state_id: usize,
pub external_lex_state_id: usize,
pub core_id: usize,
@ -65,7 +64,7 @@ pub struct ProductionInfo {
pub field_map: BTreeMap<String, Vec<FieldLocation>>,
}
#[derive(Debug, Default, PartialEq, Eq)]
#[derive(Debug, PartialEq, Eq)]
pub struct ParseTable {
pub states: Vec<ParseState>,
pub symbols: Vec<Symbol>,
@ -93,7 +92,6 @@ pub struct LexTable {
}
impl ParseTableEntry {
#[must_use]
pub const fn new() -> Self {
Self {
reusable: true,

View file

@ -3,15 +3,11 @@ root = true
[*]
charset = utf-8
[*.{json,toml,yml,gyp,xml}]
[*.{json,toml,yml,gyp}]
indent_style = space
indent_size = 2
[*.{js,ts}]
indent_style = space
indent_size = 2
[*.scm]
[*.js]
indent_style = space
indent_size = 2
@ -31,10 +27,6 @@ indent_size = 4
indent_style = space
indent_size = 4
[*.java]
indent_style = space
indent_size = 4
[*.go]
indent_style = tab
indent_size = 8
@ -45,6 +37,3 @@ indent_size = 8
[parser.c]
indent_size = 2
[{alloc,array,parser}.h]
indent_size = 2

View file

@ -0,0 +1,11 @@
prefix=@PREFIX@
libdir=@LIBDIR@
includedir=@INCLUDEDIR@
Name: tree-sitter-PARSER_NAME
Description: CAMEL_PARSER_NAME grammar for tree-sitter
URL: @URL@
Version: @VERSION@
Requires: @REQUIRES@
Libs: -L${libdir} @ADDITIONAL_LIBS@ -ltree-sitter-PARSER_NAME
Cflags: -I${includedir}

View file

@ -0,0 +1,42 @@
"""CAMEL_PARSER_NAME grammar for tree-sitter"""
from importlib.resources import files as _files
from ._binding import language
def _get_query(name, file):
query = _files(f"{__package__}.queries") / file
globals()[name] = query.read_text()
return globals()[name]
def __getattr__(name):
# NOTE: uncomment these to include any queries that this grammar contains:
# if name == "HIGHLIGHTS_QUERY":
# return _get_query("HIGHLIGHTS_QUERY", "highlights.scm")
# if name == "INJECTIONS_QUERY":
# return _get_query("INJECTIONS_QUERY", "injections.scm")
# if name == "LOCALS_QUERY":
# return _get_query("LOCALS_QUERY", "locals.scm")
# if name == "TAGS_QUERY":
# return _get_query("TAGS_QUERY", "tags.scm")
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
__all__ = [
"language",
# "HIGHLIGHTS_QUERY",
# "INJECTIONS_QUERY",
# "LOCALS_QUERY",
# "TAGS_QUERY",
]
def __dir__():
return sorted(__all__ + [
"__all__", "__builtins__", "__cached__", "__doc__", "__file__",
"__loader__", "__name__", "__package__", "__path__", "__spec__",
])

View file

@ -0,0 +1,10 @@
from typing import Final
# NOTE: uncomment these to include any queries that this grammar contains:
# HIGHLIGHTS_QUERY: Final[str]
# INJECTIONS_QUERY: Final[str]
# LOCALS_QUERY: Final[str]
# TAGS_QUERY: Final[str]
def language() -> object: ...

View file

@ -0,0 +1,26 @@
[package]
name = "tree-sitter-PARSER_NAME"
description = "CAMEL_PARSER_NAME grammar for tree-sitter"
version = "0.0.1"
license = "MIT"
readme = "README.md"
keywords = ["incremental", "parsing", "tree-sitter", "PARSER_NAME"]
categories = ["parsing", "text-editors"]
repository = "https://github.com/tree-sitter/tree-sitter-PARSER_NAME"
edition = "2021"
autoexamples = false
build = "bindings/rust/build.rs"
include = ["bindings/rust/*", "grammar.js", "queries/*", "src/*"]
[lib]
path = "bindings/rust/lib.rs"
[dependencies]
tree-sitter-language = "0.1"
[build-dependencies]
cc = "1.0.87"
[dev-dependencies]
tree-sitter = "0.23"

Some files were not shown because too many files have changed in this diff Show more