From 8dfed40466a7c49dfcd172d611af8cad5f876fe5 Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Fri, 11 Feb 2022 11:17:18 -0500 Subject: [PATCH 01/13] Describe naming conventions for syntax captures. --- docs/section-2-using-parsers.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/section-2-using-parsers.md b/docs/section-2-using-parsers.md index 0d22f251..6a891458 100644 --- a/docs/section-2-using-parsers.md +++ b/docs/section-2-using-parsers.md @@ -852,3 +852,20 @@ Example: } } ``` + +## Capture Naming Conventions + +Applications using Tree-sitter often need to use queries and captures to categorize and label different syntactic nodes, such as functions, built-ins, operators, and variables. We recommend using a reverse-DNS-style notation for these captures, and provide guidelines below for naming captures of a given syntax node. User applications may extend (or only recognize a subset of) these capture names, but we recommend standardizing on the names below. + +| Category | Tag | +|--------------------------|-----------------------------| +| Class definitions | `@definition.class` | +| Function definitions | `@definition.function` | +| Interface definitions | `@definition.interface` | +| Method definitions | `@definition.method` | +| Module definitions | `@definition.module` | +| Function/method calls | `@reference.call` | +| Class reference | `@reference.class` | +| Interface implementation | `@reference.implementation` | + +To communicate the associated identifier inside one of these syntactic classes, capture the identifier within as `@name`. From 302c8b5305279636c0fcf0579cbc066558f4e5be Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Fri, 11 Feb 2022 11:25:11 -0500 Subject: [PATCH 02/13] Move this inside the query section. --- docs/section-2-using-parsers.md | 34 ++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/section-2-using-parsers.md b/docs/section-2-using-parsers.md index 6a891458..8bc5f868 100644 --- a/docs/section-2-using-parsers.md +++ b/docs/section-2-using-parsers.md @@ -711,6 +711,23 @@ bool ts_query_cursor_next_match(TSQueryCursor *, TSQueryMatch *match); This function will return `false` when there are no more matches. Otherwise, it will populate the `match` with data about which pattern matched and which nodes were captured. +### Capture Naming Conventions + +Applications using Tree-sitter often need to use queries and captures to categorize and label different syntactic nodes, such as functions, built-ins, operators, and variables. We recommend using a reverse-DNS-style notation for these captures, and provide guidelines below for naming captures of a given syntax node. User applications may extend (or only recognize a subset of) these capture names, but we recommend standardizing on the names below. + +| Category | Tag | +|--------------------------|-----------------------------| +| Class definitions | `@definition.class` | +| Function definitions | `@definition.function` | +| Interface definitions | `@definition.interface` | +| Method definitions | `@definition.method` | +| Module definitions | `@definition.module` | +| Function/method calls | `@reference.call` | +| Class reference | `@reference.class` | +| Interface implementation | `@reference.implementation` | + +To communicate the associated identifier inside one of these syntactic classes, capture the identifier within as `@name`. + ## Static Node Types In languages with static typing, it can be helpful for syntax trees to provide specific type information about individual syntax nodes. Tree-sitter makes this information available via a generated file called `node-types.json`. This _node types_ file provides structured data about every possible syntax node in a grammar. @@ -852,20 +869,3 @@ Example: } } ``` - -## Capture Naming Conventions - -Applications using Tree-sitter often need to use queries and captures to categorize and label different syntactic nodes, such as functions, built-ins, operators, and variables. We recommend using a reverse-DNS-style notation for these captures, and provide guidelines below for naming captures of a given syntax node. User applications may extend (or only recognize a subset of) these capture names, but we recommend standardizing on the names below. - -| Category | Tag | -|--------------------------|-----------------------------| -| Class definitions | `@definition.class` | -| Function definitions | `@definition.function` | -| Interface definitions | `@definition.interface` | -| Method definitions | `@definition.method` | -| Module definitions | `@definition.module` | -| Function/method calls | `@reference.call` | -| Class reference | `@reference.class` | -| Interface implementation | `@reference.implementation` | - -To communicate the associated identifier inside one of these syntactic classes, capture the identifier within as `@name`. From 88822bd3fc8c9a40d5a9aef0d7ecde385124a3c8 Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Fri, 11 Feb 2022 15:25:50 -0500 Subject: [PATCH 03/13] Move this to its own page. --- docs/section-2-using-parsers.md | 17 ------------- docs/section-8-code-navigation-systems.md | 29 +++++++++++++++++++++++ 2 files changed, 29 insertions(+), 17 deletions(-) create mode 100644 docs/section-8-code-navigation-systems.md diff --git a/docs/section-2-using-parsers.md b/docs/section-2-using-parsers.md index 8bc5f868..0d22f251 100644 --- a/docs/section-2-using-parsers.md +++ b/docs/section-2-using-parsers.md @@ -711,23 +711,6 @@ bool ts_query_cursor_next_match(TSQueryCursor *, TSQueryMatch *match); This function will return `false` when there are no more matches. Otherwise, it will populate the `match` with data about which pattern matched and which nodes were captured. -### Capture Naming Conventions - -Applications using Tree-sitter often need to use queries and captures to categorize and label different syntactic nodes, such as functions, built-ins, operators, and variables. We recommend using a reverse-DNS-style notation for these captures, and provide guidelines below for naming captures of a given syntax node. User applications may extend (or only recognize a subset of) these capture names, but we recommend standardizing on the names below. - -| Category | Tag | -|--------------------------|-----------------------------| -| Class definitions | `@definition.class` | -| Function definitions | `@definition.function` | -| Interface definitions | `@definition.interface` | -| Method definitions | `@definition.method` | -| Module definitions | `@definition.module` | -| Function/method calls | `@reference.call` | -| Class reference | `@reference.class` | -| Interface implementation | `@reference.implementation` | - -To communicate the associated identifier inside one of these syntactic classes, capture the identifier within as `@name`. - ## Static Node Types In languages with static typing, it can be helpful for syntax trees to provide specific type information about individual syntax nodes. Tree-sitter makes this information available via a generated file called `node-types.json`. This _node types_ file provides structured data about every possible syntax node in a grammar. diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md new file mode 100644 index 00000000..3aec22c6 --- /dev/null +++ b/docs/section-8-code-navigation-systems.md @@ -0,0 +1,29 @@ +--- +title: Code Navigation Systems +permalink: code-navigation-systems +--- + +# Code Navigation Systems + +Tree-sitter can be used in conjunction with its [tree query language](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) as a part of code navigation systems. An example of such a system can be seen in the `tree-sitter tag` command, which emits a textual dump of the interesting syntactic nodes in its file argument. This document exists to provide guidelines on the design and use of tree-sitter concepts to implement such systems. + +## Tagging and captures + +Code navigation systems using Tree-sitter need to use queries and captures to categorize and label different syntactic nodes, such as functions, built-ins, operators, and variables. A reverse-DNS-style notation is recommendedfor these captures, and provide guidelines below for naming captures of a given syntax node. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. + +| Category | Tag | +|--------------------------|-----------------------------| +| Class definitions | `@definition.class` | +| Function definitions | `@definition.function` | +| Interface definitions | `@definition.interface` | +| Method definitions | `@definition.method` | +| Module definitions | `@definition.module` | +| Function/method calls | `@reference.call` | +| Class reference | `@reference.class` | +| Interface implementation | `@reference.implementation` | + +To communicate the associated identifier inside one of these syntactic classes, capture the identifier within as `@name`. + +## `tree-sitter graph` + +Coming soon! From f41e13f5da1c6d30a6ede65b09bf3cb2f353bf50 Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Fri, 11 Feb 2022 15:41:53 -0500 Subject: [PATCH 04/13] Spacing and word choice. --- docs/section-8-code-navigation-systems.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index 3aec22c6..6067dc3a 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -9,7 +9,7 @@ Tree-sitter can be used in conjunction with its [tree query language](https://tr ## Tagging and captures -Code navigation systems using Tree-sitter need to use queries and captures to categorize and label different syntactic nodes, such as functions, built-ins, operators, and variables. A reverse-DNS-style notation is recommendedfor these captures, and provide guidelines below for naming captures of a given syntax node. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. +Code navigation systems using Tree-sitter need to use queries and captures to categorize and label different syntactic nodes, such as functions, built-ins, operators, and variables. A reverse-DNS-style notation is recommended for these captures, and provide guidelines below for naming captures of a given syntax node. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. | Category | Tag | |--------------------------|-----------------------------| @@ -26,4 +26,4 @@ To communicate the associated identifier inside one of these syntactic classes, ## `tree-sitter graph` -Coming soon! +Documentation forthcoming. From 70077b8205bcbe7c9da6f30131df2883b5efef4e Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Thu, 17 Feb 2022 14:00:34 -0500 Subject: [PATCH 05/13] Incorporate @dcreager's excellent suggestions. --- docs/section-8-code-navigation-systems.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index 6067dc3a..0b17136c 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -9,7 +9,11 @@ Tree-sitter can be used in conjunction with its [tree query language](https://tr ## Tagging and captures -Code navigation systems using Tree-sitter need to use queries and captures to categorize and label different syntactic nodes, such as functions, built-ins, operators, and variables. A reverse-DNS-style notation is recommended for these captures, and provide guidelines below for naming captures of a given syntax node. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. +*Tagging* is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name. + +The essence of a given tag lies in two pieces of data: the _kind_ of entity that is matched (usually a definition or a reference) and the _role_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier. + +The below table describes a standard vocabulary for kinds and roles during the tagging process. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. Language communities that write tagging rules using these names can work out-of-the-box with a steadily increasing set of analysis tools | Category | Tag | |--------------------------|-----------------------------| From 1fbace136d6d44336958fcabf809f8b901dba73f Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Thu, 17 Feb 2022 17:14:53 -0500 Subject: [PATCH 06/13] Add examples. --- docs/section-8-code-navigation-systems.md | 63 ++++++++++++++++++++--- 1 file changed, 55 insertions(+), 8 deletions(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index 0b17136c..3b9efea6 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -5,15 +5,66 @@ permalink: code-navigation-systems # Code Navigation Systems -Tree-sitter can be used in conjunction with its [tree query language](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) as a part of code navigation systems. An example of such a system can be seen in the `tree-sitter tag` command, which emits a textual dump of the interesting syntactic nodes in its file argument. This document exists to provide guidelines on the design and use of tree-sitter concepts to implement such systems. +Tree-sitter can be used in conjunction with its [tree query language](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) as a part of code navigation systems. An example of such a system can be seen in the `tree-sitter tag` command, which emits a textual dump of the interesting syntactic nodes in its file argument. A notable application of this is GitHub's support for [search-based code navigation](https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation). This document exists to describe how to extend the ## Tagging and captures *Tagging* is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name. -The essence of a given tag lies in two pieces of data: the _kind_ of entity that is matched (usually a definition or a reference) and the _role_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier. +You can use the `tree-sitter tag` command to test out a given set of tags -The below table describes a standard vocabulary for kinds and roles during the tagging process. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. Language communities that write tagging rules using these names can work out-of-the-box with a steadily increasing set of analysis tools +The essence of a given tag lies in two pieces of data: the _kind_ of entity that is matched (usually a definition or a reference) and the _role_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@kind.role` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier. +' +You may optionally include a capture named `@doc `to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, `#select-adjacent` and `#strip` that are convenient for removing comment syntax from a docstring. `#strip` takes a capture as its first argument and a regular expression, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. `#select-adjacent`, when passed two capture names, filters the text associated with the first capture so that only text adjacent to the second capture is preserved. This can be useful when writing queries that would otherwise include too much information in matched comments. + +## Examples + +An [example query](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/queries/tags.scm#L4-L5) follows, one that recognizes Python function definitions and captures their declared name. The `function_definition` syntax node is defined in the [Python Tree-sitter grammar](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/grammar.js#L354). + +``` scheme +(function_definition + name: (identifier) @name) @definition.function +``` + +A more sophisticated query can be found in the [JavaScript Tree-sitter repository](https://github.com/tree-sitter/tree-sitter-javascript/blob/fdeb68ac8d2bd5a78b943528bb68ceda3aade2eb/queries/tags.scm#L63-L70): + +``` scheme +(assignment_expression + left: [ + (identifier) @name + (member_expression + property: (property_identifier) @name) + ] + right: [(arrow_function) (function)] +) @definition.function +``` + +An even more sophisticated query is in the [Ruby Tree-sitter repository](https://github.com/tree-sitter/tree-sitter-ruby/blob/1ebfdb288842dae5a9233e2509a135949023dd82/queries/tags.scm#L24-L43), which uses built-in functions to strip the Ruby comment character (`#`) from the docstrings associated with a class or singleton-class declaration, then selects only the docstrings adjacent to the node matched as `@definition.class`. + +``` scheme +( + (comment)* @doc + . + [ + (class + name: [ + (constant) @name + (scope_resolution + name: (_) @name) + ]) @definition.class + (singleton_class + value: [ + (constant) @name + (scope_resolution + name: (_) @name) + ]) @definition.class + ] + (#strip! @doc "^#\\s*") + (#select-adjacent! @doc @definition.class) +) +``` + +The below table describes a standard vocabulary for kinds and roles during the tagging process. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. Language communities that write tagging rules using these names can work out-of-the-box with a steadily increasing set of analysis tools. | Category | Tag | |--------------------------|-----------------------------| @@ -26,8 +77,4 @@ The below table describes a standard vocabulary for kinds and roles during the t | Class reference | `@reference.class` | | Interface implementation | `@reference.implementation` | -To communicate the associated identifier inside one of these syntactic classes, capture the identifier within as `@name`. - -## `tree-sitter graph` - -Documentation forthcoming. +By convention, tags for a given language are made available in a `queries/tags.scm `file in that language's repository. From 69a5f77eab6b46ac9c6a11ceae0ebdc9d482b6f1 Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Thu, 17 Feb 2022 17:34:15 -0500 Subject: [PATCH 07/13] Describe how to use tree-sitter tags as well. --- docs/section-8-code-navigation-systems.md | 34 ++++++++++++++++++----- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index 3b9efea6..7a259455 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -5,21 +5,19 @@ permalink: code-navigation-systems # Code Navigation Systems -Tree-sitter can be used in conjunction with its [tree query language](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) as a part of code navigation systems. An example of such a system can be seen in the `tree-sitter tag` command, which emits a textual dump of the interesting syntactic nodes in its file argument. A notable application of this is GitHub's support for [search-based code navigation](https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation). This document exists to describe how to extend the +Tree-sitter can be used in conjunction with its [tree query language](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) as a part of code navigation systems. An example of such a system can be seen in the `tree-sitter tag` command, which emits a textual dump of the interesting syntactic nodes in its file argument. A notable application of this is GitHub's support for [search-based code navigation](https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation). This document exists to describe how to integrate with such systems, and how to extend this functionality to any language with a Tree-sitter grammar. ## Tagging and captures *Tagging* is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name. -You can use the `tree-sitter tag` command to test out a given set of tags - The essence of a given tag lies in two pieces of data: the _kind_ of entity that is matched (usually a definition or a reference) and the _role_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@kind.role` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier. ' -You may optionally include a capture named `@doc `to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, `#select-adjacent` and `#strip` that are convenient for removing comment syntax from a docstring. `#strip` takes a capture as its first argument and a regular expression, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. `#select-adjacent`, when passed two capture names, filters the text associated with the first capture so that only text adjacent to the second capture is preserved. This can be useful when writing queries that would otherwise include too much information in matched comments. +You may optionally include a capture named `@doc` to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, `#select-adjacent!` and `#strip!` that are convenient for removing comment syntax from a docstring. `#strip!` takes a capture as its first argument and a regular expression, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. `#select-adjacent!`, when passed two capture names, filters the text associated with the first capture so that only text adjacent to the second capture is preserved. This can be useful when writing queries that would otherwise include too much information in matched comments. ## Examples -An [example query](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/queries/tags.scm#L4-L5) follows, one that recognizes Python function definitions and captures their declared name. The `function_definition` syntax node is defined in the [Python Tree-sitter grammar](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/grammar.js#L354). +This [query](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/queries/tags.scm#L4-L5) recognizes Python function definitions and captures their declared name. The `function_definition` syntax node is defined in the [Python Tree-sitter grammar](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/grammar.js#L354). ``` scheme (function_definition @@ -64,7 +62,7 @@ An even more sophisticated query is in the [Ruby Tree-sitter repository](https:/ ) ``` -The below table describes a standard vocabulary for kinds and roles during the tagging process. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. Language communities that write tagging rules using these names can work out-of-the-box with a steadily increasing set of analysis tools. +The below table describes a standard vocabulary for kinds and roles during the tagging process. New applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below. | Category | Tag | |--------------------------|-----------------------------| @@ -77,4 +75,26 @@ The below table describes a standard vocabulary for kinds and roles during the t | Class reference | `@reference.class` | | Interface implementation | `@reference.implementation` | -By convention, tags for a given language are made available in a `queries/tags.scm `file in that language's repository. +## Command-line invocation + +You can use the `tree-sitter tags` command to test out a tags query file. We can run this tool from within the Tree-sitter Ruby repository, over code in a file called `test.rb` + +``` ruby +module Foo + class Bar + def baz + end + end +end +``` + +Invoking `tree-sitter tags test.rb` produces the following console output: + +``` + test.rb + Foo | module def (0, 7) - (0, 10) `module Foo` + Bar | class def (1, 8) - (1, 11) `class Bar` + baz | method def (2, 8) - (2, 11) `def baz` +``` + +By convention, tags for a given language are made available in a `queries/tags.scm`file in that language's repository. From 4c602173456536a7b2223dab815bac90180f525a Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Thu, 17 Feb 2022 17:43:14 -0500 Subject: [PATCH 08/13] Flesh out output. --- docs/section-8-code-navigation-systems.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index 7a259455..e60d345f 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -77,24 +77,25 @@ The below table describes a standard vocabulary for kinds and roles during the t ## Command-line invocation -You can use the `tree-sitter tags` command to test out a tags query file. We can run this tool from within the Tree-sitter Ruby repository, over code in a file called `test.rb` +You can use the `tree-sitter tags` command to test out a tags query file, passing as arguments one or more files to tag. We can run this tool from within the Tree-sitter Ruby repository, over code in a file called `test.rb`: ``` ruby module Foo class Bar + # wow! def baz end end end ``` -Invoking `tree-sitter tags test.rb` produces the following console output: +Invoking `tree-sitter tags test.rb` produces the following console output, representing matched entities' name, role, location, first line, and docstring: ``` test.rb - Foo | module def (0, 7) - (0, 10) `module Foo` + Foo | module def (0, 7) - (0, 10) `module Foo` Bar | class def (1, 8) - (1, 11) `class Bar` - baz | method def (2, 8) - (2, 11) `def baz` + baz | method def (2, 8) - (2, 11) `def baz` "wow!" ``` -By convention, tags for a given language are made available in a `queries/tags.scm`file in that language's repository. +It is expected that tag queries for a given language are located at `queries/tags.scm` in that language's repository. From e1ac2e2648c7cb405c95881cef3d414aeded160e Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Thu, 17 Feb 2022 18:05:19 -0500 Subject: [PATCH 09/13] Better nomenclature. --- docs/section-8-code-navigation-systems.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index e60d345f..5d9aa933 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -11,7 +11,7 @@ Tree-sitter can be used in conjunction with its [tree query language](https://tr *Tagging* is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name. -The essence of a given tag lies in two pieces of data: the _kind_ of entity that is matched (usually a definition or a reference) and the _role_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@kind.role` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier. +The essence of a given tag lies in two pieces of data: the _role_ of the entity that is matched (i.e. whether it is a definition or a reference) and the _kind_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier. ' You may optionally include a capture named `@doc` to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, `#select-adjacent!` and `#strip!` that are convenient for removing comment syntax from a docstring. `#strip!` takes a capture as its first argument and a regular expression, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. `#select-adjacent!`, when passed two capture names, filters the text associated with the first capture so that only text adjacent to the second capture is preserved. This can be useful when writing queries that would otherwise include too much information in matched comments. From 48748ee33204ab08d616ae5e56552c7f04b55d99 Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Thu, 17 Feb 2022 18:05:50 -0500 Subject: [PATCH 10/13] Typo. --- docs/section-8-code-navigation-systems.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index 5d9aa933..ba2d2517 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -12,7 +12,7 @@ Tree-sitter can be used in conjunction with its [tree query language](https://tr *Tagging* is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name. The essence of a given tag lies in two pieces of data: the _role_ of the entity that is matched (i.e. whether it is a definition or a reference) and the _kind_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier. -' + You may optionally include a capture named `@doc` to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, `#select-adjacent!` and `#strip!` that are convenient for removing comment syntax from a docstring. `#strip!` takes a capture as its first argument and a regular expression, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. `#select-adjacent!`, when passed two capture names, filters the text associated with the first capture so that only text adjacent to the second capture is preserved. This can be useful when writing queries that would otherwise include too much information in matched comments. ## Examples From 65da86f16fbbf3d94c691d970c86029a370fb114 Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Thu, 17 Feb 2022 18:11:01 -0500 Subject: [PATCH 11/13] Missing plural here. --- docs/section-8-code-navigation-systems.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index ba2d2517..6bfcd0a4 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -5,7 +5,7 @@ permalink: code-navigation-systems # Code Navigation Systems -Tree-sitter can be used in conjunction with its [tree query language](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) as a part of code navigation systems. An example of such a system can be seen in the `tree-sitter tag` command, which emits a textual dump of the interesting syntactic nodes in its file argument. A notable application of this is GitHub's support for [search-based code navigation](https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation). This document exists to describe how to integrate with such systems, and how to extend this functionality to any language with a Tree-sitter grammar. +Tree-sitter can be used in conjunction with its [tree query language](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) as a part of code navigation systems. An example of such a system can be seen in the `tree-sitter tags` command, which emits a textual dump of the interesting syntactic nodes in its file argument. A notable application of this is GitHub's support for [search-based code navigation](https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation). This document exists to describe how to integrate with such systems, and how to extend this functionality to any language with a Tree-sitter grammar. ## Tagging and captures From 27019d117217f426ce05abbcc339d0492803cecc Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Thu, 17 Feb 2022 18:28:09 -0500 Subject: [PATCH 12/13] demonstrate that select-adjacent works --- docs/section-8-code-navigation-systems.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index 6bfcd0a4..31bd7d11 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -82,7 +82,9 @@ You can use the `tree-sitter tags` command to test out a tags query file, passin ``` ruby module Foo class Bar - # wow! + # won't be included + + # is adjacent, will be def baz end end @@ -95,7 +97,7 @@ Invoking `tree-sitter tags test.rb` produces the following console output, repre test.rb Foo | module def (0, 7) - (0, 10) `module Foo` Bar | class def (1, 8) - (1, 11) `class Bar` - baz | method def (2, 8) - (2, 11) `def baz` "wow!" + baz | method def (2, 8) - (2, 11) `def baz` "is adjacent, will be" ``` It is expected that tag queries for a given language are located at `queries/tags.scm` in that language's repository. From 764c8c88ca2fb100f40fdbfbae61ffd32585fc67 Mon Sep 17 00:00:00 2001 From: Patrick Thomson Date: Fri, 18 Feb 2022 09:24:04 -0500 Subject: [PATCH 13/13] last tweaks --- docs/section-8-code-navigation-systems.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/section-8-code-navigation-systems.md b/docs/section-8-code-navigation-systems.md index 31bd7d11..eb1c7dde 100644 --- a/docs/section-8-code-navigation-systems.md +++ b/docs/section-8-code-navigation-systems.md @@ -13,7 +13,7 @@ Tree-sitter can be used in conjunction with its [tree query language](https://tr The essence of a given tag lies in two pieces of data: the _role_ of the entity that is matched (i.e. whether it is a definition or a reference) and the _kind_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier. -You may optionally include a capture named `@doc` to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, `#select-adjacent!` and `#strip!` that are convenient for removing comment syntax from a docstring. `#strip!` takes a capture as its first argument and a regular expression, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. `#select-adjacent!`, when passed two capture names, filters the text associated with the first capture so that only text adjacent to the second capture is preserved. This can be useful when writing queries that would otherwise include too much information in matched comments. +You may optionally include a capture named `@doc` to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, `#select-adjacent!` and `#strip!` that are convenient for removing comment syntax from a docstring. `#strip!` takes a capture as its first argument and a regular expression as its second, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. `#select-adjacent!`, when passed two capture names, filters the text associated with the first capture so that only nodes adjacent to the second capture are preserved. This can be useful when writing queries that would otherwise include too much information in matched comments. ## Examples