# Basic Parsing ## Providing the Code In the example on the previous page, we parsed source code stored in a simple string using the `ts_parser_parse_string` function: ```c TSTree *ts_parser_parse_string( TSParser *self, const TSTree *old_tree, const char *string, uint32_t length ); ``` You may want to parse source code that's stored in a custom data structure, like a [piece table][piece table] or a [rope][rope]. In this case, you can use the more general `ts_parser_parse` function: ```c TSTree *ts_parser_parse( TSParser *self, const TSTree *old_tree, TSInput input ); ``` The `TSInput` structure lets you provide your own function for reading a chunk of text at a given byte offset and row/column position. The function can return text encoded in either UTF-8 or UTF-16. This interface allows you to efficiently parse text that is stored in your own data structure. ```c typedef struct { void *payload; const char *(*read)( void *payload, uint32_t byte_offset, TSPoint position, uint32_t *bytes_read ); TSInputEncoding encoding; TSDecodeFunction decode; } TSInput; ``` If you want to decode text that is not encoded in UTF-8 or UTF-16, you can set the `decode` field of the input to your function that will decode text. The signature of the `TSDecodeFunction` is as follows: ```c typedef uint32_t (*TSDecodeFunction)( const uint8_t *string, uint32_t length, int32_t *code_point ); ``` ```admonish attention The `TSInputEncoding` must be set to `TSInputEncodingCustom` for the `decode` function to be called. ``` The `string` argument is a pointer to the text to decode, which comes from the `read` function, and the `length` argument is the length of the `string`. The `code_point` argument is a pointer to an integer that represents the decoded code point, and should be written to in your `decode` callback. The function should return the number of bytes decoded. ## Syntax Nodes Tree-sitter provides a [DOM][dom]-style interface for inspecting syntax trees. A syntax node's _type_ is a string that indicates which grammar rule the node represents. ```c const char *ts_node_type(TSNode); ``` Syntax nodes store their position in the source code both in raw bytes and row/column coordinates. In a point, rows and columns are zero-based. The `row` field represents the number of newlines before a given position, while `column` represents the number of bytes between the position and beginning of the line. ```c uint32_t ts_node_start_byte(TSNode); uint32_t ts_node_end_byte(TSNode); typedef struct { uint32_t row; uint32_t column; } TSPoint; TSPoint ts_node_start_point(TSNode); TSPoint ts_node_end_point(TSNode); ``` ```admonish note A *newline* is considered to be a single line feed (`\n`) character. ``` ## Retrieving Nodes Every tree has a _root node_: ```c TSNode ts_tree_root_node(const TSTree *); ``` Once you have a node, you can access the node's children: ```c uint32_t ts_node_child_count(TSNode); TSNode ts_node_child(TSNode, uint32_t); ``` You can also access its siblings and parent: ```c TSNode ts_node_next_sibling(TSNode); TSNode ts_node_prev_sibling(TSNode); TSNode ts_node_parent(TSNode); ``` These methods may all return a _null node_ to indicate, for example, that a node does not _have_ a next sibling. You can check if a node is null: ```c bool ts_node_is_null(TSNode); ``` ## Named vs Anonymous Nodes Tree-sitter produces [_concrete_ syntax trees][cst] — trees that contain nodes for every individual token in the source code, including things like commas and parentheses. This is important for use-cases that deal with individual tokens, like [syntax highlighting][syntax highlighting]. But some types of code analysis are easier to perform using an [_abstract_ syntax tree][ast] — a tree in which the less important details have been removed. Tree-sitter's trees support these use cases by making a distinction between _named_ and _anonymous_ nodes. Consider a grammar rule like this: ```js if_statement: $ => seq("if", "(", $._expression, ")", $._statement); ``` A syntax node representing an `if_statement` in this language would have 5 children: the condition expression, the body statement, as well as the `if`, `(`, and `)` tokens. The expression and the statement would be marked as _named_ nodes, because they have been given explicit names in the grammar. But the `if`, `(`, and `)` nodes would _not_ be named nodes, because they are represented in the grammar as simple strings. You can check whether any given node is named: ```c bool ts_node_is_named(TSNode); ``` When traversing the tree, you can also choose to skip over anonymous nodes by using the `_named_` variants of all of the methods described above: ```c TSNode ts_node_named_child(TSNode, uint32_t); uint32_t ts_node_named_child_count(TSNode); TSNode ts_node_next_named_sibling(TSNode); TSNode ts_node_prev_named_sibling(TSNode); ``` If you use this group of methods, the syntax tree functions much like an abstract syntax tree. ## Node Field Names To make syntax nodes easier to analyze, many grammars assign unique _field names_ to particular child nodes. In the [creating parsers][using fields] section, it's explained how to do this in your own grammars. If a syntax node has fields, you can access its children using their field name: ```c TSNode ts_node_child_by_field_name( TSNode self, const char *field_name, uint32_t field_name_length ); ``` Fields also have numeric ids that you can use, if you want to avoid repeated string comparisons. You can convert between strings and ids using the `TSLanguage`: ```c uint32_t ts_language_field_count(const TSLanguage *); const char *ts_language_field_name_for_id(const TSLanguage *, TSFieldId); TSFieldId ts_language_field_id_for_name(const TSLanguage *, const char *, uint32_t); ``` The field ids can be used in place of the name: ```c TSNode ts_node_child_by_field_id(TSNode, TSFieldId); ``` [ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree [cst]: https://en.wikipedia.org/wiki/Parse_tree [dom]: https://en.wikipedia.org/wiki/Document_Object_Model [piece table]: [rope]: [syntax highlighting]: https://en.wikipedia.org/wiki/Syntax_highlighting [using fields]: ../creating-parsers/3-writing-the-grammar.md#using-fields