diff --git a/docs/_layouts/default.html b/docs/_layouts/default.html
index a764485b..a11327b3 100644
--- a/docs/_layouts/default.html
+++ b/docs/_layouts/default.html
@@ -32,6 +32,7 @@
               <div id="current-page-table-of-contents">
                 {% capture whitespace %}
                 {% assign min_header = 2 %}
+                {% assign max_header = 3 %}
                 {% assign nodes = content | split: "<h" %}
                 {% assign first_header = true %}
                 {% for node in nodes %}
@@ -41,7 +42,7 @@
 
                   {% assign header_level = node | replace: '"', '' | slice: 0, 1 | times: 1 %}
 
-                  {% if header_level < min_header or header_level > maxHeader %}
+                  {% if header_level < min_header or header_level > max_header %}
                     {% continue %}
                   {% endif %}
 
@@ -127,7 +128,7 @@
     }
   });
 
-  $('h1, h2, h3, h4, h5, h6').filter('[id]').each(function() {
+  $('h1, h2, h3').filter('[id]').each(function() {
     $(this).html('<a href="#'+$(this).attr('id')+'">' + $(this).text() + '</a>');
   });
 </script>
diff --git a/docs/section-3-creating-parsers.md b/docs/section-3-creating-parsers.md
index 268b034a..90411a55 100644
--- a/docs/section-3-creating-parsers.md
+++ b/docs/section-3-creating-parsers.md
@@ -211,12 +211,13 @@ The following is a complete list of built-in functions you can use to define Tre
 * **Tokens : `token(rule)`** - This function marks the given rule as producing only a single token. Tree-sitter's default is to treat each String or RegExp literal in the grammar as a separate token. Each token is matched separately by the lexer and returned as its own leaf node in the tree. The `token` function allows you to express a complex rule using the functions described above (rather than as a single regular expression) but still have Tree-sitter treat it as a single token.
 * **Aliases : `alias(rule, name)`** - This function causes the given rule to *appear* with an alternative name in the syntax tree. It is useful in cases where a language construct needs to be parsed differently in different contexts (and thus needs to be defined using multiple symbols), but should always *appear* as the same type of node.
 
-In addition to the `name` and `rules` fields, grammars have a few other public fields that influence the behavior of the parser.
+In addition to the `name` and `rules` fields, grammars have a few other optional public fields that influence the behavior of the parser.
 
 * `extras` - an array of tokens that may appear *anywhere* in the language. This is often used for whitespace and comments. The default for `extras` in `tree-sitter-cli` is to accept whitespace. To control whitespace explicitly, specify `extras=[]` in the grammar.
 * `inline` - an array of rule names that should be automatically *removed* from the grammar by replacing all of their usages with a copy of their definition. This is useful for rules that are used in multiple places but for which you *don't* want to create syntax tree nodes at runtime.
 * `conflicts` - an array of arrays of rule names. Each inner array represents a set of rules that's involved in an *LR(1) conflict* that is *intended to exist* in the grammar. When these conflicts occur at runtime, Tree-sitter will use the GLR algorithm to explore all of the possible interpretations. If *multiple* parses end up succeeding, Tree-sitter will pick the subtree rule with the highest *dynamic precedence*.
 * `externals` - an array of toen names which can be returned by an *external scanner*. External scanners allow you to write custom C code which runs during the lexing process in order to handle lexical rules (e.g. Python's indentation tokens) that cannot be described by regular expressions.
+* `word` - the name of a token that will match keywords for the purpose of the [keyword extraction](#keyword-extraction) optimization.
 
 ## Adjusting existing grammars
 
@@ -355,11 +356,81 @@ For an expression like `a * b * c`, it's not clear whether we mean `a * (b * c)`
 
 You may have noticed in the above examples that some of the grammar rule name like `_expression` and `_type` began with an underscore. Starting a rule's name with an underscore causes the rule to be *hidden* in the syntax tree. This is useful for rules like `_expression` in the grammars above, which always just wrap a single child node. If these nodes were not hidden, they would add substantial depth and noise to the syntax tree without making it any easier to understand.
 
-## Dealing with LR conflicts
+### Dealing with LR conflicts
 
-TODO
+...
 
+## Lexical Analysis
+
+Tree-sitter's parsing process is divided into two phases: parsing (which is described above) and [lexing](lexing) - the process of grouping individual characters into the language's fundamental *tokens*. There are a few important things to know about how Tree-sitter's lexing works.
+
+### Conflict Resolution
+
+Grammars often contain multiple tokens that can match the same characters. For example, a grammar might contain the tokens (`"if"` and `/[a-z]+/`). Tree-sitter differentiates between these conflicting tokens in a few ways:
+
+1. **Context-aware lexing** - Tree-sitter performs lexing on-demand, during the parsing process. At any given position in a source document, the lexer only tries to recognize tokens that are *valid* at that position in the document.
+
+2. **Longest-match** - If multiple valid tokens match the characters at a given position in a document, Tree-sitter will select the token that matches the [longest sequence of characters](longest-match).
+
+3. **Lexical Precedence** - When the precedence functions described [above](#using-the-grammar-dsl) are used within the `token` function, the given precedence values serve as instructions to the lexer. If there are two valid tokens that match the same sequence of characters, Tree-sitter will select the one with the higher precedence.
+
+### Keywords
+
+If your language has keywords which are matched by a rule (typically `identifier`), you can tell Tree-sitter about it with your grammar's `word` property.
+
+```js
+grammar({
+  word: $ => $.identifier,
+
+  rules: {
+    class_declaration: $ => seq(
+      'class',
+      $.identifier,
+      $.class_body
+    ),
+
+    break_statement: $ => seq('break', ';'),
+
+    continue_statement: $ => seq('continue', ';'),
+
+    identifier: $ => /[a-z]+/
+  }
+})
+```
+
+In this case, we're specifying `identifier` as our `word`. Tree-sitter will automatically find the set of terminals which are matched by `$.identifier`, and consider them keywords. Instead of generating a parser which scans for each keyword individually, Tree-sitter will generate a parser that tries to match the word rule (in this case, `identifier`), and checks to see if the matched word is the necessary keyword.
+
+This makes the set of parse states smaller, so the parser compiles faster.
+
+It *also changes behavior*. Consider this grammar:
+
+```js
+grammar({
+  rules: {
+    import: $ => seq(
+      'import',
+      $.identifier,
+      'as',
+      $.identifier
+    ),
+
+    identifier: $ => /[a-z]+/
+  }
+})
+```
+
+Without the `word` directive, the grammar matches this input:
+
+```
+import foo asbar
+```
+
+Which is probably not what you want. If we add `word: $ => $.identifier`, this will no longer parse. When we try to parse `'as'`, we will parse a word — which will be the identifier ``'asbar'``—and then compare it to `'as'`, correctly generating an error.
+
+[lexing]: https://en.wikipedia.org/wiki/Lexical_analysis
+[longest-match]: https://en.wikipedia.org/wiki/Maximal_munch
 [cst]: https://en.wikipedia.org/wiki/Parse_tree
+[dfa]: https://en.wikipedia.org/wiki/Deterministic_finite_automaton
 [non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
 [language-spec]: https://en.wikipedia.org/wiki/Programming_language_specification
 [glr-parsing]: https://en.wikipedia.org/wiki/GLR_parser
diff --git a/include/tree_sitter/compiler.h b/include/tree_sitter/compiler.h
index ca2a28f7..3db2f7ca 100644
--- a/include/tree_sitter/compiler.h
+++ b/include/tree_sitter/compiler.h
@@ -19,6 +19,7 @@ typedef enum {
   TSCompileErrorTypeEpsilonRule,
   TSCompileErrorTypeInvalidTokenContents,
   TSCompileErrorTypeInvalidRuleName,
+  TSCompileErrorTypeInvalidWordRule,
 } TSCompileErrorType;
 
 typedef struct {
diff --git a/src/compiler/build_tables/lex_table_builder.cc b/src/compiler/build_tables/lex_table_builder.cc
index 178cfb75..d0f363d1 100644
--- a/src/compiler/build_tables/lex_table_builder.cc
+++ b/src/compiler/build_tables/lex_table_builder.cc
@@ -49,6 +49,19 @@ using rules::Symbol;
 using rules::Metadata;
 using rules::Seq;
 
+enum ConflictStatus {
+  DoesNotMatch = 0,
+  MatchesShorterStringWithinSeparators = 1 << 0,
+  MatchesSameString = 1 << 1,
+  MatchesLongerString = 1 << 2,
+  MatchesLongerStringWithValidNextChar = 1 << 3,
+  CannotDistinguish = (
+    MatchesShorterStringWithinSeparators |
+    MatchesSameString |
+    MatchesLongerStringWithValidNextChar
+  ),
+};
+
 static const std::unordered_set<ParseStateId> EMPTY;
 
 bool CoincidentTokenIndex::contains(Symbol a, Symbol b) const {
@@ -65,14 +78,12 @@ const std::unordered_set<ParseStateId> &CoincidentTokenIndex::states_with(Symbol
   }
 }
 
-template <bool include_all>
-class CharacterAggregator {
+class StartingCharacterAggregator {
  public:
   void apply(const Rule &rule) {
     rule.match(
       [this](const Seq &sequence) {
         apply(*sequence.left);
-        if (include_all) apply(*sequence.right);
       },
 
       [this](const rules::Choice &rule) {
@@ -91,9 +102,6 @@ class CharacterAggregator {
   CharacterSet result;
 };
 
-using StartingCharacterAggregator = CharacterAggregator<false>;
-using AllCharacterAggregator = CharacterAggregator<true>;
-
 class LexTableBuilderImpl : public LexTableBuilder {
   LexTable main_lex_table;
   LexTable keyword_lex_table;
@@ -109,7 +117,7 @@ class LexTableBuilderImpl : public LexTableBuilder {
   vector<ConflictStatus> conflict_matrix;
   bool conflict_detection_mode;
   LookaheadSet keyword_symbols;
-  Symbol keyword_capture_token;
+  Symbol word_rule;
   char encoding_buffer[8];
 
  public:
@@ -125,7 +133,7 @@ class LexTableBuilderImpl : public LexTableBuilder {
       parse_table(parse_table),
       conflict_matrix(lexical_grammar.variables.size() * lexical_grammar.variables.size(), DoesNotMatch),
       conflict_detection_mode(false),
-      keyword_capture_token(rules::NONE()) {
+      word_rule(syntax_grammar.word_rule) {
 
     // Compute the possible separator rules and the set of separator characters that can occur
     // immediately after any token.
@@ -141,7 +149,6 @@ class LexTableBuilderImpl : public LexTableBuilder {
     // characters that can follow each token. Also identify all of the tokens that can be
     // considered 'keywords'.
     LOG_START("characterizing tokens");
-    LookaheadSet potential_keyword_symbols;
     for (unsigned i = 0, n = grammar.variables.size(); i < n; i++) {
       Symbol token = Symbol::terminal(i);
 
@@ -158,31 +165,6 @@ class LexTableBuilderImpl : public LexTableBuilder {
         });
       }
       following_characters_by_token[i] = following_character_aggregator.result;
-
-      AllCharacterAggregator all_character_aggregator;
-      all_character_aggregator.apply(grammar.variables[i].rule);
-
-      if (
-        !starting_character_aggregator.result.includes_all &&
-        !all_character_aggregator.result.includes_all
-      ) {
-        bool starts_alpha = true, all_alnum = true;
-        for (auto character : starting_character_aggregator.result.included_chars) {
-          if (!iswalpha(character) && character != '_') {
-            starts_alpha = false;
-          }
-        }
-        for (auto character : all_character_aggregator.result.included_chars) {
-          if (!iswalnum(character) && character != '_') {
-            all_alnum = false;
-          }
-        }
-        if (starts_alpha && all_alnum) {
-          LOG("potential keyword: %s", token_name(token).c_str());
-          potential_keyword_symbols.insert(token);
-        }
-      }
-
     }
     LOG_END();
 
@@ -205,98 +187,83 @@ class LexTableBuilderImpl : public LexTableBuilder {
     }
     LOG_END();
 
-    LOG_START("finding keyword capture token");
-    for (Symbol::Index i = 0, n = grammar.variables.size(); i < n; i++) {
-      Symbol candidate = Symbol::terminal(i);
+    if (word_rule != rules::NONE()) {
+      identify_keywords();
+    }
+  }
 
-      LookaheadSet homonyms;
-      potential_keyword_symbols.for_each([&](Symbol other_token) {
-        if (get_conflict_status(other_token, candidate) & MatchesShorterStringWithinSeparators) {
-          homonyms.clear();
-          return false;
-        }
-        if (get_conflict_status(candidate, other_token) == MatchesSameString) {
-          homonyms.insert(other_token);
-        }
-        return true;
-      });
-      if (homonyms.empty()) continue;
-
-      LOG_START(
-        "keyword capture token candidate: %s, homonym count: %lu",
-        token_name(candidate).c_str(),
-        homonyms.size()
-      );
-
-      homonyms.for_each([&](Symbol homonym1) {
-        homonyms.for_each([&](Symbol homonym2) {
-          if (get_conflict_status(homonym1, homonym2) & MatchesSameString) {
-            LOG(
-              "conflict between homonyms %s %s",
-              token_name(homonym1).c_str(),
-              token_name(homonym2).c_str()
-            );
-            homonyms.remove(homonym1);
-          }
-          return false;
-        });
-        return true;
-      });
-
-      for (Symbol::Index j = 0; j < n; j++) {
-        Symbol other_token = Symbol::terminal(j);
-        if (other_token == candidate || homonyms.contains(other_token)) continue;
-        bool candidate_shadows_other = get_conflict_status(other_token, candidate);
-        bool other_shadows_candidate = get_conflict_status(candidate, other_token);
-
-        if (candidate_shadows_other || other_shadows_candidate) {
-          homonyms.for_each([&](Symbol homonym) {
-            bool other_shadows_homonym = get_conflict_status(homonym, other_token);
-
-            bool candidate_was_already_present = true;
-            for (ParseStateId state_id : coincident_token_index.states_with(homonym, other_token)) {
-              if (!parse_table->states[state_id].has_terminal_entry(candidate)) {
-                candidate_was_already_present = false;
-                break;
-              }
-            }
-            if (candidate_was_already_present) return true;
-
-            if (candidate_shadows_other) {
-              homonyms.remove(homonym);
-              LOG(
-                "remove %s because candidate would shadow %s",
-                token_name(homonym).c_str(),
-                token_name(other_token).c_str()
-              );
-            } else if (other_shadows_candidate && !other_shadows_homonym) {
-              homonyms.remove(homonym);
-              LOG(
-                "remove %s because %s would shadow candidate",
-                token_name(homonym).c_str(),
-                token_name(other_token).c_str()
-              );
-            }
-            return true;
-          });
-        }
+  void identify_keywords() {
+    LookaheadSet homonyms;
+    for (Symbol::Index j = 0, n = grammar.variables.size(); j < n; j++) {
+      Symbol other_token = Symbol::terminal(j);
+      if (get_conflict_status(word_rule, other_token) == MatchesSameString) {
+        homonyms.insert(other_token);
       }
-
-      if (homonyms.size() > keyword_symbols.size()) {
-        LOG_START("found capture token. homonyms:");
-        homonyms.for_each([&](Symbol homonym) {
-          LOG("%s", token_name(homonym).c_str());
-          return true;
-        });
-        LOG_END();
-        keyword_symbols = homonyms;
-        keyword_capture_token = candidate;
-      }
-
-      LOG_END();
     }
 
-    LOG_END();
+    homonyms.for_each([&](Symbol homonym1) {
+      homonyms.for_each([&](Symbol homonym2) {
+        if (get_conflict_status(homonym1, homonym2) & MatchesSameString) {
+          LOG(
+            "conflict between homonyms %s %s",
+            token_name(homonym1).c_str(),
+            token_name(homonym2).c_str()
+          );
+          homonyms.remove(homonym1);
+        }
+        return false;
+      });
+      return true;
+    });
+
+    for (Symbol::Index j = 0, n = grammar.variables.size(); j < n; j++) {
+      Symbol other_token = Symbol::terminal(j);
+      if (other_token == word_rule || homonyms.contains(other_token)) continue;
+      bool word_rule_shadows_other = get_conflict_status(other_token, word_rule);
+      bool other_shadows_word_rule = get_conflict_status(word_rule, other_token);
+
+      if (word_rule_shadows_other || other_shadows_word_rule) {
+        homonyms.for_each([&](Symbol homonym) {
+          bool other_shadows_homonym = get_conflict_status(homonym, other_token);
+
+          bool word_rule_was_already_present = true;
+          for (ParseStateId state_id : coincident_token_index.states_with(homonym, other_token)) {
+            if (!parse_table->states[state_id].has_terminal_entry(word_rule)) {
+              word_rule_was_already_present = false;
+              break;
+            }
+          }
+          if (word_rule_was_already_present) return true;
+
+          if (word_rule_shadows_other) {
+            homonyms.remove(homonym);
+            LOG(
+              "remove %s because word_rule would shadow %s",
+              token_name(homonym).c_str(),
+              token_name(other_token).c_str()
+            );
+          } else if (other_shadows_word_rule && !other_shadows_homonym) {
+            homonyms.remove(homonym);
+            LOG(
+              "remove %s because %s would shadow word_rule",
+              token_name(homonym).c_str(),
+              token_name(other_token).c_str()
+            );
+          }
+          return true;
+        });
+      }
+    }
+
+    if (!homonyms.empty()) {
+      LOG_START("found keywords:");
+      homonyms.for_each([&](Symbol homonym) {
+        LOG("%s", token_name(homonym).c_str());
+        return true;
+      });
+      LOG_END();
+      keyword_symbols = homonyms;
+    }
   }
 
   BuildResult build() {
@@ -307,8 +274,8 @@ class LexTableBuilderImpl : public LexTableBuilder {
     for (ParseState &parse_state : parse_table->states) {
       LookaheadSet token_set;
       for (auto &entry : parse_state.terminal_entries) {
-        if (keyword_capture_token.is_terminal() && keyword_symbols.contains(entry.first)) {
-          token_set.insert(keyword_capture_token);
+        if (word_rule.is_terminal() && keyword_symbols.contains(entry.first)) {
+          token_set.insert(word_rule);
         } else {
           token_set.insert(entry.first);
         }
@@ -337,7 +304,19 @@ class LexTableBuilderImpl : public LexTableBuilder {
 
     mark_fragile_tokens();
     remove_duplicate_lex_states(main_lex_table);
-    return {main_lex_table, keyword_lex_table, keyword_capture_token};
+    return {main_lex_table, keyword_lex_table, word_rule};
+  }
+
+  bool does_token_shadow_other(Symbol token, Symbol shadowed_token) const {
+    if (token == word_rule && keyword_symbols.contains(shadowed_token)) return false;
+    return get_conflict_status(shadowed_token, token) & (
+      MatchesShorterStringWithinSeparators |
+      MatchesLongerStringWithValidNextChar
+    );
+  }
+
+  bool does_token_match_same_string_as_other(Symbol token, Symbol shadowed_token) const {
+    return get_conflict_status(shadowed_token, token) & MatchesSameString;
   }
 
   ConflictStatus get_conflict_status(Symbol shadowed_token, Symbol other_token) const {
@@ -410,12 +389,14 @@ class LexTableBuilderImpl : public LexTableBuilder {
                 advance_symbol,
                 MatchesLongerStringWithValidNextChar
               )) {
-                LOG(
-                  "%s shadows %s followed by '%s'",
-                  token_name(advance_symbol).c_str(),
-                  token_name(accept_action.symbol).c_str(),
-                  log_char(*conflicting_following_chars.included_chars.begin())
-                );
+                if (!conflicting_following_chars.included_chars.empty()) {
+                  LOG(
+                    "%s shadows %s followed by '%s'",
+                    token_name(advance_symbol).c_str(),
+                    token_name(accept_action.symbol).c_str(),
+                    log_char(*conflicting_following_chars.included_chars.begin())
+                  );
+                }
               }
             }
           }
@@ -665,8 +646,12 @@ LexTableBuilder::BuildResult LexTableBuilder::build() {
   return static_cast<LexTableBuilderImpl *>(this)->build();
 }
 
-ConflictStatus LexTableBuilder::get_conflict_status(Symbol a, Symbol b) const {
-  return static_cast<const LexTableBuilderImpl *>(this)->get_conflict_status(a, b);
+bool LexTableBuilder::does_token_shadow_other(Symbol a, Symbol b) const {
+  return static_cast<const LexTableBuilderImpl *>(this)->does_token_shadow_other(a, b);
+}
+
+bool LexTableBuilder::does_token_match_same_string_as_other(Symbol a, Symbol b) const {
+  return static_cast<const LexTableBuilderImpl *>(this)->does_token_match_same_string_as_other(a, b);
 }
 
 }  // namespace build_tables
diff --git a/src/compiler/build_tables/lex_table_builder.h b/src/compiler/build_tables/lex_table_builder.h
index 4ec4f22b..d69b996b 100644
--- a/src/compiler/build_tables/lex_table_builder.h
+++ b/src/compiler/build_tables/lex_table_builder.h
@@ -30,19 +30,6 @@ namespace build_tables {
 
 class LookaheadSet;
 
-enum ConflictStatus {
-  DoesNotMatch = 0,
-  MatchesShorterStringWithinSeparators = 1 << 0,
-  MatchesSameString = 1 << 1,
-  MatchesLongerString = 1 << 2,
-  MatchesLongerStringWithValidNextChar = 1 << 3,
-  CannotDistinguish = (
-    MatchesShorterStringWithinSeparators |
-    MatchesSameString |
-    MatchesLongerStringWithValidNextChar
-  ),
-};
-
 struct CoincidentTokenIndex {
   std::unordered_map<
     std::pair<rules::Symbol::Index, rules::Symbol::Index>,
@@ -69,7 +56,8 @@ class LexTableBuilder {
 
   BuildResult build();
 
-  ConflictStatus get_conflict_status(rules::Symbol, rules::Symbol) const;
+  bool does_token_shadow_other(rules::Symbol, rules::Symbol) const;
+  bool does_token_match_same_string_as_other(rules::Symbol, rules::Symbol) const;
 
  protected:
   LexTableBuilder() = default;
diff --git a/src/compiler/build_tables/parse_table_builder.cc b/src/compiler/build_tables/parse_table_builder.cc
index 0e6b4247..26dae5b7 100644
--- a/src/compiler/build_tables/parse_table_builder.cc
+++ b/src/compiler/build_tables/parse_table_builder.cc
@@ -134,11 +134,6 @@ class ParseTableBuilderImpl : public ParseTableBuilder {
   }
 
   void build_error_parse_state(ParseStateId state_id) {
-    unsigned CannotMerge = (
-      MatchesShorterStringWithinSeparators |
-      MatchesLongerStringWithValidNextChar
-    );
-
     parse_table.states[state_id].terminal_entries.clear();
 
     // First, identify the conflict-free tokens.
@@ -149,7 +144,7 @@ class ParseTableBuilderImpl : public ParseTableBuilder {
       for (unsigned j = 0; j < lexical_grammar.variables.size(); j++) {
         Symbol other_token = Symbol::terminal(j);
         if (!coincident_token_index.contains(token, other_token) &&
-            (lex_table_builder->get_conflict_status(other_token, token) & CannotMerge)) {
+            lex_table_builder->does_token_shadow_other(token, other_token)) {
           conflicts_with_other_tokens = true;
           break;
         }
@@ -171,7 +166,7 @@ class ParseTableBuilderImpl : public ParseTableBuilder {
         bool conflicts_with_other_tokens = false;
         conflict_free_tokens.for_each([&](Symbol other_token) {
           if (!coincident_token_index.contains(token, other_token) &&
-              (lex_table_builder->get_conflict_status(other_token, token) & CannotMerge)) {
+              lex_table_builder->does_token_shadow_other(token, other_token)) {
             LOG(
               "exclude %s: conflicts with %s",
               symbol_name(token).c_str(),
@@ -517,7 +512,8 @@ class ParseTableBuilderImpl : public ParseTableBuilder {
     // Do not add a token if it conflicts with an existing token.
     if (!new_token.is_built_in()) {
       for (const auto &entry : state.terminal_entries) {
-        if (lex_table_builder->get_conflict_status(entry.first, new_token) & CannotDistinguish) {
+        if (lex_table_builder->does_token_shadow_other(new_token, entry.first) ||
+            lex_table_builder->does_token_match_same_string_as_other(new_token, entry.first)) {
           LOG_IF(
             logged_conflict_tokens.insert({entry.first, new_token}).second,
             "cannot merge parse states due to token conflict: %s and %s",
diff --git a/src/compiler/grammar.h b/src/compiler/grammar.h
index 6c63340c..5e2212fb 100644
--- a/src/compiler/grammar.h
+++ b/src/compiler/grammar.h
@@ -32,6 +32,7 @@ struct InputGrammar {
   std::vector<std::unordered_set<rules::NamedSymbol>> expected_conflicts;
   std::vector<rules::Rule> external_tokens;
   std::unordered_set<rules::NamedSymbol> variables_to_inline;
+  rules::NamedSymbol word_rule;
 };
 
 }  // namespace tree_sitter
diff --git a/src/compiler/log.cc b/src/compiler/log.cc
index c0f3a03c..4b1e3dbf 100644
--- a/src/compiler/log.cc
+++ b/src/compiler/log.cc
@@ -1,4 +1,5 @@
 #include "compiler/log.h"
+#include <cassert>
 
 static const char *SPACES = "                                                           ";
 
@@ -21,6 +22,7 @@ void _indent_logs() {
 }
 
 void _outdent_logs() {
+  assert(_indent_level > 0);
   _indent_level--;
 }
 
diff --git a/src/compiler/parse_grammar.cc b/src/compiler/parse_grammar.cc
index e233cfe0..f589d15a 100644
--- a/src/compiler/parse_grammar.cc
+++ b/src/compiler/parse_grammar.cc
@@ -229,7 +229,9 @@ ParseGrammarResult parse_grammar(const string &input) {
   string error_message;
   string name;
   InputGrammar grammar;
-  json_value name_json, rules_json, extras_json, conflicts_json, external_tokens_json, inline_rules_json;
+  json_value
+    name_json, rules_json, extras_json, conflicts_json, external_tokens_json,
+    inline_rules_json, word_rule_json;
 
   json_settings settings = { 0, json_enable_comments, 0, 0, 0, 0 };
   char parse_error[json_error_max];
@@ -359,6 +361,16 @@ ParseGrammarResult parse_grammar(const string &input) {
     }
   }
 
+  word_rule_json = grammar_json->operator[]("word");
+  if (word_rule_json.type != json_none) {
+    if (word_rule_json.type != json_string) {
+      error_message = "Invalid word property";
+      goto error;
+    }
+
+    grammar.word_rule = NamedSymbol { word_rule_json.u.string.ptr };
+  }
+
   json_value_free(grammar_json);
   return { name, grammar, "" };
 
diff --git a/src/compiler/prepare_grammar/expand_repeats.cc b/src/compiler/prepare_grammar/expand_repeats.cc
index 46230867..42878376 100644
--- a/src/compiler/prepare_grammar/expand_repeats.cc
+++ b/src/compiler/prepare_grammar/expand_repeats.cc
@@ -106,6 +106,7 @@ InitialSyntaxGrammar expand_repeats(const InitialSyntaxGrammar &grammar) {
     expander.aux_rules.end()
   );
 
+  result.word_rule = grammar.word_rule;
   return result;
 }
 
diff --git a/src/compiler/prepare_grammar/extract_tokens.cc b/src/compiler/prepare_grammar/extract_tokens.cc
index 93b06be2..c82b3505 100644
--- a/src/compiler/prepare_grammar/extract_tokens.cc
+++ b/src/compiler/prepare_grammar/extract_tokens.cc
@@ -329,6 +329,18 @@ tuple<InitialSyntaxGrammar, LexicalGrammar, CompileError> extract_tokens(
     }
   }
 
+  syntax_grammar.word_rule = symbol_replacer.replace_symbol(grammar.word_rule);
+  if (syntax_grammar.word_rule.is_non_terminal()) {
+    return make_tuple(
+      syntax_grammar,
+      lexical_grammar,
+      CompileError(
+        TSCompileErrorTypeInvalidWordRule,
+        "Word rules must be tokens"
+      )
+    );
+  }
+
   return make_tuple(syntax_grammar, lexical_grammar, CompileError::none());
 }
 
diff --git a/src/compiler/prepare_grammar/flatten_grammar.cc b/src/compiler/prepare_grammar/flatten_grammar.cc
index e135ee67..ebfc3ae4 100644
--- a/src/compiler/prepare_grammar/flatten_grammar.cc
+++ b/src/compiler/prepare_grammar/flatten_grammar.cc
@@ -161,6 +161,8 @@ pair<SyntaxGrammar, CompileError> flatten_grammar(const InitialSyntaxGrammar &gr
     i++;
   }
 
+  result.word_rule = grammar.word_rule;
+  
   return {result, CompileError::none()};
 }
 
diff --git a/src/compiler/prepare_grammar/initial_syntax_grammar.h b/src/compiler/prepare_grammar/initial_syntax_grammar.h
index 881c6396..4f21e3cd 100644
--- a/src/compiler/prepare_grammar/initial_syntax_grammar.h
+++ b/src/compiler/prepare_grammar/initial_syntax_grammar.h
@@ -17,6 +17,7 @@ struct InitialSyntaxGrammar {
   std::set<std::set<rules::Symbol>> expected_conflicts;
   std::vector<ExternalToken> external_tokens;
   std::set<rules::Symbol> variables_to_inline;
+  rules::Symbol word_rule;
 };
 
 }  // namespace prepare_grammar
diff --git a/src/compiler/prepare_grammar/intern_symbols.cc b/src/compiler/prepare_grammar/intern_symbols.cc
index 4e610960..dc128779 100644
--- a/src/compiler/prepare_grammar/intern_symbols.cc
+++ b/src/compiler/prepare_grammar/intern_symbols.cc
@@ -166,6 +166,8 @@ pair<InternedGrammar, CompileError> intern_symbols(const InputGrammar &grammar)
     }
   }
 
+  result.word_rule = interner.intern_symbol(grammar.word_rule);
+
   return {result, CompileError::none()};
 }
 
diff --git a/src/compiler/prepare_grammar/interned_grammar.h b/src/compiler/prepare_grammar/interned_grammar.h
index 83117ced..fc322522 100644
--- a/src/compiler/prepare_grammar/interned_grammar.h
+++ b/src/compiler/prepare_grammar/interned_grammar.h
@@ -15,8 +15,8 @@ struct InternedGrammar {
   std::vector<rules::Rule> extra_tokens;
   std::set<std::set<rules::Symbol>> expected_conflicts;
   std::vector<Variable> external_tokens;
-  std::set<rules::Symbol> blank_external_tokens;
   std::set<rules::Symbol> variables_to_inline;
+  rules::Symbol word_rule;
 };
 
 }  // namespace prepare_grammar
diff --git a/src/compiler/syntax_grammar.h b/src/compiler/syntax_grammar.h
index 2d55686b..7d2b1be1 100644
--- a/src/compiler/syntax_grammar.h
+++ b/src/compiler/syntax_grammar.h
@@ -60,6 +60,7 @@ struct SyntaxGrammar {
   std::set<std::set<rules::Symbol>> expected_conflicts;
   std::vector<ExternalToken> external_tokens;
   std::set<rules::Symbol> variables_to_inline;
+  rules::Symbol word_rule;
 };
 
 }  // namespace tree_sitter
diff --git a/src/runtime/array.h b/src/runtime/array.h
index 45b3adaa..b32487a2 100644
--- a/src/runtime/array.h
+++ b/src/runtime/array.h
@@ -110,7 +110,7 @@ static inline void array__grow(VoidArray *self, size_t element_size) {
 
 static inline void array__splice(VoidArray *self, size_t element_size,
                                  uint32_t index, uint32_t old_count,
-                                 uint32_t new_count, void *elements) {
+                                 uint32_t new_count, const void *elements) {
   uint32_t new_size = self->size + new_count - old_count;
   uint32_t old_end = index + old_count;
   uint32_t new_end = index + new_count;
diff --git a/src/runtime/node.c b/src/runtime/node.c
index 607cf9de..0855ec66 100644
--- a/src/runtime/node.c
+++ b/src/runtime/node.c
@@ -28,11 +28,11 @@ static inline TSNode ts_node__null() {
 
 // TSNode - accessors
 
-uint32_t ts_node_start_byte(const TSNode self) {
+uint32_t ts_node_start_byte(TSNode self) {
   return self.context[0];
 }
 
-TSPoint ts_node_start_point(const TSNode self) {
+TSPoint ts_node_start_point(TSNode self) {
   return (TSPoint) {self.context[1], self.context[2]};
 }
 
diff --git a/src/runtime/subtree.c b/src/runtime/subtree.c
index 9b2d954f..83a1ad4e 100644
--- a/src/runtime/subtree.c
+++ b/src/runtime/subtree.c
@@ -59,7 +59,7 @@ bool ts_external_scanner_state_eq(const ExternalScannerState *a, const ExternalS
 // SubtreeArray
 
 bool ts_subtree_array_copy(SubtreeArray self, SubtreeArray *dest) {
-  const Subtree **contents = NULL;
+  Subtree **contents = NULL;
   if (self.capacity > 0) {
     contents = ts_calloc(self.capacity, sizeof(Subtree *));
     memcpy(contents, self.contents, self.size * sizeof(Subtree *));
diff --git a/test/compiler/build_tables/parse_item_set_builder_test.cc b/test/compiler/build_tables/parse_item_set_builder_test.cc
index 6cf5bb0e..6c41c3ca 100644
--- a/test/compiler/build_tables/parse_item_set_builder_test.cc
+++ b/test/compiler/build_tables/parse_item_set_builder_test.cc
@@ -25,7 +25,8 @@ describe("ParseItemSetBuilder", []() {
   LexicalGrammar lexical_grammar{lexical_variables, {}};
 
   it("adds items at the beginnings of referenced rules", [&]() {
-    SyntaxGrammar grammar{{
+    SyntaxGrammar grammar;
+    grammar.variables = {
       SyntaxVariable{"rule0", VariableTypeNamed, {
         Production({
           {Symbol::non_terminal(1), 0, AssociativityNone, Alias{}},
@@ -47,7 +48,7 @@ describe("ParseItemSetBuilder", []() {
           {Symbol::terminal(15), 0, AssociativityNone, Alias{}},
         }, 0)
       }},
-    }, {}, {}, {}, {}};
+    };
 
     auto production = [&](int variable_index, int production_index) -> const Production & {
       return grammar.variables[variable_index].productions[production_index];
@@ -84,7 +85,8 @@ describe("ParseItemSetBuilder", []() {
   });
 
   it("handles rules with empty productions", [&]() {
-    SyntaxGrammar grammar{{
+    SyntaxGrammar grammar;
+    grammar.variables = {
       SyntaxVariable{"rule0", VariableTypeNamed, {
         Production({
           {Symbol::non_terminal(1), 0, AssociativityNone, Alias{}},
@@ -98,7 +100,7 @@ describe("ParseItemSetBuilder", []() {
         }, 0),
         Production{{}, 0}
       }},
-    }, {}, {}, {}, {}};
+    };
 
     auto production = [&](int variable_index, int production_index) -> const Production & {
       return grammar.variables[variable_index].productions[production_index];
diff --git a/test/compiler/prepare_grammar/expand_repeats_test.cc b/test/compiler/prepare_grammar/expand_repeats_test.cc
index 250bd59b..f7aaa8fe 100644
--- a/test/compiler/prepare_grammar/expand_repeats_test.cc
+++ b/test/compiler/prepare_grammar/expand_repeats_test.cc
@@ -11,11 +11,9 @@ START_TEST
 
 describe("expand_repeats", []() {
   it("replaces repeat rules with pairs of recursive rules", [&]() {
-    InitialSyntaxGrammar grammar{
-      {
-        Variable{"rule0", VariableTypeNamed, Repeat{Symbol::terminal(0)}},
-      },
-      {}, {}, {}, {}
+    InitialSyntaxGrammar grammar;
+    grammar.variables = {
+      Variable{"rule0", VariableTypeNamed, Repeat{Symbol::terminal(0)}},
     };
 
     auto result = expand_repeats(grammar);
@@ -30,14 +28,12 @@ describe("expand_repeats", []() {
   });
 
   it("replaces repeats inside of sequences", [&]() {
-    InitialSyntaxGrammar grammar{
-      {
-        Variable{"rule0", VariableTypeNamed, Rule::seq({
-          Symbol::terminal(10),
-          Repeat{Symbol::terminal(11)},
-        })},
-      },
-      {}, {}, {}, {}
+    InitialSyntaxGrammar grammar;
+    grammar.variables = {
+      Variable{"rule0", VariableTypeNamed, Rule::seq({
+        Symbol::terminal(10),
+        Repeat{Symbol::terminal(11)},
+      })},
     };
 
     auto result = expand_repeats(grammar);
@@ -55,14 +51,12 @@ describe("expand_repeats", []() {
   });
 
   it("replaces repeats inside of choices", [&]() {
-    InitialSyntaxGrammar grammar{
-      {
-        Variable{"rule0", VariableTypeNamed, Rule::choice({
-          Symbol::terminal(10),
-          Repeat{Symbol::terminal(11)}
-        })},
-      },
-      {}, {}, {}, {}
+    InitialSyntaxGrammar grammar;
+    grammar.variables = {
+      Variable{"rule0", VariableTypeNamed, Rule::choice({
+        Symbol::terminal(10),
+        Repeat{Symbol::terminal(11)}
+      })},
     };
 
     auto result = expand_repeats(grammar);
@@ -80,18 +74,16 @@ describe("expand_repeats", []() {
   });
 
   it("does not create redundant auxiliary rules", [&]() {
-    InitialSyntaxGrammar grammar{
-      {
-        Variable{"rule0", VariableTypeNamed, Rule::choice({
-          Rule::seq({ Symbol::terminal(1), Repeat{Symbol::terminal(4)} }),
-          Rule::seq({ Symbol::terminal(2), Repeat{Symbol::terminal(4)} }),
-        })},
-        Variable{"rule1", VariableTypeNamed, Rule::seq({
-          Symbol::terminal(3),
-          Repeat{Symbol::terminal(4)}
-        })},
-      },
-      {}, {}, {}, {}
+    InitialSyntaxGrammar grammar;
+    grammar.variables = {
+      Variable{"rule0", VariableTypeNamed, Rule::choice({
+        Rule::seq({ Symbol::terminal(1), Repeat{Symbol::terminal(4)} }),
+        Rule::seq({ Symbol::terminal(2), Repeat{Symbol::terminal(4)} }),
+      })},
+      Variable{"rule1", VariableTypeNamed, Rule::seq({
+        Symbol::terminal(3),
+        Repeat{Symbol::terminal(4)}
+      })},
     };
 
     auto result = expand_repeats(grammar);
@@ -113,14 +105,14 @@ describe("expand_repeats", []() {
   });
 
   it("can replace multiple repeats in the same rule", [&]() {
-    InitialSyntaxGrammar grammar{
+    InitialSyntaxGrammar grammar;
+    grammar.variables = {
       {
         Variable{"rule0", VariableTypeNamed, Rule::seq({
           Repeat{Symbol::terminal(10)},
           Repeat{Symbol::terminal(11)},
         })},
-      },
-      {}, {}, {}, {}
+      }
     };
 
     auto result = expand_repeats(grammar);
@@ -142,12 +134,10 @@ describe("expand_repeats", []() {
   });
 
   it("can replace repeats in multiple rules", [&]() {
-    InitialSyntaxGrammar grammar{
-      {
-        Variable{"rule0", VariableTypeNamed, Repeat{Symbol::terminal(10)}},
-        Variable{"rule1", VariableTypeNamed, Repeat{Symbol::terminal(11)}},
-      },
-      {}, {}, {}, {}
+    InitialSyntaxGrammar grammar;
+    grammar.variables = {
+      Variable{"rule0", VariableTypeNamed, Repeat{Symbol::terminal(10)}},
+      Variable{"rule1", VariableTypeNamed, Repeat{Symbol::terminal(11)}},
     };
 
     auto result = expand_repeats(grammar);
diff --git a/test/compiler/prepare_grammar/intern_symbols_test.cc b/test/compiler/prepare_grammar/intern_symbols_test.cc
index 65bad45e..7b7f3624 100644
--- a/test/compiler/prepare_grammar/intern_symbols_test.cc
+++ b/test/compiler/prepare_grammar/intern_symbols_test.cc
@@ -11,13 +11,11 @@ using prepare_grammar::intern_symbols;
 
 describe("intern_symbols", []() {
   it("replaces named symbols with numerically-indexed symbols", [&]() {
-    InputGrammar grammar{
-      {
-        {"x", VariableTypeNamed, Rule::choice({ NamedSymbol{"y"}, NamedSymbol{"_z"} })},
-        {"y", VariableTypeNamed, NamedSymbol{"_z"}},
-        {"_z", VariableTypeNamed, String{"stuff"}}
-      },
-      {}, {}, {}, {}
+    InputGrammar grammar;
+    grammar.variables = {
+      {"x", VariableTypeNamed, Rule::choice({ NamedSymbol{"y"}, NamedSymbol{"_z"} })},
+      {"y", VariableTypeNamed, NamedSymbol{"_z"}},
+      {"_z", VariableTypeNamed, String{"stuff"}}
     };
 
     auto result = intern_symbols(grammar);
@@ -32,11 +30,9 @@ describe("intern_symbols", []() {
 
   describe("when there are symbols that reference undefined rules", [&]() {
     it("returns an error", []() {
-      InputGrammar grammar{
-        {
-          {"x", VariableTypeNamed, NamedSymbol{"y"}},
-        },
-        {}, {}, {}, {}
+      InputGrammar grammar;
+      grammar.variables = {
+        {"x", VariableTypeNamed, NamedSymbol{"y"}},
       };
 
       auto result = intern_symbols(grammar);
@@ -46,16 +42,14 @@ describe("intern_symbols", []() {
   });
 
   it("translates the grammar's optional 'extra_tokens' to numerical symbols", [&]() {
-    InputGrammar grammar{
-      {
-        {"x", VariableTypeNamed, Rule::choice({ NamedSymbol{"y"}, NamedSymbol{"z"} })},
-        {"y", VariableTypeNamed, NamedSymbol{"z"}},
-        {"z", VariableTypeNamed, String{"stuff"}}
-      },
-      {
-        NamedSymbol{"z"}
-      },
-      {}, {}, {}
+    InputGrammar grammar;
+    grammar.variables = {
+      {"x", VariableTypeNamed, Rule::choice({ NamedSymbol{"y"}, NamedSymbol{"z"} })},
+      {"y", VariableTypeNamed, NamedSymbol{"z"}},
+      {"z", VariableTypeNamed, String{"stuff"}}
+    };
+    grammar.extra_tokens = {
+      NamedSymbol{"z"}
     };
 
     auto result = intern_symbols(grammar);
@@ -66,19 +60,15 @@ describe("intern_symbols", []() {
   });
 
   it("records any rule names that match external token names", [&]() {
-    InputGrammar grammar{
-      {
-        {"x", VariableTypeNamed, Rule::choice({ NamedSymbol{"y"}, NamedSymbol{"z"} })},
-        {"y", VariableTypeNamed, NamedSymbol{"z"}},
-        {"z", VariableTypeNamed, String{"stuff"}},
-      },
-      {},
-      {},
-      {
-        NamedSymbol{"w"},
-        NamedSymbol{"z"},
-      },
-      {}
+    InputGrammar grammar;
+    grammar.variables = {
+      {"x", VariableTypeNamed, Rule::choice({ NamedSymbol{"y"}, NamedSymbol{"z"} })},
+      {"y", VariableTypeNamed, NamedSymbol{"z"}},
+      {"z", VariableTypeNamed, String{"stuff"}},
+    };
+    grammar.external_tokens = {
+      NamedSymbol{"w"},
+      NamedSymbol{"z"},
     };
 
     auto result = intern_symbols(grammar);