Add libFuzzer support

This adds support for fuzzing tree-sitter grammars with libFuzzer. This
currently only works on Linux because of linking issues on macOS. Breifly, the
AddressSanitizer library is dynamically linked into the fuzzer binary and
cannot be found at runtime if built with a compiler that wasn't provided by
Xcode(?). The runtime library is statically linked on Linux so this isn't a
problem.
This commit is contained in:
Phil Turnbull 2017-07-14 10:42:01 -07:00
parent 69500c9dd7
commit 798ef5e4dc
8 changed files with 205 additions and 0 deletions

43
test/fuzz/README.md Normal file
View file

@ -0,0 +1,43 @@
# Fuzzing tree-sitter
The tree-sitter fuzzing support requires 1) the `libFuzzer` runtime library and 2) a recent version of clang
## libFuzzer
The main fuzzing logic is implemented by `libFuzzer` which is part of the LLVM project but is not shipped by distros. It will need to be built from source but does not require building the _whole_ LLVM project. LLVM can be downloaded from llvm.org using SVN or [llvm-mirror](https://github.com/llvm-mirror/llvm) using git. `libFuzzer` can be built as, e.g.:
```
cd ~/src
git clone https://github.com/llvm-mirror/llvm
cd llvm/lib/Fuzzer
./build.sh
```
## clang
Using libFuzzer requires a reasonably new version of `clang` and will probably _not_ work with your system-installed version. The easiest way to get started is to use the version provided by the Chromium team. Instructions are available at [libFuzzer.info](http://libfuzzer.info).
The fuzzers can then be built with:
```
export CLANG_DIR=$HOME/src/third_party/llvm-build/Release+Asserts/bin
CC="$CLANG_DIR/clang" CXX="$CLANG_DIR/clang++" LINK="$CLANG_DIR/clang++" \
LIB_FUZZER_PATH=$HOME/src/llvm/lib/Fuzzer/libFuzzer.a \
./script/build_fuzzers
```
This will generate a separate fuzzer for each grammar defined in `test/fixtures/grammars` and will be instrumented with [AddressSanitizer](https://clang.llvm.org/docs/AddressSanitizer.html) and [UndefinedBehaviorSanitizer](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html). Individual fuzzers can be built with, for example, `./script/build_fuzzers python ruby`.
The `run-fuzzer` script handles running an individual fuzzer with a sensible default set of arguments:
```
./script/run-fuzzer <grammar-name> <extra libFuzzer arguments...>
```
which will log information to stdout. Failing testcases and a fuzz corpus will be saved to `fuzz-results/<grammar-name>`. The most important extra `libFuzzer` options are `-jobs` and `-workers` which allow parallel fuzzing. This is can done with, e.g.:
```
./script/run-fuzzer <grammer-name> -jobs=32 -workers=32
```
The testcase can be used to reproduce the crash by running:
```
./script/reproduce <grammar-name> <path-to-testcase>
```

27
test/fuzz/fuzzer.cc Normal file
View file

@ -0,0 +1,27 @@
#include <string.h>
#include "tree_sitter/runtime.h"
void test_log(void *payload, TSLogType type, const char *string) { }
TSLogger logger = {
.log = test_log,
};
extern "C" const TSLanguage *TSLANG();
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
const char *str = reinterpret_cast<const char *>(data);
TSDocument *document = ts_document_new();
ts_document_set_language(document, TSLANG());
ts_document_set_input_string_with_length(document, str, size);
TSParseOptions options = {};
options.halt_on_error = false;
ts_document_parse_with_options(document, options);
TSNode root_node = ts_document_root_node(document);
ts_document_free(document);
return 0;
}

31
test/fuzz/gen-dict.py Normal file
View file

@ -0,0 +1,31 @@
import json
import sys
def find_literals(literals, node):
'''Recursively find STRING literals in the grammar definition'''
if type(node) is dict:
if 'type' in node and node['type'] == 'STRING' and 'value' in node:
literals.add(node['value'])
for key, value in node.iteritems():
find_literals(literals, value)
elif type(node) is list:
for item in node:
find_literals(literals, item)
def main():
'''Generate a libFuzzer / AFL dictionary from a tree-sitter grammar.json'''
with open(sys.argv[1]) as f:
grammar = json.load(f)
literals = set()
find_literals(literals, grammar)
for lit in sorted(literals):
if lit:
print '"%s"' % ''.join([(c if c.isalnum() else '\\x%02x' % ord(c)) for c in lit])
if __name__ == '__main__':
main()