Merge pull request #91 from tree-sitter/libFuzzer

Add support for fuzzing with libFuzzer
This commit is contained in:
Max Brunsfeld 2017-07-17 11:43:01 -07:00 committed by GitHub
commit 34279257f9
9 changed files with 208 additions and 0 deletions

43
test/fuzz/README.md Normal file
View file

@ -0,0 +1,43 @@
# Fuzzing tree-sitter
The tree-sitter fuzzing support requires 1) the `libFuzzer` runtime library and 2) a recent version of clang
## libFuzzer
The main fuzzing logic is implemented by `libFuzzer` which is part of the LLVM project but is not shipped by distros. It will need to be built from source but does not require building the _whole_ LLVM project. LLVM can be downloaded from llvm.org using SVN or [llvm-mirror](https://github.com/llvm-mirror/llvm) using git. `libFuzzer` can be built as, e.g.:
```
cd ~/src
git clone https://github.com/llvm-mirror/llvm
cd llvm/lib/Fuzzer
./build.sh
```
## clang
Using libFuzzer requires a reasonably new version of `clang` and will probably _not_ work with your system-installed version. The easiest way to get started is to use the version provided by the Chromium team. Instructions are available at [libFuzzer.info](http://libfuzzer.info).
The fuzzers can then be built with:
```
export CLANG_DIR=$HOME/src/third_party/llvm-build/Release+Asserts/bin
CC="$CLANG_DIR/clang" CXX="$CLANG_DIR/clang++" LINK="$CLANG_DIR/clang++" \
LIB_FUZZER_PATH=$HOME/src/llvm/lib/Fuzzer/libFuzzer.a \
./script/build_fuzzers
```
This will generate a separate fuzzer for each grammar defined in `test/fixtures/grammars` and will be instrumented with [AddressSanitizer](https://clang.llvm.org/docs/AddressSanitizer.html) and [UndefinedBehaviorSanitizer](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html). Individual fuzzers can be built with, for example, `./script/build_fuzzers python ruby`.
The `run-fuzzer` script handles running an individual fuzzer with a sensible default set of arguments:
```
./script/run-fuzzer <grammar-name> <extra libFuzzer arguments...>
```
which will log information to stdout. Failing testcases and a fuzz corpus will be saved to `fuzz-results/<grammar-name>`. The most important extra `libFuzzer` options are `-jobs` and `-workers` which allow parallel fuzzing. This is can done with, e.g.:
```
./script/run-fuzzer <grammer-name> -jobs=32 -workers=32
```
The testcase can be used to reproduce the crash by running:
```
./script/reproduce <grammar-name> <path-to-testcase>
```

27
test/fuzz/fuzzer.cc Normal file
View file

@ -0,0 +1,27 @@
#include <string.h>
#include "tree_sitter/runtime.h"
void test_log(void *payload, TSLogType type, const char *string) { }
TSLogger logger = {
.log = test_log,
};
extern "C" const TSLanguage *TSLANG();
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
const char *str = reinterpret_cast<const char *>(data);
TSDocument *document = ts_document_new();
ts_document_set_language(document, TSLANG());
ts_document_set_input_string_with_length(document, str, size);
TSParseOptions options = {};
options.halt_on_error = false;
ts_document_parse_with_options(document, options);
TSNode root_node = ts_document_root_node(document);
ts_document_free(document);
return 0;
}

31
test/fuzz/gen-dict.py Normal file
View file

@ -0,0 +1,31 @@
import json
import sys
def find_literals(literals, node):
'''Recursively find STRING literals in the grammar definition'''
if type(node) is dict:
if 'type' in node and node['type'] == 'STRING' and 'value' in node:
literals.add(node['value'])
for key, value in node.iteritems():
find_literals(literals, value)
elif type(node) is list:
for item in node:
find_literals(literals, item)
def main():
'''Generate a libFuzzer / AFL dictionary from a tree-sitter grammar.json'''
with open(sys.argv[1]) as f:
grammar = json.load(f)
literals = set()
find_literals(literals, grammar)
for lit in sorted(literals):
if lit:
print '"%s"' % ''.join([(c if c.isalnum() else '\\x%02x' % ord(c)) for c in lit])
if __name__ == '__main__':
main()