Fuzzer Building¶
Fuzzers in Grammarinator are constructed in two main stages:
A generator class is created from an ANTLRv4 grammar.
In the case of the C++ backend, the generated source code is compiled into standalone executables or libraries.
The following sections describe each step in detail.
Generator Creation from ANTLR Grammar¶
In both Python and C++ backends, the first step is to convert the ANTLR grammar into a generator class. This generator encapsulates the logic for producing derivation trees from the grammar rules.
In the Python backend, this generator is an instance of
Generator, which will be utilized in the
subsequent step where test cases are produced using the
grammarinator-generate command.
In the C++ backend, a corresponding Generator class is generated as a
header-only C++ file (e.g., HTMLGenerator.hpp). While this class does not
yet have dedicated API documentation, it mirrors the structure and behavior of
the Python generator and is used by the compiled fuzzing tools
(e.g., the grammarinator-generate-html binary and
libFuzzer integration or
AFL++ integration).
The generator class – whether Python or C++ – is automatically produced by
the grammarinator-process command line utility. This tool loads and
interprets the input grammar, generating a corresponding generator written in
the target language. The output generator will consist of a class definition,
named based on the grammar’s name in the format <grammarName>Generator and
methods corresponding to each rule defined in the grammar.
- The CLI of grammarinator-process
usage: python -m grammarinator.process [-h] [-g [FILE ...]] [-D OPT=VAL]
[--language LANG] [--no-actions]
[--rule NAME] [--lib DIR] [--pep8]
[-o DIR] [--encoding NAME]
[--encoding-errors NAME]
[--log-level LEVEL] [-v] [-q]
[--version]
[FILE ...]
Grammarinator: Processor
positional arguments:
FILE ANTLR grammar files describing the expected format to
generate (alias for --grammar).
options:
-h, --help show this help message and exit
-g, --grammar [FILE ...]
ANTLR grammar files describing the expected format to
generate.
-D OPT=VAL set/override grammar-level option
--language LANG language of the generated code (choices: py, hpp;
default: py)
--no-actions do not process inline actions.
--rule, -r NAME default rule to start generation from (default: the
first parser rule)
--lib DIR alternative location of import grammars.
--pep8 enable autopep8 to format the generated fuzzer (only
if --language=py).
-o, --out DIR temporary working directory (default: /home/docs/check
outs/readthedocs.org/user_builds/grammarinator/checkou
ts/stable/docs).
--encoding NAME grammar file encoding (default: utf-8).
--encoding-errors NAME
encoding error handling scheme (default: strict).
--log-level LEVEL verbosity level of diagnostic messages (TRACE, DEBUG,
INFO, WARNING, ERROR, CRITICAL, DISABLE; default:
INFO)
-v, --verbose verbose mode (alias for --log-level DEBUG)
-q, --quiet quiet mode (alias for --log-level DISABLE)
--version show program's version number and exit
The tool processes a grammar in ANTLR v4 format (*.g4, either separated to
lexer and parser grammar files, or a single combined grammar) and creates a
fuzzer that can generate randomized content conforming to the format described
by the grammar.
The usage of grammarinator-process is straigthforward: it processes the
specified grammars encoded with the --encoding option (default: utf-8)
and generates the output in the directory specified by --out (default is the
current working directory). The generated code is written in the programming
language specified by --language (currently, the available options are
py for Python and hpp for C++).
If the grammar contains parser-specific or unnecessary inline actions, they
can be ignored by using the --no-actions option. The grammars can also
include an options section, where values can be extended or
overridden from the command-line interface (CLI) using the -D OPT=VAL
argument.
If a grammar imports other grammars from a different directory, the directory
path needs to be defined using the --lib argument.
Additionally, the output grammar can be automatically formatted to follow the
PEP8 style recommendations by using the --pep8 option.
Compilation in C++¶
Once the generator header has been created from the grammar, the next step in
the C++ workflow is to compile it into a usable binary or library. This is
done using the build.py utility script, which is located in
grammarinator-cxx/dev/.
- The CLI of grammarinator-cxx/dev/build.py
usage: build.py [-h] [--builddir DIR] [--clean] [--build-type TYPE] [--debug]
[--verbose] [--install [DIR]] [--log-level LEVEL] [--generate]
[--decode] [--fuzznull] [--grlf] [--grafl]
[--afl-includedir DIR] [--generator NAME] [--model NAME]
[--listener NAME] [--transformer NAME] [--serializer NAME]
[--tree-format NAME] [--include FILE] [--includedir DIR]
[--suffix NAME]
options:
-h, --help show this help message and exit
general build options:
--builddir DIR directory for the build files (default: /home/docs/che
ckouts/readthedocs.org/user_builds/grammarinator/check
outs/stable/grammarinator-cxx/build)
--clean create a clean build (default: False)
--build-type TYPE set build type (default: Release)
--debug debug build (alias for --build-type Debug)
--verbose build target in verbose mode (default: False)
--install [DIR] install after build (default: don't install; default
directory if install: OS-specific)
--log-level LEVEL set logging verbosity (default: error)
specialization options:
--generate build a standalone blackbox generator tool for the
given grammar (default: False)
--decode build a standalone decoder tool for the given grammar
(default: False)
--fuzznull build a dummy fuzznull binary to test libFuzzer
integration without a real fuzz target (default:
False)
--grlf build a static libgrlf library for libFuzzer
integration (default: False)
--grafl build a shared libgrafl library for AFL++ integration
(default: False)
--afl-includedir DIR AFL include directory (mandatory if --grafl is
specified)
--generator NAME name of the generator class
--model NAME name of the model class (default:
grammarinator::runtime::DefaultModel)
--listener NAME name of the listener class (default:
grammarinator::runtime::Listener)
--transformer NAME name of the transformer function (default: nullptr,
signaling no transformer)
--serializer NAME name of the serializer function (default:
grammarinator::runtime::NoSpaceSerializer)
--tree-format NAME format of the saved trees (choices: json, flatbuffers;
default: flatbuffers)
--include FILE file to include when compiling the specialized
artefacts (default: derived from the generator class
name by appending .hpp)
--includedir DIR directory to append to the include path, usually which
contains the file produced by grammarinator-process
(may be specified multiple times)
--suffix NAME suffix of the specialized artefacts, possibly
referring to the input format (default: derived from
the generator class name by removing Generator and
lowercasing)
The build.py script takes the configuration of the desired generator and
optional components and compiles a standalone binary from them.
Component Configuration
The following command-line arguments define the components that should be built into the resulting binary:
--generator(required): Fully qualified name of the generator class, e.g.,HTMLGenerator.--model,--listener,--transformer,--serializer(optional): Fully qualified names of additional components to be compiled into the binary (e.g.,grammarinator::runtime::NoSpaceSerializer).--includedir(required): Directory that contains the source headers of all specified components. Only one include directory can be specified, so it is recommended to place all related source files (generator, serializer, etc.) in the same folder.
If any component beyond the generator is specified, an additional argument is required:
--include(optional): A header file (e.g.,HTMLConfig.hpp) that explicitly includes all component headers. This is only needed if any component other than the generator is customized. The file must reside in the directory specified by--includedir.
For example, if using a custom serializer and transformer, your config file
(HTMLConfig.hpp) might look like:
#include "HTMLGenerator.hpp"
#include "HTMLSpaceSerializer.hpp"
Binary Output
Depending on the build flags, the following outputs may be generated:
With
--generate:grammarinator-generate-<name>: standalone blackbox generator
With
--grlf:libgrlf-<name>.a: static library to defineLLVMFuzzerCustomMutatororLLVMFuzzerCustomCrossover(useful for libFuzzer integration)
With
--grafl:libgafl-<name>.so: shared library to define various hooks for AFL++ integration
With
--fuzznull:fuzznull-<name>: dummy libFuzzer binary for integration testing
With
--decode:grammarinator-decode-<name>: standalone tool to convert tests from tree to source format with the chosen serializer
All outputs are written to the build/<Release|Debug>/bin and
build/<Release|Debug>/lib directories.
Compiler Requirements
Clang is required for building libFuzzer-linked binaries due to the use of
-fsanitize=fuzzer. You can specify it by setting the environment variable:
CXX=clang++ python3 grammarinator-cxx/dev/build.py ...