Fuzzer Building

Fuzzers in Grammarinator are constructed in two main stages:

  1. A generator class is created from an ANTLRv4 grammar.

  2. In the case of the C++ backend, the generated source code is compiled into standalone executables or libraries.

The following sections describe each step in detail.

Generator Creation from ANTLR Grammar

In both Python and C++ backends, the first step is to convert the ANTLR grammar into a generator class. This generator encapsulates the logic for producing derivation trees from the grammar rules.

In the Python backend, this generator is an instance of Generator, which will be utilized in the subsequent step where test cases are produced using the grammarinator-generate command.

In the C++ backend, a corresponding Generator class is generated as a header-only C++ file (e.g., HTMLGenerator.hpp). While this class does not yet have dedicated API documentation, it mirrors the structure and behavior of the Python generator and is used by the compiled fuzzing tools (e.g., the grammarinator-generate-html binary and libFuzzer integration or AFL++ integration).

The generator class – whether Python or C++ – is automatically produced by the grammarinator-process command line utility. This tool loads and interprets the input grammar, generating a corresponding generator written in the target language. The output generator will consist of a class definition, named based on the grammar’s name in the format <grammarName>Generator and methods corresponding to each rule defined in the grammar.

The CLI of grammarinator-process
usage: python -m grammarinator.process [-h] [-g [FILE ...]] [-D OPT=VAL]
                                       [--language LANG] [--no-actions]
                                       [--rule NAME] [--lib DIR] [--pep8]
                                       [-o DIR] [--encoding NAME]
                                       [--encoding-errors NAME]
                                       [--log-level LEVEL] [-v] [-q]
                                       [--version]
                                       [FILE ...]

Grammarinator: Processor

positional arguments:
  FILE                  ANTLR grammar files describing the expected format to
                        generate (alias for --grammar).

options:
  -h, --help            show this help message and exit
  -g, --grammar [FILE ...]
                        ANTLR grammar files describing the expected format to
                        generate.
  -D OPT=VAL            set/override grammar-level option
  --language LANG       language of the generated code (choices: py, hpp;
                        default: py)
  --no-actions          do not process inline actions.
  --rule, -r NAME       default rule to start generation from (default: the
                        first parser rule)
  --lib DIR             alternative location of import grammars.
  --pep8                enable autopep8 to format the generated fuzzer (only
                        if --language=py).
  -o, --out DIR         temporary working directory (default: /home/docs/check
                        outs/readthedocs.org/user_builds/grammarinator/checkou
                        ts/stable/docs).
  --encoding NAME       grammar file encoding (default: utf-8).
  --encoding-errors NAME
                        encoding error handling scheme (default: strict).
  --log-level LEVEL     verbosity level of diagnostic messages (TRACE, DEBUG,
                        INFO, WARNING, ERROR, CRITICAL, DISABLE; default:
                        INFO)
  -v, --verbose         verbose mode (alias for --log-level DEBUG)
  -q, --quiet           quiet mode (alias for --log-level DISABLE)
  --version             show program's version number and exit

The tool processes a grammar in ANTLR v4 format (*.g4, either separated to
lexer and parser grammar files, or a single combined grammar) and creates a
fuzzer that can generate randomized content conforming to the format described
by the grammar.

The usage of grammarinator-process is straigthforward: it processes the specified grammars encoded with the --encoding option (default: utf-8) and generates the output in the directory specified by --out (default is the current working directory). The generated code is written in the programming language specified by --language (currently, the available options are py for Python and hpp for C++).

If the grammar contains parser-specific or unnecessary inline actions, they can be ignored by using the --no-actions option. The grammars can also include an options section, where values can be extended or overridden from the command-line interface (CLI) using the -D OPT=VAL argument.

If a grammar imports other grammars from a different directory, the directory path needs to be defined using the --lib argument.

Additionally, the output grammar can be automatically formatted to follow the PEP8 style recommendations by using the --pep8 option.

Compilation in C++

Once the generator header has been created from the grammar, the next step in the C++ workflow is to compile it into a usable binary or library. This is done using the build.py utility script, which is located in grammarinator-cxx/dev/.

The CLI of grammarinator-cxx/dev/build.py
usage: build.py [-h] [--builddir DIR] [--clean] [--build-type TYPE] [--debug]
                [--verbose] [--install [DIR]] [--log-level LEVEL] [--generate]
                [--decode] [--fuzznull] [--grlf] [--grafl]
                [--afl-includedir DIR] [--generator NAME] [--model NAME]
                [--listener NAME] [--transformer NAME] [--serializer NAME]
                [--tree-format NAME] [--include FILE] [--includedir DIR]
                [--suffix NAME]

options:
  -h, --help            show this help message and exit

general build options:
  --builddir DIR        directory for the build files (default: /home/docs/che
                        ckouts/readthedocs.org/user_builds/grammarinator/check
                        outs/stable/grammarinator-cxx/build)
  --clean               create a clean build (default: False)
  --build-type TYPE     set build type (default: Release)
  --debug               debug build (alias for --build-type Debug)
  --verbose             build target in verbose mode (default: False)
  --install [DIR]       install after build (default: don't install; default
                        directory if install: OS-specific)
  --log-level LEVEL     set logging verbosity (default: error)

specialization options:
  --generate            build a standalone blackbox generator tool for the
                        given grammar (default: False)
  --decode              build a standalone decoder tool for the given grammar
                        (default: False)
  --fuzznull            build a dummy fuzznull binary to test libFuzzer
                        integration without a real fuzz target (default:
                        False)
  --grlf                build a static libgrlf library for libFuzzer
                        integration (default: False)
  --grafl               build a shared libgrafl library for AFL++ integration
                        (default: False)
  --afl-includedir DIR  AFL include directory (mandatory if --grafl is
                        specified)
  --generator NAME      name of the generator class
  --model NAME          name of the model class (default:
                        grammarinator::runtime::DefaultModel)
  --listener NAME       name of the listener class (default:
                        grammarinator::runtime::Listener)
  --transformer NAME    name of the transformer function (default: nullptr,
                        signaling no transformer)
  --serializer NAME     name of the serializer function (default:
                        grammarinator::runtime::NoSpaceSerializer)
  --tree-format NAME    format of the saved trees (choices: json, flatbuffers;
                        default: flatbuffers)
  --include FILE        file to include when compiling the specialized
                        artefacts (default: derived from the generator class
                        name by appending .hpp)
  --includedir DIR      directory to append to the include path, usually which
                        contains the file produced by grammarinator-process
                        (may be specified multiple times)
  --suffix NAME         suffix of the specialized artefacts, possibly
                        referring to the input format (default: derived from
                        the generator class name by removing Generator and
                        lowercasing)

The build.py script takes the configuration of the desired generator and optional components and compiles a standalone binary from them.

Component Configuration

The following command-line arguments define the components that should be built into the resulting binary:

  • --generator (required): Fully qualified name of the generator class, e.g., HTMLGenerator.

  • --model, --listener, --transformer, --serializer (optional): Fully qualified names of additional components to be compiled into the binary (e.g., grammarinator::runtime::NoSpaceSerializer).

  • --includedir (required): Directory that contains the source headers of all specified components. Only one include directory can be specified, so it is recommended to place all related source files (generator, serializer, etc.) in the same folder.

If any component beyond the generator is specified, an additional argument is required:

  • --include (optional): A header file (e.g., HTMLConfig.hpp) that explicitly includes all component headers. This is only needed if any component other than the generator is customized. The file must reside in the directory specified by --includedir.

For example, if using a custom serializer and transformer, your config file (HTMLConfig.hpp) might look like:

#include "HTMLGenerator.hpp"
#include "HTMLSpaceSerializer.hpp"

Binary Output

Depending on the build flags, the following outputs may be generated:

  • With --generate:

    • grammarinator-generate-<name>: standalone blackbox generator

  • With --grlf:

    • libgrlf-<name>.a: static library to define LLVMFuzzerCustomMutator or LLVMFuzzerCustomCrossover (useful for libFuzzer integration)

  • With --grafl:

  • With --fuzznull:

    • fuzznull-<name>: dummy libFuzzer binary for integration testing

  • With --decode:

    • grammarinator-decode-<name>: standalone tool to convert tests from tree to source format with the chosen serializer

All outputs are written to the build/<Release|Debug>/bin and build/<Release|Debug>/lib directories.

Compiler Requirements

Clang is required for building libFuzzer-linked binaries due to the use of -fsanitize=fuzzer. You can specify it by setting the environment variable:

CXX=clang++ python3 grammarinator-cxx/dev/build.py ...