Population¶
In addition to generating trees from a specific start rule, Grammarinator
also provides support for two evolutionary operators:
mutate() and
recombine(). These operators require
Grammarinator to maintain a set of trees, known as the population. The
population can be created by either processing existing sources or by
generating trees from scratch.
Grammarinator supports multiple tree serialization formats to represent population members. These formats determine how the population is stored on disk and consumed by various generation or fuzzing tools.
Supported Tree Formats¶
FlatBuffer-encoded trees (
.grtf):Recommended format for both Python and C++ workflows.
Compact and fast to read/write.
Cross-language compatible (e.g., usable from Python, C++, etc. with FlatBuffer bindings).
Supported by both Python and C++ components.
Default format when tree codec is not explicitly selected.
JSON-encoded trees (
.grtj):Portable and human-readable format.
Slower to process than FlatBuffer.
Useful for debugging or language-agnostic inspection.
Supported by both Python and C++ components.
Pickle-encoded trees (
.grtp):Python-specific format based on the
picklemodule.Not portable across languages or even Python versions.
Only usable with
grammarinator-generate.Retained primarily for backward compatibility and prototyping.
When creating or using a population, the appropriate format must be specified
consistently across tools using the --tree-format flag.
Population Creation From Existing Sources¶
The grammarinator-parse utility provides support for creating an initial
set of trees from real tests or any input that is not necessarily generated by
a fuzzer. This allows the incorporation of real-world scenarios or specific
test cases into the population and apply evolutionary algorithms to generate
variations and explore different test cases.
- The CLI of grammarinator-parse
usage: python -m grammarinator.parse [-h] [--glob PATTERN [PATTERN ...]]
-g FILE [FILE ...] [-r NAME] [-t NAME]
[--hidden NAME] [--max-depth MAX_DEPTH]
[--strict] [-o DIR] [--parser-dir DIR]
[--lib DIR] [--tree-format NAME]
[--encoding NAME]
[--encoding-errors NAME]
[--disable-cleanup] [-j NUM]
[--antlr FILE] [--sys-path DIR]
[--sys-recursion-limit NUM]
[--log-level LEVEL] [-v] [-q] [--version]
FILE [FILE ...]
Grammarinator: Parser
positional arguments:
FILE input files or directories to process.
options:
-h, --help show this help message and exit
--glob PATTERN [PATTERN ...]
wildcard patterns for input files to process
(supported wildcards: ?, *, **, [])
-g, --grammar FILE [FILE ...]
ANTLR grammar files describing the expected format of
input to parse.
-r, --rule NAME name of the rule to start parsing with (default: first
parser rule).
-t, --transformer NAME
reference to a transformer (in package.module.function
format) to postprocess the parsed tree.
--hidden NAME list of hidden tokens to be built into the parsed
tree.
--max-depth MAX_DEPTH
maximum expected tree depth (deeper tests will be
discarded (default: inf)).
--strict discard tests that contain syntax errors.
-o, --out DIR directory to save the trees (default: /home/docs/check
outs/readthedocs.org/user_builds/grammarinator/checkou
ts/stable/docs).
--parser-dir DIR directory to save the parser grammars (default:
<OUTDIR>/grammars).
--lib DIR alternative location of import grammars.
--tree-format NAME format of the saved trees (choices: flatbuffers, json,
pickle, default: pickle)
--encoding NAME input file encoding (default: utf-8).
--encoding-errors NAME
encoding error handling scheme (default: strict).
--disable-cleanup disable the removal of intermediate files.
-j, --jobs NUM parallelization level (default: number of cpu cores
(2)).
--antlr FILE path of the ANTLR v4 tool jar file (default:
/home/docs/.antlerinator/antlr-4.13.2-complete.jar)
--sys-path DIR add directory to the search path for Python modules
(may be specified multiple times)
--sys-recursion-limit NUM
override maximum depth of the Python interpreter stack
(default: 1000)
--log-level LEVEL verbosity level of diagnostic messages (TRACE, DEBUG,
INFO, WARNING, ERROR, CRITICAL, DISABLE; default:
INFO)
-v, --verbose verbose mode (alias for --log-level DEBUG)
-q, --quiet quiet mode (alias for --log-level DISABLE)
--version show program's version number and exit
The tool parses files with ANTLR v4 grammars, builds Grammarinator- compatible
tree representations from them and saves them for further reuse.
The usage of the grammarinator-parse utility is generally straightforward.
It takes a set of inputs and processes them with the specified grammars
(-g). Inputs can be listed as files or directories (FILE), or specified
with file patterns (using --glob). The listed directories are traversed
recursively. The start rule, which determines the root of every tree in the
population, can be defined using the --rule argument. The --tree-format
option controls the serialization format of the output trees. If omitted, the
default is flatbuffer (producing .grtf files). After the parsing is
completed and the tree is created, various transformers
(--transformer) can be applied to modify the tree before saving it to the
file system using the --out option.
There are two settings that may require further explanation:
--hidden: When using ANTLR to tokenize an input, tokens are sorted into various channels. The hidden channel typically contains tokens that are not important for the parser and are not explicitly listed at every allowed position in the grammar for better readability. Examples of such tokens could be whitespaces or comments. However, when working with parse trees, including when generating tests, these “hidden” tokens may become important. To ensure that hidden tokens are added to the tree, the names of the corresponding rules need to be listed using the--hiddenargument.
--max-depth: Controlling the depth of the generated tree, and therefore the size of the serialized test, is important for both generation and execution performance. This argument allows to set the maximum depth of the tree. Any inputs that exceeds this depth limit will be discarded. The grammarinator-generate utility also has a corresponding setting to guide the generator and avoid generating excessively deep trees.
Convert Population Trees to Test Sources¶
The grammarinator-decode utility supports decoding the tree elements of a
population - whether encoded using pickle, JSON, or FlatBuffers - into test
sources serialized according to the chosen method.
- The CLI of grammarinator-decode
usage: python -m grammarinator.decode [-h] [--glob PATTERN [PATTERN ...]]
[--ext EXT] [-s NAME] [-o DIR]
[--stdout] [--tree-format NAME]
[--encoding NAME]
[--encoding-errors NAME] [-j NUM]
[--sys-path DIR]
[--sys-recursion-limit NUM]
[--log-level LEVEL] [-v] [-q]
[--version]
FILE [FILE ...]
Grammarinator: Decode
positional arguments:
FILE input files or directories to process
options:
-h, --help show this help message and exit
--glob PATTERN [PATTERN ...]
wildcard patterns for input files to process
(supported wildcards: ?, *, **, [])
--ext EXT extension to use when saving decoded trees (default:
.txt).
-s, --serializer NAME
reference to a seralizer (in package.module.function
format) that takes a tree and produces a string from
it.
-o, --out DIR directory to save the test cases (default: /home/docs/
checkouts/readthedocs.org/user_builds/grammarinator/ch
eckouts/stable/docs).
--stdout print test cases to stdout (alias for --out='').
--tree-format NAME format of the saved trees (choices: flatbuffers, json,
pickle, default: pickle)
--encoding NAME output file encoding (default: utf-8).
--encoding-errors NAME
encoding error handling scheme (default: strict).
-j, --jobs NUM parallelization level (default: number of cpu cores
(2)).
--sys-path DIR add directory to the search path for Python modules
(may be specified multiple times)
--sys-recursion-limit NUM
override maximum depth of the Python interpreter stack
(default: 1000)
--log-level LEVEL verbosity level of diagnostic messages (TRACE, DEBUG,
INFO, WARNING, ERROR, CRITICAL, DISABLE; default:
INFO)
-v, --verbose verbose mode (alias for --log-level DEBUG)
-q, --quiet quiet mode (alias for --log-level DISABLE)
--version show program's version number and exit
The tool decodes tree files and serializes them to test cases.
grammarinator-decode processes a set of tree inputs and creates a
test representation from them. Inputs can be listed as files or directories
(FILE), or specified with file patterns (using --glob). The listed
directories are traversed recursively.
First, the files are converted to trees using the appropriate tree codec
specified by --tree-format. The resulting trees are then serialized using
the function defined by --serializer (or str by default). The
serialized tests are saved into the --out directory with the --ext
extension and encoded with --encoding.
The decoder functionality can be created not only in Python, but also
in C++ using serializers written in C++. For this, the --decode argument
has to be provided to the build script. When
converting an output corpus generated by either the
libFuzzer integration or the
AFL++ integration, it is recommended to use
these C++ decoders. When built with the same configuration, they will reproduce
exactly the same test cases that were observed during fuzzing.