Test Generation¶
After constructing a fuzzer and, if desired,
creating a population, the grammarinator-generate
utility or the grammarinator-generate-<name> binaries can be used to
generate test cases based on the format specified in the input grammar.
Test generation is performed by creators, which implement various strategies such as generating new trees, mutating existing ones, or recombining inputs.
The available creators are listed below.
Creator |
Description |
|---|---|
Generate a new tree from scratch. |
|
Regenerate a randomly selected subtree of an existing test case. |
|
Randomly remove an optional quantified subtree. |
|
Replicate a repeatable subtree multiple times at random sibling positions. |
|
Shuffle the subtrees of a randomly chosen quantifier node. |
|
Select two nodes with the same name in an ancestor-descendant relationship and replace the ancestor with the descendant. |
|
Randomly swap two compatible, non-overlapping subtrees within a single test case. |
|
Insert a randomly chosen quantified subtree into another compatible quantifier node, while preserving quantifier limits. |
|
Randomly delete a subtree rooted at any rule node, without additional constraints. |
|
Replace an ancestor rule node with one of its descendants, without compatibility checks. |
|
Recombine two trees at random compatible node positions (sharing the same name) by replacing a recipient subtree with a donor subtree. Not supported by the AFL++ integration. |
|
Select two compatible quantifier nodes from two trees. If the recipient quantifier is not full, insert a random child from the donor quantifier at a random position. Not supported by the AFL++ integration. |
|
|
LibFuzzer integration-only mutator that applies libFuzzer’s built-in random mutations on mutable token nodes. |
|
AFL++ integration-only mutator that replaces a random subtree with a compatible one chosen from a subtree pool constructed from previously generated trees. |
|
AFL++ integration-only mutator that inserts a compatible subtree into a quantifier node, selecting it from a subtree pool built from previously generated trees. |
Python-based CLI¶
- The CLI of grammarinator-generate
usage: python -m grammarinator.generate [-h] [-r NAME] [-m NAME] [-l NAME]
[-t NAME] [-s NAME] [-d NUM]
[--max-tokens NUM] [-w FILE]
[--population DIR] [--no-generate]
[--no-mutate] [--no-recombine]
[--no-grammar-violations]
[--allowlist LIST] [--blocklist LIST]
[--keep-trees] [--tree-format NAME]
[-o FILE] [--stdout] [-n NUM]
[--memo-size NUM]
[--unique-attempts NUM]
[--random-seed NUM] [--dry-run]
[--encoding NAME]
[--encoding-errors NAME] [-j NUM]
[--sys-path DIR]
[--sys-recursion-limit NUM]
[--log-level LEVEL] [-v] [-q]
[--version]
NAME
Grammarinator: Generate
positional arguments:
NAME reference to the generator created by grammarinator-
process (in package.module.class format).
options:
-h, --help show this help message and exit
-r, --rule NAME name of the rule to start generation from (default:
the parser rule set by grammarinator-process).
-m, --model NAME reference to the decision model (in
package.module.class format) (default:
grammarinator.runtime.DefaultModel).
-l, --listener NAME reference to a listener (in package.module.class
format).
-t, --transformer NAME
reference to a transformer (in package.module.function
format) to postprocess the generated tree (the result
of these transformers will be saved into the
serialized tree, e.g., variable matching).
-s, --serializer NAME
reference to a seralizer (in package.module.function
format) that takes a tree and produces a string from
it.
-d, --max-depth NUM maximum recursion depth during generation (default:
inf).
--max-tokens NUM maximum token number during generation (default: inf).
-w, --weights FILE JSON file defining custom weights for alternatives and
quantifiers.
--population DIR directory of grammarinator tree pool.
--no-generate disable test generation from grammar.
--no-mutate disable test generation by mutation (disabled by
default if no population is given).
--no-recombine disable test generation by recombination (disabled by
default if no population is given).
--no-grammar-violations
disable applying grammar-violating mutators (enabled
by default)
--allowlist LIST comma-separated list of mutators to allow (by default,
all mutators are allowed).
--blocklist LIST comma-separated list of mutators to block (by default,
no mutators are blocked).
--keep-trees keep generated tests to participate in further
mutations or recombinations (only if population is
given).
--tree-format NAME format of the saved trees (choices: flatbuffers, json,
pickle, default: pickle)
-o, --out FILE output file name pattern (default: /home/docs/checkout
s/readthedocs.org/user_builds/grammarinator/checkouts/
stable/docs/tests/test_%d).
--stdout print test cases to stdout (alias for --out='')
-n NUM number of tests to generate, 'inf' for continuous
generation (default: 1).
--memo-size NUM memoize the last NUM unique tests; if a memoized test
case is generated again, it is discarded and
generation of a unique test case is retried (default:
0).
--unique-attempts NUM
limit on how many times to try to generate a unique
(i.e., non-memoized) test case; no effect if --memo-
size=0 (default: 2).
--random-seed NUM initialize random number generator with fixed seed
(not set by default).
--dry-run generate tests without writing them to file or
printing to stdout (do not keep generated tests in
population either)
--encoding NAME output file encoding (default: utf-8).
--encoding-errors NAME
encoding error handling scheme (default: strict).
-j, --jobs NUM parallelization level (default: number of cpu cores
(2)).
--sys-path DIR add directory to the search path for Python modules
(may be specified multiple times)
--sys-recursion-limit NUM
override maximum depth of the Python interpreter stack
(default: 1000)
--log-level LEVEL verbosity level of diagnostic messages (TRACE, DEBUG,
INFO, WARNING, ERROR, CRITICAL, DISABLE; default:
INFO)
-v, --verbose verbose mode (alias for --log-level DEBUG)
-q, --quiet quiet mode (alias for --log-level DISABLE)
--version show program's version number and exit
The tool acts as a default execution harness for generators created by
Grammarinator:Processor.
The grammarinator-generate utility requires a mandatory parameter which is
the reference to the generator class in the package.module.class format.
This means that the module (or package) of the class must be on the search
path for module files. If that is not the case, the appropriate directory
can be added to PYTHONPATH or specified using the --sys-path command
line argument.
The generation process starts from the rule specified by the --rule
argument, or from the first parser rule if the --rule argument is not
provided.
The number of output test cases to be generated can be specified using the
-n argument. The default value is 1, and it accepts both integers and
inf for continuous generation.
The depth of the generated tree can be controlled using the --max-depth
argument. If the generation cannot be performed within the provided depth,
an error will be raised.
The number of the output tokens (more precisiely, the number of the unlexer
rule calls during the generation) can be controlled using the --max-token
argument. If the generation cannot be performed within the provided token
count, an error will be raised.
--max-depth and --max-token can be defined at the same time. If
any of them is too strict by itself making the generation impossible, then
an error will be raised. However, if both of them are permissive enough
separately, but they are too strict together, then the limits will be
automatically updated to the minimal value that makes the generation possible.
The output test cases can be written to the file system using the --out
argument, which allows the definition of the output file path. The %d
wildcard can be used as a placeholder for the test case index, which will be
substituted by the generator. Alternatively, if the --stdout argument is
provided, the test cases will be printed to the standard output.
The behavior of the generator can be customized using models
(--model and --weights), listeners (--listener),
transformers (--transform), and
serializers (--serialize).
If a directory containing Grammarinator trees is specified using the
--population argument and it is not empty, the utility enables the
mutate() and
recombine() operators for evolutionary
generation. If both --keep-trees and --population are set, the
generated trees will be saved to the population directory, allowing for
multiple modifications to be applied to the population items. Additionally,
if the population directory is empty but the --keep-trees argument is set,
the generate() method will initialize
the population, enabling the use of mutation and recombination operators later
on.
To disable specific creator types, the --no-generate, --no-mutate,
--no-recombine or no-grammar-violations arguments can be used. To
enable or disable specific operators, use the --allowlist or
--blocklist arguments with the appropriate names from the creators table
above. Those creators will be enabled that are in the allowlist but not in
the blocklist. As default, allowlist contains all the possible creators and
blocklist is empty.
To decrease to number of potential duplicated test generations, use the
memoization functionality, which maintains a cache with parametrizable item
count (--memo-size) and gives at most --unique-attempts to every
generation to produce an outcome not in the memo cache.
C++-based CLI¶
In addition to the Python command line tool, Grammarinator also supports test
generation via compiled C++ executables. These are generated using the
grammarinator-process tool with the --language hpp option, followed by
building the generator.
Once built, a standalone executable is created with the name
grammarinator-generate-<name>, where <name> corresponds to the
generator class (e.g., grammarinator-generate-html).
The C++ binary exposes a command-line interface that is similar to
grammarinator-generate in terms of supported command line arguments
(max depth, max tokens, output pattern, population directory, etc.), but with
one key difference:
In the C++ backend, the generator class, model class,
listeners, serializer and
transformer are statically compiled into the binary.
These cannot be specified at runtime – the binary is fully self-contained in
this regard. The rest of the generation parameters seen in
grammarinator-generate, such as output count, maximum depth, population
settings, etc., can be configured dynamically.
Example usage of a C++ generator:
grammarinator-cxx/build/Release/bin/grammarinator-generate-html \
-r htmlDocument -d 20 \
-o examples/tests/test_%d.html -n 100
This command generates 100 HTML test cases using the htmlDocument start
rule, up to a maximum derivation depth of 20, and writes the results to the
examples/tests/ directory using a formatted file name.
Notes
If you need to change the generator logic (e.g., use a different serializer or transformer), a new C++ binary must be rebuilt with those components linked in.
C++ binaries are useful in performance-critical or standalone fuzzing setups, including native integration with libFuzzer and integration with AFL++.