Tool API: package `grammarinator.tool`¶

class grammarinator.tool.AnnotatedTreeCodec[source]¶

Bases: TreeCodec

Abstract base class of tree codecs that can encode and decode extra data (i.e., annotations) when converting between trees and bytes.

decode(data)[source]¶

Decode only the tree from an array of bytes without the associated annotations. Equivalent to calling decode_annotated() and keeping only the first element of the returned tuple.

Return type:: Rule | None

decode_annotated(data)[source]¶

Decode a tree and associated annotations from an array of bytes.

Raises NotImplementedError by default.

Parameters:: data (bytes) – The encoded form of a tree and its annotations.
Return type:: tuple[Rule | None, Any]
Returns:: Root of the decoded tree, and the decoded annotations.

encode(root)[source]¶

Encode a tree without any annotations. Equivalent to calling encode_annotated() with annotations=None.

Return type:: bytes

encode_annotated(root, annotations)[source]¶

Encode a tree and associated annotations into an array of bytes.

Raises NotImplementedError by default.

Parameters:

root (Rule) – Root of the tree to be encoded.
annotations (Any) – Data to be encoded along the tree. No assumption should be made about the structure or the contents of the data, it should be treated as opaque.

Return type:

bytes

Returns:

The encoded form of the tree and its annotations.

class grammarinator.tool.DefaultGeneratorFactory(generator_class, *, model_class=None, weights=None, probs=None, listener_classes=None)[source]¶

Bases: GeneratorFactory

The default generator factory implementation. When called, a new generator instance is created backed by a new decision model instance and a set of newly created listener objects is attached.

Parameters:

generator_class (type[Generator]) – The class of the generator to instantiate.
model_class (type[Model] | None) – The class of the model to instantiate. The model instance is used to instantiate the generator.
weights (dict[tuple[str, int, int], float] | None) – Initial multipliers of alternatives. Used to instantiate a WeightedModel wrapper around the model.
probs (dict[tuple[str, int], float] | None) – Initial custom probabilities for quantifiers. Used to instantiate a WeightedModel wrapper around the model.
listener_classes (list[type[Listener]] | None) – List of listener classes to instantiate and attach to the generator.

__call__(limit=None)[source]¶

Create a new generator instance according to the settings specified for the factory instance and for this method.

Parameters:: limit (RuleSize | None) – The limit on the depth of the trees and on the number of tokens (number of unlexer rule calls), i.e., it must be possible to finish generation from the selected node so that the overall depth and token count of the tree does not exceed these limits (default: RuleSize. max). Used to instantiate the generator.
Return type:: Generator
Returns:: The created generator instance.

class grammarinator.tool.FileIndividual(population, name)[source]¶

Bases: Individual

Individual subclass presenting a file-based population individual, which maintains both the tree and the associated annotations. It is responsible for loading and storing the tree and its annotations with the appropriate tree codec in a lazy manner.

Parameters:

population (FilePopulation) – The population this individual belongs to.
name (str) – Path to the encoded tree file.

property root: Rule¶

Get the root of the tree. Return the root if it is already loaded, otherwise load it immediately.

Returns:: The root of the tree.

class grammarinator.tool.FilePopulation(directory, extension, codec=None)[source]¶

Bases: Population

File system-based population that saves trees into files in a directory. The selection strategy used for mutation and recombination is purely random.

Parameters:

directory (str) – Path to the directory containing the trees.
extension (str) – Extension of the files containing the trees.
codec (TreeCodec | None) – Codec used to save trees into files (default: PickleTreeCodec).

add_individual(root, path=None)[source]¶

Save the tree to a new file. The name of the tree file is determined from the basename of the given path, or from the population class name if none is provided. The output file is saved with the appropriate extension defined by the current tree codec.

Return type:: None

empty()[source]¶

Check whether the population contains no individuals.

Return type:: bool

select_individual(recipient=None)[source]¶

Randomly select an individual of the population and create a FileIndividual instance from it.

Parameters:: recipient (Individual | None) – Unused.
Return type:: Individual
Returns:: FileIndividual instance created from a randomly selected population item.

class grammarinator.tool.FlatBuffersTreeCodec(encoding='utf-8', encoding_errors='ignore')[source]¶

Bases: TreeCodec

FlatBuffers-based tree codec.

Parameters:: encoding (str) – The encoding to use when converting between flatbuffers-encoded text and bytes (default: utf-8).

decode(data)[source]¶

Reconstruct a tree from a FlatBuffers representation.

Return type:: Rule | None

encode(root)[source]¶

Create the FlatBuffers representation of a tree.

Return type:: bytes

class grammarinator.tool.GeneratorFactory(generator_class)[source]¶

Bases: object

Base class of generator factories. A generator factory is a generalization of a generator class. It has to be a callable that, when called, must return a generator instance. It must also expose some properties of the generator class it generalizes that are required to guide generation or mutation by GeneratorTool.

This factory generalizes a generator class by simply wrapping it and forwarding call operations to instantiations of the wrapped class. Furthermore, generator factories deriving from this base class are guaranteed to expose all the required generator class properties.

Parameters:: generator_class (type[Generator]) – The class of the wrapped generator.

__call__(limit=None)[source]¶

Create a new generator instance.

Parameters:: limit (RuleSize | None) – The limit on the depth of the trees and on the number of tokens (number of unlexer rule calls), i.e., it must be possible to finish generation from the selected node so that the overall depth and token count of the tree does not exceed these limits (default: RuleSize. max). Used to instantiate the generator.
Return type:: Generator
Returns:: The created generator instance.

class grammarinator.tool.GeneratorTool(generator_factory, out_format, lock=None, rule=None, limit=None, population=None, keep_trees=False, generate=True, mutate=True, recombine=True, unrestricted=True, allowlist=None, blocklist=None, transformers=None, serializer=None, memo_size=0, unique_attempts=2, cleanup=True, encoding='utf-8', errors='strict', dry_run=False)[source]¶

Bases: object

Tool to create new test cases using the generator produced by grammarinator-process.

Parameters:

generator_factory (type[Generator] | GeneratorFactory) – A callable that can produce instances of a generator. It is a generalization of a generator class: it has to instantiate a generator object, and it may also set the decision model and the listeners of the generator as well. It also has to expose some properties of the generator class necessary to guide generation or mutation. In the simplest case, it can be a grammarinator-process-created subclass of Generator, but in more complex scenarios a factory can be used, e.g., an instance of a subclass of GeneratorFactory, like DefaultGeneratorFactory.
rule (str | None) – Name of the rule to start generation from (default: the default rule of the generator).
out_format (str) – Test output description. It can be a file path pattern possibly including the %d placeholder which will be replaced by the index of the test case. Otherwise, it can be an empty string, which will result in printing the test case to the stdout (i.e., not saving to file system).
lock (multiprocessing.Lock | None) – Lock object necessary when printing test cases in parallel (optional).
limit (RuleSize | None) – The limit on the depth of the trees and on the number of tokens (number of unlexer rule calls), i.e., it must be possible to finish generation from the selected node so that the overall depth and token count of the tree does not exceed these limits (default: RuleSize. max).
population (Population | None) – Tree pool for mutation and recombination, e.g., an instance of FilePopulation.
keep_trees (bool) – Keep generated trees to participate in further mutations or recombinations (otherwise, only the initial population will be mutated or recombined). It has effect only if population is defined.
generate (bool) – Enable generating new test cases from scratch, i.e., purely based on grammar.
mutate (bool) – Enable mutating existing test cases, i.e., re-generate part of an existing test case based on grammar.
recombine (bool) – Enable recombining existing test cases, i.e., replace part of a test case with a compatible part from another test case.
unrestricted (bool) – Enable applying possibly grammar-violating creators.
allowlist (list[str] | None) – List of mutators to allow (by default, all mutators are allowed).
blocklist (list[str] | None) – List of mutators to block (by default, no mutators are blocked).
transformers (list[Callable[[Rule], Rule]] | None) – List of transformers to be applied to postprocess the generated tree before serializing it.
serializer (Callable[[Rule], str] | None) – A serializer that takes a tree and produces a string from it (default: str). See grammarinator.runtime.simple_space_serializer() for a simple solution that concatenates tokens with spaces.
memo_size (int) – The number of most recently created unique tests memoized (default: 0).
unique_attempts (int) – The limit on how many times to try to generate a unique (i.e., non-memoized) test case. It has no effect if memo_size is 0 (default: 2).
cleanup (bool) – Enable deleting the generated tests at __exit__().
encoding (str) – Output file encoding.
errors (str) – Encoding error handling scheme.
dry_run (bool) – Enable or disable the saving or printing of the result of generation.

__exit__(exc_type, exc_val, exc_tb)[source]¶: Delete the output directory if the tests were saved to files and if cleanup was enabled.

create()[source]¶

Create a new tree with a randomly selected generator method from the available options (see generate(), mutate(), and recombine()). The generated tree is also transformed according to the parameters used to initialize the current tool object.

Return type:: Rule
Returns:: The root of the created tree.

create_test(index)[source]¶

Create a new test case with a randomly selected generator method from the available options (see generate(), mutate(), and recombine()). The generated tree is transformed, serialized and saved according to the parameters used to initialize the current tool object.

Parameters:: index (int) – Index of the test case to be generated.
Return type:: str | None
Returns:: Path to the generated serialized test file. It may be empty if the tool object was initialized with an empty out_format or None if dry_run was enabled, and hence the test file was not saved.

delete_quantified(individual=None, _=None)[source]¶

Removes an optional subtree randomly from a quantifier node.

Parameters:: individual (Individual | None) – The population item to be mutated.
Return type:: Rule | None
Returns:: The root of the modified tree.

generate(_individual1=None, _individual2=None, *, rule=None, reserve=None)[source]¶

Instantiate a new generator and generate a new tree from scratch.

Parameters:

rule (str | None) – Name of the rule to start generation from.
reserve (RuleSize | None) – Size budget that needs to be put in reserve before generating the tree. Practically, deduced from the initially specified limit. (default values: 0, 0)

Return type:

UnlexerRule | UnparserRule

Returns:

The root of the generated tree.

hoist_rule(individual=None, _=None)[source]¶

Select an individual of the population to be mutated and select two rule nodes from it which share the same rule name and are in ancestor-descendant relationship making possible for the descendant to replace its ancestor.

Parameters:: individual (Individual | None) – The population item to be mutated.
Return type:: Rule | None
Returns:: The root of the hoisted tree.

insert_local_node(individual=None, _=None)[source]¶

Select two compatible quantifier nodes from a single test and insert a random quantified subtree of the second one into the first one at a random position, while the quantifier restrictions are ensured.

Parameters:: individual (Individual | None) – The population item to be mutated
Return type:: Rule | None
Returns:: The root of the mutated tree.

insert_quantified(recipient_individual=None, donor_individual=None)[source]¶

Selects two compatible quantifier nodes from two trees randomly and if the quantifier node of the recipient tree is not full (the number of its children is less than the maximum count), then add one new child to it at a random position from the children of donors quantifier node.

Parameters:

recipient_individual (Individual | None) – The population item to be used as a recipient during crossover.
donor_individual (Individual | None) – The population item to be used as a donor during crossover.

Return type:

Rule | None

Returns:

The root of the extended tree.

mutate(individual=None)[source]¶

Dispatcher method for mutation operators: it picks one operator randomly and creates a new tree by applying the operator to an individual. The generated tree is also transformed according to the parameters used to initialize the current tool object.

Supported mutation operators: regenerate_rule(), delete_quantified(), replicate_quantified(), shuffle_quantifieds(), hoist_rule(), unrestricted_delete(), unrestricted_hoist_rule(), swap_local_nodes(), insert_local_node()

Parameters:: individual (Individual | None) – The population item to be mutated.
Return type:: Rule
Returns:: The root of the mutated tree.

recombine(individual1=None, individual2=None)[source]¶

Dispatcher method for recombination operators: it picks one operator randomly and creates a new tree by applying the operator to an individual. The generated tree is also transformed according to the parameters used to initialize the current tool object.

Supported recombination operators: replace_node(), insert_quantified()

Parameters:

individual1 (Individual | None) – The population item to be used as a recipient during crossover.
individual2 (Individual | None) – The population item to be used as a donor during crossover.

Return type:

Rule

Returns:

The root of the recombined tree.

regenerate_rule(individual=None, _=None)[source]¶

Mutate a tree at a random position, i.e., discard and re-generate its sub-tree at a randomly selected node.

Parameters:: individual (Individual | None) – The population item to be mutated.
Return type:: Rule
Returns:: The root of the mutated tree.

replace_node(recipient_individual=None, donor_individual=None)[source]¶

Recombine two trees at random positions where the nodes are compatible with each other (i.e., they share the same node name). One of the trees is called the recipient while the other is the donor. The sub-tree rooted at a random node of the recipient is discarded and replaced by the sub-tree rooted at a random node of the donor.

Parameters:

recipient_individual (Individual | None) – The population item to be used as a recipient during crossover.
donor_individual (Individual | None) – The population item to be used as a donor during crossover.

Return type:

Rule | None

Returns:

The root of the recombined tree.

replicate_quantified(individual=None, _=None)[source]¶

Select a quantified sub-tree randomly, replicate it and insert it again if the maximum quantification count is not reached yet.

Parameters:: individual (Individual | None) – The population item to be mutated.
Return type:: Rule | None
Returns:: The root of the modified tree.

shuffle_quantifieds(individual=None, _=None)[source]¶

Select a quantifier node and shuffle its quantified sub-trees.

Parameters:: individual (Individual | None) – The population item to be mutated.
Return type:: Rule | None
Returns:: The root of the modified tree.

swap_local_nodes(individual=None, _=None)[source]¶

Swap two non-overlapping subtrees at random positions in a single test where the nodes are compatible with each other (i.e., they share the same node name).

Parameters:: individual (Individual | None) – The population item to be mutated
Return type:: Rule | None
Returns:: The root of the mutated tree.

unrestricted_delete(individual=None, _=None)[source]¶

Remove a subtree rooted in any kind of rule node randomly without any further restriction.

Parameters:: individual (Individual | None) – The population item to be mutated.
Return type:: Rule | None
Returns:: The root of the modified tree.

unrestricted_hoist_rule(individual=None, _=None)[source]¶

Select two rule nodes from the input individual which are in ancestor-descendant relationship (without type compatibility check) and replace the ancestor with the selected descendant.

Parameters:: individual (Individual | None) – The population item to be mutated.
Return type:: Rule | None
Returns:: The root of the modified tree.

class grammarinator.tool.JsonTreeCodec(encoding='utf-8', encoding_errors='surrogatepass')[source]¶

Bases: TreeCodec

JSON-based tree codec.

Parameters:: encoding (str) – The encoding to use when converting between json-formatted text and bytes (default: utf-8).

decode(data)[source]¶

Reconstruct a tree from a JSON representation stored in an array of bytes using the specified encoding.

Return type:: Rule | None

encode(root)[source]¶

Create the JSON representation of a tree and convert it to an array of bytes using the specified encoding.

Return type:: bytes

class grammarinator.tool.ParserTool(grammars, parser_dir, antlr, population, rule=None, hidden=None, transformers=None, max_depth=inf, strict=False, lib_dir=None, cleanup=True, encoding='utf-8', errors='strict')[source]¶

Bases: object

Tool to parse existing sources and create a tree pool from them. These trees can be reused later by generation.

Parameters:

grammars (list[str]) – List of resources (grammars and additional sources) needed to parse the input.
parser_dir (str) – Directory where grammars and the generated parser will be placed.
antlr (str) – Path to the ANTLR4 tool (Java jar binary).
population (Population | None) – Tree pool where the trees will be saved, e.g., an instance of FilePopulation.
rule (str | None) – Name of the rule to start parsing with (default: first parser rule in the grammar).
hidden (list[str] | None) – List of hidden rule names that are expected to be added to the grammar tree (hidden rules are skipped by default).
transformers (list[Callable[[Rule], Rule]] | None) – List of transformers to be applied to postprocess the parsed tree before serializing it.
max_depth (int | float) – Maximum depth of trees. Deeper trees are not saved.
strict (bool) – Tests that contain syntax errors are discarded.
lib_dir (str | None) – Alternative directory to look for grammar imports beside the current working directory.
cleanup (bool) – Boolean to enable the removal of the helper parser resources after processing the inputs.
encoding (str) – Encoding of the input file.
errors (str) – Encoding error handling scheme.

parse(fn)[source]¶

Load content from file, parse it to an ANTLR tree, convert it to Grammarinator tree, and save it to population.

Parameters:: fn (str) – Path to the input file.
Return type:: None

class grammarinator.tool.PickleTreeCodec[source]¶

Bases: AnnotatedTreeCodec

Tree codec based on Python’s pickle module.

decode_annotated(data)[source]¶

Unpickle a tree and associated annotations from an array of bytes.

Return type:: tuple[Rule | None, Any]

encode_annotated(root, annotations)[source]¶

Pickle a tree and associated annotations into an array of bytes.

Return type:: bytes

class grammarinator.tool.ProcessorTool(lang, work_dir=None)[source]¶

Bases: object

Tool to process ANTLRv4 grammar files, build an internal representation from them and create a generator class that is able to produce textual data according to the grammar files.

Parameters:

lang (str) – Language of the generated code (currently, 'py' and 'hpp' are accepted).
work_dir (str | None) – Directory to generate fuzzers into (default: the current working directory).

process(grammars, *, options=None, default_rule=None, encoding='utf-8', errors='strict', lib_dir=None, actions=True, pep8=False)[source]¶

Perform the four main steps:

Parse the grammar files.
Build an internal representation of the grammar.
Translate the internal representation into a generator source code in the target language.
Save the source code into file.

Parameters:

grammars (list[str]) – List of grammar files to produce generator from.
options (dict | None) –
Options dictionary to override/extend the options set in the grammar. Currenly, the following options are supported:
- superClass: Define the ancestor for the current grammar. The generator of this grammar will be inherited from superClass. (default: grammarinator.runtime.Generator)
- dot: Define how to handle the . wildcard in the grammar. Three keywords are accepted:
  1. any_ascii_letter: generate any ASCII letters
  2. any_ascii_char: generate any ASCII characters
  3. any_unicode_char: generate any Unicode characters
  (default: any_ascii_char)
- virtual: Define whether methods of the created generator class should use dynamic dispatch and thus be overridable in derived classes. Possible values are true or false. (default: false in C++ code generation; ignored in Python)
default_rule (str | None) – Name of the default rule to start generation from (default: first parser rule in the grammar).
encoding (str) – Grammar file encoding.
errors (str) – Encoding error handling scheme.
lib_dir (str | None) – Alternative directory to look for grammar imports beside the current working directory.
actions (bool) – Boolean to enable grammar actions. If they are disabled then the inline actions and semantic predicates of the input grammar (snippets in {...} and {...}? form) are disregarded (i.e., no code is generated from them).
pep8 (bool) – Boolean to enable pep8 to beautify the generated fuzzer source (only if the language of the generated code is Python).

Return type: