Tool API: package grammarinator.tool

class grammarinator.tool.AnnotatedTreeCodec

Bases: TreeCodec

Abstract base class of tree codecs that can encode and decode extra data (i.e., annotations) when converting between trees and bytes.

decode(data)

Decode only the tree from an array of bytes without the associated annotations. Equivalent to calling decode_annotated() and keeping only the first element of the returned tuple.

decode_annotated(data)

Decode a tree and associated annotations from an array of bytes.

Raises NotImplementedError by default.

Parameters:

data (bytes) – The encoded form of a tree and its annotations.

Returns:

Root of the decoded tree, and the decoded annotations.

Return type:

tuple[Rule,object]

encode(root)

Encode a tree without any annotations. Equivalent to calling encode_annotated() with annotations=None.

encode_annotated(root, annotations)

Encode a tree and associated annotations into an array of bytes.

Raises NotImplementedError by default.

Parameters:
  • root (Rule) – Root of the tree to be encoded.

  • annotations (object) – Data to be encoded along the tree. No assumption should be made about the structure or the contents of the data, it should be treated as opaque.

Returns:

The encoded form of the tree and its annotations.

Return type:

bytes

class grammarinator.tool.DefaultGeneratorFactory(generator_class, *, model_class=None, cooldown=1.0, weights=None, lock=None, listener_classes=None)

Bases: GeneratorFactory

The default generator factory implementation. When called, a new generator instance is created backed by a new decision model instance and a set of newly created listener objects is attached.

Parameters:
  • generator_class (type[Generator]) – The class of the generator to instantiate.

  • model_class (type[Model]) – The class of the model to instantiate. The model instance is used to instantiate the generator.

  • cooldown (float) – Cooldown factor. Used to instantiate a CooldownModel wrapper around the model.

  • weights (dict[tuple,float]) – Initial multipliers of alternatives. Used to instantiate a CooldownModel wrapper around the model.

  • lock (multiprocessing.Lock) – Lock object when generating in parallel. Used to instantiate a CooldownModel wrapper around the model.

  • listener_classes (list[type[Listener]]) – List of listener classes to instantiate and attach to the generator.

__call__(limit=None)

Create a new generator instance according to the settings specified for the factory instance and for this method.

Parameters:

limit (RuleSize) – The limit on the depth of the trees and on the number of tokens (number of unlexer rule calls), i.e., it must be possible to finish generation from the selected node so that the overall depth and token count of the tree does not exceed these limits (default: RuleSize. max). Used to instantiate the generator.

Returns:

The created generator instance.

Return type:

Generator

class grammarinator.tool.DefaultPopulation(directory, extension, codec=None)

Bases: Population

File system-based population that saves trees into files in a directory. The selection strategy used for mutation and recombination is purely random.

Parameters:
  • directory (str) – Path to the directory containing the trees.

  • extension (str) – Extension of the files containing the trees.

  • codec (TreeCodec) – Codec used to save trees into files (default: PickleTreeCodec).

add_individual(root, annotations=None, path=None)

Save the tree to a new file. The name of the tree file is determined based on the pathname of the corresponding test case. From the pathname of the test case, the base name is kept up to the first period only. If no file name can be determined, the population class name is used as a fallback. To avoid naming conflicts, a unique identifier is concatenated to the file name.

select_individual()

Randomly select an individual of the population.

class grammarinator.tool.GeneratorFactory(generator_class)

Bases: object

Base class of generator factories. A generator factory is a generalization of a generator class. It has to be a callable that, when called, must return a generator instance. It must also expose some properties of the generator class it generalizes that are required to guide generation or mutation by GeneratorTool.

This factory generalizes a generator class by simply wrapping it and forwarding call operations to instantiations of the wrapped class. Furthermore, generator factories deriving from this base class are guaranteed to expose all the required generator class properties.

Parameters:

generator_class (type[Generator]) – The class of the wrapped generator.

Variables:

_generator_class (type[Generator]) – The class of the wrapped generator.

__call__(limit=None)

Create a new generator instance.

Parameters:

limit (RuleSize) – The limit on the depth of the trees and on the number of tokens (number of unlexer rule calls), i.e., it must be possible to finish generation from the selected node so that the overall depth and token count of the tree does not exceed these limits (default: RuleSize. max). Used to instantiate the generator.

Returns:

The created generator instance.

Return type:

Generator

class grammarinator.tool.GeneratorTool(generator_factory, out_format, lock=None, rule=None, limit=None, population=None, generate=True, mutate=True, recombine=True, keep_trees=False, transformers=None, serializer=None, cleanup=True, encoding='utf-8', errors='strict', dry_run=False)

Bases: object

Tool to create new test cases using the generator produced by grammarinator-process.

Parameters:
  • generator_factory (type[Generator] or GeneratorFactory) – A callable that can produce instances of a generator. It is a generalization of a generator class: it has to instantiate a generator object, and it may also set the decision model and the listeners of the generator as well. It also has to expose some properties of the generator class necessary to guide generation or mutation. In the simplest case, it can be a grammarinator-process-created subclass of Generator, but in more complex scenarios a factory can be used, e.g., an instance of a subclass of GeneratorFactory, like DefaultGeneratorFactory.

  • rule (str) – Name of the rule to start generation from (default: the default rule of the generator).

  • out_format (str) – Test output description. It can be a file path pattern possibly including the %d placeholder which will be replaced by the index of the test case. Otherwise, it can be an empty string, which will result in printing the test case to the stdout (i.e., not saving to file system).

  • lock (multiprocessing.Lock) – Lock object necessary when printing test cases in parallel (optional).

  • limit (RuleSize) – The limit on the depth of the trees and on the number of tokens (number of unlexer rule calls), i.e., it must be possible to finish generation from the selected node so that the overall depth and token count of the tree does not exceed these limits (default: RuleSize. max).

  • population (Population) – Tree pool for mutation and recombination, e.g., an instance of DefaultPopulation.

  • generate (bool) – Enable generating new test cases from scratch, i.e., purely based on grammar.

  • mutate (bool) – Enable mutating existing test cases, i.e., re-generate part of an existing test case based on grammar.

  • recombine (bool) – Enable recombining existing test cases, i.e., replace part of a test case with a compatible part from another test case.

  • keep_trees (bool) – Keep generated trees to participate in further mutations or recombinations (otherwise, only the initial population will be mutated or recombined). It has effect only if population is defined.

  • transformers (list) – List of transformers to be applied to postprocess the generated tree before serializing it.

  • serializer – A seralizer that takes a tree and produces a string from it (default: str). See grammarinator.runtime.simple_space_serializer() for a simple solution that concatenates tokens with spaces.

  • cleanup (bool) – Enable deleting the generated tests at __exit__().

  • encoding (str) – Output file encoding.

  • errors (str) – Encoding error handling scheme.

  • dry_run (bool) – Enable or disable the saving or printing of the result of generation.

__exit__(exc_type, exc_val, exc_tb)

Delete the output directory if the tests were saved to files and if cleanup was enabled.

create(index)

Create new test case with a randomly selected generator method from the available options (i.e., via generate(), mutate(), or recombine()). The generated tree is transformed, serialized and saved according to the parameters used to initialize the current tool object.

Parameters:

index (int) – Index of the test case to be generated.

Returns:

Path to the generated serialized test file. It may be empty if the tool object was initialized with an empty out_format or None if dry_run was enabled, and hence the test file was not saved.

Return type:

str

delete_quantified()

Removes an optional subtree randomly from a quantifier node.

Returns:

The root of the modified tree.

Return type:

Rule

generate(*, rule=None, reserve=None)

Instantiate a new generator and generate a new tree from scratch.

Parameters:
  • rule (str) – Name of the rule to start generation from.

  • reserve (RuleSize) – Size budget that needs to be put in reserve before generating the tree. Practically, deduced from the initially specified limit. (default values: 0, 0)

Returns:

The root of the generated tree.

Return type:

Rule

hoist_rule()

Select an individual of the population to be mutated and select two rule nodes from it which share the same rule name and are in ancestor-descendant relationship making possible for the descendant to replace its ancestor.

Returns:

The root of the hoisted tree.

Return type:

Rule

insert_quantified()

Selects two compatible quantifier nodes from two trees randomly and if the quantifier node of the recipient tree is not full (the number of its children is less than the maximum count), then add one new child to it at a random position from the children of donors quantifier node.

Returns:

The root of the extended tree.

Return type:

Rule

mutate()

Dispatcher method for mutation operators: it picks one operator randomly and creates a new tree with it.

Supported mutation operators: regenerate_rule(), delete_quantified(), replicate_quantified(), shuffle_quantifieds(), hoist_rule()

Returns:

The root of the mutated tree.

Return type:

Rule

recombine()

Dispatcher method for recombination operators: it picks one operator randomly and creates a new tree with it.

Supported recombination operators: replace_node(), insert_quantified()

Returns:

The root of the recombined tree.

Return type:

Rule

regenerate_rule()

Mutate a tree at a random position, i.e., discard and re-generate its sub-tree at a randomly selected node.

Returns:

The root of the mutated tree.

Return type:

Rule

replace_node()

Recombine two trees at random positions where the nodes are compatible with each other (i.e., they share the same node name). One of the trees is called the recipient while the other is the donor. The sub-tree rooted at a random node of the recipient is discarded and replaced by the sub-tree rooted at a random node of the donor.

Returns:

The root of the recombined tree.

Return type:

Rule

replicate_quantified()

Select a quantified sub-tree randomly, replicate it and insert it again if the maximum quantification count is not reached yet.

Returns:

The root of the modified tree.

Return type:

Rule

shuffle_quantifieds()

Select a quantifier node and shuffle its quantified sub-trees.

Returns:

The root of the modified tree.

Return type:

Rule

class grammarinator.tool.JsonTreeCodec(encoding='utf-8')

Bases: TreeCodec

JSON-based tree codec.

Parameters:

encoding (str) – The encoding to use when converting between json-formatted text and bytes (default: utf-8).

class grammarinator.tool.ParserTool(grammars, parser_dir, antlr, population, rule=None, hidden=None, transformers=None, max_depth=inf, cleanup=True, encoding='utf-8', errors='strict')

Bases: object

Tool to parse existing sources and create a tree pool from them. These trees can be reused later by generation.

Parameters:
  • grammars (list[str]) – List of resources (grammars and additional sources) needed to parse the input.

  • parser_dir (str) – Directory where grammars and the generated parser will be placed.

  • antlr (str) – Path to the ANTLR4 tool (Java jar binary).

  • population (Population) – Tree pool where the trees will be saved, e.g., an instance of DefaultPopulation.

  • rule (str) – Name of the rule to start parsing with (default: first parser rule in the grammar).

  • hidden (list[str]) – List of hidden rule names that are expected to be added to the grammar tree (hidden rules are skipped by default).

  • transformers (list) – List of transformers to be applied to postprocess the parsed tree before serializing it.

  • max_depth (int or float) – Maximum depth of trees. Deeper trees are not saved.

  • cleanup (bool) – Boolean to enable the removal of the helper parser resources after processing the inputs.

  • encoding (str) – Encoding of the input file.

  • errors (str) – Encoding error handling scheme.

parse(fn)

Load content from file, parse it to an ANTLR tree, convert it to Grammarinator tree, and save it to population.

Parameters:

fn (str) – Path to the input file.

class grammarinator.tool.PickleTreeCodec

Bases: AnnotatedTreeCodec

Tree codec based on Python’s pickle module.

class grammarinator.tool.ProcessorTool(lang, work_dir=None)

Bases: object

Tool to process ANTLRv4 grammar files, build an internal representation from them and create a generator class that is able to produce textual data according to the grammar files.

Parameters:
  • lang (str) – Language of the generated code (currently, only 'py' is accepted as Python is the only supported language).

  • work_dir (str) – Directory to generate fuzzers into (default: the current working directory).

process(grammars, *, options=None, default_rule=None, encoding='utf-8', errors='strict', lib_dir=None, actions=True, pep8=False)

Perform the four main steps:

  1. Parse the grammar files.

  2. Build an internal representation of the grammar.

  3. Translate the internal representation into a generator source code in the target language.

  4. Save the source code into file.

Parameters:
  • grammars (list[str]) – List of grammar files to produce generator from.

  • options (dict) –

    Options dictionary to override/extend the options set in the grammar. Currenly, the following options are supported:

    1. superClass: Define the ancestor for the current grammar. The generator of this grammar will be inherited from superClass. (default: grammarinator.runtime.Generator)

    2. dot: Define how to handle the . wildcard in the grammar. Three keywords are accepted:

      1. any_ascii_letter: generate any ASCII letters

      2. any_ascii_char: generate any ASCII characters

      3. any_unicode_char: generate any Unicode characters

      (default: any_ascii_char)

  • default_rule (str) – Name of the default rule to start generation from (default: first parser rule in the grammar).

  • encoding (str) – Grammar file encoding.

  • errors (str) – Encoding error handling scheme.

  • lib_dir (str) – Alternative directory to look for grammar imports beside the current working directory.

  • actions (bool) – Boolean to enable grammar actions. If they are disabled then the inline actions and semantic predicates of the input grammar (snippets in {...} and {...}? form) are disregarded (i.e., no code is generated from them).

  • pep8 (bool) – Boolean to enable pep8 to beautify the generated fuzzer source.

class grammarinator.tool.TreeCodec

Bases: object

Abstract base class of tree codecs that convert between trees and bytes.

decode(data)

Decode a tree from an array of bytes.

Raises NotImplementedError by default.

Parameters:

data (bytes) – The encoded form of a tree.

Returns:

Root of the decoded tree.

Return type:

Rule

encode(root)

Encode a tree into an array of bytes.

Raises NotImplementedError by default.

Parameters:

root (Rule) – Root of the tree to be encoded.

Returns:

The encoded form of the tree.

Return type:

bytes