Generated file missing tokens #44

kaby76 · 2022-02-20T23:20:28Z

I am trying to test grammars-v4/verilog/verilog/ using Grammarinator. But, I'm getting problems in parsing some generated output. When I look at output from Trees.print(), the tree doesn't seem to contain all the tokens or sometimes more tokens that aren't in the printed tree.

Here is the code that I am executing:

git clone https://github.com/antlr/grammars-v4.git
cd grammars-v4
git checkout ffecfeee601ffc75edbc52845c1509753d6dd4a1
cd verilog/verilog
# Already cloned and build grammarinator from sources.
grammarinator-process VerilogLexer.g4 VerilogParser.g4 -o .
grammarinator-generate VerilogGenerator.VerilogGenerator  --sys-path . -d 15 -n 100 -r source_text --serializer grammarinator.runtime.simple_space_serializer --no-mutate --no-recombine
# Already built a standardized Antlr4 parser driver for the the grammar.
for  i in tests/test_*; do echo $i; ./Generated/bin/Debug/net5.0/Test.exe -file $i; status=$?; if [[ $status != 0 ]]; then break; fi; done

This loops through the various generated tests, parsing each, and stops the loop on a test file that does not parse.

I've assume that Grammarinator would construct a valid CST ("Unparser" tree) and output that. While most tests parse, some do not, and only appear when -d 15 is specified. I've included the --no-mutate and --no-recombine so that the tree is output as is unmodified.

To understand WHY the parse fails, I need to look at the CST constructed prior to serializing the token stream into a generated test. To do that, I modified generate.py after this line with this code:

    print("Index = ")
    print(index)
    tree.print()

I now rerun the grammarinator-generate command and save the human-readable parse trees, and rerun the parser.

Selecting a test that fails, I've noticed that the tree.print() output is not the same as the generated text, and the tokens reported by the standardized Antlr parser.

For example,

Output from tree.print():

...
VERTICAL_BAR
DOLLAR_RANDOM
COMMA
COMMA
SIMPLE_IDENTIFIER
...
Tokens recognized by parser:

...
VERTICAL_BAR
DOLLAR_RANDOM
COMMA
SIMPLE_IDENTIFIER
...

(Note, only one COMMA.)

Relevant sequence in generated file:

| $random , J

(Note, only one COMMA.)

I have noticed other times similar token differences. It seems that

Grammarinator indicates some tokens in the CST that are not being outputted.

Incidentally, I tried to just save the tree using --keep-trees but there is no tool to print out the trees after reading. I tried something like this, but it did not work.

from pydoc import importfile
module = importfile('/full/path/to/trees.py')
module.Trees.print(module.Trees.load("/full/path/to/test_xxx.grt"))

The text was updated successfully, but these errors were encountered:

akosthekiss · 2022-02-21T14:44:17Z

@kaby76 Some quick comments:

I haven't seen anything like missing tokens from the test case before. With simple_space_serializer, the only way not to output a token is when it is empty (i.e., node.src is falsy, see https://github.com/renatahodovan/grammarinator/blob/master/grammarinator/runtime/serializer.py#L21). But a COMMA in your example is ',', which is truthy. So, this should not happen (even if it obviously does in your case).
If you want to tweak things here and there, you might want to tweak the serializer instead of generate.py. Either modify simple_space_serializer (linked above), or write your own serializer and specify it from the command line using the -s or --serializer switch (https://github.com/renatahodovan/grammarinator/blob/master/grammarinator/generate.py#L244). It should be simply a function that gets the root node of the tree and should return a string form of it. You could easily add any debug code there.
Alternatively/additionally, you might also tweak Tree.print to give more details. E.g., at https://github.com/renatahodovan/grammarinator/blob/master/grammarinator/runtime/tree.py#L69, perhaps something like print('%s%s (%s)' % (' ' * indent, node.name, getattr(node, 'src', ''))). (I haven't tested this. It may or may not be useful for you.)
If you want to load saved trees, you can write from grammarinator.runtime import Tree in your script (or directly in the interactive interpreter), assuming that you have grammarinator already installed in your (virtual) environment. Then, Tree.load("path/to/test_xxx.grt") will work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated file missing tokens #44

Generated file missing tokens #44

kaby76 commented Feb 20, 2022 •

edited

Loading

akosthekiss commented Feb 21, 2022

Generated file missing tokens #44

Generated file missing tokens #44

Comments

kaby76 commented Feb 20, 2022 • edited Loading

akosthekiss commented Feb 21, 2022

kaby76 commented Feb 20, 2022 •

edited

Loading