Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammarinator crashes when generating sqlite test cases #214

Open
ahmayun opened this issue Apr 27, 2024 · 1 comment
Open

Grammarinator crashes when generating sqlite test cases #214

ahmayun opened this issue Apr 27, 2024 · 1 comment

Comments

@ahmayun
Copy link

ahmayun commented Apr 27, 2024

I am trying to use grammarinator to generate test cases for sqlite.
I am using the ANTLR grammar for sqlite that is available at the official antlr repo:

First I run:
grammarinator-process examples/grammars/SQLiteLexer.g4 examples/grammars/SQLiteParser.g4 -o examples/fuzzer

Which works fine.

But when I run:
grammarinator-generate SQLiteGenerator.SQLiteGenerator -r sql_stmt -d 20 -o examples/tests/test_%d.sql -n 100 -s SQLiteGenerator.html_space_serializer --sys-path examples/fuzzer/

I often get the following error (Note that it does not always crash but 9/10 times it will):

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/generate.py", line 78, in create_test
    return generator_tool.create(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/tool/generator.py", line 255, in create
    f.write(test)
  File "<frozen codecs>", line 727, in write
  File "<frozen codecs>", line 377, in write
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud85f' in position 878: surrogates not allowed
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ahmad/anaconda3/envs/grammarinator/bin/grammarinator-generate", line 8, in <module>
    sys.exit(execute())
             ^^^^^^^^^
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/generate.py", line 158, in execute
    for _ in pool.imap_unordered(parallel_create_test, count(0) if args.n == inf else range(args.n)):
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/multiprocessing/pool.py", line 873, in next
    raise value
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud85f' in position 878: surrogates not allowed

From my understanding, this means that grammarinator is generating values that can't be encoded as a utf-8 string. Is this an issue with grammarinator or is there a way to handle this that I am not aware of?

Here are some environment details if needed:

$ pip show grammarinator
      Name: grammarinator
      Version: 23.7.post76+gf3ffa71.d20240427
      Summary: Grammarinator: Grammar-based Random Test Generator
      Home-page: https://github.com/renatahodovan/grammarinator
      Author: Renata Hodovan, Akos Kiss
      Author-email: [email protected], [email protected]
      License: BSD
      Location: /home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages
      Requires: antlerinator, antlr4-python3-runtime, autopep8, inators, jinja2, regex
      Required-by: 
$ python -V
      Python 3.12.3
@renatahodovan
Copy link
Owner

The problem is that the grammar enables to generate surrogates as part of some tokens, however the test generator is not prepared to encode them while saving the output to file. To configure the encoding and the error handlers of encoding, you can use the --encoding and the --encoding-errors CLI options of grammarinator-generate. These values will be passed to the encoding and errors parameter of codecs.open so you can set their values accordingly. In this case, I think the simples solution is to provide --encoding-errors=surrogatepass argument to grammarinator-generate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants