Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testcase Uniqueness #14

Open
Thiefyface opened this issue Nov 8, 2019 · 2 comments
Open

Testcase Uniqueness #14

Thiefyface opened this issue Nov 8, 2019 · 2 comments

Comments

@Thiefyface
Copy link

Heya, so I was testing out the Json grammer fuzzing and noticed alot of duplication in the output testcases. It may have been something I was doing wrong but I was following the instructions on the Readme.md.

Testcase Generation I was using (standard Unlexer/Unparser):
grammarinator-generate -l JSONUnlexer.py -p JSONUnparser.py -d 10 -n 1000000 -o json_fuzzer_test2

The above would finish incredibly fast at ~37 seconds for the 1 million testcases, which was awesome, but when testing the uniqueness for a smaller sample via "for i in `ls` ; do md5sum $i >> hashes.txt; done", I got the following:

~grammarinator/json_fuzz/# wc -l hashes.txt
133614 hashes.txt
~/grammarinator/json_fuzz/# cat hashes.txt | cut -d " " -f 1 | sort -u | wc -l
29594

Anyways, I wrote a patch for getting grammarinator-generate to produce unique testcases that can be found at the below. It's just a hack and could probably done better as to reduce the runtime cost. As it stands, the runtime is significantly increased, but it seems like the testcases are unique:

time grammarinator-generate -l JSONUnlexer.py -p JSONUnparser.py -d 10 -n 1000000 -o json_fuzzer_test2
real    41m58.709s
user    5m4.388s
sys     69m52.101s

/json_fuzzer_test2# cat hashes.txt | wc -l
76184
/json_fuzzer_test2# cat hashes.txt | cut -d " " -f 1 | sort -u | wc -l
76184
18,19d17
< import hashlib
<
21c19
< from multiprocessing import Pool, Manager, Lock
---
> from multiprocessing import Pool
56,59c54
<                  cleanup=True, encoding='utf-8', shared_dict={}, shared_lock = None):
<
<         self.shared_dict = shared_dict
<         self.shared_lock = shared_lock
---
>                  cleanup=True, encoding='utf-8'):
147a143,144
>         with codecs.open(test_fn, 'w', self.encoding) as f:
>             f.write(str(Generator.transform(tree.root, self.test_transformers)))
149,163c146
<         output = str(Generator.transform(tree.root, self.test_transformers))
<         output_hash = hashlib.md5(output.encode('utf-8')).digest()
<          
<         try:
<             with self.shared_lock:
<                 _ = self.shared_dict[output_hash]
<             return self.create_new_test()
<         except KeyError:
<             with self.shared_lock:
<                 self.shared_dict[output_hash] = 1
<
<             with codecs.open(test_fn, 'w', self.encoding) as f:
<                 f.write(output)
<
<             return test_fn, tree_fn
---
>         return test_fn, tree_fn
302,305d284
<     sync_manager = Manager()
<     shared_dict_ = sync_manager.dict()
<     lock = sync_manager.Lock()
<
310c289
<                    cleanup=False, encoding=args.encoding, shared_dict=shared_dict_, shared_lock = lock) as generator:
---
>                    cleanup=False, encoding=args.encoding) as generator:
@renatahodovan
Copy link
Owner

@Thiefyface Thanks for the report, I'll look into this.

@Tejas2805
Copy link

@Thiefyface I wished to understand something. I need to generate test cases for bnf grammar. So this is what I am using:

gramminator-process bnf.g4

I get the Unlexer and Unparser python files.

Then I use:

gramminator-generate -l unlexerfile -p unparser file

I am not sure how to feed in the grammar that needs to be followed to generate the test cases. Can guide?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants