Testcase Uniqueness #14

Thiefyface · 2019-11-08T19:04:56Z

Heya, so I was testing out the Json grammer fuzzing and noticed alot of duplication in the output testcases. It may have been something I was doing wrong but I was following the instructions on the Readme.md.

Testcase Generation I was using (standard Unlexer/Unparser):
grammarinator-generate -l JSONUnlexer.py -p JSONUnparser.py -d 10 -n 1000000 -o json_fuzzer_test2

The above would finish incredibly fast at ~37 seconds for the 1 million testcases, which was awesome, but when testing the uniqueness for a smaller sample via "for i in `ls` ; do md5sum $i >> hashes.txt; done", I got the following:

~grammarinator/json_fuzz/# wc -l hashes.txt
133614 hashes.txt
~/grammarinator/json_fuzz/# cat hashes.txt | cut -d " " -f 1 | sort -u | wc -l
29594

Anyways, I wrote a patch for getting grammarinator-generate to produce unique testcases that can be found at the below. It's just a hack and could probably done better as to reduce the runtime cost. As it stands, the runtime is significantly increased, but it seems like the testcases are unique:

time grammarinator-generate -l JSONUnlexer.py -p JSONUnparser.py -d 10 -n 1000000 -o json_fuzzer_test2
real    41m58.709s
user    5m4.388s
sys     69m52.101s

/json_fuzzer_test2# cat hashes.txt | wc -l
76184
/json_fuzzer_test2# cat hashes.txt | cut -d " " -f 1 | sort -u | wc -l
76184

18,19d17
< import hashlib
<
21c19
< from multiprocessing import Pool, Manager, Lock
---
> from multiprocessing import Pool
56,59c54
<                  cleanup=True, encoding='utf-8', shared_dict={}, shared_lock = None):
<
<         self.shared_dict = shared_dict
<         self.shared_lock = shared_lock
---
>                  cleanup=True, encoding='utf-8'):
147a143,144
>         with codecs.open(test_fn, 'w', self.encoding) as f:
>             f.write(str(Generator.transform(tree.root, self.test_transformers)))
149,163c146
<         output = str(Generator.transform(tree.root, self.test_transformers))
<         output_hash = hashlib.md5(output.encode('utf-8')).digest()
<          
<         try:
<             with self.shared_lock:
<                 _ = self.shared_dict[output_hash]
<             return self.create_new_test()
<         except KeyError:
<             with self.shared_lock:
<                 self.shared_dict[output_hash] = 1
<
<             with codecs.open(test_fn, 'w', self.encoding) as f:
<                 f.write(output)
<
<             return test_fn, tree_fn
---
>         return test_fn, tree_fn
302,305d284
<     sync_manager = Manager()
<     shared_dict_ = sync_manager.dict()
<     lock = sync_manager.Lock()
<
310c289
<                    cleanup=False, encoding=args.encoding, shared_dict=shared_dict_, shared_lock = lock) as generator:
---
>                    cleanup=False, encoding=args.encoding) as generator:

The text was updated successfully, but these errors were encountered:

renatahodovan · 2019-11-13T20:00:51Z

@Thiefyface Thanks for the report, I'll look into this.

Tejas2805 · 2021-02-26T14:53:13Z

@Thiefyface I wished to understand something. I need to generate test cases for bnf grammar. So this is what I am using:

gramminator-process bnf.g4

I get the Unlexer and Unparser python files.

Then I use:

gramminator-generate -l unlexerfile -p unparser file

I am not sure how to feed in the grammar that needs to be followed to generate the test cases. Can guide?

renatahodovan added the enhancement label Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testcase Uniqueness #14

Testcase Uniqueness #14

Thiefyface commented Nov 8, 2019

renatahodovan commented Nov 13, 2019

Tejas2805 commented Feb 26, 2021

Testcase Uniqueness #14

Testcase Uniqueness #14

Comments

Thiefyface commented Nov 8, 2019

renatahodovan commented Nov 13, 2019

Tejas2805 commented Feb 26, 2021