-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from m-zakeri/r0.3.0
R0.3.0
- Loading branch information
Showing
90 changed files
with
2,474 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,38 @@ Fuzz testing (Fuzzing) is a dynamic software testing technique. In this techniqu | |
In this thesis, we proposed an automated method for hybrid test data generation. To this aim, we apply neural language models (NLMs) that are constructed by recurrent neural networks (RNNs). The proposed models by using deep learning techniques can learn the statistical structure of complex files and then generate new textual test data, based on the grammar, and binary data, based on mutations. Fuzzing the generated data is done by two newly introduced algorithms, called neural fuzz algorithms that use these models. We use our proposed method to generate test data, and then fuzz testing of MuPDF complicated software which takes portable document format (PDF) files as input. To train our generative models, we gathered a large corpus of PDF files. Our experiments demonstrate that the data generated by this method leads to an increase in the code coverage, more than 7%, compared to state-of-the-art file format fuzzers such as American fuzzy lop (AFL). Experiments also indicate a better learning accuracy of simpler NLMS in comparison with the more complicated encoder-decoder model and confirm that our proposed models can outperform the encoder-decoder model in code coverage when fuzzing the SUT. | ||
|
||
|
||
## Getting Started | ||
In the current release (0.3.0) you can use IUST-DeepFuzz for test data generation and then fuzzing every application. | ||
|
||
### Install | ||
You need to have Python 3.6.x and and up-to-date TensorFlow and Keras frameworks on your computer. | ||
* Install [Python 3.6.x](https://www.python.org/) | ||
* Install [TensorFlow](https://www.tensorflow.org/) | ||
* Install [Keras](https://keras.io/) | ||
* Clone the IUST-DeepFuzz repository: `git clone https://github.com/m-zakeri/iust_deep_fuzz.git` or download the latest version https://github.com/m-zakeri/iust_deep_fuzz.git | ||
* IUST-DeepFuzz is almost ready for test data generation! | ||
|
||
### Running | ||
* Configure the `config.py` work with your dataset and to set other paths settings. | ||
* Find the script of specific algorithm that you need. | ||
* Run the script in command line: `python script_name.py` | ||
* Wait until your file format learn and your test data is generate! | ||
|
||
#### Available Pre-trained Models | ||
A pre-trained model is a model that was trained on a large benchmark dataset to solve a problem similar to the one that we want to solve. For the time being, we provided some pre-trained model for PDF file format. Our best trained model is available at [model_checkpoint/best_models](model_checkpoint/best_models) | ||
|
||
#### Availbale Fuzzing Scripts | ||
ISUT-DeepFuzz has implemented four new deep models and two new fuzz algorithms: DataNeuralFuzz and MetadataNeuralFuzz as our contribution in mentioned thesis. The following algorithms to generate and fuzz test data are available in the current release (r0.3.0): | ||
* `data_neural_fuzz.py`: To implement the DataNeuralFuzz algorithm for fuzzing data in the files. | ||
* `metadata_neural_fuzz.py`: To implement MetadataNeuralFuzz for fuzzing metadata in the files. | ||
* `learn_and_fuzz_3_sample_fuzz.py`: To implement SampleFuzz algorithm introduced in https://arxiv.org/abs/1701.07232. | ||
|
||
#### Available Dataset | ||
Various file format for learning with IUST-DeepFuzz and then fuzz testing is available at [dataset directory](dataset). | ||
|
||
|
||
## How It Works? | ||
|
||
### The PDF File Generation Process | ||
![amazing_test_data_generation_process](docs/figs/amazing_test_data_generation_process.gif) | ||
|
||
|
@@ -20,13 +52,6 @@ In this thesis, we proposed an automated method for hybrid test data generation. | |
|
||
|
||
|
||
## About | ||
### Version 0.1 | ||
The main purpose of this version is to implement a free version of learn and fuzz paper and improve the **learn\&fuzz algorithm**. | ||
|
||
### Version 0.2 | ||
This version implements four new deep models and two new fuzz algorithms: DataNeuralFuzz and MetadataNeuralFuzz as our contribution in mentioned thesis. | ||
|
||
### FAQs | ||
This repository is under *active development* and it dose not documented well. If you have downloaded source code or have forked it and have any questions, then feel free to email me (*[email protected]*) and get more information. You may see the main [references](REFERENCES.md) or look at our large [test corpus](dataset). | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.