forked from lengstrom/defensive-distillation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
39 lines (26 loc) · 1.53 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Update: From Wieland Brendel - this repository has been modified to fit the API of Foolbox (https://foolbox.readthedocs.io/en/latest/modules/zoo.html).
Update: From Logan Engstrom - this repository has been modified to fit the API for the https://www.robust-ml.org/ project.
----
Update: this repository is out of date. It contains strictly less
useful code than the repository at the following URL:
https://github.com/carlini/nn_robust_attacks
In particular, do not use the l0 attack in this repository; it is
only good at breaking defensive distillation (not other attacks).
----
Defensive Distillation was recently proposed as a defense to
adversarial examples.
Unfortunately, distillation is not secure. We show this in our paper, at
http://nicholas.carlini.com/papers/2016_defensivedistillation.pdf
We strongly believe that research should be reproducible, and so our
releasing the code required to train a baseline model on MNIST, train
a defensively distilled model on MNIST, and attack the defensively
distilled model.
To run the code, you will need Python 3.x with TensorFlow. It will be slow
unless you have a GPU to train on.
Begin by running train_baseline.py and train_distillation.py; that will
create three model files, two of which are useful. They should report
final accuracy around 99.3% +/-0.2%.
To construct adversarial examples, run l0_attack.py passing as argument
either models/baseline models/distilled. This will run the modified l0
adversary on the given model. The success probability should be ~95%
modifying ~35 pixels.