-
Notifications
You must be signed in to change notification settings - Fork 0
Neural Networks
A neural network trained using a backpropagation algorithm uses the input to predict the output.
Synapses connect two layers of the network together. The values on these Synapses are called weights.
A Sigmoid function distributes the input values between 0 and 1 i,e it is a normalising function. We use it to convert numbers to probablities. The slope of the sigmoid function is derived as follows.
Random numbers are generated by some algorithm (a function with a very large period). Hence although it appears to be a random sequence, compared to the entire period these generated sequence of numbers always follow a pattern. A seed is the starting point for the generation of random numbers. Using the same seed generates the same sequence of the random numbers. Which makes the sequence very deterministic. This property is useful while debugging. If we are running an experiment with random values and we need to run the same experiment multiple times with the same sequence of random values, we should use the same seed.
numpy.random.random is an alias for numpy.random.random_sample
(https://stackoverflow.com/questions/47231852/np-random-rand-vs-np-random-random)
xrange() is more memory efficient that range() while looping and hence faster than the latter.
Since we are training the entire input dataset at the same time, it is a "full batch training".
The weights represent predictions. A large weight signifies a very confident prediction, the idea of this algorithm is to preserve these highly confident predictions and erase(or reduce) the not so confident predictions. This is done by passing the error function as input to the sigmoid function.
A nonlinear pattern in the dataset means there isn't a one-one relationship between the inputs and the outputs but there will be a one-one relationship between a combination of inputs.
We give each input a weight (which can be positive or negative). The higher the weight, the higher is the effect of that particular input on the output.
The "Error Weighted Derivative Formula" helps to decide how much we need to adjust the weights in each training cycle.
Weight += error * input * SigmoidCurveGradient(output)
Why this formula? First we want to make the adjustment proportional to the size of the error. Secondly, we multiply by the input, which is either a 0 or a 1. If the input is 0, the weight isn’t adjusted. Finally, we multiply by the gradient of the Sigmoid curve (Diagram 4). To understand this last one, consider that:
- We used the Sigmoid curve to calculate the output of the neuron.
- If the output is a large positive or negative number, it signifies the neuron was quite confident one way or another.
- From the graph of the Sigmoid function, we can see that at large numbers, the Sigmoid curve has a shallow gradient.
- If the neuron is confident that the existing weight is correct, it doesn’t want to adjust it very much. Multiplying by the Sigmoid curve gradient achieves this (since the weight increment will be reduced when the slope is shallow).