Learning
Connectionist models are powerful, but how do they learn?
Neuron
v = (Sum of each input and weight) + bias
or dot product of x and w
Feed into activation function for the output
hardlim (0, 1 output)
sigmoid (between 0 and 1 curve)
tanh (-1 and 1 curve)
ReLu (0 and inf)
linear (just output v)
Activation function is for the non-linear behaviour
Each neuron can only approximate the activation function it has
But combined they can approximate non-linear functions!
thus we need multi-layer networks
Perceptron is a single layer neural network
Multi-layer perceptron (or fully connected) can learn many non-linear functions!
Learning
Regression
e.g. fitting x,y values to some unknown curve based on some samples
Evaluate with mean squared error
Classification
e.g. map vectors / images to labels
Binary classification
This or that
Sigmoid is good for emulating a probability between 0 and 1
Multi-class classification
One hot encoding
Output class 1 neuron, class 2 neuron.... feed into softmax to convert outputs into probability distribution
Evaluation
Well either it is correct or incorrect
Accuracy: Fraction of correct labels over N examples
Errors: Fraction of incorrect labels over N examples
Cross entropy: -(Likelihood of model predicting the class labels well)
Works well with sigmoid or softmax outputs
Changing weights
Derivatives can only be calculated on smooth functions
Accuracy and classification error aren't smooth!
MSE and CE are
Adjust weights by using their gradients with respect to the loss function to minimise loss
Should get closer to desired output
Optimisation
Network is essentially a function of inputs, weights and biases
Process of gradually changing the state in order to minimise the loss function.
Stochastic Gradient Descent (sgd)
Optimisers: Adam, RMSProp
Gradient tells us which direction to go, but not how far.
Some optimisers will try different approaches to reach the goal quickly
Backpropagation
Forward computation of output
Backpropagation of gradient/derivative information through the network
Distribution of 'blame' for incorrect output
Last updated
Was this helpful?