Learning

Connectionist models are powerful, but how do they learn?

Neuron

v = (Sum of each input and weight) + bias
- or dot product of x and w
Feed into activation function for the output
- hardlim (0, 1 output)
- sigmoid (between 0 and 1 curve)
- tanh (-1 and 1 curve)
- ReLu (0 and inf)
- linear (just output v)
Activation function is for the non-linear behaviour
- Each neuron can only approximate the activation function it has
- But combined they can approximate non-linear functions!
  - thus we need multi-layer networks

Perceptron is a single layer neural network

Multi-layer perceptron (or fully connected) can learn many non-linear functions!

e.g. fitting x,y values to some unknown curve based on some samples

e.g. map vectors / images to labels

This or that

One hot encoding
Output class 1 neuron, class 2 neuron.... feed into softmax to convert outputs into probability distribution

Well either it is correct or incorrect
Accuracy: Fraction of correct labels over N examples
Errors: Fraction of incorrect labels over N examples
Cross entropy: -(Likelihood of model predicting the class labels well)
- Works well with sigmoid or softmax outputs

Derivatives can only be calculated on smooth functions
- Accuracy and classification error aren't smooth!
- MSE and CE are
Adjust weights by using their gradients with respect to the loss function to minimise loss
- Should get closer to desired output

Network is essentially a function of inputs, weights and biases

Process of gradually changing the state in order to minimise the loss function.

Gradient tells us which direction to go, but not how far.

Forward computation of output
Backpropagation of gradient/derivative information through the network
- Distribution of 'blame' for incorrect output

Last updated 4 years ago

Was this helpful?