Garth Wales
  • Introduction
  • META
  • To be added elsewhere
  • University
    • 400 Project Research
    • Aims
    • Presentation
    • Report
    • Work Diary
  • Deep Declarative Networks
    • Potential avenues
    • Work Diary
    • Deep Declarative Networks
    • Image Segmentation
  • shortcuts / syntax / setup
    • Jupyter
    • SSH
    • Python
    • CUDA
    • Screen
    • Markdown Syntax
    • Git
    • Dendron
    • Visual Studio Code
    • LaTeX
    • Windows
    • Bash
    • Vim
  • Machine Learning
    • Markov decision process
    • Reinforcement learning
    • Deep Q Network
    • Modern hopfield network
    • Object Detection
    • Face detection
    • Activation Functions
    • Gym
    • RL frameworks
    • Hyperparameters
    • Interpretability
  • Reinforcement learning
    • Memory
    • Exploration techniques
    • Agents
  • garth.wales
    • Website redesign
  • Computer Graphics and Vision
    • Image Mosaicing
    • Stereo Vision
    • Rendering
    • Exam papers
  • Neural Networks
    • Introduction
    • Learning
    • Learning theory
    • Stochastic Neural Networks
    • Interpretability
    • Word Respresentation
    • Language modelling with recurrent networks
    • Sequence-to-sequence networks
    • Exam
Powered by GitBook
On this page
  • Recurrent network
  • Unfolding (Backpropagation through time)
  • Evaluate a language model

Was this helpful?

  1. Neural Networks

Language modelling with recurrent networks

Before neural networks there were n-grams

  • Modelled frequency of word sequences from a corpus

    • Struggled with low likelihood words (sparsity problem)

      • Fixed slightly by smoothing and using backoff

        • e.g. add a small value to every count to avoid numerator of zero

        • e.g. 4-gram move to 5-gram

  • These used one-hot representations so no symantic relationships or word embeddings

Recurrent network

current -> hidden -> predicted word ^ + previous step

Recurrent network learnt word embeddings in its hidden layer (similar words / contexts predict similar next words!)

But an Elman network only looks back one time step

Unfolding (Backpropagation through time)

  • Choose a depth D

  • Break corpus into sub-sequences of D+1

  • Same weights for all of the hidden layer and other bits

But as you train, the gradients get smaller as we propagate to earlier words making it hard to capture long distance dependencies in text.

Evaluate a language model

Perplexity Allows you to evalute a model.. maths goes here..

PreviousWord RespresentationNextSequence-to-sequence networks

Last updated 3 years ago

Was this helpful?