Word Respresentation

How to encode words (the simple way)

Onehot
- Doesn't encode any similarity relationships between words Want apple to be a bit like bananna
- There are a lot of words! So each word has to be a vector the length of all possible words

But we want a denser method, that encodes similarities.

Cluster words based on usage
Meanings best represented in a continuous space (instead of traditional discrete meaning)

Use a language corpus (a sample of how language is used) to determine 'meaning'

Therefore we can get some 'Co-occurrence' based representations.

Define a context of a word as the n words around it (left and right)
- Store as a matrix
- N words gives a N-dimensional 'usage' space
- Use PCA or SVD to reduce dimension
- Each word is now a point in a lower dimensional space

Word2Vec

Iterate through each word in a corpus, progressively learn a probability model to predict words around it
Use a maximum likelihood approach
No weights, only the vectors themselves
Dot product produces similarities between two vectors (but with a bias twowards large magnitudes)
- if w1 and w2 have similar context words then maximising both their similarity and their context words maximises w1 and w2

This was very powerful because it clusters similar meanings together, and made semantic relationships v_woman - v_man gives a semantic change from man to woman (so apply to king to get queen!)

GloVe then made this a key feature to achieve both better.

Focused on probability ratios

Words close together can be antonyms as well as synonyms

Words with multiple meanings may be close to difference groups of words in different 'directions' in embedding space

Embeddings incorporate human biases, present in the training corpus

PreviousInterpretability NextLanguage modelling with recurrent networks

Last updated 4 years ago

Was this helpful?