Word Vector Representations
Word embeddings map words to dense vectors.
Word2Vec
Skip-gram: predict context from word. CBOW: predict word from context. Negative sampling speeds up training.
Pre-trained: GoogleNews vectors, GloVe vectors.
Contextual Embeddings
ELMO: bidirectional LSTM. BERT: transformer-based. These give different embeddings per context.
Using Embeddings
gensim.models.Word2Vec. Load pre-trained: KeyedVectors.load_word2vec_format.
Key Takeaways
- Word2Vec and GloVe provide static embeddings
- BERT provides contextual embeddings
- Pre-trained vectors useful for downstream tasks