Describe Images with Text
Generate image descriptions.
Architecture
Encoder-decoder. CNN for image. RNN/LM for text. Attention.
Show and Tell
NIC model. CNN + LSTM.
Show, Attend and Tell
Attention on image regions.
Key Takeaways
- CNN encoder + LSTM decoder
- Attention helps
- Beam search for better captions