Generative Pre-Training
GPT: decoder-only transformers.
GPT-1/2/3/4
GPT-1: 117M params. GPT-2: 1.5B. GPT-3: 175B. GPT-4: multimodal.
Training
Next token prediction. Large-scale text. Few-shot learning.
In-Context Learning
Prompt engineering. Examples in context. No gradient updates.
Key Takeaways
- GPT: auto-regressive decoder
- Few-shot via prompting
- Scale drives capabilities