Learning Across Modalities
Connect different data types.
Alignment
Align embeddings. CLIP alignment. Contrastive learning.
Fusion
Early fusion. Late fusion. Attention fusion.
Applications
Image-text. Video-audio. 3D-text.
Key Takeaways
- Align modalities
- Contrastive learning
- Multi-modal understanding