← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Deep Learning

Attention Mechanism

Topic: Attention

Advertisement

Attention in Neural Networks

Attention focuses on relevant parts of input.

Attention Weights

Compute similarity between query and keys. Softmax to get weights. Weight values by attention weights.

Attention(Q, K, V) = softmax(QK^T/√d_k)V. Scaled dot-product attention.

Types

Self-attion: sequence attends to itself. Multi-head attention: multiple attention mechanisms.

Transformer uses multi-head self-attention.

Applications

Machine translation. Text summarization. Image captioning. Question answering.

Key Takeaways

  1. Attention weights focus on relevant input
  2. Multi-head attention captures multiple relationships
  3. Foundation of modern NLP

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →