Attention Mechanism

Topic: Attention

Attention in Neural Networks

Attention focuses on relevant parts of input.

Compute similarity between query and keys. Softmax to get weights. Weight values by attention weights.

Attention(Q, K, V) = softmax(QK^T/√d_k)V. Scaled dot-product attention.

Self-attion: sequence attends to itself. Multi-head attention: multiple attention mechanisms.

Transformer uses multi-head self-attention.

Machine translation. Text summarization. Image captioning. Question answering.

Get personalized data science help from ChatWhole's AI-powered platform.