← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Deep Learning

BERT and Transformers

Topic: NLP

Advertisement

BERT Architecture

Bidirectional Encoder Representations.

Pre-training

Masked Language Modeling (MLM). Next Sentence Prediction. Deep bidirectional.

Fine-Tuning

Add task head. Train end-to-end. Works for classification, QA.

Variants

RoBERTa: more data, better training. ALBERT: parameter sharing. DistilBERT: knowledge distillation.

Key Takeaways

  1. BERT is bidirectional transformer
  2. MLM pre-training
  3. Many variants improve on BERT

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →