NER Fundamentals
Identify and classify named entities.
Task
Input: sequence of tokens. Output: entity labels per token. BIO tagging: B-entity, I-entity, O.
Common types: PERSON, ORG, LOC, DATE, MONEY.
Approaches
Rule-based: dictionary matching. Statistical: HMM, CRF. Neural: Bi-LSTM-CRF, BERT-based.
CRF models label sequences. Bi-LSTM captures context.
Libraries
SpaCy: pre-trained models. Stanford NER. Hugging Face: BERT-NER.
Key Takeaways
- NER labels sequence tokens
- CRF models sequential labels
- BERT-based models are current state-of-art