Text Classification Methods
Classify text into categories.
Traditional Methods
Naive Bayes: baseline, works well. SVM: good for high-dimensional text. Logistic regression: interpretable.
TF-IDF features. Document-term matrix.
Deep Learning
CNN for text: convolve over word embeddings. RNN/LSTM: sequential modeling.
BERT: pre-trained + fine-tune. State-of-art for most tasks.
Multi-label
One-vs-rest for multi-label. Hierarchical classification for structured labels.
Key Takeaways
- TF-IDF + traditional ML is solid baseline
- BERT provides state-of-art
- Multi-label requires special handling