Decision Tree Fundamentals
Decision trees recursively split data based on feature values.
Splitting Criteria
Gini impurity measures node purity: 1 - Σp². Entropy: -Σp log₂(p). Information gain = parent entropy - weighted child entropy.
CART uses Gini. ID3/C4.5 use entropy.
Tree Building
Recursively find best split. Stopping criteria: max_depth, min_samples_split, min_samples_leaf.
Pruning reduces overfitting: cost-complexity pruning (ccp_alpha).
Advantages and Disadvantages
Interpretable, handle non-linear, require little preprocessing. Prone to overfitting, unstable, biased toward splits with many levels.
Key Takeaways
- Gini/entropy measure split quality
- Stopping criteria prevent overfitting
- Trees are interpretable but unstable