← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Machine Learning

Decision Trees

Topic: Tree Models

Advertisement

Decision Tree Fundamentals

Decision trees recursively split data based on feature values.

Splitting Criteria

Gini impurity measures node purity: 1 - Σp². Entropy: -Σp log₂(p). Information gain = parent entropy - weighted child entropy.

CART uses Gini. ID3/C4.5 use entropy.

Tree Building

Recursively find best split. Stopping criteria: max_depth, min_samples_split, min_samples_leaf.

Pruning reduces overfitting: cost-complexity pruning (ccp_alpha).

Advantages and Disadvantages

Interpretable, handle non-linear, require little preprocessing. Prone to overfitting, unstable, biased toward splits with many levels.

Key Takeaways

  1. Gini/entropy measure split quality
  2. Stopping criteria prevent overfitting
  3. Trees are interpretable but unstable

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →