What is Machine Learning?
Machine Learning (ML) is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It focuses on developing algorithms that can access data and use it to learn patterns.
Types of Machine Learning
1. Supervised Learning
Learning from labeled data where the correct output is known:
- Classification: Predicting categorical labels
- Regression: Predicting continuous values
Where:
- y = target variable
- X = features
- f = learned function
- ε = error term
Common Algorithms:
- Linear Regression
- Logistic Regression
- Decision Trees
- Random Forest
- Support Vector Machines
- Neural Networks
2. Unsupervised Learning
Finding patterns in unlabeled data:
- Clustering: Grouping similar data points
- Dimensionality Reduction: Reducing feature space
Common Algorithms:
- K-Means Clustering
- Hierarchical Clustering
- DBSCAN
- PCA (Principal Component Analysis)
- t-SNE
3. Reinforcement Learning
Learning through interaction with an environment:
Key Components:
- Agent: The learner
- Environment: What the agent interacts with
- Action: Possible moves the agent can make
- Reward: Feedback from the environment
The ML Workflow
Data Collection → Data Preprocessing → Feature Engineering →
Model Selection → Training → Evaluation → Hyperparameter Tuning →
Deployment → Monitoring
Model Evaluation Metrics
Classification Metrics
Regression Metrics
Bias-Variance Tradeoff
- High Bias (Underfitting): Model is too simple, misses patterns
- High Variance (Overfitting): Model learns noise, doesn't generalize
Python Implementation
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, mean_squared_error
# Classification example
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train_scaled, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
# Regression example
reg_model = LinearRegression()
reg_model.fit(X_train, y_train)
y_pred = reg_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
Overfitting and Underfitting
Overfitting:
- Too complex model
- High training accuracy, low test accuracy
- Solution: Regularization, more data, feature selection
Underfitting:
- Too simple model
- Low training and test accuracy
- Solution: More features, more complex model
Cross-Validation
Common k values: 5, 10
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
Key Takeaways
- ML has three main types: supervised, unsupervised, and reinforcement
- Choose the right algorithm based on problem type
- Evaluation metrics vary by problem type
- Bias-variance tradeoff is fundamental to model performance
- Cross-validation ensures reliable performance estimates