Introduction
Creating effective visualizations requires understanding design principles, chart selection, and avoiding common pitfalls.
Key Principles
- Know your audience - Technical vs. general
- Choose appropriate charts - Data type and message
- Keep it simple - Avoid chart junk
- Use color effectively - Meaningful and accessible
Choosing the Right Chart
| Data Type | Chart Type |
|---|---|
| Comparison | Bar chart, Box plot |
| Distribution | Histogram, KDE, Violin |
| Relationship | Scatter plot, Line plot |
| Composition | Pie chart, Stacked bar |
| Trend | Line chart, Area chart |
Data-Ink Ratio
# Bad - too much clutter
plt.figure(figsize=(12, 8))
plt.plot(x, y, 'b-', linewidth=2)
plt.grid(True, linestyle='--')
plt.box(False)
plt.xlabel('Time', fontsize=14)
plt.ylabel('Value', fontsize=14)
plt.title('Time Series', fontsize=16, fontweight='bold')
# Good - minimal, essential
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'b-', linewidth=1.5)
plt.xlabel('Time')
plt.ylabel('Value')
Color Usage
# Qualitative palette - categorical data
colors = plt.cm.Set2.colors
plt.bar(categories, values, color=colors)
# Sequential palette - ordered data
colors = plt.cm.Blues(np.linspace(0.3, 1, len(values)))
plt.bar(categories, values, color=colors)
# Diverging palette - differences
colors = plt.cm.RdBu(np.linspace(0, 1, len(values)))
plt.bar(categories, values, color=colors)
Accessibility
# Use colorblind-friendly palettes
plt.style.use('seaborn-v0_8-colorblind')
# Add patterns for distinction
plt.bar(categories, values, hatch='//', color='gray')
plt.bar(categories2, values2, hatch='xx', color='white')
# Label directly instead of legend
for i, v in enumerate(values):
plt.text(i, v + 1, str(v), ha='center')
Story Telling with Data
# Progressive reveal
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# Plot 1: Raw data
axes[0].scatter(df.x, df.y)
axes[0].set_title('Raw Data')
# Plot 2: With trend
axes[1].scatter(df.x, df.y)
axes[1].plot(df.x, trend, 'r-')
axes[1].set_title('With Trend')
# Plot 3: Annotated
axes[2].scatter(df.x, df.y)
axes[2].plot(df.x, trend, 'r-')
axes[2].axvline(x=event_date, color='green', linestyle='--')
axes[2].set_title('Key Event Highlighted')
Common Mistakes
- 3D charts for simple data - Often distorts perception
- Truncated Y-axis - Can mislead
- Dual Y-axes - Can confuse relationships
- Too many pie slices - Hard to compare
- Missing axis labels - Ambiguous
Key Takeaways
- Match chart type to data and message
- Minimize clutter, maximize data-ink
- Use color meaningfully and accessibly
- Tell a clear story with your visualization