Saturday, July 12, 2025

A visual introduction to ML | Tuning | Bias-Variance

I came across R2D3's interactive guide on machine learning basics (Parts 1 & 2) and thought it'd be useful to share. It's a visual explanation using a dataset of homes in San Francisco vs. New York for classification.

Part 1: Basics of ML and Decision Trees

  • ML uses statistical techniques to identify patterns in data for predictions, e.g., classifying homes by features like elevation and price per sq ft.
  • Decision trees create boundaries via if-then splits (forks) on variables, recursively building branches until patterns emerge.
  • Training involves growing the tree for accuracy on known data, but overfitting can occur, leading to poor performance on unseen test data.

Part 2: Bias-Variance Tradeoff

  • Models have tunable parameters (e.g., minimum node size) to control complexity.
  • High bias: Overly simple models (e.g., a single-split "stump") ignore nuances, causing systematic errors.
  • High variance: Overly complex models overfit to training data quirks, causing inconsistent errors on new data.
  • Optimal models balance bias and variance to minimize total error; deeper trees reduce bias but increase variance.

Created by Stephanie Yee (statistician) and Tony Chu (designer) at R2D3.us. Great for intuitive understanding—check it out if interested.

Sources