Visualizing Why Machines Learn

Interactive visuals to accompany the book.

I recently read Anil Ananthaswamy’s great book Why Machines Learn. I enjoyed learning about the story behind the development of deep learning, most of which I didn’t know! While reading it, I repeatedly found myself wishing that the visuals were better, so I made this blog post to accompany the book. I hope it helps!

We start with Frank Rosenblatt’s development of the perceptron, one of the first neural networks.

The Perceptron

The perceptron is the simplest neural network — a single neuron that learns to separate two classes with a linear boundary.

Given an input point $(x_1, x_2)$, the perceptron computes:

\[y = \text{sign}(w_0 + w_1 x_1 + w_2 x_2)\]

Where $w_0$ is the bias and $w_1, w_2$ are the weights. The decision boundary is the line where the weighted sum equals zero:

\[w_0 + w_1 x_1 + w_2 x_2 = 0\]

Learning Rule

When the perceptron misclassifies a point, it updates its weights:

\[w_i \leftarrow w_i + \eta \cdot y \cdot x_i\]

where $\eta$ is the learning rate and $y \in {-1, +1}$ is the true label.

This nudges the decision boundary toward correctly classifying the point. The perceptron convergence theorem guarantees that if the data is linearly separable, this process will find a solution.

Interactive Demo

Watch the perceptron learn! The visualization starts with a deliberately bad decision boundary (splitting each cluster in half) so you can see the learning process.

Blue filled circles = class +1
Red rings = class -1
Yellow dashed line = decision boundary
Yellow rings = misclassified points
Green rings = points just trained on

Batch size: 1

Step: 0 | Misclassified: - | Learning rate: 0.01

What to Notice

Single point updates (batch=1): The boundary wobbles as it responds to individual points
Larger batches: Smoother convergence as updates are averaged
The weight deltas: Blue points push the boundary one way, red points push it the other
Convergence: The algorithm always finds a solution for linearly separable data

More visualizations coming soon: gradient descent, neural network forward/backward passes, Hopfield networks…