Gradient Descent

Definition

Gradient descent is an optimization algorithm used to minimize a loss function by iteratively adjusting model parameters in the direction that reduces error. It’s the core method used to train most machine learning models, including neural networks.

Nuance

The model calculates the gradient (slope) of the loss function with respect to its parameters. It then moves slightly in the opposite direction of the gradient — like descending a hill — to reduce the loss. This is repeated across many iterations.

Follow the slope down the error hill

Examples

Used to train a neural network to recognize handwriting by reducing classification error
Fine-tuning a language model by nudging its weights based on how far off its answers are
Adjusting logistic regression parameters to better separate spam vs. non-spam emails

Pros

Simple and widely applicable
Scales to high-dimensional problems
Can be optimized with techniques like momentum, Adam, or RMSprop
Forms the basis of most deep learning training

Cons

Can get stuck in local minima or saddle points
Sensitive to learning rate (too big = overshoot, too small = slow)
Requires differentiable functions
May take many iterations to converge

Links to this note

Training

is the process of teaching a machine learning model to recognize patterns in data by adjusting its internal parameters (weights) based on