Gradient Descent
Definition
Gradient descent is an optimization algorithm used to minimize a loss function by iteratively adjusting model parameters in the direction that reduces error. It’s the core method used to train most machine learning models, including neural networks.
Nuance
The model calculates the gradient (slope) of the loss function with respect to its parameters. It then moves slightly in the opposite direction of the gradient — like descending a hill — to reduce the loss. This is repeated across many iterations.
Follow the slope down the error hill
Examples
- Used to train a neural network to recognize handwriting by reducing classification error
- Fine-tuning a language model by nudging its weights based on how far off its answers are
- Adjusting logistic regression parameters to better separate spam vs. non-spam emails
Pros
- Simple and widely applicable
- Scales to high-dimensional problems
- Can be optimized with techniques like momentum, Adam, or RMSprop
- Forms the basis of most deep learning training
Cons
- Can get stuck in local minima or saddle points
- Sensitive to learning rate (too big = overshoot, too small = slow)
- Requires differentiable functions
- May take many iterations to converge