How a neural network learns · Step 4 of 4
Gradient descent: taking small steps downhill, over and over
Last step you found the direction each weight should move. Now you actually take the step — and then do it again, and again.
Question 1 of 3
The gradient points in the direction that makes the loss go UP. To shrink the loss, which way do you nudge each weight?
You said: Along the gradient (the same direction it points)
Not quiteThat would push the loss higher. The gradient points uphill, toward more error, so to lower the loss you step the OPPOSITE way — against the gradient. That subtraction is the whole move: new weight = old weight minus a little bit of the gradient.
You said: Against the gradient (the opposite direction)
ExactlyExactly. The gradient points uphill toward more loss, so you go the other way to head downhill. Concretely: new weight = old weight minus a small slice of the gradient. That single subtraction, done to every weight, is one step of gradient descent.
You said: Whichever way makes the weight bigger
CloseIt depends on the gradient's sign, not on making weights bigger. You always step AGAINST the gradient: new weight = old weight minus a small slice of the gradient. If that gradient is negative, the weight goes up; if positive, it goes down.
You said: I'm not sure
No worriesYou step against the gradient. Since the gradient points uphill toward more loss, going the opposite way heads downhill toward less. The rule: new weight = old weight minus a small slice of the gradient.
Another way to see it
Picture standing on a foggy hillside wanting to reach the valley. You can only feel the slope under your feet — that's the gradient, pointing uphill. So you take a small step in the exact opposite direction, downhill. Each weight is one such step against its own slope.
So the step is a subtraction. But how BIG a slice of the gradient should you subtract?
Question 2 of 3
Why subtract only a SMALL slice of the gradient each time, instead of one big jump?
You said: The gradient only tells you the direction nearby; take too big a step and you can overshoot the valley
ExactlyRight. The slope you measured is only trustworthy close to where you're standing. A giant leap can fly past the bottom or even bounce to a worse spot, so you take small steps and re-measure the slope after each. That step size is called the learning rate.
You said: Small steps make the math simpler to compute
Not quiteThe size of the step doesn't change how hard the math is — computing the gradient costs the same either way. The real reason is that the slope is only reliable nearby: a too-big step overshoots the valley. That tuned step size is the learning rate.
You said: Small steps guarantee you reach the exact lowest point
CloseSmall steps help, but they don't guarantee the exact bottom — you can settle near it or in a local dip. The actual reason for going small is that the slope is only trustworthy nearby, so a big jump can overshoot. That step size is the learning rate.
You said: I'm not sure
No worriesThe gradient only describes the slope right where you're standing. Step too far and you can sail past the valley floor or land somewhere worse, so you nudge a little and re-measure. That tuned step size is called the learning rate.
Now put the whole loop together — one full pass of how a network learns from a single example.
Question 3 of 3
A network predicts on one example, and the loss comes out high. Put the next moves in order to complete one learning step.
You said: Score the prediction (loss), find the gradient, then nudge each weight a little against the gradient
ExactlyThat's the full loop: predict, score with the loss, find the gradient (the direction), then step each weight a little downhill against it — and repeat on the next example. Run that thousands of times and the loss slowly drops. THAT repeated loop is what 'a neural network learns' actually means.
You said: Nudge the weights first, then find the gradient, then score the result
Not quiteThe order is backwards — you can't know which way to nudge until you've found the gradient, and you can't find the gradient until you've scored the prediction. The loop is: score (loss), find the gradient, THEN step each weight against it. Repeat across many examples and the loss falls.
You said: Find the gradient, then jump each weight straight to its best value in one move
CloseYou've got the gradient step in the right place, but there's no single jump to the 'best' value — the gradient only gives a direction, so you take a SMALL step against it and repeat. The loop is: score, find the gradient, nudge a little, then do it all again on the next example.
You said: I'm not sure
No worriesThe order is: score the prediction with the loss, find the gradient (the direction of steepest increase), then nudge each weight a little the opposite way. Repeat that across many examples and the loss slowly drops — that repeating loop is what learning is.
The takeaway
Gradient descent is just: step every weight a little AGAINST the gradient (downhill), then re-measure and repeat. Predict, score, find the direction, take a small step — looped over many examples until the loss slowly drops. That loop IS learning.
The pattern
You can now trace the full learning loop end to end: a forward pass produces a prediction, a loss function scores how wrong it is, the gradient says which way to nudge each weight, and gradient descent takes that small step downhill — repeated over many examples until the loss shrinks. That four-part cycle is the engine behind training any neural network, and you can now read the words 'backprop', 'learning rate', and 'optimizer' knowing exactly where they plug in. From here, hand off to the tutor to go deeper: how gradients are actually computed through layers (backpropagation), how the step size (learning rate) is chosen, and why batches, activation functions, and more layers change what the network can learn.
That's one thread. The real tutor doesn't stop here — it remembers what connected for you and keeps building the map, at your pace or against your deadline.
Or make it about your topic:
The real tutor would keep building this with you, step by step, and remember where you are.
Or make it about your topic:
No shame in this
Still fuzzy after two angles? That's the exact moment the real tutor is built for — it works out which step is tripping you, re-explains from a direction that fits how you think, and checks you've actually got it before moving on. This preview can't adapt to you. The tutor does.