How a neural network learns · Step 3 of 4
The gradient: which way to nudge each weight
Last step gave you one number — the loss — that says how wrong the network is. Now we turn that number into action: which weights to change, and which way.
Question 1 of 3
You have the loss — one number measuring how wrong the prediction is. To improve, you need to change the weights. What's the missing piece the gradient supplies?
You said: For each weight, which direction of change would raise the loss
ExactlyExactly. The gradient is one number per weight, and each tells you the direction that would INCREASE the loss. That's the lever you need — because if you know the way up, you know to go the opposite way.
You said: The single best new value to set every weight to at once
Not quiteNot a finished answer — the gradient doesn't hand you final values. For each weight it gives a direction: which way of nudging it would increase the loss. You then step the opposite way. It's a compass, not a destination.
You said: How wrong the prediction was, broken down per output
CloseClose, but that's still measuring wrongness — that's the loss's job. The gradient goes further: for each weight it says which direction of change would increase the loss, so you know which way to move it.
You said: I'm not sure
No worriesHere's the key: the gradient gives one number per weight, and each says which direction of nudging that weight would INCREASE the loss. Knowing the way up tells you to step the opposite way, down.
Another way to see it
Picture standing on a foggy hill where height = loss. You can't see the valley, but you can feel the slope under your feet — which way is uphill. The gradient is exactly that feeling, computed separately for each weight: the uphill direction. Walk against it and you head downhill, toward lower loss.
So the gradient points uphill, toward MORE loss. That sets up the one move that matters.
Question 2 of 3
The gradient for a weight points in the direction that INCREASES the loss. You want loss to go down. So which way do you actually move that weight?
You said: The opposite direction of the gradient
ExactlyRight. The gradient points uphill toward more loss, so you step the other way — downhill. That single rule, applied to every weight, is what 'learning' physically is: move against the gradient.
You said: The same direction as the gradient, to follow it
Not quiteThat would climb the hill — following the gradient INCREASES the loss, making the network worse. You want lower loss, so you move OPPOSITE the gradient, downhill. It's a sign flip, and it's the whole idea.
You said: Whichever direction makes the weight larger
Not quiteDirection isn't about bigger or smaller in general — it depends on this weight's gradient. Since the gradient points toward MORE loss, you move opposite it. For some weights that means smaller, for others larger.
You said: I'm not sure
No worriesMove OPPOSITE the gradient. The gradient points uphill toward more loss; stepping the other way takes you downhill toward less. Do that for every weight and the network improves.
Opposite the gradient, every weight. Now let's see you actually read a gradient and act on it.
Question 3 of 3
A network has two weights. The gradient comes back as: weight A → +3, weight B → -0.5. To reduce the loss, what do you do to each weight?
You said: Decrease A, increase B
ExactlySpot on. A's gradient is positive (+3), meaning raising A raises loss — so you lower A. B's is negative (-0.5), meaning raising B lowers loss — so you raise B. And A's bigger size means it gets the bigger nudge.
You said: Increase A, decrease B
Not quiteThat's following the gradient, which climbs toward MORE loss. You move opposite the sign: A's +3 means lower A; B's -0.5 means raise B. Flip each move and you'll be heading downhill.
You said: Decrease both, since you want everything smaller
Not quiteDirection is per-weight, set by each sign — not a blanket shrink. A is +3, so lower A. But B is -0.5, so you RAISE B. Each weight moves opposite its own gradient.
You said: I'm not sure
No worriesMove each weight opposite its gradient's sign. A is +3 (positive), so decrease A. B is -0.5 (negative), so increase B. The magnitude (3 vs 0.5) also tells you A deserves the bigger nudge.
The takeaway
The gradient gives one number per weight pointing toward MORE loss. To learn, move each weight opposite its gradient — positive means decrease, negative means increase — and bigger magnitude means a bigger nudge.
Next step
You now know the gradient points the way that worsens the loss, so the opposite way improves it. Next: actually taking that step, repeatedly — gradient descent.
The real tutor would keep building this with you, step by step, and remember where you are.
Or make it about your topic:
No shame in this
Still fuzzy after two angles? That's the exact moment the real tutor is built for — it works out which step is tripping you, re-explains from a direction that fits how you think, and checks you've actually got it before moving on. This preview can't adapt to you. The tutor does.