Step 2: Loss — scoring how wrong the prediction was

Last step the network produced a prediction. Now we need one number that says how far off it was — that number is the loss.

3 quick questions · about 2 min · no sign-up

Question 1 of 3

The network just predicted 0.2 for an image that's actually a cat (true answer: 1). What does a loss function do with these two numbers?

You said: It collapses the gap between prediction and truth into a single error score

Exactly

Exactly. A loss function takes the prediction (0.2) and the true answer (1) and returns one number measuring how wrong the guess was — high when the gap is big, low when it's small. That single score is the thing training will try to shrink.

You said: It corrects the prediction, nudging the 0.2 up toward 1

Not quite

Not yet — that's a later step. The loss function only measures the error, collapsing the prediction (0.2) and truth (1) into one number that says how wrong the guess was. Fixing the prediction comes after we know the score.

You said: It checks whether the prediction is exactly right, returning true or false

Close

Close, but loss isn't pass/fail — it's a graded distance. It collapses the prediction (0.2) and truth (1) into one number: how wrong the guess was. A guess of 0.2 scores worse than 0.9, even though both miss 1.

You said: I'm not sure

No worries

No problem. A loss function takes the prediction (0.2) and the true answer (1) and returns one number measuring how wrong the guess was — high for a bad guess, low for a good one. That single score is what training tries to shrink.

Another way to see it

Think of it like grading a test answer by closeness, not just right/wrong. If the truth is 1, a guess of 0.9 earns a small penalty, 0.2 earns a big one, and 1.0 earns roughly zero. The loss function is the grader that turns 'how far off' into a single penalty score.

So loss is a graded penalty — bigger when the guess is further from truth. Let's see how that grading actually behaves.

Question 2 of 3

Truth is 1. The network outputs three guesses on three images: 0.9, 0.5, and 0.1. Order them from LOWEST loss to HIGHEST.

You said: 0.9 (lowest), then 0.5, then 0.1 (highest)

Exactly

Right. Loss tracks distance from the truth (1). 0.9 is closest, so lowest loss; 0.1 is furthest, so highest loss. As the guess slides away from the right answer, the penalty climbs.

You said: 0.1 (lowest), then 0.5, then 0.9 (highest)

Not quite

That's reversed. Loss measures distance from the truth (1), so the guess CLOSEST to 1 gets the lowest loss. 0.9 is nearest, 0.1 is furthest — so 0.9 has the lowest loss and 0.1 the highest.

You said: They all have the same loss — none of them equals 1 exactly

Close

Loss isn't all-or-nothing. It grades by distance from the truth (1), so 0.9 scores a small penalty and 0.1 a large one. Lowest to highest: 0.9, 0.5, 0.1.

You said: I'm not sure

No worries

Here's the rule: loss grows as the guess moves away from the truth (1). So 0.9 is closest and has the lowest loss, 0.1 is furthest and has the highest. Order: 0.9, 0.5, 0.1.

You can now read loss as a distance-to-truth dial. Last check: what does that dial give us that 'the network was wrong' didn't?

Question 3 of 3

Why bother turning the error into a single number at all — why not just note 'the network was wrong'?

You said: A number is something we can try to make smaller — it gives training a target to shrink

Exactly

That's the point. 'Wrong' is a dead end, but a number can go up or down. By making the loss as small as possible, we turn training into a concrete goal: shrink this score. That's exactly what the next step does.

You said: A single number is easier for humans to read on a dashboard

Not quite

Readability is a nice bonus, but not the reason. The real point: a number is something we can try to MAKE SMALLER. That gives training a concrete target — shrink the loss — which 'the network was wrong' could never provide.

You said: It lets us throw away the bad predictions and keep only the good ones

Close

We don't discard predictions — we use the loss to improve them. The key is that a number can be shrunk: making the loss as small as possible becomes training's concrete goal, which a flat 'wrong' can't give us.

You said: I'm not sure

No worries

Here's the why: a number can be made smaller, while 'wrong' just sits there. Turning error into a loss gives training a concrete target — shrink this score — and that's the goal the next step chases.

The takeaway

A loss function collapses prediction-vs-truth into one number: high when the guess is bad, low when it's good. That single score is what makes training possible — it gives us something concrete to shrink.

Next step

You can now measure how wrong a prediction is as one number. But knowing you're wrong isn't enough — next: which weights to change, and which way, using the gradient.

Next step →

The real tutor would keep building this with you, step by step, and remember where you are.

Take another →

Or make it about your topic:

No shame in this

Still fuzzy after two angles? That's the exact moment the real tutor is built for — it works out which step is tripping you, re-explains from a direction that fits how you think, and checks you've actually got it before moving on. This preview can't adapt to you. The tutor does.

Take another →