The reasoning that keeps showing up: the test is 99% accurate, you tested positive, so there's a 99% chance you have the condition. It feels like the only sane reading of the words. But the answer can easily be 50% — or 9%, or lower — even with that exact 99% test. The number on the box and the number you want are two different conditional probabilities pointing in opposite directions, and reusing one for the other is the silent move that produces the wrong 99%.

The "99%" you were handed is the chance the test fires correctly given that you have the condition — written P(positive | have it). The number you actually care about runs the other way: the chance you have it given that the test fired — P(have it | positive). Those are two different numbers, and they are only close when the condition is common. When it's rare, P(have it | positive) can be 50%, or 9%, or lower, even with a test that is right 99% of the time.

The mistake isn't a math error. It's that everyday language collapses the direction of the conditional. "The test is 99% accurate" sounds like it's describing the test result in your hand. It isn't. It's describing the test's behavior on a population that already has the condition — a group you don't yet know you belong to.

The swap, named

You're given P(positive | disease) = 99%. You want P(disease | positive). The brain treats them as the same fact stated two ways, because both are "99%," both are "about the test," and both are "about the disease." Natural language has no grammar for "which way the conditioning runs," so the swap happens silently and feels airtight.

P(A | B) is not P(B | A). "Probability it's cloudy given that it's raining" is near 100%. "Probability it's raining given that it's cloudy" is not. Same two events, opposite direction, wildly different numbers. The test problem is exactly this, dressed in clinical language. The same direction-of-conditioning slip shows up in inferential statistics too — it's one of the common ways students misread what a hypothesis test actually claims, where a probability computed assuming the null gets read as the probability of the null.

Here's the part the swap quietly deletes. The given number, P(positive | disease), runs disease → test: start with sick people, ask how the test behaves. What you want, P(disease | positive), runs test → disease: start with a positive result, ask who's actually sick. To turn one into the other you have to use Bayes' theorem — and Bayes forces a third quantity into the calculation that the original sentence never mentioned: how common the disease is in the first place. The base rate. The swap is the engine of the error; dropping the base rate is what the engine drops.

Watch it happen with real numbers

Take a disease that 1% of people have. The test: sensitivity P(positive | disease) = 99%, and specificity 99%, meaning the false-positive rate P(positive | healthy) = 1%. Now imagine 10,000 people walk through the door.

  • 100 of them actually have the disease (1% of 10,000).
  • 9,900 of them are healthy.

Run the test on everyone:

  • Of the 100 sick people, the test correctly flags 0.99 × 100 = 99 true positives.
  • Of the 9,900 healthy people, the test wrongly flags 0.01 × 9,900 = 99 false positives.

The full picture as a 2×2 table:

Test positive Test negative Total
Has disease 99 1 100
Healthy 99 9,801 9,900
Total 198 9,802 10,000

A positive result lands you in the left column: 198 people. Of those, only 99 are actually sick.

P(disease | positive) = 99 / 198 = exactly 50%.

Not 99%. A coin flip. The test is genuinely 99% accurate, and a positive result still leaves you with even odds — because the 9,900 healthy people are such a large pool that their 1% error rate produces just as many positives as the entire sick population does. The false positives, each individually unlikely, arrive in a flood because there are so many healthy people for the test to be wrong about.

Why the wrong intuition usually works

If the swap is so broken, why does it feel right? Because in most everyday situations the base rate isn't tiny, and when it isn't, the two numbers really do converge.

Redo the calculation in a high-suspicion setting — say a clinic where, based on symptoms, 50% of the people tested actually have the condition. Same test, 10,000 people:

  • 5,000 have the disease, 5,000 are healthy.
  • True positives: 0.99 × 5,000 = 4,950.
  • False positives: 0.01 × 5,000 = 50.
  • Total positives: 5,000.

P(disease | positive) = 4,950 / 5,000 = 99%.

Now the swapped answer is correct. When half the tested population is sick, a positive result really does mean a 99% chance. This is the trap's alibi: in the high-base-rate cases we meet most often, swapping the conditionals gives the right answer, so the habit never gets corrected. It only bites when the condition is rare — precisely the screening situation where the stakes are highest.

The lesson isn't "the test is bad." A 99% test is excellent. A result is evidence that updates a prior probability, and when the prior is 1%, even strong evidence only pulls you up to 50%, not to certainty. If treating a stated probability as a fact about the wrong quantity feels familiar, it's the same family of error as reading a 95% confidence interval as a 95% chance the true value sits inside it — a number defined one way, intuitively grabbed for another.

The reasoning trap, in one line

You were given how the test behaves among the sick. You assumed that told you how the sick are distributed among positives. Those are reverse questions, and the bridge between them is the base rate you weren't given and didn't ask for.

Check yourself

A condition affects 1 in 1,000 people (0.1%). A test has 100% sensitivity (it never misses a real case) and a 5% false-positive rate. You test positive. Roughly what's the chance you actually have the condition?

A) About 95%, since the test is mostly accurate. B) About 100%, since the test never misses a case. C) About 2%. D) Not enough information.


Correct answer: C.

Take 10,000 people. About 10 have the condition; the test catches all 10 (100% sensitivity). Of the 9,990 healthy people, 5% test positive falsely: 0.05 × 9,990 ≈ 500. Total positives ≈ 510, of which only 10 are real. P(condition | positive) ≈ 10 / 510 ≈ 2%.

A makes the direction swap — it reads "5% false positives" as "95% chance I'm sick," reusing the test's accuracy for the wrong question. B fixes on sensitivity, which only tells you the test won't miss real cases, not how many positives are real. D is the tempting hedge, but you were given everything you need: a base rate, a sensitivity, and a false-positive rate are exactly the three ingredients Bayes' theorem requires.

Close the gap

The fix here isn't memorizing Bayes' formula — it's training the reflex to ask "which direction is this conditional running, and what base rate am I quietly assuming?" before trusting a number. That reflex is hard to build from a single example, because the swap re-tempts you every time the wording changes. Gradual Learning works through these reversed-conditional problems with you across cases, watching where the swap sneaks back in and adapting until the direction-check becomes automatic.

Try Gradual Learning free →