These three mistakes show up at the very beginning of a statistics course, and they share an unusual quality: the wrong answer sounds like careful thinking. The student isn't being sloppy. They're applying real logic — just to the wrong thing.
The common mistakes
1. Treating "either/or" as "50/50"
In a session on basic probability, the tutor asked: you flip a fair coin, what's the probability of getting heads?
Alma answered: probability 1, because "it either happens or it doesn't."
The logic behind that answer is worth taking seriously. There are two outcomes. One is heads. The other is tails. In some loose sense, heads "either happens or it doesn't." That's true. But the same reasoning would apply to virtually any event. Will it rain tomorrow? It either happens or it doesn't. Does that make the probability of rain 1? Clearly not.
The confusion is between the type of outcome and the likelihood of the outcome. "Either/or" describes the structure — this event either occurs or it fails to occur. That's true of everything. It says nothing about how often each branch happens.
What students miss is that probability isn't tracking which outcomes exist, but how many equally likely ways each outcome can occur.
Probability = favorable outcomes ÷ total equally likely outcomes.
For a coin flip: 1 head ÷ 2 total outcomes = 0.5. Not 1, not 2. The "either/or" framing tells you there are two branches — it doesn't tell you anything about their relative weight.
Once Alma applied the formula directly — 1 favorable outcome ÷ 6 total outcomes for a die roll = 1/6 — the logic clicked. The problem was never arithmetic. It was a category error: using a binary description of outcome types to infer a probability.
2. Choosing the mean because it "uses all the data"
A real estate agent wants to report the "typical" home price in a neighborhood. Most houses sell for $200k–$300k, but one mansion sold for $2 million. Which measure should the agent use?
In this session, Alma chose the mean — "it uses every data point."
That reasoning treats comprehensiveness as a statistical virtue. And in some contexts, it is. But in this case, "using every data point" is precisely the problem. The mean doesn't just include the $2 million observation — it gets pulled by it. If that single mansion drags the mean to $420k, the resulting number doesn't describe any actual buyer's experience in that neighborhood.
This is the key distinction beginners miss: the mean is sensitive to extreme values by design. That sensitivity can be useful (it fully captures the scale of a distribution), but it becomes a liability when a small number of values are dramatically different from the rest.
The session laid it out through a homebuyer's lens: if you're budgeting for a home in that neighborhood, the median ($250k) tells you what most homes actually cost. The mean ($420k) tells you the mathematical average including a house you can't afford and probably aren't considering. The median isn't less rigorous — it's more resistant. It only asks "what's the middle value?" and ignores how extreme the outlier actually is.
After the buyer-perspective reframe, Alma applied the rule correctly. The prior framing ("uses all the data") hadn't been wrong in other contexts — it just couldn't survive contact with the specific failure mode it was generating.
3. Missing when spread matters more than center
After working through mean and median, the session introduced two classes with identical means (both exactly 50) but completely different score distributions:
- Class A: 48, 49, 50, 51, 52
- Class B: 10, 10, 50, 90, 90
Alma immediately identified that Class B's scores were more spread out. That was right, and the concept of standard deviation followed cleanly: roughly "how far, on average, are scores from the mean?"
The moment worth capturing isn't the error — it's the lesson underneath it. When a professor announces "the average was 70," you can't tell whether that means everyone clustered around 70 or half the class scored 30 and half scored 100. The mean is honest, but incomplete. A standard deviation of 2 vs. a standard deviation of 30 are completely different situations, even with identical means.
The practical move: the mean tells you where the center is. The standard deviation tells you whether that center is meaningful. A high SD means the center describes almost no one in the dataset particularly well. A low SD means most people are close to it.
For exam purposes, the standard deviation question that follows will ask you to know what a high vs. low SD implies — not just the formula. The conceptual answer is: high SD = widely scattered; low SD = tightly clustered.
The actual mechanism
All three mistakes share a structure: the student applies a general-sounding principle (either/or → probability; uses all data → accuracy; average → full picture) without checking whether that principle fits the specific statistical context.
Statistics has a lot of concepts that work almost like the intuition suggests — until they don't. The binary-outcome framing fails because probability is about frequency across trials, not just structure of a single trial. The "uses all data" framing fails because inclusion isn't the same as representativeness. The average-as-summary fails because center without spread hides the shape of the distribution.
These aren't exotic edge cases. They're the three most common questions in an introductory descriptive statistics course. If you want to see how they connect to the inferential mistakes that come next — the p-value and alpha inversions that trip up students later — that cluster is covered in 4 hypothesis-testing mistakes that feel right until they don't.
How to remember it
For probability: "either/or" describes the world, not the odds. Every event either happens or it doesn't. The formula does the work: favorable ÷ total.
For mean vs. median: ask whether there are outliers. If yes, the median resists them; the mean chases them. Median = what most people actually experience. Mean = math that weighs everyone equally regardless of how extreme they are.
For spread: the average tells you the center. The standard deviation tells you whether the center is representative or just a number that happens to be in the middle of a wide scatter.
Check yourself
A city reports the "average" household income as $85,000. A journalist argues this overstates how well most families are doing. A statistician says the median income is $52,000. Which interpretation is more likely correct, and why?
A) The mean — it uses all income data and is therefore more accurate.
B) The median — it resists the upward pull of a small number of very high earners.
C) Both are equally valid; use whichever is easier to compute.
D) Neither — without the standard deviation, neither measure is interpretable.
Correct answer: B.
A small number of very high earners can dramatically pull the mean upward without changing what most households actually experience. The median is the midpoint of the distribution — half earn above it, half below — and resists that pull entirely. In income distributions, which are almost always right-skewed by high earners, the median typically describes the typical household more accurately than the mean. D is wrong because the mean and median are useful on their own; standard deviation is a complement, not a prerequisite.
Close the gap
The tutor who worked with Alma caught the "uses all data" reasoning the moment it appeared — not after the exam. That real-time correction, before a wrong model has time to solidify across ten more practice problems, is exactly what Gradual Learning is designed to do.