Earth Data School/Averages, anomalies & baselines
Lesson 3.1 · 6 of 17

Averages, anomalies & baselines

Before you can say “it changed,” you need three numbers: a typical value, how much it normally wobbles, and what counts as unusual. That's almost all the statistics this whole field runs on — built here intuition-first, no formulas memorised.

You're a strong coder, so the code here is trivial — .mean(), .std(). The part worth slowing down for is what these numbers mean, because every later lesson (trends, signal-vs-noise, cross-validation) is built on them. Three ideas: a typical value, the spread, and the anomaly.

1 · A typical value — mean vs. median

The mean (average) is the obvious "typical value." But it's easily dragged by one weird number. The median (the middle value when sorted) shrugs those off. Drag the one outlier below and watch:

One outlier value
mean median
In plain EnglishThe mean is a democracy where a billionaire can swing the average income; the median is "the person standing in the middle." For skewed, outlier-prone Earth data (rainfall, fire counts), the median and median-based methods are usually the honest choice — which is exactly why the trend test you'll meet next (Theil–Sen) is built on medians.

2 · The spread — standard deviation

One number for "how much does this normally wobble?" is the standard deviation (SD). Rough rule: most values (~2 in 3) sit within ±1 SD of the mean; landing beyond ±2 SD is genuinely unusual.

anomaly = value − baseline_mean   |   z = anomaly ÷ SD   (“how many wobbles from normal”)

3 · The anomaly — compare to normal, not to nothing

"India got 1100 mm of rain" is meaningless on its own. 1100 mm vs. a normal of 1300 mm is a story. The anomaly is the value minus the long-term average for that place and season — the baseline (the global convention is the WMO 1991–2020 30-year normal). Dividing the anomaly by the SD gives the z-score (a.k.a. standardized anomaly; the drought version is the SPI).

Predict first (then reveal): a region's normal monsoon is 1300 mm with an SD of 150 mm. This year it got 1080 mm. Is that "a bit dry" or "alarmingly dry"? Make a guess, then move the slider to where you think — and read the verdict.

This year's rainfall (mm)
anomaly % of normal z-score verdict
Why this is the whole game"Is it unusual?" is just "how big is the z-score?" "Is there a trend?" is "are the anomalies drifting one way over years?" "Is it a real signal?" is "is the change bigger than the SD-sized wobble?" Master anomaly + SD and the rest of the course is variations on them.

Doubt it — the traps

  • State your baseline. "120% of normal" is meaningless without saying which normal — a 1981–2010 vs. 1991–2020 baseline can flip the sign. Always name it.
  • Deseasonalise before trending. A "rising" series is often just the annual cycle; subtract the monthly climatology first (you'll do this in the sandbox).
  • SD assumes roughly symmetric wobble. Rainfall is skewed (can't go below 0, can spike high), so a z-score is a guide, not gospel — which is why we cross-check.

Check yourself

Cover the answers. (1) Your series has values [10, 11, 9, 10, 95] — would you report the mean or the median, and why? (2) A z-score of −0.4 — unusual or normal? (3) Why must you subtract a climatology before looking for a trend?

answers

(1) Median (10) — the 95 is almost certainly an outlier/error dragging the mean to 27. (2) Normal — well within ±1 SD; the data can't call it unusual. (3) Otherwise the regular seasonal cycle masquerades as a trend; you'd "find" a change that's just summer following winter.