Earth Data School/Cross-validation — does an independent dataset agree?
Lesson 4.1 · 10 of 17

Cross-validation — does an independent dataset agree?

One dataset can be confidently wrong. The first real check on any answer here: recompute it on a second, independently-built dataset and see whether they tell the same story.

In one lineBefore trusting "rainfall is falling here," ask a different rainfall dataset — built from different inputs — the same question. If both agree, your confidence goes up. If they don't, you've caught a problem.

Why one dataset isn't enough

Every dataset has a personality. ERA5 is a reanalysis — a weather model run over history, nudged by observations. It's superb for temperature but has known biases for precipitation (models struggle with rain). So an ERA5-only rainfall trend could be partly an artefact of the model, not the climate.

The fix is triangulation. CHIRPS is built completely differently — it blends rain-gauge records with satellite cloud-temperature estimates. If ERA5 and CHIRPS both say "drying, about this fast," the answer survived an independent check. That's what the "Cross-check" line on a /verify precipitation trend reports.

What "agree" actually means

Two estimates agree when they point the same direction and their confidence intervals overlap — i.e. the difference between them is within the noise. It's a weak-but-real test: cheap to run, and it catches gross errors and product-specific artefacts.

The honest crack"Agree" means "survived one independent check" — not "true." Two gridded products can lean on some of the same satellite inputs and share a bias, so they can be wrong together. Cross-validation raises confidence; it doesn't prove correctness. For that you need to leave the models entirely and compare to ground instruments.

Play with it

Two trend estimates with their uncertainty. Slide them apart, or widen the uncertainty, and watch the verdict flip between agree and disagree.

direction: CIs overlap:

Do it yourself

editable · runs in your browser

This is internal consistency — the system checking itself. It's necessary but not sufficient. The next guide, ground-truth anchoring, is the step that actually leaves the models and checks against the physical world.