Earth-Observation Data Analysis & Verification — A CS Person’s Handbook

You already know how to code. This is the Earth-science vocabulary, the concepts, and the methodology you need to go from “I can write Python” to “I can pull NASA/ISRO satellite data, compute an answer, and defend that it’s real — without an Earth scientist holding your hand.” It’s the map; the site’s /learn pages are the interactive territory, and verified-earth-agent is a working reference implementation.

Read it in order once. After that it’s a reference. Anything marked ⚠️ is a place people (including papers) get it wrong.

0. The one idea that matters most

A satellite measurement is not the truth — it’s an estimate of the truth, made by a sensor, through an atmosphere, turned into a number by a retrieval algorithm, on a grid. Every step adds error. So the job is never just “compute the number.” It’s:

Compute the number (the easy part).
Quantify how uncertain it is (noise, significance).
Cross-check it against an independent source.
Refuse to claim more than the data supports.

That loop — compute → quantify → cross-check → refuse — is verification. Everything below serves it.

1. How Earth observation actually works

Platform / sensor / product. A platform is the satellite (Terra, Sentinel-2). A sensor/instrument is the camera/radar on it (MODIS, OLI, SAR). A product is the processed data you download (e.g. MOD13A2 NDVI). You analyze products, not raw sensors.
Passive vs active. Passive sensors measure reflected sunlight or emitted heat (optical, thermal) — they need daylight and are blocked by cloud. Active sensors send their own signal (radar/SAR, lidar) — they work day/night and SAR sees through cloud. For monsoon India this distinction is everything: optical is blind under monsoon cloud; SAR (Sentinel-1, NISAR, RISAT) is not.
Processing levels. L0 (raw) → L1 (calibrated radiances, geolocated) → L2 (geophysical variable per pixel/swath, e.g. NDVI, NO₂) → L3 (gridded, time-composited) → L4 (model-assimilated/gap-filled). You almost always want L2 or L3.
The four resolutions (always ask all four):
- Spatial — pixel size (10 m Sentinel-2, 250 m–1 km MODIS, ~10 km IMERG, ~25–55 km SMAP/GRACE). ⚠️ India’s farm fields are often <1 ha — MODIS 250 m mixes several fields (“mixed pixel” problem). Match resolution to the thing you’re measuring.
- Temporal — revisit / cadence (Sentinel-2 ~5 days, MODIS daily, Landsat 16 days, geostationary INSAT/GOES every few minutes).
- Spectral — which wavelengths/bands (visible, near-infrared (NIR), shortwave-IR (SWIR), thermal-IR, microwave). Indices are built from band combinations.
- Radiometric — how finely it measures intensity (bit depth); rarely your bottleneck.
Swath & revisit trade-off. Wide swath → frequent revisit but coarse pixels; narrow swath → fine pixels but infrequent. You can’t have both from one sensor (hence data fusion).
Orbits. Sun-synchronous (polar, ~same local time each pass — most land sensors). Geostationary (fixed over one spot — weather: INSAT-3D, GOES). Non-sun-sync (Megha- Tropiques — many tropical passes/day).

2. Data: formats & access (the plumbing)

Formats

NetCDF / HDF — the classic self-describing array formats (dimensions + variables + metadata). Most NASA L2/L3 data. Read with xarray.
GeoTIFF / COG — raster images. COG = Cloud-Optimized GeoTIFF: internally tiled so you can read a small window over HTTP without downloading the whole file. VEDA/Landsat use it.
Zarr — chunked array format built for the cloud and parallel reads; the native format for ARCO data. Read with xarray + dask.
GeoParquet / GeoArrow — columnar/tabular formats for vector features (points, polygons) and increasingly for analysis-ready geo tables; fast, typed, cloud-friendly.
ARCO = Analysis-Ready, Cloud-Optimized. Data pre-chunked so you compute directly in the cloud without reformatting. ERA5-ARCO on Google Cloud is the canonical example.

Access & catalogs

STAC (SpatioTemporal Asset Catalog) — a JSON standard for describing and searching geospatial data. pystac-client queries a STAC API. VEDA, Copernicus, Element84 expose STAC.
CMR (Common Metadata Repository) — NASA’s giant searchable index of all its datasets.
DAAC — Distributed Active Archive Center: NASA’s themed data archives (LP DAAC = land, GES DISC = atmosphere/precip, PO.DAAC = ocean, NSIDC = cryosphere, ASF = SAR). A dataset “lives” at one DAAC.
earthaccess — the Python library to authenticate (NASA Earthdata login) and pull NASA data by short-name + bounding box + time. Your main door to NASA data.
Google Earth Engine (GEE) — a planetary-scale compute platform: petabytes of data + server-side processing. You write a query, Google’s servers reduce it, you get back a tiny answer. Brilliant for trends/aggregations over big regions. ⚠️ It hosts a curated subset of NASA data, not everything. (This site’s verified engine uses GEE.)
Provider portals — ESA Copernicus (open), and India’s own (ISRO MOSDAC, NRSC Bhoonidhi, IMD Pune) which mostly need free registration and aren’t openly scriptable.

3. The variables you’ll actually use (by domain)

Know what each measures, its units, and its gotcha.

Vegetation / land

NDVI = (NIR − Red)/(NIR + Red) ∈ [−1, 1]. Greenness/vigour proxy. ⚠️ saturates over dense canopy; sensitive to soil background; cloud-contaminated values must be masked.
EVI — like NDVI but corrects soil/atmosphere, doesn’t saturate as fast.
LST — Land-Surface Temperature (thermal-IR, e.g. MODIS MOD11). ⚠️ not air temperature — it’s the skin temperature of the ground; can be 10–20 °C hotter than the air on a sunny day.
Land cover / land use — categorical maps (cropland, forest, urban…).

Atmosphere / greenhouse gases

NO₂ / SO₂ / CO / O₃ columns — column density (molecules/cm²) = total amount in the air column, not surface concentration. OMI (2004–), TROPOMI (2018–, sharper). ⚠️ OMI “row anomaly”; column ≠ what you breathe at the surface.
CH₄ (methane), CO₂ (XCO₂) — column mixing ratios (TROPOMI, EMIT, OCO-2/3, GOSAT).
AOD — Aerosol Optical Depth: haziness/particulates proxy (MODIS).

Water / cryosphere

Precipitation — GPM IMERG (~10 km, satellite), CHIRPS (satellite+gauge, land), IMD gauge grid (India). ⚠️ satellite precip is biased, especially for extremes and orographic (mountain) rain — always cross-check.
Soil moisture — SMAP (microwave, ~top 5 cm).
TWS — Terrestrial Water Storage anomaly from GRACE/GRACE-FO (measures mass change via gravity; ~300 km, monthly). Groundwater = TWS − soil moisture − surface water − snow.
Sea-surface height / sea level — radar altimetry (Jason, SARAL/AltiKa).
SST — Sea-Surface Temperature (GHRSST/MUR, MODIS).
Snow / SWE — snow-covered area (MODIS) and Snow Water Equivalent.
Ice elevation — ICESat-2 lidar (ATL06 along-track, ATL15 gridded dh/dt).

Human / hazards

Nightlights — VIIRS Black Marble (VNP46): radiance of city lights; proxy for power/economic activity → power-outage and electrification mapping.
SAR backscatter (σ⁰, dB) — radar brightness; smooth water is dark → flood mapping (Sentinel-1, NISAR). OPERA DSWx is an analysis-ready surface-water product.
FRP / active fire — Fire Radiative Power & thermal-anomaly detections (VIIRS/MODIS, the FIRMS feed) — e.g. stubble-fire counts.
dNBR — differenced Normalized Burn Ratio = NBR_pre − NBR_post, where NBR = (NIR − SWIR)/(NIR + SWIR); burn-severity classes.

4. Spatial concepts

CRS / projection — the Earth is round; maps are flat. A Coordinate Reference System defines the lat/lon (e.g. EPSG:4326) or a projected grid (meters). ⚠️ mixing CRSs silently gives wrong areas/distances. rioxarray/geopandas handle reprojection.
Resampling / regridding — moving data from one grid to another (nearest, bilinear, conservative). ⚠️ comparing two products on different grids without regridding is a bug.
Area-weighting (cos-lat). A 0.25° pixel near the equator covers more ground than one near the pole. A correct regional mean weights each pixel by its area ∝ cos(latitude). Naive .mean() over lat/lon is biased. (See /learn/area-weighting.)
Zonal statistics — reducing pixels within a region/polygon to one number (mean, sum).
Masking — excluding pixels you shouldn’t average: clouds, water/ocean, fill values, low-quality retrievals. ⚠️ Fill values (e.g. −999, 65535) are “no data” — if you don’t mask them they poison your mean. Always check the product’s fill value + apply the scale factor / offset (e.g. MODIS LST is stored ×0.02 in Kelvin).
Footprint matching. ⚠️ When comparing products, average them over the same spatial footprint. A coastal “box” mean includes ocean for a satellite that sees ocean but is land-only for a gauge product → they’ll disagree for a dumb reason. Clip both to the same land/territory. (This handbook’s own project hit exactly this — see §8.)

5. Temporal concepts

Time series — a value per time step for your region.
Climatology / “normal” — the long-term average for each calendar period, over a fixed baseline (WMO standard: 1991–2020). “Normal rainfall” is meaningless without stating the baseline.
Anomaly — observation − climatology. Often expressed as percent of normal or as a standardized anomaly / z-score (anomaly ÷ standard deviation), e.g. SPI for drought.
Deseasonalizing — removing the regular annual cycle so a trend isn’t swamped by it (subtract the monthly climatology, or fit harmonics). ⚠️ a “rising NO₂” can be pure winter seasonality if you don’t deseasonalize.
Compositing — combining many noisy scenes into one cleaner one (e.g. max-NDVI or median over a month) to beat cloud/gaps. ⚠️ taking a naive mean of a window that includes recovery (e.g. a post-storm window that runs into when the lights came back) dilutes the signal.

6. Statistics for trends & change (the heart of it)

Linear trend (OLS slope) — fit a line, the slope is change-per-year. ⚠️ sensitive to outliers and assumes normal noise — risky for skewed satellite data.
Theil–Sen slope — the median of all pairwise slopes. Robust to outliers; the right default for noisy EO time series. (See /learn/trends.)
Mann–Kendall test — a non-parametric test for whether a monotonic trend is real (vs noise). Gives a p-value. Pair it with Theil–Sen.
p-value & significance — p = probability of seeing this trend if there were truly none. p < 0.05 (“significant”) is the common bar. ⚠️ “not significant” is a real result, not a failure — it means the data can’t distinguish the trend from noise. Report it as such.
Confidence interval (CI) — the plausible range for the slope. If the CI straddles zero, the sign is unresolved.
Autocorrelation — consecutive years aren’t independent (this year’s drought relates to last year’s). ⚠️ ignoring it makes you over-confident (too-small p-values); modified Mann–Kendall corrects for it.
Signal-to-Noise Ratio (SNR) — effect size ÷ natural variability. A big-looking change with SNR < 1 is probably noise. (See /learn/signal-vs-noise.)
Noise floor — the level a change must clear to count as real, measured from the data (e.g. the control’s own variability), not hand-picked. Below the floor → “no clear signal.”
Extreme indices (rainfall): Rx1day/Rx5day (annual max 1-/5-day total), R95p/R99p (rain from very-wet days). Total rain can be flat while intensity rises — different question.
Minimum record length. ⚠️ A “trend” over <~10 years isn’t scientifically honest; ≥30 years is a climate normal. Below the floor, refuse rather than mislead.

7. The core methods (your recipe book)

Trend — area-weight → annual series → deseasonalize (if needed) → Theil–Sen + Mann–Kendall. Answers: “is X rising/falling over years?”
Anomaly vs baseline — accumulate/average → build fixed-baseline climatology → percent-of-normal + z-score/SPI. Answers: “is this year unusual?”
BACI (Before-After-Control-Impact) — compare the change at an impacted site to the change at a comparable control site, which cancels the regional/background trend. The workhorse for “did this event/intervention do something?” (deforestation, power outage). Answers: “did X cause a local change beyond the background?”
Classification / thresholding — turn a continuous band into classes via a cut (NDWI > t → water; σ⁰ < t → flood; dNBR bins → burn severity). Answers: “where is X?”
Differencing — pre vs post images to isolate change (flood extent, dNBR, deformation).
Correlation / regression — relate two variables (e.g. upwind fire counts vs downwind NO₂). ⚠️ correlation ≠ causation; BACI is closer to causal.

8. Verification & trust (what makes you independent)

This is the part Earth scientists are paid for — and the part you can actually learn.

Cross-validation (multi-product agreement). Compute the same answer from independent datasets and see if they agree. Reanalysis (ERA5) vs satellite+gauge (CHIRPS) vs pure gauge (IMD). If they disagree, the trend is unresolved — say so. Agreement across independent methods is the strongest evidence you can offer without fieldwork.
Ground-truth / in-situ. Compare the satellite answer to ground measurements: rain gauges (IMD), flux towers, EPA/CPCB/OpenAQ air monitors, stream gauges, GHCN/meteostat weather stations. Gauges are the anchor; where satellite/reanalysis disagree with gauges, trust the gauges (within their own coverage limits). ⚠️ ground networks are sparse — “no nearby station” is itself a finding.
Refusal discipline. A trustworthy tool refuses questions the data can’t answer: record too short, region ambiguous, two variables conflated, too few clear scenes. Refusing is the system working, not failing. (See /learn/refusal-discipline.)
Reproducibility. Same query → same number. Pin dataset versions, record the exact region/dates/method, ship the code. A result you can’t reproduce isn’t verified.
The “is it real?” checklist before you believe any EO result:
- Did I mask clouds, water, and fill values, and apply scale/offset?
- Is the record long enough for the claim (≥10y for a trend)?
- Did I area-weight, and compare like-for-like footprints?
- Is the change bigger than the noise (SNR, p-value, CI not straddling zero)?
- Does an independent product/gauge agree?
- Am I claiming causation when I only have correlation?
- If any answer is “no” → downgrade the claim or refuse.

9. Pitfalls that bite everyone (⚠️ keep this list)

Fill values / scale factors not applied → garbage means.
Cloud contamination in optical/thermal not masked → false low NDVI, wrong LST.
Sensor intercalibration when splicing records (Terra vs Aqua MODIS; OMI vs TROPOMI; GRACE vs GRACE-FO gap) → fake jumps. Don’t blindly concatenate missions.
Quality flags ignored — most L2 products ship a per-pixel QA band; use it.
Mixed pixels — coarse pixel covers several land types; bad for smallholder agriculture.
Ocean/land/footprint mismatch when comparing products (see §4).
Baseline dependence — percent-of-normal changes entirely with the baseline you pick.
Column vs surface — NO₂/CH₄ columns aren’t surface concentrations.
LST vs air temperature — different things.
p-hacking / garden of forking paths — trying windows/thresholds until you get significance. Pre-specify the method; if you change it, change it for a principled reason and report both.
Over-claiming — the cardinal sin. “Rainfall is declining” when it’s a non-significant, single-product, ocean-contaminated box mean. Calibrate the claim to the evidence.

10. The Python toolchain

Need	Library
N-dim labelled arrays (the core)	xarray
Numerics	numpy, scipy
Bigger-than-memory / parallel	dask
Raster + CRS	rioxarray, rasterio
Vector (polygons/points)	geopandas, shapely
NASA data pull + auth	earthaccess
STAC search	pystac-client, odc-stac
Planetary compute	earthengine-api (GEE)
Robust trend + significance	pymannkendall (Theil–Sen + Mann–Kendall in one)
India gauge rainfall/temp	imdlib (downloads IMD Pune grids)
Plotting	matplotlib, hvplot/holoviews

Minimal trend, end to end:

import earthaccess, xarray as xr, numpy as np, pymannkendall as mk
earthaccess.login(strategy="netrc")
files = earthaccess.search_data(short_name="GPM_3IMERGM", bounding_box=(74,18,78,21),
                                temporal=("2001-01-01","2022-12-31"))
ds = xr.open_mfdataset(earthaccess.open(files))
w = np.cos(np.deg2rad(ds.lat))                       # area weight
annual = ds["precipitation"].resample(time="1YE").sum()
series = annual.weighted(w).mean(dim=["lat","lon"])  # area-weighted regional annual mm
r = mk.original_test(series.values)                  # Theil–Sen slope + Mann–Kendall p
print(r.slope, "mm/yr  p=", r.p, "significant" if r.h else "no significant trend")

11. Worked example: “Is rainfall declining in my district?” (the honest version)

Pull IMERG (satellite) for the district box, 2001–2022.
Area-weight → annual series → Theil–Sen + Mann–Kendall. Say you get −3 mm/yr, p=0.4.
Quantify: p=0.4 → not significant. Honest answer already: “no detectable trend.”
Cross-check: recompute with CHIRPS and IMD gauges, on the same India-land footprint. IMD says −, CHIRPS says +. → sources disagree → unresolved.
Verdict: “Over 2001–2022 we cannot establish a rainfall trend for this district: the satellite trend is not significant and the gauge and satellite products disagree on its direction.” That sentence is more valuable — and more correct — than any single number.

That is exactly what this project does, and you just did it yourself.

12. This project as a reference implementation

/learn pages (interactive): trends, signal-vs-noise, area-weighting, NDVI, data-formats, cross-validation, ground-truth, refusal-discipline, burn-severity, nightlights, SAR-flood.
verified-earth-agent/ (the engine): gee.py (area-weighted annual series for ERA5/CHIRPS/ MODIS), eval/compute_golden.py (trend + Theil–Sen + Mann–Kendall + status), scripts/ deforestation_analog.py / build_reports.py (BACI), build_disaster_nightlights.py (BACI on nightlights with a noise-floor guard), build_imd_crosscheck.py (the IMD ground-truth cross-check). Read these as worked, honest code.

13. Where to go deeper

Books/refs: Remote Sensing and Image Interpretation (Lillesand); the Carpentries geospatial Python lessons; NASA’s ARSET free training; Google Earth Engine’s docs & Cloud-Based Remote Sensing with GEE (open book); An Introduction to Statistical Learning for the stats.
Practice datasets: MODIS NDVI/LST, IMERG/CHIRPS, ERA5-ARCO, Sentinel-2 on the cloud, IMD via imdlib, Black Marble.
Habits: always state region + dates + baseline + method + version; always area-weight; always cross-check; always be willing to write “unresolved.”

You don’t need to be an Earth scientist. You need to (1) know what each variable really is, (2) compute robustly, (3) quantify uncertainty, and (4) refuse to overclaim. That’s learnable, and you now have the map.

Try it — the minimal trend, runnable

The §10 snippet, live in your browser. Run it, then change the noise until Mann–Kendall stops calling it significant:

editable · runs in your browser

# the handbook's minimal trend, end to end — runnable, on synthetic data
import numpy as np, pymannkendall as mk
rng = np.random.default_rng(0)
years = np.arange(2001, 2023)
# a regional annual series with a real -3/yr decline + noise (this is what real code returns
# after area-weighting an IMERG cube; here we synthesise it so it runs offline):
series = 1300 - 3.0 * (years - 2001) + rng.normal(0, 80, years.size)
r = mk.original_test(series)
print(f"slope {r.slope:+.2f} /yr   p={r.p:.3f}   {'significant' if r.h else 'not significant'}")
print("planted truth: -3.00 /yr   (swap 'series' for real earthaccess data and the code is identical)")