q25·intermediate

Is the lake or river my town drinks from getting choked with toxic algae?

water-resourcespublic-healthhydrologybiosphere Datasets: 3 20–60 min
Find the data for your area

Draw a rectangle to pick your area of interest, then see what NASA data covers it (live, here in your browser) or download a ready-to-run notebook with your AOI pre-filled. The notebook runs in any Python environment — it needs a free Earthdata Login to fetch the data.

Current AOI: -83.5, 41.4 → -82.5, 42 (Western Lake Erie (Toledo drinking-water intake))
On this page

Cyanobacteria (blue-green algae) blooms in inland reservoirs and lakes can poison a city's drinking water and shut down fishing, often with little to no on-the-ground monitoring. The CyAN program turns Sentinel-3 OLCI into a cyanobacteria index for the specific waterbody your town drinks from — letting you watch a bloom build *before* it reaches the intake pipe. This is the **freshwater / drinking-supply** companion to the coastal-HAB question (q14). It is about cyanobacteria in lakes and reservoirs, not marine algal blooms in coastal seawater.

Is the lake or river my town drinks from getting choked with toxic algae?

Cyanobacteria (blue-green algae) blooms in inland reservoirs and lakes can poison a city’s drinking water and shut down fishing, often with little to no on-the-ground monitoring. The CyAN program turns Sentinel-3 OLCI into a cyanobacteria index for the specific waterbody your town drinks from — letting you watch a bloom build before it reaches the intake pipe.

This is the freshwater / drinking-supply companion to the coastal-HAB question (q14). It is about cyanobacteria in lakes and reservoirs, not marine algal blooms in coastal seawater.

What you can answer

  • Is a bloom forming on my source water right now? — The CyAN cyanobacteria index (CI) maps cyanobacteria abundance pixel-by-pixel over the lake or reservoir.
  • How severe and how big is it? — Translate the index into low/moderate/high categories and measure the bloom area (km²) over the waterbody.
  • Is it near the drinking-water intake? — Sample the index at the intake location and its surrounding pixels, not just the lake-wide average.
  • When does it usually peak? — Build a seasonal climatology (2016+) so you know your normal bloom window and can flag an early or unusually intense season.
  • Is it getting worse year over year? — Track peak bloom area and intensity across the Sentinel-3 record to see a trend.

What you can NOT answer with these datasets alone

  • Lakes outside the United States — the merged CyAN product is gridded to a CONUS-only 300 m raster (US lakes and reservoirs). For lakes elsewhere, use the OLCI inland-water Level-2 scenes (OLCIS3A_L2_ILW / OLCIS3B_L2_ILW) and derive the index yourself.
  • Toxin concentration (microcystin, cylindrospermopsin) — the index tracks cyanobacteria abundance, not toxin level. Toxins require water sampling and lab assay (ELISA / LC-MS).
  • Is the treated tap water safe? — Satellites see the raw source water, not what comes out after treatment. Combine with utility finished-water sampling.
  • Small ponds, narrow rivers, or near-shore strips — OLCI is ~300 m resolution; pixels contaminated by land/shoreline are masked. Many small reservoirs are too small to resolve.
  • Sub-surface or bottom-hugging blooms — satellites see only the surface optical layer; a bloom mixed deep or sitting on the bottom can be invisible.
  • Species identity — the index flags cyanobacteria-like signals, not which species or whether it is a toxin-producing strain. Confirm in the lab.

Code template (Python, cloud-direct)

Verified locally. The merged CyAN product ships as Cloud-Optimized GeoTIFFs (one 7-day composite per file), gridded to the CONUS Albers projection (EPSG:5070) at 300 m — so it is US lakes only, and you read it with rioxarray, not xarray. Each pixel is a digital number 0–255: 0 = below detection, 1–253 = increasing cyanobacteria index, 254 = land, 255 = no data.

import earthaccess
import rioxarray
import numpy as np
import pandas as pd
from rasterio.warp import transform_bounds

earthaccess.login(strategy="netrc")

# Western Lake Erie — Toledo drinking-water intake region (lon/lat)
aoi = (-83.5, 41.4, -82.5, 42.0)
intake_lon, intake_lat = -83.26, 41.69   # approx. Toledo Collins Park WTP crib intake

# 1. Merged Sentinel-3 OLCI cyanobacteria index (CyAN), 7-day composites, 2016+
results = earthaccess.search_data(
    short_name="MERGED_S3_OLCI_L3m_CYAN",
    bounding_box=aoi,
    temporal=("2016-04-01", "2025-10-31"),   # full record; bloom season is summer
)
files = earthaccess.open(results)

# 2. Build a bloom-severity time series over the lake AOI
records = []
for f in files:
    da = rioxarray.open_rasterio(f, masked=False).squeeze()      # DN raster, EPSG:5070
    # reproject the lon/lat AOI into the raster's CRS, then window
    xmin, ymin, xmax, ymax = transform_bounds("EPSG:4326", da.rio.crs, *aoi)
    win = da.rio.clip_box(xmin, ymin, xmax, ymax).values.astype("float32")

    cyano = (win >= 1) & (win <= 253)             # real water pixels with a reading
    dn = np.where(cyano, win, np.nan)             # DN where cyano detectable, else NaN

    # value at the intake pixel
    intake = da.rio.reproject("EPSG:4326").sel(
        x=intake_lon, y=intake_lat, method="nearest").values
    intake_dn = float(intake) if 1 <= intake <= 253 else np.nan

    records.append({
        "date":          pd.to_datetime(f.granule.get("time_start", None), errors="coerce"),
        "mean_dn":       float(np.nanmean(dn)) if cyano.any() else 0.0,
        "max_dn":        float(np.nanmax(dn))  if cyano.any() else 0.0,
        "bloom_area_px": int(cyano.sum()),       # any detectable cyano
        "high_area_px":  int((win >= 200).sum()),# high end of the index
        "intake_dn":     intake_dn,
    })

ts = pd.DataFrame(records).dropna(subset=["date"]).sort_values("date")
ts["bloom_area_km2"] = ts["bloom_area_px"] * (0.3 * 0.3)   # ~0.09 km² per 300 m pixel

# 3. Seasonal climatology + intake alerting
ts["intake_alert"] = ts["intake_dn"] >= 200               # high index at the intake → flag
print(ts.groupby(ts["date"].dt.month)["bloom_area_km2"].mean())
print(ts.loc[ts["intake_alert"], ["date", "intake_dn", "bloom_area_km2"]])

# To turn DN into the published cyanobacteria index / cell density, apply the
# conversion in the CyAN product spec (linked in Sources) — DN is a relative index here.

Expected output

  • A bloom-severity time series for your source water: lake-wide bloom area (km²) and the cyanobacteria index sampled at the drinking-water intake, across multiple bloom seasons.
  • A seasonal climatology showing the normal bloom window (for western Lake Erie, late July–September) so an early or oversized season stands out.
  • An intake alert flag — dates when high-severity cyanobacteria sat near the intake, the events a utility most needs to know about.
  • A map of the cyanobacteria index over the lake for any flagged date, showing where the bloom is concentrated relative to the intake.

Caveats

  • Index ≠ toxin. A high cyanobacteria index means “sample the water now,” not “the toxin level is X.” Pair every satellite alert with utility sampling.
  • Clouds and wind. Cloud cover blanks out scenes; wind can pile a bloom against one shore in hours. Treat each clear scene as a snapshot, not a continuous record.
  • Resolution limits. ~300 m pixels and shoreline masking mean small reservoirs and narrow river reaches are poorly resolved or excluded entirely.
  • Mixed pixels and Case-II water. Sediment and dissolved organics in turbid inland water can confuse retrievals; the CyAN index is tuned for cyanobacteria but is not infallible.

Cross-DAAC composition

OB.DAAC only — the merged CyAN product and OLCI inland-water L2 are both distributed by the NASA Ocean Biology DAAC, served through the CyAN project for inland waters.

Sources

How a scientist answers this
Parameters
Cyanobacteria abundance from the CyAN cyanobacteria index (CI) derived from merged Sentinel-3 OLCI (MERGED_S3_OLCI_L3m_CYAN, ~300 m inland-water resolution, 2016+), with OLCIS3A/3B L2 inland-water products underlying it; report CI converted to low/moderate/high categories, bloom area (km²) over the waterbody, and CI sampled at the drinking-water intake pixel plus its neighbors rather than the lake-wide mean.
Method
Mask to the waterbody, classify CI into severity categories, sum the area of moderate/high pixels for bloom extent, and sample the intake location; build a 2016+ seasonal climatology to define the normal bloom window and flag early or unusually intense seasons, and trend peak bloom area/intensity across the Sentinel-3 record (Theil–Sen/Mann–Kendall).
Validation
Cross-check the CI against any in-situ cyanobacteria/toxin (microcystin) or chlorophyll samples and against the seasonal climatology; note that CI is a proxy for cyanobacteria abundance (not toxin concentration directly), that ~300 m pixels and mixed land/water edges limit small or narrow waters, and that the merged product is US-focused.
In plain EnglishUse the satellite cyanobacteria index to see whether a blue-green algae bloom is building on your source water, how big it is, and whether it's reaching the intake — compared against the lake's normal bloom season.

Make it yours → Set the waterbody outline and the intake coordinates, choose the dates, and adjust the low/moderate/high CI category cutoffs in the notebook.

Run the core method · no login

The thresholding a measurement into classes at the heart of this question — runnable on synthetic data, right here. The full earthaccess code template further down does it on real NASA data (needs an Earthdata login).

editable · runs in your browser