Cloud-access index files for NOAA model data (Kerchunk) — NODD KERCHUNK

In plain English

What it measures. Not a dataset of measurements itself, but a set of index files that point into existing NOAA ocean and weather model outputs, letting software load just the pieces it needs from cloud storage.

How it's made. Generated automatically in the cloud using the 'kerchunk' technique, which maps existing files to the Zarr format; rebuilt each time new source data arrives.

How & where you'd use it. A technical helper that makes large NOAA model archives faster and cheaper to work with in the cloud; useful to data engineers and scientists rather than a stand-alone product.

What's measured

aws-pdsclimatecoastaldisaster responseenvironmentalmeteorologicaloceanswaterweather

Coverage & cadence

Time span— → ongoing

What you can do with it

Map air pollutants — NO₂, aerosols, ozone
Track greenhouse gases and Earth's energy budget
Feed weather and air-quality analysis

Official description

This repository contains references to datasets published to the NOAA Open Data Dissemination Program. These reference datasets serve as index files to the original data by mapping to the Zarr V2 specification. When multidimensional model output is read through zarr, data can be lazily loaded (i.e. retrieving only the data chunks needed for processing) and data reads can be scaled horizontally to optimize object storage read performance. The process used to optimize the data is called kerchunk. RPS runs the workflow in their AWS cloud environment every time a new data notification is received from a relevant source data bucket. These are the current datasets being cloud-optimized. Refer to those pages for file naming conventions and other information regarding the specific model implementations: NOAA Operational Forecast System (OFS) NOAA Global Real-Time Ocean Forecast System (Global RTOFS) NOAA National Water Model Short-Range Forecast Filenames follow the source dataset’s conventions. For example, if the source file is nos.dbofs.fields.f024.20240527.t00z.nc Then the cloud-optimized filename is the same, with “.zarr” appended nos.dbofs.fields.f024.20240527.t00z.nc.zarr **Data Aggregations** We also produce virtual aggregations to group an entire forecast model run, and the “best” available forecast. Best Forecast (continuously updated) - nos.dbofs.fields.best.nc.zarr Full Model Run - nos.dbofs.fields.forecast.[YYYYMMDD].t[CC]z.nc.zarr - CC is the model run cycles, 00, 06, 12, 18 , or 03, 09, 15, 21 for nowcast and forecast runs - YYYY = year, MM = month, DD = day Cloud op

Get the data

noaa_access.py

# NOAA Open Data on AWS — public S3, no login
import s3fs

fs = s3fs.S3FileSystem(anon=True)
# find this dataset's bucket in the docs link in the sidebar, then:
# files = fs.ls("noaa-<bucket>/...")
# open NetCDF/GRIB with xarray, COGs with rioxarray

NOAA Open Data is on public AWS S3 — no login at all (anonymous access).

Official links

Open data source NOAA Open Data