dynamical.org Zarr Weather Dataset Catalog

dynamical.org transforms weather data archives into the zarr file format, making them easier to access. This page contains the contents of dynamical.org's dataset catalog documentation, concatenated into a single page designed to be read by an LLM or AI assistant.

NOAA GFS analysis, hourly

Dataset url https://data.dynamical.org/noaa/gfs/analysis-hourly/latest.zarr
Spatial domain Global
Spatial resolution 0.25 degrees (~20km)
Time domain 2015-01-15 00:00:00 UTC to 2024-07-01 00:00:00 UTC
Time resolution 1 hour

Description

The Global Forecast System (GFS) is a National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) weather forecast model that generates data for dozens of atmospheric and land-soil variables, including temperatures, winds, precipitation, soil moisture, and atmospheric ozone concentration. The system couples four separate models (atmosphere, ocean model, land/soil model, and sea ice) that work together to depict weather conditions.

This dataset is an "analysis" containing the model's best estimate of each value at each timestep. In other words, it does not contain a forecast dimension. GFS starts a new model run every 6 hours and dynamical.org has created this analysis by concatenating the first 6 hours of each forecast. Before 2021-02-27 GFS had a 3 hourly step at early forecast hours. In this reanalysis we have used linear interpolation in the time dimension to fill in the two timesteps between the three-hourly values prior to 2021-02-27.

Storage for this dataset is generously provided by Source Cooperative, a Radiant Earth initiative.

For LLMs & AI Assistants

Dataset summary: NOAA GFS analysis, hourly is a dataset containing 4 variables across 3 dimensions, covering Global at 0.25 degrees (~20km) resolution and 2015-01-15 00:00:00 UTC to 2024-07-01 00:00:00 UTC at 1 hour resolution.

Key use cases: This dataset is suitable for data analysis, visualization, and scientific research related to this domain.

Access pattern: Use this URL with optional email parameter to access this dataset programmatically: https://data.dynamical.org/noaa/gfs/analysis-hourly/latest.zarr

Details

The data values in this dataset have been rounded in their binary representation to improve compression. We round to retain 9 bits of the floating point number's mantissa (a 10 digit significand) which creates a maximum of 0.2% difference between the original and rounded value. See Klöwer et al. 2021 for more information.

Examples

Brief example usage:

dynamical.org - NOAA GFS analysis, hourly
Mean temperature for a single day

# Example: Mean temperature for a single day
# Dataset: NOAA GFS analysis, hourly
# This code demonstrates how to access and process the NOAA GFS analysis, hourly dataset


import xarray as xr

ds = xr.open_zarr("https://data.dynamical.org/noaa/gfs/analysis-hourly/[email protected]")
ds["temperature_2m"].sel(time="2024-06-01T00:00").mean().compute()
    

What this example does: This code demonstrates accessing the NOAA GFS analysis, hourly dataset and processing its data.

Key components: Data loading, processing, and potentially visualization of the dataset variables.

Python notebook example usage:

{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "%pip install xarray[complete]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import xarray as xr"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "ds = xr.open_zarr(\"https://data.dynamical.org/noaa/gfs/analysis-hourly/[email protected]\")\n",
        "ds"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": [
        "ds[\"temperature_2m\"].sel(\n",
        "    time=\"2024-02-01T00:00\",\n",
        "    latitude=slice(70, 0),\n",
        "    longitude=slice(0, 70),\n",
        ").plot()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "(\n",
        "  ds[\"precipitation_surface\"]\n",
        "    .sel(latitude=19.1, longitude=72.9, method=\"nearest\") # Mumbai, India\n",
        "    .sel(time=slice(\"2023-01-01\", \"2024-01-01\"))\n",
        "    .plot()\n",
        ")\n",
        "# Can you spot monsoon season?"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Run this cell yourself to create an interactive animation\n",
        "from IPython.display import HTML\n",
        "from matplotlib.animation import FuncAnimation\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "\n",
        "wind_speed = (\n",
        "    np.sqrt(ds[\"wind_u_10m\"]**2 + ds[\"wind_v_10m\"]**2)\n",
        "    .sel(\n",
        "        time=slice(\"2023-08-28T00\", \"2023-09-01\"),\n",
        "        latitude=slice(45, 18),\n",
        "        longitude=slice(-92, -55)\n",
        "    )\n",
        "    .load()\n",
        ")\n",
        "\n",
        "fig, ax = plt.subplots()\n",
        "ax.set_title(\"Hurricane Idalia, August 2023\")\n",
        "ax.axis(\"off\")\n",
        "\n",
        "img = ax.imshow(wind_speed.isel(time=0), cmap='YlGnBu_r')\n",
        "anim = FuncAnimation(fig=fig, frames=wind_speed, func=lambda frame: img.set_data(frame), interval=60)\n",
        "\n",
        "HTML(anim.to_jshtml())"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": ".venv",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}

NOAA GEFS forecast, 35 day

Dataset url https://data.dynamical.org/noaa/gefs/forecast-35-day/latest.zarr
Spatial domain Global
Spatial resolution 0-240 hours: 0.25 degrees (~20km), 243-840 hours: 0.5 degrees (~40km)
Time domain Forecasts initialized 2020-10-01 00:00:00 UTC to Present
Time resolution Forecasts initialized every 24 hours.
Forecast domain Forecast lead time 0-840 hours (0-35 days) ahead
Forecast resolution Forecast step 0-240 hours: 3 hourly, 243-840 hours: 6 hourly

Description

The Global Ensemble Forecast System (GEFS) is a National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) weather forecast model. GEFS creates 31 separate forecasts (ensemble members) to describe the range of forecast uncertainty.

This dataset is an archive of past and present GEFS forecasts. Forecasts are identified by an initialization time (init_time) denoting the start time of the model run as well as by the ensemble_member. Each forecast has a 3 hourly forecast step along the lead_time dimension. This dataset contains only the 00 hour UTC initialization times which produce the full length, 35 day forecast.

Storage for this dataset is generously provided by Source Cooperative, a Radiant Earth initiative.

For LLMs & AI Assistants

Dataset summary: NOAA GEFS forecast, 35 day is a dataset containing 24 variables across 5 dimensions, covering Global at 0-240 hours: 0.25 degrees (~20km), 243-840 hours: 0.5 degrees (~40km) resolution and Forecasts initialized 2020-10-01 00:00:00 UTC to Present at Forecasts initialized every 24 hours. resolution.

Key use cases: This dataset is suitable for forecasting and prediction models, data analysis, visualization, and scientific research related to this domain.

Access pattern: Use this URL with optional email parameter to access this dataset programmatically: https://data.dynamical.org/noaa/gefs/forecast-35-day/latest.zarr

Details

Interpolation

Source data is available at both 0.25-degree and 0.5-degree resolutions. All variables except the 100m wind components are derived from a 0.25-degree grid for the first 240 hours of each forecast and from a 0.5-degree grid for the remainder. 100m wind components are derived from a 0.5-degree grid for all lead times. Bilinear interpolation is used to convert 0.5-degree data to a 0.25-degree grid. The original 0.5-degree values can be retrieved by selecting every other pixel starting from offset 0 in both the latitude and longitude dimensions (e.g. array[::2, ::2]).

Compression

The data values in this dataset have been rounded in their binary floating point representation to improve compression. See Klöwer et al. 2021 for more information on this approach. The exact number of rounded bits can be found in our reformatting code.

Examples

Brief example usage:

dynamical.org - NOAA GEFS forecast, 35 day
Maximum temperature in ensemble forecast

# Example: Maximum temperature in ensemble forecast
# Dataset: NOAA GEFS forecast, 35 day
# This code demonstrates how to access and process the NOAA GEFS forecast, 35 day dataset


import xarray as xr  # xarray>=2025.1.2 and zarr>=3.0.4 for zarr v3 support

ds = xr.open_zarr("https://data.dynamical.org/noaa/gefs/forecast-35-day/[email protected]")
ds['temperature_2m'].sel(init_time="2025-01-01T00", latitude=0, longitude=0).max().compute()
    

What this example does: This code demonstrates accessing the NOAA GEFS forecast, 35 day dataset and processing its data.

Key components: Data loading, processing, and potentially visualization of the dataset variables.

Python notebook example usage:

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Quickstart: NOAA GEFS forecast, 35 day - dynamical.org Zarr\n",
        "A brief introduction to the NOAA GEFS forecast dataset transformed into an analysis-ready, cloud-optimized format by dynamical.org.\n",
        "\n",
        "Dataset documentation: https://dynamical.org/catalog/noaa-gefs-forecast-35-day/\n"
      ],
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# If running locally, follow README.md for simple dependency installation.\n",
        "# If using Google Colab, run this cell and then restart the notebook.\n",
        "%pip install \"xarray[complete]>=2025.1.2\" \"zarr>=3.0.4\" requests aiohttp"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "import xarray as xr\n",
        "\n",
        "ds = xr.open_zarr(\"https://data.dynamical.org/noaa/gefs/forecast-35-day/[email protected]\", decode_timedelta=True, chunks=None)\n",
        "ds"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Plot the ensemble traces of the 2025-01-01 forecast at a point on the earth\n",
        "plot_ds = ds.sel(init_time=\"2025-01-01T00\", latitude=-23.5, longitude=-46.6, method=\"nearest\")  # São Paulo, Brazil\n",
        "_ = plot_ds[\"temperature_2m\"].plot(x=\"valid_time\", hue=\"ensemble_member\", figsize=(12, 8))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Plot a summary of the ensemble distribution using quantiles\n",
        "plot_ds = ds.sel(init_time=\"2024-07-01T00\", latitude=0, longitude=0)\n",
        "(\n",
        "    plot_ds[\"temperature_2m\"]\n",
        "    .quantile([0.05, 0.25, 0.5, 0.75, 0.95], dim=\"ensemble_member\")\n",
        "    .plot(x=\"valid_time\", hue=\"quantile\")\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": [
        "# The following larger area examples run faster using dask which happens by default if you omit chunks=None\n",
        "ds = xr.open_zarr(\"https://data.dynamical.org/noaa/gefs/forecast-35-day/[email protected]\", decode_timedelta=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Calculate a quantile across ensemble members and display the result as a map\n",
        "(\n",
        "    ds[\"temperature_2m\"]\n",
        "    .sel(init_time=\"2025-01-01T00\")\n",
        "    .sel(lead_time=\"7d\")\n",
        "    .sel(latitude=slice(70, 20), longitude=slice(0, 50))\n",
        "    .quantile(0.25, dim=\"ensemble_member\") # 25% chance it gets colder than this\n",
        "    .plot()\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Highlight areas of uncertainty in temperature forecast over the first 7 days of 2025\n",
        "\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "plot_ds = (\n",
        "    ds.sel(init_time=\"2024-03-01T00\")\n",
        "    .sel(latitude=slice(-8, -46), longitude=slice(112, 154))  # Australia\n",
        "    .sel(lead_time=slice(\"0h\", \"6d\")).mean(dim=\"lead_time\")  # Average the first week of the forecast\n",
        ")\n",
        "\n",
        "# Standard deviation across ensemble members to highlight regions of forecast uncertainty\n",
        "plot_ds[\"temperature_2m\"].std(dim=\"ensemble_member\").plot()\n",
        "plt.title(f\"Ensemble standard deviation 2 meter temperature [{ds['temperature_2m'].attrs['units']}]\")\n",
        "\n",
        "plt.tight_layout()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": ".venv",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}

NOAA GEFS analysis

Dataset url https://data.dynamical.org/noaa/gefs/analysis/latest.zarr
Spatial domain Global
Spatial resolution 0.25 degrees (~20km)
Time domain 2000-01-01 00:00:00 UTC to Present
Time resolution 3.0 hours

Description

The Global Ensemble Forecast System (GEFS) is a National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) weather forecast model.

This analysis dataset is an archive of the model's best estimate of past weather. It is created by concatenating the first few hours of each historical forecast to provide a dataset with dimensions time, latitude, and longitude.

This dataset is designed to be used in conjunction with the GEFS forecast 35 day dataset.

Storage for this dataset is generously provided by Source Cooperative, a Radiant Earth initiative.

For LLMs & AI Assistants

Dataset summary: NOAA GEFS analysis is a dataset containing 21 variables across 3 dimensions, covering Global at 0.25 degrees (~20km) resolution and 2000-01-01 00:00:00 UTC to Present at 3.0 hours resolution.

Key use cases: This dataset is suitable for data analysis, visualization, and scientific research related to this domain.

Access pattern: Use this URL with optional email parameter to access this dataset programmatically: https://data.dynamical.org/noaa/gefs/analysis/latest.zarr

Details

Sources

To provide the longest possible historical record, this dataset in constructed from three distinct GEFS forecast archives.

Variable availability

Data is available for all variables at all times with the following exceptions.

Construction

To create a single time dimension we concatenate the first few hours of each forecast. From 2000-01-01 to 2019-12-31 reforecasts are available once per day and this dataset uses the first 21 or 24 hours of each forecast. From 2020-01-01 to present forecasts are available every 6 hours and this dataset uses the first 3 or 6 hours of each forecast. Variables with an instantaneous step_type use the shortest possible lead times (e.g. 0 and 3 hours) while accumulated variables must use one additional forecast step (e.g. 3 and 6 hours) because they do not have an hour 0 forecast value.

Interpolation

For most of the time range of the archive the source data is available at 0.25-degree resolution and a 3 hourly time step and we perform no interpolation. There are two exceptions to this. 1) From 2020-01-01 to 2020-09-23 the source data has a 1.0-degree spatial resolution and a 6 hourly time step. 2) From 2020-09-23 to present the 100m wind components have a 0.5-degree spatial resolution in the source data. To provide a consistent archive in the above two cases we first perform bilinear interpolation in space to 0.25-degree resolution followed by linear interpolation in time to a 3-hourly timestep if necessary. The original, uninterpolated data can be obtained by selecting latitudes and longitudes evenly divisible by 1 and, in case 1), time steps whose hour is divisible by 6.

Compression

The data values in this dataset have been rounded in their binary floating point representation to improve compression. See Klöwer et al. 2021 for more information on this approach. The exact number of rounded bits can be found in our reformatting code.

Examples

Brief example usage:

dynamical.org - NOAA GEFS analysis
Temperature at a specific place and time

# Example: Temperature at a specific place and time
# Dataset: NOAA GEFS analysis
# This code demonstrates how to access and process the NOAA GEFS analysis dataset


import xarray as xr  # xarray>=2025.1.2 and zarr>=3.0.4 for zarr v3 support

ds = xr.open_zarr("https://data.dynamical.org/noaa/gefs/analysis/[email protected]")
ds['temperature_2m'].sel(time="2025-01-01T00", latitude=0, longitude=0).compute()
    

What this example does: This code demonstrates accessing the NOAA GEFS analysis dataset and processing its data.

Key components: Data loading, processing, and potentially visualization of the dataset variables.

Python notebook example usage:

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Quickstart: NOAA GEFS analysis - dynamical.org Zarr\n",
        "A brief introduction to the NOAA GEFS analysis dataset transformed into an analysis-ready, cloud-optimized format by dynamical.org.\n",
        "\n",
        "Dataset documentation: https://dynamical.org/catalog/noaa-gefs-analysis/"
      ],
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# If running locally, follow README.md for simple dependency installation.\n",
        "# If using Google Colab, run this cell and then restart the notebook.\n",
        "%pip install \"xarray[complete]>=2025.1.2\" \"zarr>=3.0.4\" requests aiohttp"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "import xarray as xr\n",
        "\n",
        "ds = xr.open_zarr(\"https://data.dynamical.org/noaa/gefs/analysis/latest.zarr\", chunks=None)\n",
        "ds"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Precipitation over time in Ulaanbaatar, Mongolia\n",
        "ds[\"precipitation_surface\"].sel(latitude=47.9, longitude=106.9, method=\"nearest\").plot()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Map of temperature across Africa at the earliest available time in the dataset\n",
        "ds[\"temperature_2m\"].sel(\n",
        "    time=\"2000-01-01T03:00\",\n",
        "    latitude=slice(50, -30),\n",
        "    longitude=slice(-30, 60),\n",
        ").plot()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "\n",
        "# Select multiple locations and aggregate precipitation to 10 day averages\n",
        "\n",
        "cities = pd.DataFrame([\n",
        "    {\"city\": \"Mumbai\",    \"latitude\":  19.07, \"longitude\":  72.87},\n",
        "    {\"city\": \"Sao Paulo\", \"latitude\": -23.55, \"longitude\": -46.63},\n",
        "    {\"city\": \"Sydney\",    \"latitude\": -33.87, \"longitude\": 151.21},\n",
        "]).set_index(\"city\").to_xarray()\n",
        "\n",
        "(\n",
        "  ds[\"precipitation_surface\"]\n",
        "    .sel(time=slice(\"2015\", \"2025\"))\n",
        "    .sel(latitude=cities.latitude, longitude=cities.longitude, method=\"nearest\")\n",
        "    .resample(time=\"10d\").mean()\n",
        "    .plot(hue=\"city\", size=6, aspect=3)\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [],
      "source": [
        "from IPython.display import HTML\n",
        "from matplotlib.animation import FuncAnimation\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "# An interactive animation of precipitable water in the atmosphere during Typhoon Mawar\n",
        "\n",
        "data = (\n",
        "    ds[\"precipitable_water_atmosphere\"]\n",
        "    .sel( # Typhoon Mawar\n",
        "        time=slice(\"2023-05-24T00\", \"2023-06-05T00\"),\n",
        "        latitude=slice(50, -10),  \n",
        "        longitude=slice(105, 180), \n",
        "    ).load()\n",
        ")\n",
        "\n",
        "dpi=200\n",
        "fig, ax = plt.subplots(figsize=(data.longitude.size/dpi, data.latitude.size/dpi), dpi=dpi)\n",
        "fig.subplots_adjust(left=0, right=1, top=1, bottom=0)\n",
        "ax.axis(\"off\")\n",
        "\n",
        "img = ax.imshow(data.isel(time=0), cmap='magma', vmin=0, vmax=75)\n",
        "anim = FuncAnimation(fig=fig, frames=data, func=lambda frame: img.set_data(frame), interval=80)\n",
        "\n",
        "HTML(anim.to_jshtml())"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": ".venv",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}