Visualize zarr

Demonstrates reading a zarr store from the VEDA STAC catalog using xarray and visualizing the data using hvplot
Author

Slesa Adhikari, Julia Signell

Published

March 9, 2023

Run this notebook

You can launch this notebook in VEDA JupyterHub by clicking the link below.

Launch in VEDA JupyterHub (requires access)

Learn more

Inside the Hub

This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.

See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.

Outside the Hub

The data is in a protected bucket. Please request access by emailing aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

%run -i 'cognito_login.py'

Approach

  1. Use pystac to open a STAC collection
  2. Use xarray and dask to lazily read in the data
  3. Plot the data using hvplot

About the data

This is the Gridded Daily OCO-2 Carbon Dioxide assimilated dataset. More information can be found at: OCO-2 GEOS Level 3 daily, 0.5x0.625 assimilated CO2 V10r (OCO2_GEOS_L3CO2_DAY)

The data has been converted to zarr format and published to the development version of the VEDA STAC Catalog.

import pystac
import xarray as xr

import hvplot.xarray  # noqa

Declare your collection of interest

You can discover available collections the following ways:

  • Programmatically: see example in the list-collections.ipynb notebook
  • JSON API: https://openveda.cloud/api/stac/collections
  • STAC Browser: https://openveda.cloud
STAC_API_URL = "https://openveda.cloud/api/stac"
collection_id = "oco2-geos-l3-daily"

Get STAC collection

Use pystac to access the STAC collection.

collection = pystac.Collection.from_file(f"{STAC_API_URL}/collections/{collection_id}")
collection

We can see that there is one zarr asset:

collection.get_assets(media_type="application/vnd+zarr")
{'zarr': <Asset href=s3://veda-data-store/oco2-geos-l3-daily/OCO2_GEOS_L3CO2_day.zarr>}

Read from zarr to xarray

With the url pointing to the Zarr store, you can create an xarray dataset backed by a dask array.

url = collection.assets["zarr"].href

ds = xr.open_dataset(url, engine="zarr", chunks="auto")
ds
<xarray.Dataset> Size: 8GB
Dimensions:   (time: 2500, lat: 361, lon: 576)
Coordinates:
  * lat       (lat) float64 3kB -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0
  * lon       (lon) float64 5kB -180.0 -179.4 -178.8 ... 178.1 178.8 179.4
  * time      (time) datetime64[ns] 20kB 2015-01-01T12:00:00 ... 2021-11-04T1...
Data variables:
    XCO2      (time, lat, lon) float64 4GB dask.array<chunksize=(200, 200, 200), meta=np.ndarray>
    XCO2PREC  (time, lat, lon) float64 4GB dask.array<chunksize=(200, 200, 200), meta=np.ndarray>
Attributes: (12/25)
    BuildId:                        B10.2.06
    Contact:                        Brad Weir (brad.weir@nasa.gov)
    Conventions:                    CF-1
    DataResolution:                 0.5x0.625
    EastBoundingCoordinate:         179.375
    Format:                         NetCDF-4/HDF-5
    ...                             ...
    ShortName:                      OCO2_GEOS_L3CO2_DAY_10r
    SouthBoundingCoordinate:        -90.0
    SpatialCoverage:                global
    Title:                          OCO-2 GEOS Level 3 daily, 0.5x0.625 assim...
    VersionID:                      V10r
    WestBoundingCoordinate:         -180.0

In xarray you can inspect just one data variable using dot notation:

ds.XCO2
<xarray.DataArray 'XCO2' (time: 2500, lat: 361, lon: 576)> Size: 4GB
dask.array<open_dataset-XCO2, shape=(2500, 361, 576), dtype=float64, chunksize=(200, 200, 200), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 3kB -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0
  * lon      (lon) float64 5kB -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4
  * time     (time) datetime64[ns] 20kB 2015-01-01T12:00:00 ... 2021-11-04T12...
Attributes:
    long_name:  Assimilated dry-air column average CO2 daily mean
    units:      mol CO2/mol dry

Plot data

We can plot the XCO2 variable as an interactive map (with date slider) using hvplot.

ds.XCO2.hvplot(
    x="lon",
    y="lat",
    groupby="time",
    coastline=True,
    rasterize=True,
    aggregator="mean",
    widget_location="bottom",
    frame_width=600,
)

The time slider will only work when running in the notebook. When rendered on a static website the slider has no impact.