Visualize zarr

Demonstrates reading a zarr store from the VEDA STAC catalog using intake and visualizing the data using hvplot
Author

Slesa Adhikari, Julia Signell

Published

March 9, 2023

Run this notebook

You can launch this notebook in VEDA JupyterHub by clicking the link below.

Launch in VEDA JupyterHub (requires access)

Learn more

Inside the Hub

This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.

See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.

Outside the Hub

The data is in a protected bucket. Please request access by emailing aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

%run -i 'cognito_login.py'

Approach

  1. Use intake to open a STAC collection using with xarray and dask
  2. Plot the data using hvplot

About the data

This is the Gridded Daily OCO-2 Carbon Dioxide assimilated dataset. More information can be found at: OCO-2 GEOS Level 3 daily, 0.5x0.625 assimilated CO2 V10r (OCO2_GEOS_L3CO2_DAY)

The data has been converted to zarr format and published to the development version of the VEDA STAC Catalog.

import intake
import hvplot.xarray  # noqa

Declare your collection of interest

You can discover available collections the following ways:

  • Programmatically: see example in the list-collections.ipynb notebook
  • JSON API: https://staging-stac.delta-backend.com/collections
  • STAC Browser: http://veda-staging-stac-browser.s3-website-us-west-2.amazonaws.com
STAC_API_URL = "https://openveda.cloud/api/stac"
collection_id = "oco2-geos-l3-daily"

Get STAC collection

Use intake to get the entire STAC collection.

collection = intake.open_stac_collection(f"{STAC_API_URL}/collections/{collection_id}")
collection
oco2-geos-l3-daily:
  args:
    stac_obj: https://openveda.cloud/api/stac/collections/oco2-geos-l3-daily
  description: ''
  driver: intake_stac.catalog.StacCollection
  metadata:
    assets:
      zarr:
        href: s3://veda-data-store/oco2-geos-l3-daily/OCO2_GEOS_L3CO2_day.zarr
        roles:
        - data
        title: zarr
        type: application/vnd+zarr
    cube:dimensions:
      lat:
        axis: y
        description: latitude
        extent:
        - -90.0
        - 90.0
        reference_system: 4326
        type: spatial
      lon:
        axis: x
        description: longitude
        extent:
        - -180.0
        - 179.375
        reference_system: 4326
        type: spatial
      time:
        description: time
        extent:
        - '2015-01-01T12:00:00Z'
        - '2021-11-04T12:00:00Z'
        step: P1DT0H0M0S
        type: temporal
    cube:variables:
      XCO2:
        attrs:
          long_name: Assimilated dry-air column average CO2 daily mean
          units: mol CO2/mol dry
        chunks:
        - 100
        - 100
        - 100
        description: Assimilated dry-air column average CO2 daily mean
        dimensions:
        - time
        - lat
        - lon
        shape:
        - 2500
        - 361
        - 576
        type: data
        unit: mol CO2/mol dry
      XCO2PREC:
        attrs:
          long_name: Precision of dry-air column average CO2 daily mean from Desroziers
            et al. (2005) diagnostic
          units: mol CO2/mol dry
        chunks:
        - 100
        - 100
        - 100
        description: Precision of dry-air column average CO2 daily mean from Desroziers
          et al. (2005) diagnostic
        dimensions:
        - time
        - lat
        - lon
        shape:
        - 2500
        - 361
        - 576
        type: data
        unit: mol CO2/mol dry
    dashboard:is_periodic: true
    dashboard:time_density: day
    description: "The OCO-2 mission provides the highest quality space-based XCO2\
      \ retrievals to date. However, the instrument data are characterized by large\
      \ gaps in coverage due to OCO-2\u2019s narrow 10-km ground track and an inability\
      \ to see through clouds and thick aerosols. This global gridded dataset is produced\
      \ using a data assimilation technique commonly referred to as state estimation\
      \ within the geophysical literature. Data assimilation synthesizes simulations\
      \ and observations, adjusting the state of atmospheric constituents like CO2\
      \ to reflect observed values, thus gap-filling observations when and where they\
      \ are unavailable based on previous observations and short transport simulations\
      \ by GEOS. Compared to other methods, data assimilation has the advantage that\
      \ it makes estimates based on our collective scientific understanding, notably\
      \ of the Earth's carbon cycle and atmospheric transport. OCO-2 GEOS (Goddard\
      \ Earth Observing System) Level 3 data are produced by ingesting OCO-2 L2 retrievals\
      \ every 6 hours with GEOS CoDAS, a modeling and data assimilation system maintained\
      \ by NASA's Global Modeling and Assimilation Office (GMAO). GEOS CoDAS uses\
      \ a high-performance computing implementation of the Gridpoint Statistical Interpolation\
      \ approach for solving the state estimation problem. GSI finds the analyzed\
      \ state that minimizes the three-dimensional variational (3D-Var) cost function\
      \ formulation of the state estimation problem."
    extent:
      spatial:
        bbox:
        - - -180.0
          - -90.0
          - 180.0
          - 90.0
      temporal:
        interval:
        - - null
          - null
    id: oco2-geos-l3-daily
    license: CC0-1.0
    providers:
    - name: NASA VEDA
      roles:
      - host
      url: https://www.earthdata.nasa.gov/dashboard/
    stac_extensions:
    - https://stac-extensions.github.io/datacube/v2.2.0/schema.json
    stac_version: 1.0.0
    title: Gridded Daily OCO-2 Carbon Dioxide assimilated dataset
    type: Collection

Read from zarr to xarray

Intake lets you go straight from the asset to an xarray dataset backed by a dask array.

source = collection.get_asset("zarr")

ds = source.to_dask()
ds
/srv/conda/envs/notebook/lib/python3.11/site-packages/intake_xarray/base.py:21: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(self._ds.dims),
<xarray.Dataset> Size: 8GB
Dimensions:   (time: 2500, lat: 361, lon: 576)
Coordinates:
  * lat       (lat) float64 3kB -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0
  * lon       (lon) float64 5kB -180.0 -179.4 -178.8 ... 178.1 178.8 179.4
  * time      (time) datetime64[ns] 20kB 2015-01-01T12:00:00 ... 2021-11-04T1...
Data variables:
    XCO2      (time, lat, lon) float64 4GB dask.array<chunksize=(100, 100, 100), meta=np.ndarray>
    XCO2PREC  (time, lat, lon) float64 4GB dask.array<chunksize=(100, 100, 100), meta=np.ndarray>
Attributes: (12/25)
    BuildId:                        B10.2.06
    Contact:                        Brad Weir (brad.weir@nasa.gov)
    Conventions:                    CF-1
    DataResolution:                 0.5x0.625
    EastBoundingCoordinate:         179.375
    Format:                         NetCDF-4/HDF-5
    ...                             ...
    ShortName:                      OCO2_GEOS_L3CO2_DAY_10r
    SouthBoundingCoordinate:        -90.0
    SpatialCoverage:                global
    Title:                          OCO-2 GEOS Level 3 daily, 0.5x0.625 assim...
    VersionID:                      V10r
    WestBoundingCoordinate:         -180.0

In xarray you can inspect just one data variable using dot notation:

ds.XCO2
<xarray.DataArray 'XCO2' (time: 2500, lat: 361, lon: 576)> Size: 4GB
dask.array<open_dataset-XCO2, shape=(2500, 361, 576), dtype=float64, chunksize=(100, 100, 100), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 3kB -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0
  * lon      (lon) float64 5kB -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4
  * time     (time) datetime64[ns] 20kB 2015-01-01T12:00:00 ... 2021-11-04T12...
Attributes:
    long_name:  Assimilated dry-air column average CO2 daily mean
    units:      mol CO2/mol dry

Plot data

We can plot the XCO2 variable as an interactive map (with date slider) using hvplot.

ds.XCO2.hvplot(
    x="lon",
    y="lat",
    groupby="time",
    coastline=True,
    rasterize=True,
    aggregator="mean",
    widget_location="bottom",
    frame_width=600,
)