Calculate timeseries from COGs

Demonstrates how to aggregate COGs using pystac_client, rioxarray, and stackstac

Author

Aimee Barciauskas, Julia Signell

Published

February 1, 2023

Run this notebook

You can launch this notebook in VEDA JupyterHub by clicking the link below.

Launch in VEDA JupyterHub (requires access)

Learn more

Inside the Hub

This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.

See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.

Outside the Hub

The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

%run -i 'cognito_login.py'

Approach

Use pystac_client to open the STAC catalog and retrieve the items in the collection
Use stackstac to create an xarray dataset containing all the items cropped to AOI
Calculate the mean for each timestep over the AOI

from pystac_client import Client
import pandas as pd
import stackstac

import rioxarray  # noqa
import hvplot.xarray  # noqa

Declare your collection of interest

You can discover available collections the following ways:

Programmatically: see example in the list-collections.ipynb notebook
JSON API: https://openveda.cloud/api/stac/collections
STAC Browser: https://openveda.cloud

STAC_API_URL = "https://openveda.cloud/api/stac/"
collection = "no2-monthly"

Discover items in collection for region and time of interest

Use pystac_client to search the STAC collection for a particular area of interest within specified datetime bounds.

china_bbox = [
    73.675,
    18.198,
    135.026,
    53.459,
]
datetime = "2000-01-01/2025-07-25"

catalog = Client.open(STAC_API_URL)
search = catalog.search(
    bbox=china_bbox, datetime=datetime, collections=[collection], limit=1000
)
item_collection = search.item_collection()
print(f"Found {len(item_collection)} items")

Found 93 items

Read data

Read in data using xarray using a combination of xpystac, stackstac, and rasterio.

da = stackstac.stack(item_collection, epsg=4326)
da = da.assign_coords({"time": pd.to_datetime(da.start_datetime)}).squeeze()
da

<xarray.DataArray 'stackstac-707c2fec3a4bde6be0d838dc97f60ec1' (time: 93,
                                                                y: 1800, x: 3600)> Size: 5GB
dask.array<getitem, shape=(93, 1800, 3600), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/17)
    id              (time) <U37 14kB 'OMI_trno2_0.10x0.10_202309_Col3_V4.nc' ...
    band            <U11 44B 'cog_default'
  * x               (x) float64 29kB -180.0 -179.9 -179.8 ... 179.7 179.8 179.9
  * y               (y) float64 14kB 90.0 89.9 89.8 89.7 ... -89.7 -89.8 -89.9
    start_datetime  (time) <U25 9kB '2023-09-01T00:00:00+00:00' ... '2016-01-...
    end_datetime    (time) <U25 9kB '2023-09-30T00:00:00+00:00' ... '2016-01-...
    ...              ...
    proj:code       <U9 36B 'EPSG:4326'
    title           <U17 68B 'Default COG Layer'
    proj:shape      object 8B {1800, 3600}
    description     <U47 188B 'Cloud optimized default layer to display on map'
    epsg            int64 8B 4326
  * time            (time) datetime64[ns, UTC] 744B 2023-09-01T00:00:00+00:00...
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-180.0, -90.0, 180.0, 90.0), r...
    crs:         epsg:4326
    transform:   | 0.10, 0.00,-180.00|\n| 0.00,-0.10, 90.00|\n| 0.00, 0.00, 1...
    resolution:  0.1

Clip the data to the bounding box for China

# Subset to Bounding Box for China
subset = da.rio.clip_box(*china_bbox)
subset

<xarray.DataArray 'stackstac-707c2fec3a4bde6be0d838dc97f60ec1' (time: 93,
                                                                y: 354, x: 614)> Size: 162MB
dask.array<getitem, shape=(93, 354, 614), dtype=float64, chunksize=(1, 354, 535), chunktype=numpy.ndarray>
Coordinates: (12/18)
    id              (time) <U37 14kB 'OMI_trno2_0.10x0.10_202309_Col3_V4.nc' ...
    band            <U11 44B 'cog_default'
  * x               (x) float64 5kB 73.7 73.8 73.9 74.0 ... 134.8 134.9 135.0
  * y               (y) float64 3kB 53.5 53.4 53.3 53.2 ... 18.5 18.4 18.3 18.2
    start_datetime  (time) <U25 9kB '2023-09-01T00:00:00+00:00' ... '2016-01-...
    end_datetime    (time) <U25 9kB '2023-09-30T00:00:00+00:00' ... '2016-01-...
    ...              ...
    title           <U17 68B 'Default COG Layer'
    proj:shape      object 8B {1800, 3600}
    description     <U47 188B 'Cloud optimized default layer to display on map'
    epsg            int64 8B 4326
  * time            (time) datetime64[ns, UTC] 744B 2023-09-01T00:00:00+00:00...
    spatial_ref     int64 8B 0
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-180.0, -90.0, 180.0, 90.0), r...
    resolution:  0.1

Aggregate the data

Calculate the mean at each time across regional data. Note this is the first time that the data is actually loaded.

means = subset.mean(dim=("x", "y")).compute()

Plot the mean monthly NO2 using hvplot

means.hvplot.line(x="time", ylabel="NO2", title="Mean Monthly NO2 in China")