from pystac_client import Client
import pandas as pd
import stackstac
import rioxarray # noqa
import hvplot.xarray # noqa
Calculate timeseries from COGs
pystac_client
, rioxarray
, and stackstac
Run this notebook
You can launch this notebook in VEDA JupyterHub by clicking the link below.
Launch in VEDA JupyterHub (requires access)
Learn more
Inside the Hub
This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.
See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.
Outside the Hub
The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.
You should then run:
%run -i 'cognito_login.py'
Approach
- Use
pystac_client
to open the STAC catalog and retrieve the items in the collection - Use
stackstac
to create anxarray
dataset containing all the items cropped to AOI - Calculate the mean for each timestep over the AOI
Declare your collection of interest
You can discover available collections the following ways:
- Programmatically: see example in the
list-collections.ipynb
notebook - JSON API: https://staging-stac.delta-backend.com/collections
- STAC Browser: http://veda-staging-stac-browser.s3-website-us-west-2.amazonaws.com
= "https://openveda.cloud/api/stac/"
STAC_API_URL = "no2-monthly" collection
Discover items in collection for region and time of interest
Use pystac_client
to search the STAC collection for a particular area of interest within specified datetime bounds.
= [
china_bbox 73.675,
18.198,
135.026,
53.459,
]= "2000-01-01/2022-01-02" datetime
= Client.open(STAC_API_URL)
catalog = catalog.search(
search =china_bbox, datetime=datetime, collections=[collection], limit=1000
bbox
)= search.item_collection()
item_collection print(f"Found {len(item_collection)} items")
Found 73 items
Read data
Read in data using xarray
using a combination of xpystac
, stackstac
, and rasterio
.
= stackstac.stack(item_collection, epsg=4326)
da = da.assign_coords({"time": pd.to_datetime(da.start_datetime)}).squeeze()
da da
/srv/conda/envs/notebook/lib/python3.11/site-packages/stackstac/prepare.py:408: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
times = pd.to_datetime(
<xarray.DataArray 'stackstac-92e4c5b4a3eab66b40bd5f869280d6b5' (time: 73, y: 1800, x: 3600)> Size: 4GB dask.array<getitem, shape=(73, 1800, 3600), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray> Coordinates: (12/17) id (time) <U37 11kB 'OMI_trno2_0.10x0.10_202201_Col3_V4.nc' ... band <U11 44B 'cog_default' * x (x) float64 29kB -180.0 -179.9 -179.8 ... 179.7 179.8 179.9 * y (y) float64 14kB 90.0 89.9 89.8 89.7 ... -89.7 -89.8 -89.9 start_datetime (time) <U25 7kB '2022-01-01T00:00:00+00:00' ... '2016-01-... end_datetime (time) <U25 7kB '2022-01-31T00:00:00+00:00' ... '2016-01-... ... ... proj:bbox object 8B {90.0, 180.0, -90.0, -180.0} proj:geometry object 8B {'type': 'Polygon', 'coordinates': [[[-180.0, -... title <U17 68B 'Default COG Layer' proj:transform object 8B {0.1, 0.0, 1.0, -0.1, -180.0, 90.0} epsg int64 8B 4326 * time (time) object 584B 1640995200000000000 ... 14516064000000... Attributes: spec: RasterSpec(epsg=4326, bounds=(-180.0, -90.0, 180.0, 90.0), r... crs: epsg:4326 transform: | 0.10, 0.00,-180.00|\n| 0.00,-0.10, 90.00|\n| 0.00, 0.00, 1... resolution: 0.1
Clip the data to the bounding box for China
# Subset to Bounding Box for China
= da.rio.clip_box(*china_bbox)
subset subset
<xarray.DataArray 'stackstac-92e4c5b4a3eab66b40bd5f869280d6b5' (time: 73, y: 354, x: 614)> Size: 127MB dask.array<getitem, shape=(73, 354, 614), dtype=float64, chunksize=(1, 354, 535), chunktype=numpy.ndarray> Coordinates: (12/18) id (time) <U37 11kB 'OMI_trno2_0.10x0.10_202201_Col3_V4.nc' ... band <U11 44B 'cog_default' * x (x) float64 5kB 73.7 73.8 73.9 74.0 ... 134.8 134.9 135.0 * y (y) float64 3kB 53.5 53.4 53.3 53.2 ... 18.5 18.4 18.3 18.2 start_datetime (time) <U25 7kB '2022-01-01T00:00:00+00:00' ... '2016-01-... end_datetime (time) <U25 7kB '2022-01-31T00:00:00+00:00' ... '2016-01-... ... ... proj:geometry object 8B {'type': 'Polygon', 'coordinates': [[[-180.0, -... title <U17 68B 'Default COG Layer' proj:transform object 8B {0.1, 0.0, 1.0, -0.1, -180.0, 90.0} epsg int64 8B 4326 * time (time) object 584B 1640995200000000000 ... 14516064000000... spatial_ref int64 8B 0 Attributes: spec: RasterSpec(epsg=4326, bounds=(-180.0, -90.0, 180.0, 90.0), r... resolution: 0.1
Aggregate the data
Calculate the mean at each time across regional data. Note this is the first time that the data is actually loaded.
= subset.mean(dim=("x", "y")).compute() means
Plot the mean monthly NO2 using hvplot
="time", ylabel="NO2", title="Mean Monthly NO2 in China") means.hvplot.line(x