import requests
from pystac_client import Client
import pandas as pd
import stackstac
import rioxarray # noqa
import hvplot.xarray # noqa
Open and visualize COGs
pystac_client
, rioxarray
, stackstac
, and hvplot
Run this notebook
You can launch this notebook in VEDA JupyterHub by clicking the link below.
Launch in VEDA JupyterHub (requires access)
Learn more
Inside the Hub
This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.
See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.
Outside the Hub
The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.
You should then run:
%run -i 'cognito_login.py'
Approach
- Use
pystac_client
to open the STAC catalog and retrieve the items in the collection - Use
stackstac
to create anxarray
dataset containing all the items - Use
rioxarray
to crop data to AOI - Use
hvplot
to render the COG at every timestep
Declare your collection of interest
You can discover available collections the following ways:
- Programmatically: see example in the
list-collections.ipynb
notebook - JSON API: https://staging-stac.delta-backend.com/collections
- STAC Browser: http://veda-staging-stac-browser.s3-website-us-west-2.amazonaws.com
= "https://openveda.cloud/api/stac"
STAC_API_URL = "no2-monthly" collection_id
Discover items in collection for region and time of interest
Use pystac_client
to search the STAC collection for a particular area of interest within specified datetime bounds.
= Client.open(STAC_API_URL)
catalog = catalog.search(collections=[collection_id], sortby="start_datetime")
search
= search.item_collection()
item_collection print(f"Found {len(item_collection)} items")
Found 93 items
Define an AOI
We can fetch GeoJSON for metropolitan France and Corsica (excluding overseas territories) from an authoritative online source (https://gadm.org/download_country.html).
= requests.get(
response "https://geodata.ucdavis.edu/gadm/gadm4.1/json/gadm41_FRA_0.json"
)
# If anything goes wrong with this request output error contents
assert response.ok, response.text
= response.json()
result print(f"There are {len(result['features'])} features in this collection")
There are 1 features in this collection
That is the geojson for a feature collection, but since there is only one feature in it we can grab just that.
= result["features"][0] france_aoi
Read data
Create an xarray.DataArray
using stackstac
= stackstac.stack(item_collection, epsg=4326)
da = da.assign_coords({"time": da.start_datetime}).squeeze()
da da
/srv/conda/envs/notebook/lib/python3.11/site-packages/stackstac/prepare.py:408: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
times = pd.to_datetime(
<xarray.DataArray 'stackstac-63f208ae3e2c01863c43588cc3899b3c' (time: 93, y: 1800, x: 3600)> Size: 5GB dask.array<getitem, shape=(93, 1800, 3600), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray> Coordinates: (12/17) id (time) <U37 14kB 'OMI_trno2_0.10x0.10_201601_Col3_V4.nc' ... band <U11 44B 'cog_default' * x (x) float64 29kB -180.0 -179.9 -179.8 ... 179.7 179.8 179.9 * y (y) float64 14kB 90.0 89.9 89.8 89.7 ... -89.7 -89.8 -89.9 start_datetime (time) <U25 9kB '2016-01-01T00:00:00+00:00' ... '2023-09-... end_datetime (time) <U25 9kB '2016-01-31T00:00:00+00:00' ... '2023-09-... ... ... proj:geometry object 8B {'type': 'Polygon', 'coordinates': [[[-180.0, -... description <U47 188B 'Cloud optimized default layer to display on map' title <U17 68B 'Default COG Layer' proj:epsg int64 8B 4326 epsg int64 8B 4326 * time (time) <U25 9kB '2016-01-01T00:00:00+00:00' ... '2023-09-... Attributes: spec: RasterSpec(epsg=4326, bounds=(-180.0, -90.0, 180.0, 90.0), r... crs: epsg:4326 transform: | 0.10, 0.00,-180.00|\n| 0.00,-0.10, 90.00|\n| 0.00, 0.00, 1... resolution: 0.1
Clip the data to AOI
= da.rio.clip([france_aoi["geometry"]])
subset subset
<xarray.DataArray 'stackstac-63f208ae3e2c01863c43588cc3899b3c' (time: 93, y: 97, x: 143)> Size: 10MB dask.array<getitem, shape=(93, 97, 143), dtype=float64, chunksize=(1, 97, 143), chunktype=numpy.ndarray> Coordinates: (12/18) id (time) <U37 14kB 'OMI_trno2_0.10x0.10_201601_Col3_V4.nc' ... band <U11 44B 'cog_default' * x (x) float64 1kB -4.7 -4.6 -4.5 -4.4 -4.3 ... 9.2 9.3 9.4 9.5 * y (y) float64 776B 51.0 50.9 50.8 50.7 ... 41.7 41.6 41.5 41.4 start_datetime (time) <U25 9kB '2016-01-01T00:00:00+00:00' ... '2023-09-... end_datetime (time) <U25 9kB '2016-01-31T00:00:00+00:00' ... '2023-09-... ... ... description <U47 188B 'Cloud optimized default layer to display on map' title <U17 68B 'Default COG Layer' proj:epsg int64 8B 4326 epsg int64 8B 4326 * time (time) <U25 9kB '2016-01-01T00:00:00+00:00' ... '2023-09-... spatial_ref int64 8B 0 Attributes: spec: RasterSpec(epsg=4326, bounds=(-180.0, -90.0, 180.0, 90.0), r... resolution: 0.1
Compute and plot
So far we have just been setting up a calculation lazily in Dask. Now we can trigger computation using .compute()
.
%%time
= subset.compute() image_stack
CPU times: user 2.96 s, sys: 658 ms, total: 3.62 s
Wall time: 8.35 s
# get the 2% and 98% percentiles for min and max bounds of color
= image_stack.quantile(0.02).item(), image_stack.quantile(0.98).item()
vmin, vmax
image_stack.hvplot(="time",
groupby=True,
tiles=False,
colorbar=(vmin, vmax),
clim="viridis",
cmap=0.8,
alpha=512,
frame_height="bottom",
widget_location )