# include all your imports in this cell
import folium
import requests
import stackstac
from pystac_client import Client
Template (Accessing data directly)
This notebook is intended to act as a template for the example notebooks that access the data directly. These green cells should all be deleted and in several sections only one of the provided cells should be included in the notebook.
Update the link in the following section.
Run this notebook
You can launch this notebook in VEDA JupyterHub by clicking the link below.
Launch in VEDA JupyterHub (requires access)
Learn more
Inside the Hub
This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.
See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.
Outside the Hub
The data is in a protected bucket. Please request access by emailing aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.
You should then run:
%run -i 'cognito_login.py'
Fill in the text in italics in the following cells
Approach
- list a few steps that outline the approach
- you will be taking in this notebook
About the data
Optional description of the dataset.
Declare your collection of interest
You can discover available collections the following ways:
- Programmatically: see example in the
list-collections.ipynb
notebook - JSON API: https://openveda.cloud/api/stac/collections
- STAC Browser: http://openveda.cloud
= "https://openveda.cloud/api/stac"
STAC_API_URL
= collection_id
Next step is to get STAC objects from the STAC API. We use pystac-client
to do a search. Here is an some example of what that might look like.
Discover items in collection for region and time of interest
Use pystac_client
to search the STAC collection for a particular area of interest within specified datetime bounds.
= [-180.0, -90.0, 180.0, 90.0]
bbox = "2000-01-01/2022-01-02" datetime
= Client.open(STAC_API_URL)
catalog
= catalog.search(
search =bbox, datetime=datetime, collections=[collection_id], limit=1000
bbox
)= list(search.items())
items print(f"Found {len(items)} items")
The next step is often to define an Area of Interest. Note that it is preferred to get large geojson objects directly from their source rather than storing them in this repository or inlining them in the notebook. Here is an example of what that might look like.
Define an AOI
We can fetch GeoJSON from an authoritative online source for instance: https://gadm.org/download_country.html
= requests.get(
response "https://geodata.ucdavis.edu/gadm/gadm4.1/json/gadm41_FRA_0.json"
)
# If anything goes wrong with this request output error contents
assert response.ok, response.text
= response.json()
result print(f"There are {len(result['features'])} features in this collection")
That is the geojson for a feature collection, but since there is only one feature in it we can grab just that.
= result["features"][0] aoi
Next some notebooks read in the data. If you are using the raster API to trigger computation server side skip this section. Here is an example of reading the data in using stackstac
and clipping using rasterio
.
= folium.Map(
m =[40, 0],
location=2,
zoom_start
)
="AOI").add_to(m)
folium.GeoJson(aoi, name m
Read data
Create an xarray.DataSet
using stackstac
# This is a workaround that is planning to move up into stackstac itself
import rasterio as rio
import boto3
import pandas as pd
= stackstac.stack(search.item_collection())
da da
Clip the data to AOI
= da.clip([aoi["geometry"]])
subset subset
With the STAC object, and optionally the AOI and/or the data in hand, the next step is to do some analysis. The sections in the rest of the notebooks are totally up to you! Here is an idea though :)
Select a band of data
There is just one band in this case, cog_default
.
= da.sel(band="cog_default") data_band
Compute and plot
Calculate the mean at each time across the whole dataset. Note this is the first time that the data is actually loaded.
# Average over entire AOI for each month
= data_band.mean(dim=("x", "y")).compute() means
means.plot()