Template (Accessing data directly)

Published

Month day, year

This notebook is intended to act as a template for the example notebooks that access the data directly. These green cells should all be deleted and in several sections only one of the provided cells should be included in the notebook.

Update the link in the following section.

Run this notebook

You can launch this notebook in VEDA JupyterHub by clicking the link below.

Launch in VEDA JupyterHub (requires access)

Learn more

Inside the Hub

This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.

See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.

Outside the Hub

The data is in a protected bucket. Please request access by emailing aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

%run -i 'cognito_login.py'

Fill in the text in italics in the following cells

Approach

  1. list a few steps that outline the approach
  2. you will be taking in this notebook
# include all your imports in this cell
import folium
import requests
import stackstac

from pystac_client import Client

About the data

Optional description of the dataset.

Declare your collection of interest

You can discover available collections the following ways:

  • Programmatically: see example in the list-collections.ipynb notebook
  • JSON API: https://openveda.cloud/api/stac/collections
  • STAC Browser: http://openveda.cloud
STAC_API_URL = "https://openveda.cloud/api/stac"

collection_id = 

Next step is to get STAC objects from the STAC API. We use pystac-client to do a search. Here is an some example of what that might look like.

Discover items in collection for region and time of interest

Use pystac_client to search the STAC collection for a particular area of interest within specified datetime bounds.

bbox = [-180.0, -90.0, 180.0, 90.0]
datetime = "2000-01-01/2022-01-02"
catalog = Client.open(STAC_API_URL)

search = catalog.search(
    bbox=bbox, datetime=datetime, collections=[collection_id], limit=1000
)
items = list(search.items())
print(f"Found {len(items)} items")

The next step is often to define an Area of Interest. Note that it is preferred to get large geojson objects directly from their source rather than storing them in this repository or inlining them in the notebook. Here is an example of what that might look like.

Define an AOI

We can fetch GeoJSON from an authoritative online source for instance: https://gadm.org/download_country.html

response = requests.get(
    "https://geodata.ucdavis.edu/gadm/gadm4.1/json/gadm41_FRA_0.json"
)

# If anything goes wrong with this request output error contents
assert response.ok, response.text

result = response.json()
print(f"There are {len(result['features'])} features in this collection")

That is the geojson for a feature collection, but since there is only one feature in it we can grab just that.

aoi = result["features"][0]

Next some notebooks read in the data. If you are using the raster API to trigger computation server side skip this section. Here is an example of reading the data in using stackstac and clipping using rasterio.

m = folium.Map(
    location=[40, 0],
    zoom_start=2,
)

folium.GeoJson(aoi, name="AOI").add_to(m)
m

Read data

Create an xarray.DataSet using stackstac

# This is a workaround that is planning to move up into stackstac itself
import rasterio as rio
import boto3
import pandas as pd
da = stackstac.stack(search.item_collection())
da

Clip the data to AOI

subset = da.clip([aoi["geometry"]])
subset

With the STAC object, and optionally the AOI and/or the data in hand, the next step is to do some analysis. The sections in the rest of the notebooks are totally up to you! Here is an idea though :)

Select a band of data

There is just one band in this case, cog_default.

data_band = da.sel(band="cog_default")

Compute and plot

Calculate the mean at each time across the whole dataset. Note this is the first time that the data is actually loaded.

# Average over entire AOI for each month
means = data_band.mean(dim=("x", "y")).compute()
means.plot()