Open and plot COGs – VEDA Documentation

Run this notebook

You can launch this notebook in VEDA JupyterHub by clicking the link below.

Launch in VEDA JupyterHub (requires access)

Learn more

Inside the Hub

This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.

See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.

Outside the Hub

The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

%run -i 'cognito_login.py'

Approach

Use pystac_client to open the STAC catalog and retrieve the items in the collection
Open the collection with xarray and stackstac
Plot the data using hvplot

About the data

CDC’s Social Vulnerability Index (SVI) uses 15 variables at the census tract level. The data comes from the U.S. decennial census for the years 2000 & 2010, and the American Community Survey (ACS) for the years 2014, 2016, and 2018. It is a hierarchical additive index (Tate, 2013), with the component elements of CDC’s SVI including the following for 4 themes: Socioeconomic Status, Household Composition & Disability, Minority Status & Language, and Housing Type & Transportation.

SVI indicates the relative vulnerability of every U.S. Census tract–subdivisions of counties for which the Census collects statistical data. SVI ranks the tracts on 15 social factors, including unemployment, minority status, and disability, and further groups them into four related themes. Thus, each tract receives a ranking for each Census variable and for each of the four themes, as well as an overall ranking.

Scientific research

The SVI Overall Score provides the overall, summed social vulnerability score for a given tract. The Overall Score SVI Grid is part of the U.S. Census Grids collection, and displays the Center for Disease Control & Prevention (CDC) SVI score. Funding for the final development, processing and dissemination of this data set by the Socioeconomic Data and Applications Center (SEDAC) was provided under the U.S. National Aeronautics and Space Administration (NASA)¹.

The Overall SVI Score describes the vulnerability in a given county tract based on the combined percentile ranking of the four SVI scores (Socioeconomic Status, Household Composition & Disability, Minority Status & Language, and Housing Type & Transportation). The summed percentile ranking from the four themes is ordered, and then used to calculate an overall percentile ranking, ranging from 0 (less vulnerable) to 1 (more vulnerable)². Tracts with higher Overall SVI Scores typically rank high in other SVI domains, and reveal communities that may require extra support, resources, and preventative care in order to better prepare for and manage emergency situations.

Interpreting the data

The Overall SVI Score describes the vulnerability in a given county tract based on the combined percentile ranking of the four SVI scores (Socioeconomic Status, Household Composition & Disability, Minority Status & Language, and Housing Type & Transportation). The summed percentile ranking from the four themes is ordered, and then used to calculate an overall percentile ranking, ranging from 0 (less vulnerable) to 1 (more vulnerable)². Tracts with higher Overall SVI Scores typically rank high in other SVI domains, and reveal communities that may require extra support, resources, and preventative care in order to better prepare for and manage emergency situations.

Credits

Center for International Earth Science Information Network, (CIESIN), Columbia University. 2021. Documentation for the U.S. Social Vulnerability Index Grids. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/fjr9-a973. Accessed 13 May 2022.
Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry/ Geospatial Research, Analysis, and Services Program. CDC/ATSDR Social Vulnerability Index Database. https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2018Documentation_01192022_1.pdf

from pystac_client import Client
import pandas as pd
import stackstac

import hvplot.xarray  # noqa

Declare your collection of interest

You can discover available collections the following ways:

Programmatically: see example in the list-collections.ipynb notebook
JSON API: https://staging-stac.delta-backend.com/collections
STAC Browser: http://veda-staging-stac-browser.s3-website-us-west-2.amazonaws.com

STAC_API_URL = "https://openveda.cloud/api/stac/"
collection = "social-vulnerability-index-overall-nopop"

Find items in collection

Use pystac_client to search the STAC collection.

catalog = Client.open(STAC_API_URL)
search = catalog.search(collections=[collection])

item_collection = search.item_collection()
print(f"Found {len(item_collection)} items")

Found 5 items

Read data

Read in data using xarray using a combination of xpystac, stackstac, and rasterio.

da = stackstac.stack(item_collection, epsg=4326)
da = da.assign_coords({"time": da.start_datetime}).squeeze()
# da
da

/srv/conda/envs/notebook/lib/python3.11/site-packages/stackstac/prepare.py:408: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  times = pd.to_datetime(

<xarray.DataArray 'stackstac-6c45bf56d29299d91eb93a5bc48c5746' (time: 5,
                                                                y: 6298,
                                                                x: 13354)> Size: 3GB
dask.array<getitem, shape=(5, 6298, 13354), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/17)
    id              (time) <U38 760B 'svi_2018_tract_overall_wgs84_nopop_cog'...
    band            <U11 44B 'cog_default'
  * x               (x) float64 107kB -178.2 -178.2 -178.2 ... -66.97 -66.97
  * y               (y) float64 50kB 71.38 71.37 71.37 ... 18.92 18.92 18.91
    end_datetime    (time) <U25 500B '2018-12-31T00:00:00+00:00' ... '2000-12...
    start_datetime  (time) <U25 500B '2018-01-01T00:00:00+00:00' ... '2000-01...
    ...              ...
    proj:geometry   object 8B {'type': 'Polygon', 'coordinates': [[[-178.2333...
    proj:shape      object 8B {6297, 13353}
    title           <U17 68B 'Default COG Layer'
    proj:bbox       object 8B {-178.23333334, 18.908332897999998, -66.9583337...
    epsg            int64 8B 4326
  * time            (time) <U25 500B '2018-01-01T00:00:00+00:00' ... '2000-01...
Attributes:
    spec:           RasterSpec(epsg=4326, bounds=(-178.24166595386018, 18.899...
    crs:            epsg:4326
    transform:      | 0.01, 0.00,-178.24|\n| 0.00,-0.01, 71.38|\n| 0.00, 0.00...
    resolution_xy:  (0.00833333330000749, 0.00833333329998412)

There are 5 items representing the 5 years of data in the collection (2000, 2010, 2014, 2016, and 2018).

Plot data

Plot data using hvplot. By using rasterize=True we tell hvplot to use datashader behind the scenes to make the plot render more quickly and re-render on zoom.

%%time
da.hvplot(x="x", y="y", rasterize=True, clim=(0, 1), coastline=True, cmap="viridis", widget_location="bottom")

CPU times: user 1.33 s, sys: 87.9 ms, total: 1.42 s
Wall time: 1.61 s