from pystac_client import Client
import pandas as pd
import stackstac
import hvplot.xarray # noqa
Open and plot COGs
pystac_client
, xarray
, and hvplot
Run this notebook
You can launch this notebook in VEDA JupyterHub by clicking the link below.
Launch in VEDA JupyterHub (requires access)
Learn more
Inside the Hub
This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.
See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.
Outside the Hub
The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.
You should then run:
%run -i 'cognito_login.py'
Approach
- Use
pystac_client
to open the STAC catalog and retrieve the items in the collection - Open the collection with
xarray
andstackstac
- Plot the data using
hvplot
About the data
CDC’s Social Vulnerability Index (SVI) uses 15 variables at the census tract level. The data comes from the U.S. decennial census for the years 2000 & 2010, and the American Community Survey (ACS) for the years 2014, 2016, and 2018. It is a hierarchical additive index (Tate, 2013), with the component elements of CDC’s SVI including the following for 4 themes: Socioeconomic Status, Household Composition & Disability, Minority Status & Language, and Housing Type & Transportation.
SVI indicates the relative vulnerability of every U.S. Census tract–subdivisions of counties for which the Census collects statistical data. SVI ranks the tracts on 15 social factors, including unemployment, minority status, and disability, and further groups them into four related themes. Thus, each tract receives a ranking for each Census variable and for each of the four themes, as well as an overall ranking.
Scientific research
The SVI Overall Score provides the overall, summed social vulnerability score for a given tract. The Overall Score SVI Grid is part of the U.S. Census Grids collection, and displays the Center for Disease Control & Prevention (CDC) SVI score. Funding for the final development, processing and dissemination of this data set by the Socioeconomic Data and Applications Center (SEDAC) was provided under the U.S. National Aeronautics and Space Administration (NASA)¹.
The Overall SVI Score describes the vulnerability in a given county tract based on the combined percentile ranking of the four SVI scores (Socioeconomic Status, Household Composition & Disability, Minority Status & Language, and Housing Type & Transportation). The summed percentile ranking from the four themes is ordered, and then used to calculate an overall percentile ranking, ranging from 0 (less vulnerable) to 1 (more vulnerable)². Tracts with higher Overall SVI Scores typically rank high in other SVI domains, and reveal communities that may require extra support, resources, and preventative care in order to better prepare for and manage emergency situations.
Interpreting the data
The Overall SVI Score describes the vulnerability in a given county tract based on the combined percentile ranking of the four SVI scores (Socioeconomic Status, Household Composition & Disability, Minority Status & Language, and Housing Type & Transportation). The summed percentile ranking from the four themes is ordered, and then used to calculate an overall percentile ranking, ranging from 0 (less vulnerable) to 1 (more vulnerable)². Tracts with higher Overall SVI Scores typically rank high in other SVI domains, and reveal communities that may require extra support, resources, and preventative care in order to better prepare for and manage emergency situations.
Credits
- Center for International Earth Science Information Network, (CIESIN), Columbia University. 2021. Documentation for the U.S. Social Vulnerability Index Grids. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/fjr9-a973. Accessed 13 May 2022.
- Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry/ Geospatial Research, Analysis, and Services Program. CDC/ATSDR Social Vulnerability Index Database. https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2018Documentation_01192022_1.pdf
Declare your collection of interest
You can discover available collections the following ways:
- Programmatically: see example in the
list-collections.ipynb
notebook - JSON API: https://staging-stac.delta-backend.com/collections
- STAC Browser: http://veda-staging-stac-browser.s3-website-us-west-2.amazonaws.com
= "https://openveda.cloud/api/stac/"
STAC_API_URL = "social-vulnerability-index-overall-nopop" collection
Find items in collection
Use pystac_client
to search the STAC collection.
= Client.open(STAC_API_URL)
catalog = catalog.search(collections=[collection])
search
= search.item_collection()
item_collection print(f"Found {len(item_collection)} items")
Found 5 items
Read data
Read in data using xarray
using a combination of xpystac
, stackstac
, and rasterio
.
= stackstac.stack(item_collection, epsg=4326)
da = da.assign_coords({"time": da.start_datetime}).squeeze()
da # da
da
/srv/conda/envs/notebook/lib/python3.11/site-packages/stackstac/prepare.py:408: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
times = pd.to_datetime(
<xarray.DataArray 'stackstac-6c45bf56d29299d91eb93a5bc48c5746' (time: 5, y: 6298, x: 13354)> Size: 3GB dask.array<getitem, shape=(5, 6298, 13354), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray> Coordinates: (12/17) id (time) <U38 760B 'svi_2018_tract_overall_wgs84_nopop_cog'... band <U11 44B 'cog_default' * x (x) float64 107kB -178.2 -178.2 -178.2 ... -66.97 -66.97 * y (y) float64 50kB 71.38 71.37 71.37 ... 18.92 18.92 18.91 end_datetime (time) <U25 500B '2018-12-31T00:00:00+00:00' ... '2000-12... start_datetime (time) <U25 500B '2018-01-01T00:00:00+00:00' ... '2000-01... ... ... proj:geometry object 8B {'type': 'Polygon', 'coordinates': [[[-178.2333... proj:shape object 8B {6297, 13353} title <U17 68B 'Default COG Layer' proj:bbox object 8B {-178.23333334, 18.908332897999998, -66.9583337... epsg int64 8B 4326 * time (time) <U25 500B '2018-01-01T00:00:00+00:00' ... '2000-01... Attributes: spec: RasterSpec(epsg=4326, bounds=(-178.24166595386018, 18.899... crs: epsg:4326 transform: | 0.01, 0.00,-178.24|\n| 0.00,-0.01, 71.38|\n| 0.00, 0.00... resolution_xy: (0.00833333330000749, 0.00833333329998412)
There are 5 items representing the 5 years of data in the collection (2000, 2010, 2014, 2016, and 2018).
Plot data
Plot data using hvplot
. By using rasterize=True
we tell hvplot
to use datashader
behind the scenes to make the plot render more quickly and re-render on zoom.
%%time
="x", y="y", rasterize=True, clim=(0, 1), coastline=True, cmap="viridis", widget_location="bottom") da.hvplot(x
CPU times: user 1.33 s, sys: 87.9 ms, total: 1.42 s
Wall time: 1.61 s