Visualizing NCEO Aboveground Woody Biomass 2017 prioritization areas

Run this notebook

You can launch this notebook in VEDA JupyterHub by clicking the link below.

Launch in VEDA JupyterHub (requires access)

Learn more

Inside the Hub

This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.

See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.

Outside the Hub

The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

%run -i 'cognito_login.py'

Approach

Query STAC API and explore item contents for a given collection
Read and access the data
Visualize the collection with hvplot
Run zonal statistics on collection using rasterstats
Visualize resultant zonal statistics on a choropleth map

About the Data

The NCEO Aboveground Woody Biomass 2017 dataset is a map for Africa at 100 m spatial resolution which was developed using a combination of LiDAR, Synthetic Aperture Radar (SAR) and optical based data. Aboveground woody biomass (AGB) plays an key role in the study of the Earth’s carbon cycle and response to climate change. Estimation based on Earth Observation measurements is an effective method for regional scale studies and the results are expressed as dry matter in Mg ha-1.

Important Note: Users of this dataset should keep in mind that the map is a continental-scale dataset, generated using a combination of different remote sensing data types, with a single method for the whole study area produced in 2017. Users, therefore, should understand that accuracy may vary for different regions and vegetation types.

The Case Study - Guinea

Mapping and understanding the spatial distribution of AGB is key to understanding carbon dioxide emissions from tropical deforestation through the loss of woody carbon stocks. The resulting carbon fluxes from these land-use changes and vegetation degradation can have negative impacts on the global carbon cycle. Change analysis between AGB maps overtime can display losses in high biomass forests, due to suspected deforestation and forest degredation.

The forests of southern Guinea are reported to have some of the highest density AGB of any forest in the world and are one of the most threatened ecoregions in Africa. Importantly, this area was also the epicenter of the 2014 Ebola outbreak, which had a lasting impact on the region. There is more and more evidence that human deforestation activities in this area may have accelerated the spread of the deadly virus as a result of increasing human-bat interactions in the region.

In this example we explore the NCEO AGB dataset for 2017, running zonal statistics at the district (administrative 2) level to understand those areas in Guinea that need greatest prioritization for protection and conservation.

Setting up the Environment

To run zonal statistics we’ll need to import a python package called rasterstats into our environment. You can uncomment the following line for installation. This cell needs only needs to be run once.

!pip install rasterstats --quiet

Querying the STAC API

from pystac_client import Client

# Provide STAC API endpoint
STAC_API_URL = "https://openveda.cloud/api/stac/"

# Declare collection of interest - NCEO Biomass
collection = "nceo_africa_2017"

Now let’s check how many total items are available.

search = Client.open(STAC_API_URL).search(collections=[collection])
items = list(search.items())
print(f"Found {len(items)} items")

Found 1 items

This makes sense as there is only one item available: a map for 2017.

# Explore the "cog_default" asset of one item to see what it contains
items[0].assets["cog_default"].to_dict()

{'href': 's3://nasa-maap-data-store/file-staging/nasa-map/nceo-africa-2017/AGB_map_2017v0m_COG.tif',
 'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
 'title': 'Default COG Layer',
 'description': 'Cloud optimized default layer to display on map',
 'raster:bands': [{'scale': 1,
   'nodata': 'inf',
   'offset': 0,
   'sampling': 'area',
   'data_type': 'uint16',
   'histogram': {'max': 429,
    'min': 0,
    'count': 11,
    'buckets': [405348,
     44948,
     18365,
     6377,
     3675,
     3388,
     3785,
     9453,
     13108,
     1186]},
   'statistics': {'mean': 37.58407913145342,
    'stddev': 81.36678677343947,
    'maximum': 429,
    'minimum': 0,
    'valid_percent': 50.42436439336373}}],
 'roles': ['data', 'layer']}

Explore through the item’s assets. We can see from the data’s statistics values that the min and max values for the observed values range from 0 to 429 Mg ha-1.

Reading and accessing the data

Now that we’ve explored the dataset through the STAC API, let’s read and access the dataset itself.

Note: We need to manually pull the epsg code from the item since stackstac doesn’t understand newer versions of the Projection STAC Extension (GitHub issue)

import stackstac
import rioxarray


da = stackstac.stack(items[0], epsg=items[0].ext.proj.epsg).squeeze()
da

<xarray.DataArray 'stackstac-00a12995ab4ad3703f5c19282f0cc6bf' (y: 81025,
                                                                x: 78078)> Size: 51GB
dask.array<getitem, shape=(81025, 78078), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/16)
    time            datetime64[ns] 8B NaT
    id              <U19 76B 'AGB_map_2017v0m_COG'
    band            <U11 44B 'cog_default'
  * x               (x) float64 625kB -18.27 -18.27 -18.27 ... 51.86 51.86 51.86
  * y               (y) float64 648kB 37.73 37.73 37.73 ... -35.05 -35.05 -35.05
    start_datetime  <U25 100B '2017-01-01T00:00:00+00:00'
    ...              ...
    end_datetime    <U25 100B '2017-12-31T23:59:59+00:00'
    proj:bbox       object 8B {-35.054059016911935, 51.86423292864056, 37.731...
    description     <U47 188B 'Cloud optimized default layer to display on map'
    title           <U17 68B 'Default COG Layer'
    raster:bands    object 8B {'scale': 1, 'nodata': 'inf', 'offset': 0, 'sam...
    epsg            int64 8B 4326
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.05405...
    crs:         epsg:4326
    transform:   | 0.00, 0.00,-18.27|\n| 0.00,-0.00, 37.73|\n| 0.00, 0.00, 1.00|
    resolution:  0.0008983152841195214

In this example, we’ll explore the data contained in the NCEO AGB collection and analyze it for each of the districts in Guinea. To do this we will need to import district (administrative level 2) boundary layers from below. We will use the Humanitarian Data Exchange (HDX) site to retrieve subnational administrative boundaries for Guinea. Specifically, we will use the geoBoundaries-GIN-ADM2_simplified.geojson which can be accessed here and read them in directly using geopandas.

import geopandas as gpd

admin2_gdf = gpd.read_file(
    "https://raw.githubusercontent.com/wmgeolab/geoBoundaries/0f0b6f5fb638e7faf115f876da4e77d8f7fa319f/releaseData/gbOpen/GIN/ADM2/geoBoundaries-GIN-ADM2_simplified.geojson"
)

# check the CRS
print(admin2_gdf.crs)

EPSG:4326

Now we can use the bounds of the admin boundaries to clip the data to a box containing Guinea.

biomass = da.rio.clip_box(*admin2_gdf.total_bounds).compute()

import hvplot.xarray


biomass.hvplot(
    x="x",
    y="y",
    coastline=True,
    rasterize=True,
    cmap="viridis",
    widget_location="bottom",
    frame_width=600,
)

Zonal Statistics

This map we created above is great, but let’s focus on which districts (administrative level 2 boundaries) should be prioritized for forest conservation.

Zonal statistics is an operation that calculates statistics on the cell values of a raster layer (e.g., the NCEO AGB dataset) within the zones (i.e., polygons) of another dataset. It is an analytical tool that can calculate the mean, median, sum, minimum, maximum, or range in each zone. The zonal extent, often polygons, can be in the form of objects like administrative boundaries, water catchment areas, or field boundaries.

import pandas as pd
from rasterstats import zonal_stats


admin2_biomass = pd.DataFrame(
    zonal_stats(
        admin2_gdf,
        biomass.values,
        affine=biomass.rio.transform(),
        nodata=biomass["raster:bands"].values.tolist()["nodata"],
        band=1,
    ),
    index=admin2_gdf.index,
)
admin2_biomass

	max	mean	count
0	565.0	51.445738	1276378
1	546.0	46.846892	527269
2	513.0	48.862863	1107831
3	345.0	31.315987	41660
4	494.0	47.809879	116515
5	422.0	57.583787	530195
6	548.0	48.555337	317191
7	438.0	43.717062	1176897
8	448.0	44.744092	403399
9	533.0	78.724252	1315404
10	422.0	46.973538	426420
11	452.0	50.016374	161593
12	483.0	52.088640	1145951
13	563.0	89.595375	429232
14	503.0	51.326848	1769560
15	558.0	65.559085	955743
16	486.0	48.230132	897380
17	590.0	90.110284	630818
18	413.0	47.668705	364092
19	339.0	37.304653	544541
20	501.0	57.187525	1614332
21	470.0	46.776737	216368
22	469.0	57.435206	278955
23	592.0	71.918267	461576
24	623.0	123.937151	814877
25	451.0	45.058698	859223
26	564.0	74.196642	1042860
27	406.0	33.353046	1191380
28	592.0	86.547049	413435
29	560.0	57.591189	460766
30	392.0	29.201285	1813376
31	554.0	57.991285	771431
32	389.0	49.663329	615108
33	588.0	131.323274	326021

Now we’ll join the administrative level 2 boundaries to the zonal statistics results, so that we can map the districts on a choropleth map.

concat_df = admin2_gdf.join(admin2_biomass)
concat_df.head()

	OBJECTID	ISO Code	shapeName	Level	shapeID	shapeGroup	shapeType	geometry	max	mean	count
0	1	GN-BE	Beyla	ADM2	GIN-ADM2-49546643B63767081	GIN	ADM2	POLYGON ((-8.24559 8.44255, -8.24158 8.45044, ...	565.0	51.445738	1276378
1	2	GN-BF	Boffa	ADM2	GIN-ADM2-49546643B69790359	GIN	ADM2	MULTIPOLYGON (((-13.77147 9.84445, -13.76994 9...	546.0	46.846892	527269
2	3	GN-BK	Boke	ADM2	GIN-ADM2-49546643B67680147	GIN	ADM2	MULTIPOLYGON (((-14.57512 10.76872, -14.57633 ...	513.0	48.862863	1107831
3	4	GN-C	Conakry	ADM2	GIN-ADM2-49546643B26553537	GIN	ADM2	MULTIPOLYGON (((-13.78686 9.46592, -13.79013 9...	345.0	31.315987	41660
4	5	GN-CO	Coyah	ADM2	GIN-ADM2-49546643B29309121	GIN	ADM2	POLYGON ((-13.49399 9.53945, -13.4805 9.55304,...	494.0	47.809879	116515

By sorting the results, we can identify those top districts with the highest mean AGB.

concat_df_sorted = concat_df.sort_values(by="mean", ascending=False)
concat_df_sorted.head()

	OBJECTID	ISO Code	shapeName	Level	shapeID	shapeGroup	shapeType	geometry	max	mean	count
33	34	GN-YO	Yomou	ADM2	GIN-ADM2-49546643B32761429	GIN	ADM2	POLYGON ((-9.34981 7.75681, -9.34896 7.7535, -...	588.0	131.323274	326021
24	25	GN-MC	Macenta	ADM2	GIN-ADM2-49546643B91718973	GIN	ADM2	POLYGON ((-8.95774 8.77472, -9.01024 8.79308, ...	623.0	123.937151	814877
17	18	GN-KS	Kissidougou	ADM2	GIN-ADM2-49546643B39508892	GIN	ADM2	POLYGON ((-10.45426 9.10945, -10.45334 9.08925...	590.0	90.110284	630818
13	14	GN-GU	Gueckedou	ADM2	GIN-ADM2-49546643B59147082	GIN	ADM2	POLYGON ((-10.59971 9.05848, -10.59402 9.05494...	563.0	89.595375	429232
28	29	GN-NZ	Nzerekore	ADM2	GIN-ADM2-49546643B97455025	GIN	ADM2	POLYGON ((-8.93454 8.25441, -8.93687 8.25503, ...	592.0	86.547049	413435

Visualizing the results with a choropleth map

Now, let’s visualize the results!

import hvplot.pandas

# renaming the shapeName to District for improved legend
concat_df.rename(columns={"shapeName": "District"}, inplace=True)

agb = concat_df.hvplot(
    c="mean",
    width=900,
    height=500,
    geo=True,
    hover_cols=["mean", "District"],
    cmap="viridis",
    hover_fill_color="white",
    line_width=1,
    title="Mean Aboveground Woody Biomass per Guinean District (Mg ha-1)",
    tiles="CartoLight",
)

agb

By hovering over the map, we can identify the names and mean AGB per district.

Summary

In this case study we have successfully performed zonal statistics on the NCEO AGB dataset in Guinea and displayed the results on a choropleth map. The results of this analysis can dispaly those districts which contain the greatest average amount of AGB and should be prioritized for forest protection efforts.