Visualizing NCEO Aboveground Woody Biomass 2017 prioritization areas

Explores NCEO Aboveground Woody Biomass priority areas in Guinea using zonal statistics.
Author

Kathryn Berger

Published

March 20, 2023

Run this notebook

You can launch this notebook in VEDA JupyterHub by clicking the link below.

Launch in VEDA JupyterHub (requires access)

Learn more

Inside the Hub

This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.

See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.

Outside the Hub

The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

%run -i 'cognito_login.py'

Approach

  1. Query STAC API and explore item contents for a given collection
  2. Read and access the data
  3. Visualize the collection with hvplot
  4. Run zonal statistics on collection using rasterstats
  5. Visualize resultant zonal statistics on a choropleth map

About the Data

The NCEO Aboveground Woody Biomass 2017 dataset is a map for Africa at 100 m spatial resolution which was developed using a combination of LiDAR, Synthetic Aperture Radar (SAR) and optical based data. Aboveground woody biomass (AGB) plays an key role in the study of the Earth’s carbon cycle and response to climate change. Estimation based on Earth Observation measurements is an effective method for regional scale studies and the results are expressed as dry matter in Mg ha-1.

Important Note: Users of this dataset should keep in mind that the map is a continental-scale dataset, generated using a combination of different remote sensing data types, with a single method for the whole study area produced in 2017. Users, therefore, should understand that accuracy may vary for different regions and vegetation types.

The Case Study - Guinea

Mapping and understanding the spatial distribution of AGB is key to understanding carbon dioxide emissions from tropical deforestation through the loss of woody carbon stocks. The resulting carbon fluxes from these land-use changes and vegetation degradation can have negative impacts on the global carbon cycle. Change analysis between AGB maps overtime can display losses in high biomass forests, due to suspected deforestation and forest degredation.

The forests of southern Guinea are reported to have some of the highest density AGB of any forest in the world and are one of the most threatened ecoregions in Africa. Importantly, this area was also the epicenter of the 2014 Ebola outbreak, which had a lasting impact on the region. There is more and more evidence that human deforestation activities in this area may have accelerated the spread of the deadly virus as a result of increasing human-bat interactions in the region.

In this example we explore the NCEO AGB dataset for 2017, running zonal statistics at the district (administrative 2) level to understand those areas in Guinea that need greatest prioritization for protection and conservation.

Setting up the Environment

To run zonal statistics we’ll need to import a python package called rasterstats into our environment. You can uncomment the following line for installation. This cell needs only needs to be run once.

!pip install rasterstats --quiet

Querying the STAC API

from pystac_client import Client
# Provide STAC API endpoint
STAC_API_URL = "https://openveda.cloud/api/stac/"

# Declare collection of interest - NCEO Biomass
collection = "nceo_africa_2017"

Now let’s check how many total items are available.

search = Client.open(STAC_API_URL).search(collections=[collection])
items = list(search.items())
print(f"Found {len(items)} items")
Found 1 items

This makes sense as there is only one item available: a map for 2017.

# Explore the "cog_default" asset of one item to see what it contains
items[0].assets["cog_default"].to_dict()
{'href': 's3://nasa-maap-data-store/file-staging/nasa-map/nceo-africa-2017/AGB_map_2017v0m_COG.tif',
 'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
 'title': 'Default COG Layer',
 'description': 'Cloud optimized default layer to display on map',
 'raster:bands': [{'scale': 1,
   'nodata': 'inf',
   'offset': 0,
   'sampling': 'area',
   'data_type': 'uint16',
   'histogram': {'max': 429,
    'min': 0,
    'count': 11,
    'buckets': [405348,
     44948,
     18365,
     6377,
     3675,
     3388,
     3785,
     9453,
     13108,
     1186]},
   'statistics': {'mean': 37.58407913145342,
    'stddev': 81.36678677343947,
    'maximum': 429,
    'minimum': 0,
    'valid_percent': 50.42436439336373}}],
 'roles': ['data', 'layer']}

Explore through the item’s assets. We can see from the data’s statistics values that the min and max values for the observed values range from 0 to 429 Mg ha-1.

Reading and accessing the data

Now that we’ve explored the dataset through the STAC API, let’s read and access the dataset itself.

import stackstac
import rioxarray


da = stackstac.stack(items[0])
da
/srv/conda/envs/notebook/lib/python3.11/site-packages/stackstac/prepare.py:408: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  times = pd.to_datetime(
<xarray.DataArray 'stackstac-52df0ce77b6e6dbb5f28f680fe94f581' (time: 1,
                                                                band: 1,
                                                                y: 81025,
                                                                x: 78078)> Size: 51GB
dask.array<fetch_raster_window, shape=(1, 1, 81025, 78078), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/16)
  * time            (time) datetime64[ns] 8B NaT
    id              (time) <U19 76B 'AGB_map_2017v0m_COG'
  * band            (band) <U11 44B 'cog_default'
  * x               (x) float64 625kB -18.27 -18.27 -18.27 ... 51.86 51.86 51.86
  * y               (y) float64 648kB 37.73 37.73 37.73 ... -35.05 -35.05 -35.05
    proj:bbox       object 8B {-35.054059016911935, 51.86423292864056, 37.731...
    ...              ...
    proj:shape      object 8B {81024, 78077}
    start_datetime  <U25 100B '2017-01-01T00:00:00+00:00'
    raster:bands    object 8B {'scale': 1, 'nodata': 'inf', 'offset': 0, 'sam...
    title           <U17 68B 'Default COG Layer'
    description     <U47 188B 'Cloud optimized default layer to display on map'
    epsg            int64 8B 4326
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.05405...
    crs:         epsg:4326
    transform:   | 0.00, 0.00,-18.27|\n| 0.00,-0.00, 37.73|\n| 0.00, 0.00, 1.00|
    resolution:  0.0008983152841195214

In this example, we’ll explore the data contained in the NCEO AGB collection and analyze it for each of the districts in Guinea. To do this we will need to import district (administrative level 2) boundary layers from below. We will use the Humanitarian Data Exchange (HDX) site to retrieve subnational administrative boundaries for Guinea. Specifically, we will use the geoBoundaries-GIN-ADM2_simplified.geojson which can be accessed here and read them in directly using geopandas.

import geopandas as gpd

admin2_gdf = gpd.read_file(
    "https://raw.githubusercontent.com/wmgeolab/geoBoundaries/0f0b6f5fb638e7faf115f876da4e77d8f7fa319f/releaseData/gbOpen/GIN/ADM2/geoBoundaries-GIN-ADM2_simplified.geojson"
)
# check the CRS
print(admin2_gdf.crs)
EPSG:4326

Now we can use the bounds of the admin boundaries to clip the data to a box containing Guinea.

subset = da.rio.clip_box(*admin2_gdf.total_bounds)
# select the band of interest, as there is only one in this dataset we'll select the default
data_band = subset.sel(band="cog_default")
data_band
<xarray.DataArray 'stackstac-52df0ce77b6e6dbb5f28f680fe94f581' (time: 1,
                                                                y: 6104, x: 8289)> Size: 405MB
dask.array<getitem, shape=(1, 6104, 8289), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/17)
  * time            (time) datetime64[ns] 8B NaT
    id              (time) <U19 76B 'AGB_map_2017v0m_COG'
    band            <U11 44B 'cog_default'
  * x               (x) float64 66kB -15.09 -15.09 -15.08 ... -7.642 -7.641
  * y               (y) float64 49kB 12.68 12.68 12.67 ... 7.196 7.195 7.194
    proj:bbox       object 8B {37.73103856358817, 51.86423292864056, -35.0540...
    ...              ...
    start_datetime  <U25 100B '2017-01-01T00:00:00+00:00'
    raster:bands    object 8B {'scale': 1, 'nodata': 'inf', 'offset': 0, 'sam...
    title           <U17 68B 'Default COG Layer'
    description     <U47 188B 'Cloud optimized default layer to display on map'
    epsg            int64 8B 4326
    spatial_ref     int64 8B 0
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.05405...
    resolution:  0.0008983152841195214
import hvplot.xarray

biomass = data_band.squeeze()
biomass

biomass.hvplot(
    x="x",
    y="y",
    coastline=True,
    rasterize=True,
    cmap="viridis",
    widget_location="bottom",
    frame_width=600,
)

Zonal Statistics

This map we created above is great, but let’s focus on which districts (administrative level 2 boundaries) should be prioritized for forest conservation.

Zonal statistics is an operation that calculates statistics on the cell values of a raster layer (e.g., the NCEO AGB dataset) within the zones (i.e., polygons) of another dataset. It is an analytical tool that can calculate the mean, median, sum, minimum, maximum, or range in each zone. The zonal extent, often polygons, can be in the form of objects like administrative boundaries, water catchment areas, or field boundaries.

import pandas as pd
from rasterstats import zonal_stats


admin2_biomass = pd.DataFrame(
    zonal_stats(
        admin2_gdf,
        biomass.values,
        affine=biomass.rio.transform(),
        nodata=biomass.rio.nodata,
        band=1,
    ),
    index=admin2_gdf.index,
)
admin2_biomass
/srv/conda/envs/notebook/lib/python3.11/site-packages/rasterstats/io.py:328: NodataWarning: Setting nodata to -999; specify nodata explicitly
  warnings.warn(
min max mean count
0 0.0 565.0 51.445738 1276378
1 0.0 546.0 46.846892 527269
2 0.0 513.0 48.862863 1107831
3 0.0 345.0 31.315987 41660
4 0.0 494.0 47.809879 116515
5 0.0 422.0 57.583787 530195
6 0.0 548.0 48.555337 317191
7 0.0 438.0 43.717062 1176897
8 0.0 448.0 44.744092 403399
9 0.0 533.0 78.724252 1315404
10 0.0 422.0 46.973538 426420
11 0.0 452.0 50.016374 161593
12 0.0 483.0 52.088640 1145951
13 0.0 563.0 89.595375 429232
14 0.0 503.0 51.326848 1769560
15 0.0 558.0 65.559085 955743
16 0.0 486.0 48.230132 897380
17 0.0 590.0 90.110284 630818
18 0.0 413.0 47.668705 364092
19 0.0 339.0 37.304653 544541
20 0.0 501.0 57.187525 1614332
21 0.0 470.0 46.776737 216368
22 0.0 469.0 57.435206 278955
23 0.0 592.0 71.918267 461576
24 0.0 623.0 123.937151 814877
25 0.0 451.0 45.058698 859223
26 0.0 564.0 74.196642 1042860
27 0.0 406.0 33.353046 1191380
28 0.0 592.0 86.547049 413435
29 0.0 560.0 57.591189 460766
30 0.0 392.0 29.201285 1813376
31 0.0 554.0 57.991285 771431
32 0.0 389.0 49.663329 615108
33 0.0 588.0 131.323274 326021

Now we’ll join the administrative level 2 boundaries to the zonal statistics results, so that we can map the districts on a choropleth map.

concat_df = admin2_gdf.join(admin2_biomass)
concat_df
OBJECTID ISO Code shapeName Level shapeID shapeGroup shapeType geometry min max mean count
0 1 GN-BE Beyla ADM2 GIN-ADM2-49546643B63767081 GIN ADM2 POLYGON ((-8.24559 8.44255, -8.24158 8.45044, ... 0.0 565.0 51.445738 1276378
1 2 GN-BF Boffa ADM2 GIN-ADM2-49546643B69790359 GIN ADM2 MULTIPOLYGON (((-13.77147 9.84445, -13.76994 9... 0.0 546.0 46.846892 527269
2 3 GN-BK Boke ADM2 GIN-ADM2-49546643B67680147 GIN ADM2 MULTIPOLYGON (((-14.57512 10.76872, -14.57633 ... 0.0 513.0 48.862863 1107831
3 4 GN-C Conakry ADM2 GIN-ADM2-49546643B26553537 GIN ADM2 MULTIPOLYGON (((-13.78686 9.46592, -13.79013 9... 0.0 345.0 31.315987 41660
4 5 GN-CO Coyah ADM2 GIN-ADM2-49546643B29309121 GIN ADM2 POLYGON ((-13.49399 9.53945, -13.48050 9.55304... 0.0 494.0 47.809879 116515
5 6 GN-DB Dabola ADM2 GIN-ADM2-49546643B70320134 GIN ADM2 POLYGON ((-10.46739 10.53598, -10.46752 10.545... 0.0 422.0 57.583787 530195
6 7 GN-DL Dalaba ADM2 GIN-ADM2-49546643B47404564 GIN ADM2 POLYGON ((-12.01167 11.29091, -12.03171 11.288... 0.0 548.0 48.555337 317191
7 8 GN-DI Dinguiraye ADM2 GIN-ADM2-49546643B47728803 GIN ADM2 POLYGON ((-10.72063 11.13326, -10.72092 11.144... 0.0 438.0 43.717062 1176897
8 9 GN-DU Dubreka ADM2 GIN-ADM2-49546643B78750611 GIN ADM2 MULTIPOLYGON (((-13.76504 9.82404, -13.75194 9... 0.0 448.0 44.744092 403399
9 10 GN-FA Faranah ADM2 GIN-ADM2-49546643B99428691 GIN ADM2 POLYGON ((-11.38731 10.39356, -11.38273 10.350... 0.0 533.0 78.724252 1315404
10 11 GN-FO Forecariah ADM2 GIN-ADM2-49546643B32851960 GIN ADM2 MULTIPOLYGON (((-13.32015 9.14776, -13.32062 9... 0.0 422.0 46.973538 426420
11 12 GN-FR Fria ADM2 GIN-ADM2-49546643B75641357 GIN ADM2 POLYGON ((-13.76799 10.27884, -13.73119 10.276... 0.0 452.0 50.016374 161593
12 13 GN-GA Gaoual ADM2 GIN-ADM2-49546643B44796554 GIN ADM2 POLYGON ((-13.84293 11.29667, -13.83242 11.291... 0.0 483.0 52.088640 1145951
13 14 GN-GU Gueckedou ADM2 GIN-ADM2-49546643B59147082 GIN ADM2 POLYGON ((-10.59971 9.05848, -10.59402 9.05494... 0.0 563.0 89.595375 429232
14 15 GN-KA Kankan ADM2 GIN-ADM2-49546643B19447005 GIN ADM2 POLYGON ((-8.14727 9.58395, -8.15293 9.58911, ... 0.0 503.0 51.326848 1769560
15 16 GN-KE Kerouane ADM2 GIN-ADM2-49546643B28981869 GIN ADM2 POLYGON ((-8.61661 9.50260, -8.60868 9.51354, ... 0.0 558.0 65.559085 955743
16 17 GN-KD Kindia ADM2 GIN-ADM2-49546643B38105311 GIN ADM2 POLYGON ((-13.11475 9.58669, -13.10890 9.58190... 0.0 486.0 48.230132 897380
17 18 GN-KS Kissidougou ADM2 GIN-ADM2-49546643B39508892 GIN ADM2 POLYGON ((-10.45426 9.10945, -10.45334 9.08925... 0.0 590.0 90.110284 630818
18 19 GN-KB Koubia ADM2 GIN-ADM2-49546643B329053 GIN ADM2 POLYGON ((-11.30453 12.01713, -11.31240 12.021... 0.0 413.0 47.668705 364092
19 20 GN-KN Koundara ADM2 GIN-ADM2-49546643B74925550 GIN ADM2 POLYGON ((-12.82676 12.14425, -12.76880 12.221... 0.0 339.0 37.304653 544541
20 21 GN-KO Kouroussa ADM2 GIN-ADM2-49546643B81289084 GIN ADM2 POLYGON ((-10.46739 10.53598, -10.46733 10.531... 0.0 501.0 57.187525 1614332
21 22 GN-LA Labe ADM2 GIN-ADM2-49546643B47788034 GIN ADM2 POLYGON ((-12.01167 11.29091, -11.98685 11.320... 0.0 470.0 46.776737 216368
22 23 GN-LE Lelouma ADM2 GIN-ADM2-49546643B80531036 GIN ADM2 POLYGON ((-12.99636 11.18952, -12.98648 11.187... 0.0 469.0 57.435206 278955
23 24 GN-LO Lola ADM2 GIN-ADM2-49546643B51651521 GIN ADM2 POLYGON ((-8.46455 8.27185, -8.44429 8.25379, ... 0.0 592.0 71.918267 461576
24 25 GN-MC Macenta ADM2 GIN-ADM2-49546643B91718973 GIN ADM2 POLYGON ((-8.95774 8.77472, -9.01024 8.79308, ... 0.0 623.0 123.937151 814877
25 26 GN-ML Mali ADM2 GIN-ADM2-49546643B68291102 GIN ADM2 POLYGON ((-12.76304 11.85482, -12.74823 11.857... 0.0 451.0 45.058698 859223
26 27 GN-MM Mamou ADM2 GIN-ADM2-49546643B49157402 GIN ADM2 POLYGON ((-11.15547 11.05524, -11.13717 11.074... 0.0 564.0 74.196642 1042860
27 28 GN-MD Mandiana ADM2 GIN-ADM2-49546643B49348937 GIN ADM2 POLYGON ((-8.13614 10.00000, -8.13498 10.00774... 0.0 406.0 33.353046 1191380
28 29 GN-NZ Nzerekore ADM2 GIN-ADM2-49546643B97455025 GIN ADM2 POLYGON ((-8.93454 8.25441, -8.93687 8.25503, ... 0.0 592.0 86.547049 413435
29 30 GN-PI Pita ADM2 GIN-ADM2-49546643B22597757 GIN ADM2 POLYGON ((-12.20899 11.16225, -12.21822 11.152... 0.0 560.0 57.591189 460766
30 31 GN-SI Siguiri ADM2 GIN-ADM2-49546643B98837050 GIN ADM2 POLYGON ((-10.00475 11.40696, -10.00285 11.401... 0.0 392.0 29.201285 1813376
31 32 GN-TE Telimele ADM2 GIN-ADM2-49546643B10795278 GIN ADM2 POLYGON ((-13.65247 10.66825, -13.59967 10.711... 0.0 554.0 57.991285 771431
32 33 GN-TO Tougue ADM2 GIN-ADM2-49546643B67909893 GIN ADM2 POLYGON ((-11.74293 10.98745, -11.70851 11.019... 0.0 389.0 49.663329 615108
33 34 GN-YO Yomou ADM2 GIN-ADM2-49546643B32761429 GIN ADM2 POLYGON ((-9.34981 7.75681, -9.34896 7.75350, ... 0.0 588.0 131.323274 326021

By sorting the results, we can identify those top districts with the highest mean AGB.

concat_df_sorted = concat_df.sort_values(by="mean", ascending=False)
concat_df_sorted.head()
OBJECTID ISO Code shapeName Level shapeID shapeGroup shapeType geometry min max mean count
33 34 GN-YO Yomou ADM2 GIN-ADM2-49546643B32761429 GIN ADM2 POLYGON ((-9.34981 7.75681, -9.34896 7.75350, ... 0.0 588.0 131.323274 326021
24 25 GN-MC Macenta ADM2 GIN-ADM2-49546643B91718973 GIN ADM2 POLYGON ((-8.95774 8.77472, -9.01024 8.79308, ... 0.0 623.0 123.937151 814877
17 18 GN-KS Kissidougou ADM2 GIN-ADM2-49546643B39508892 GIN ADM2 POLYGON ((-10.45426 9.10945, -10.45334 9.08925... 0.0 590.0 90.110284 630818
13 14 GN-GU Gueckedou ADM2 GIN-ADM2-49546643B59147082 GIN ADM2 POLYGON ((-10.59971 9.05848, -10.59402 9.05494... 0.0 563.0 89.595375 429232
28 29 GN-NZ Nzerekore ADM2 GIN-ADM2-49546643B97455025 GIN ADM2 POLYGON ((-8.93454 8.25441, -8.93687 8.25503, ... 0.0 592.0 86.547049 413435

Visualizing the results with a choropleth map

Now, let’s visualize the results!

import hvplot.pandas

# renaming the shapeName to District for improved legend
concat_df.rename(columns={"shapeName": "District"}, inplace=True)

agb = concat_df.hvplot(
    c="mean",
    width=900,
    height=500,
    geo=True,
    hover_cols=["mean", "District"],
    cmap="viridis",
    hover_fill_color="white",
    line_width=1,
    title="Mean Aboveground Woody Biomass per Guinean District (Mg ha-1)",
    tiles="CartoLight",
)

agb

By hovering over the map, we can identify the names and mean AGB per district.

Summary

In this case study we have successfully performed zonal statistics on the NCEO AGB dataset in Guinea and displayed the results on a choropleth map. The results of this analysis can dispaly those districts which contain the greatest average amount of AGB and should be prioritized for forest protection efforts.