import os
import rio_cogeo
import rasterio
import boto3
import requests
Ingestion Workflow for Uploading Data to the VEDA Catalog for the VEDA Dashboard
Approach
This notebook is intented to be used as a reference for data providers who want to add new datasets to the VEDA Dashboard. As always it is important that the data provider has read the documentation for Data Ingestion before moving forward with this notebook example.
For example purposes, we will walk the end user through adding the GEOGLAM June 2023 dataset directly to the VEDA Dashboard.
- Validate the GeoTIFF
- Upload the file to the staging S3 bucket (
veda-data-store-staging
) - Use the workflows-api (staging.openveda.cloud/api/workflows/docs) to generate STAC metadata for the file and add to the staging STAC catalog (staging.openveda.cloud)
When the data has been published to the STAC metadata catalog for this geoglam
collection, which is already configured for the dashboard, it will be available in the VEDA Dashboard
1. Validate data format
Below we will import some geospatial tools for validation and define some of the variables to be used including the TARGET_FILENAME
for the datafile you want to upload. Note that in this example we will demonstrate the ingestion of GEOGLAM’s June 2023 data. It is important that the file you want to upload (e.g., CropMonitor_2023_06_28.tif
) is located in the same repository folder as this notebook.
In the cell below we are using TARGET_FILENAME
to revise the LOCAL_FILE_PATH
into the correct file format as advised in the File preparation
documentation. See example formats in the link provided.
If the LOCAL_FILE_PATH
is already properly formatted, then both LOCAL_FILE_PATH
and TARGET_FILENAME
will be identical.
= "CropMonitor_2023_06_28.tif"
LOCAL_FILE_PATH = 2023, 6
YEAR, MONTH
= f"CropMonitor_{YEAR}{MONTH:02}.tif" TARGET_FILENAME
The following code is used to test whether the data format you are planning to upload is Cloud Optimized GeoTiff (COG) that enables more efficient workflows in the cloud environment. If the validation process identifies that it is not a COG, it will convert it into one.
= rio_cogeo.cog_validate(LOCAL_FILE_PATH)
file_is_a_cog if not file_is_a_cog:
raise ValueError()
print("File is not a COG - converting")
=True) rio_cogeo.cog_translate(LOCAL_FILE_PATH, LOCAL_FILE_PATH, in_memory
2. Upload file to S3
The following code will upload your COG data into veda-data-store-staging
bucket. It will use the TARGET_FILENAME
to assign the correct month and year values we have provided earlier in this notebook, under the geoglam
bucket on S3
.
= boto3.client("s3")
s3 = "veda-data-store-staging"
BUCKET = f"{BUCKET}/geoglam/{TARGET_FILENAME}"
KEY = f"s3://{KEY}"
S3_FILE_LOCATION
if False:
s3.upload_file(LOCAL_FILE_PATH, KEY)
3. Use the workflows-api to add this geoglam item to the staging catalog
For this step, open the workflows API at staging.openveda.cloud/api/workflows/docs in a second browser tab and click the green authorize button at the upper right to authenticate your session with your username and password (you will be temporarily redirected to a login widget and then back to the workflows-api docs). The cells below will guide you through the process of configuiring your request jsons for each endpoint demonstrated and you will copy the cell outputs into the workflows API in your second tab.
3a. Construct dataset definition
Here the data provider will construct the dataset definition (and supporting metadata) that will be used for dataset ingestion. It is imperative that these values are correct and align to the data the provider is planning to upload to the VEDA Platform. For example, make sure that the startdate
and enddate
are realistic (e.g., an "enddate":"2023-06-31T23:59:59Z"
would be an incorrect value for June, as it contains only 31 days).
For further detail on metadata required for entries in the VEDA STAC to work with the VEDA Dashboard, see documentation here. In particular, note recommendations for the fields is_periodic
and time_density
. For example, in the code block below we define the is_periodic
field as False
because we are ingesting only one month of data. Even though we know that the monthly observations are provided routinely by GEOGLAM, we will only have a single file to ingest and so do not have a temporal range of items in the collection with a monthly time density to generate a time picker from the available data.
Note Several OPTIONAL properties are added to this dataset config for completeness. Your dataset json does NOT need to include these optional properties *
assets
*item_assets
*renders
import json
= {
dataset "collection": "geoglam",
"title": "GEOGLAM Crop Monitor",
"data_type": "cog",
"spatial_extent": {
"xmin": -180,
"ymin": -90,
"xmax": 180,
"ymax": 90
},"temporal_extent": {
"startdate": "2020-01-01T00:00:00Z",
"enddate": "2023-06-30T23:59:59Z"
},"license": "MIT",
"description": "The Crop Monitors were designed to provide a public good of open, timely, science-driven information on crop conditions in support of market transparency for the G20 Agricultural Market Information System (AMIS). Reflecting an international, multi-source, consensus assessment of crop growing conditions, status, and agro-climatic factors likely to impact global production, focusing on the major producing and trading countries for the four primary crops monitored by AMIS (wheat, maize, rice, and soybeans). The Crop Monitor for AMIS brings together over 40 partners from national, regional (i.e. sub-continental), and global monitoring systems, space agencies, agriculture organizations and universities. Read more: https://cropmonitor.org/index.php/about/aboutus/",
"is_periodic": False,
"time_density": "month",
## NOTE: email the veda team at veda@uah.edu to upload a new thumbnail for your dataset
"assets": {
"thumbnail": {
"href": "https://thumbnails.openveda.cloud/geoglam--dataset-cover.jpg",
"type": "image/jpeg",
"roles": ["thumbnail"],
"title": "Thumbnail",
"description": "Photo by [Jean Wimmerlin](https://unsplash.com/photos/RUj5b4YXaHE) (Bird's eye view of fields)"
}
},## RENDERS metadata are OPTIONAL but provided below
"renders": {
"dashboard": {
"bidx": [1],
"title": "VEDA Dashboard Render Parameters",
"assets": [
"cog_default"
],"unscale": False,
"colormap": {
"1": [120, 120, 120],
"2": [130, 65, 0],
"3": [66, 207, 56],
"4": [245, 239, 0],
"5": [241, 89, 32],
"6": [168, 0, 0],
"7": [0, 143, 201]
},"max_size": 1024,
"resampling": "nearest",
"return_mask": True
}
},## IMPORTANT update providers for a your data, some are specific to each collection
"providers": [
{"url": "https://data.nal.usda.gov/dataset/geoglam-geo-global-agricultural-monitoring-crop-assessment-tool#:~:text=The%20GEOGLAM%20crop%20calendars%20are,USDA%20FAS%2C%20and%20USDA%20NASS.",
"name": "USDA & Global Crop Monitor Group partners",
"roles": [
"producer",
"processor",
"licensor"
]
},
{"url": "https://www.earthdata.nasa.gov/dashboard/",
"name": "NASA VEDA",
"roles": ["host"]
}
],## item_assets are OPTIONAL but pre-filled here
"item_assets": {
"cog_default": {
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"roles": ["data","layer"],
"title": "Default COG Layer",
"description": "Cloud optimized default layer to display on map"
}
},"sample_files": [
"s3://veda-data-store-staging/geoglam/CropMonitor_202306.tif"
],"discovery_items": [
{"discovery": "s3",
"prefix": "geoglam/",
"bucket": "veda-data-store-staging",
"filename_regex": "(.*)CropMonitor_202306.tif$"
}
]
}
print(json.dumps(dataset, indent=2))
{
"collection": "geoglam",
"title": "GEOGLAM Crop Monitor",
"data_type": "cog",
"spatial_extent": {
"xmin": -180,
"ymin": -90,
"xmax": 180,
"ymax": 90
},
"temporal_extent": {
"startdate": "2020-01-01T00:00:00Z",
"enddate": "2023-06-30T23:59:59Z"
},
"license": "MIT",
"description": "The Crop Monitors were designed to provide a public good of open, timely, science-driven information on crop conditions in support of market transparency for the G20 Agricultural Market Information System (AMIS). Reflecting an international, multi-source, consensus assessment of crop growing conditions, status, and agro-climatic factors likely to impact global production, focusing on the major producing and trading countries for the four primary crops monitored by AMIS (wheat, maize, rice, and soybeans). The Crop Monitor for AMIS brings together over 40 partners from national, regional (i.e. sub-continental), and global monitoring systems, space agencies, agriculture organizations and universities. Read more: https://cropmonitor.org/index.php/about/aboutus/",
"is_periodic": false,
"time_density": "month",
"assets": {
"thumbnail": {
"href": "https://thumbnails.openveda.cloud/geoglam--dataset-cover.jpg",
"type": "image/jpeg",
"roles": [
"thumbnail"
],
"title": "Thumbnail",
"description": "Photo by [Jean Wimmerlin](https://unsplash.com/photos/RUj5b4YXaHE) (Bird's eye view of fields)"
}
},
"renders": {
"dashboard": {
"bidx": [
1
],
"title": "VEDA Dashboard Render Parameters",
"assets": [
"cog_default"
],
"unscale": false,
"colormap": {
"1": [
120,
120,
120
],
"2": [
130,
65,
0
],
"3": [
66,
207,
56
],
"4": [
245,
239,
0
],
"5": [
241,
89,
32
],
"6": [
168,
0,
0
],
"7": [
0,
143,
201
]
},
"max_size": 1024,
"resampling": "nearest",
"return_mask": true
}
},
"providers": [
{
"url": "https://data.nal.usda.gov/dataset/geoglam-geo-global-agricultural-monitoring-crop-assessment-tool#:~:text=The%20GEOGLAM%20crop%20calendars%20are,USDA%20FAS%2C%20and%20USDA%20NASS.",
"name": "USDA & Global Crop Monitor Group partners",
"roles": [
"producer",
"processor",
"licensor"
]
},
{
"url": "https://www.earthdata.nasa.gov/dashboard/",
"name": "NASA VEDA",
"roles": [
"host"
]
}
],
"item_assets": {
"cog_default": {
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"roles": [
"data",
"layer"
],
"title": "Default COG Layer",
"description": "Cloud optimized default layer to display on map"
}
},
"sample_files": [
"s3://veda-data-store-staging/geoglam/CropMonitor_202306.tif"
],
"discovery_items": [
{
"discovery": "s3",
"prefix": "geoglam/",
"bucket": "veda-data-store-staging",
"filename_regex": "(.*)CropMonitor_202306.tif$"
}
]
}
3b. Validate dataset definition
After composing your dataset definition, copy the printed json and paste it into the /dataset/validate
input in the workflows-api docs page in the second tab. Note that if you navigate away from this page you will need to click authorize again.
Choose POST dataset/validate
in the Dataset section of the API docs at staging.openveda.cloud/api/workflows/docs. Click ’Try it Out` and paste your json into the Request body and then Execute
If the json is valid, the response will confirm that it is ready to be published on the VEDA Platform.
3c. Publish to STAC
Now that you have validated your dataset, you can initiate a workflow and publish the dataset to the VEDA Platform.
Choose POST dataset/publish
in the Dataset section of the API docs at staging.openveda.cloud/api/workflows/docs. Click ’Try it Out` and paste your json into the Request body and then Execute
On success, you will recieve a success message containing the id of your workflow, for example
{"message":"Successfully published collection: geoglam. 1 workflows initiated.","workflows_ids":["db6a2097-3e4c-45a3-a772-0c11e6da8b44"]}
Congratulations! You have now successfully uploaded a COG dataset to the VEDA Dashboard. You can now explore the data catalog to verify the ingestion process has worked successfully, as now uploaded data should be ready for viewing and exploration.