Dataset Ingestion
Guide to ingesting and publishing data to the VEDA data store & STAC API
VEDA uses a centralized Spatio-Temporal Asset Catalog (STAC) for data dissemination and prefers to host datasets in cloud-object storage (AWS S3 in the region us-west-2
) in the cloud-optimized file formats Cloud-Optimized GeoTIFF (COG) and Zarr, which enables viewing and efficient access in the cloud directly from the original datafiles without copies or multiple versions.
Steps for ingesting a dataset
For dataset ingestion, generally four steps are required. Depending on the capacity of the dataset provider, some of the steps can be completed by the VEDA team on request.
The data ingestion process requires Cognito
credentials (username and password). In order to retrieve these credentials, you’ll need to contact a member of the VEDA Data Services Team at veda@uah.edu who can set up an account and credentials for you. The first time you log in using the Cognito Client
, you will be prompted to set a new password.
Complete as many steps of the process as you have capacity or authorization to. You will initially publish to a staging catalog where you can review the data before publishing to the public production catalog. Please follow the steps and guides outlined below:
- Transform datasets to conform with cloud-optimized file formats - see file preparation
- Upload files to storage (may be skipped, if data is cloud-optimized and in
us-west-2
) - Load those records into the STAGING VEDA STAC - see catalog ingestion
- Finally, when you are satisfied with how your data look in the staging catalog, open a veda-data pull request with the configuration you used to publish to the staging catalog. When this PR is approved, the data will be published to the production catalog at openveda.cloud!
Open a dedicated pull request in the veda-data repository. Please read through these docs fully first as you they will help supply the information required to complete the PR. Use this “new dataset” template to open a new issue and get started.
End to end ingest example
For a walk through of the full process outlined above, please refer to this example notebook. This notebook uses the GEOGLAM June 2023
to ingest this file CropMonitor_2023_06_28.tif into VEDA’s staging STAC catalog.
Please use this as a guide for the ingestion process (and required dataset defintions), replacing the GEOGLAM dataset metadata and file with your own data.
Stuck on how to develop compliant metadata records for your dataset?
Checkout the following notebooks and resources to help provide you with the STAC metadata required to create the dataset definitions needed for catalog ingestion.
- How to create STAC Collections: see this example notebook and related STAC conventions
- How to create STAC Items: see this example notebook and conventions.