File preparation
How to create valid cloud-optimized data files and upload them to the VEDA data store
STEP I: Prepare the data
VEDA supports inclusion of cloud optimized GeoTIFFs (COGs) to its data store.
Create proper Cloud-Optimized GeoTIFFs (COGs)
We often encounter issues like missing or wrong nodata
value, missing coordinate-reference system, missing or wrong overviews - polluted by fill values or not conserving class values in categorical data, empty files, or artifacts in the data.
Discovering these issues early on (ideally before upload to our buckets) can save us all a lot of time.
A command-line tool for creating and validating COGs is rio-cogeo
. The rio-cogeo
documentation is a great starting reference point and have a very helpful guide on preparing COGs, too.
To inspect and validate your raster dataset use the following
rio cogeo info /path/to/file.tif
This will allow you to explore the size in rows and lines, understand the data type (e.g., byte, float, etc.), and verify how
NoData
is defined. Once you have explored the COG’s definitions you can validate it by using the following commands:rio cogeo validate /path/to/file.tif
COG validation can be used to identify if any errors are found. The following steps can be used to resolve any errors.
If your raster contains empty pixels, make sure the
nodata
value is set correctly (check withrio cogeo info
). Thenodata
value needs to be set before cloud-optimizing the raster, so overviews are computed from real data pixels only. Pro-tip: For floating-point rasters, usingNaN
for flagging nodata helps avoid roundoff errors later on.You can set the nodata flag on a GeoTIFF in-place with:
rio edit_info --nodata 255 /path/to/file.tif
or in Python with
import rasterio with rasterio.open("/path/to/file.tif", "r+") as ds: = 255 ds.nodata
Note that this only changes the flag. If you want to change the actual value you have in the data, you need to create a new copy of the file where you change the pixel values.
Make sure the coordinate reference system is embedded in the COG (check with
rio cogeo info
)When creating the COG, use the most appropriate
resampling
method for overviews. For example, useaverage
for continuous / floating point data andmode
for categorical / integer.rio cogeo create --overview-resampling "mode" /path/to/input.tif /path/to/output.tif
Name your files correctly
Make sure that the COG filename is meaningful and contains the datetime associated with the COG in the following format. All the datetime values in the file should be preceded by the _
underscore character. Some examples are shown below:
Single datetime
- Year data:
nightlights_2012.tif
,nightlights_2012-yearly.tif
- Month data:
nightlights_201201.tif
,nightlights_2012-01_monthly.tif
- Day data:
nightlights_20120101day.tif
,nightlights_2012-01-01_day.tif
Datetime range
- Year data:
nightlights_2012_2014.tif
,nightlights_2012_year_2015.tif
- Month data:
nightlights_201201_201205.tif
,nightlights_2012-01_month_2012-06_data.tif
- Day data:
nightlights_20120101day_20121221.tif
,nightlights_2012-01-01_to_2012-12-31_day.tif
Note that the date/datetime value is always preceded by an _
(underscore).
STEP II: Upload to the VEDA data store
Once you have the COGs, obtain permissions (Cognito
credentials) from the VEDA team to upload them to the veda-data-store-staging
bucket.
Upload the data to a sensible location inside the bucket. Example: s3://veda-data-store-staging/<collection-id>/