[LANDGRIF-1262] Refactor (removes) tiff_to_h3_table.py
to something more modular
#899
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
General description
Rewrite of the raster to h3 ingestion. Highlights:
psycopg
3 apiTesting instructions
Known issues & a bit of Discussion
The process has 3 bottlenecks currently [seconds running SPAM ingestion]:
DataFrame.join(list[DataFrame])
[57s]So for the SPAM case, which has 51 rasters to ingest (the worst case), the old script takes ~90s and the new approach takes 116s.
For datasets with a single raster it is faster.
All this has to be put into perspective that the longest time in the ingestion is taken by the deforestation (hansen + ghg) part which is 99.9% downloading and preprocessing with gdal and the resulting raster ingested is as small as a single SPAM one.
Thus, the next big step should be move the postprocessed results of deforestation to the cloud.
And why you may ask? I find it a little bit less cluttered and easy to follow to apply modifications safely.
PD.: I almost forget, this PR also removes unused files in the data folder which are quite annoying to have around
Checklist before merging