8000 Converters: Source geometries or "improved" versions? · Issue #119 · fiboa/cli · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Converters: Source geometries or "improved" versions? #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
m-mohr opened this issue Nov 14, 2024 · 6 comments · Fixed by #114
Open

Converters: Source geometries or "improved" versions? #119

m-mohr opened this issue Nov 14, 2024 · 6 comments · Fixed by #114
Assignees

Comments

@m-mohr
Copy link
Contributor
m-mohr commented Nov 14, 2024

Originally posted by @m-mohr in #117 (comment)

A general question (also applies to FI and EE currently). Do we already fix geometries in converters or do we use fiboa improve -g when needed? I slightly tend towards the latter, it would allow to get the original geometries out of the converters if needed.

On the other hand, we also have this explode_multipolygon option which we seem to set very randomly... I'm wondering whether that should also move to the improve toolset?

@ivorbosloper wrote:

The question for explode_multipolygon is, do we always want "normal" polygons, or do we also allow multipolygons in our datasets? I always disliked multipolygons for crop fields/parcels, but that might be my bias.

And invalid polygons give you problems everywhere and all the time. I want to get rid of invalidity as early as possible...

@m-mohr wrote:

My thinking was that in convert we keep the source geometries as they are and then allow people to fix things when needed. > If we want consistency, the alternative would be to always run make_valid and explode_multipolygon to ensure consistent datasets. Right now it seems pretty author specific, which options are used in the converters, which is not ideal.

@cholmes I think you said at some point that converters should do less and we should push more into additional CLI commands? Is that also true for this case?

@ivorbosloper wrote:

One of the ideas behind collecting all these datasets is to make it really easy to use them. If all data is consistent valid and to a lesser degree also polygon, you don't have to fix data yourself. Given the documentation (data-survey and converter), someone can go back to the original file if required.

But it depends what we want to create.

@m-mohr
Copy link
Contributor Author
m-mohr commented Nov 14, 2024

I'd say for source coop deployment we should indeed run the improve command, because otherwise files may be invalid actually. For me converters and the actual deployment are two different topics.

I'm thinking about FTW, if they may struggle to correctly judge/select examples if we modified some of the geometries. It would give them the option to themselves create original geometries fiboa compliant, but we can still run improve for deployment.


Edit, I have second thoughts: Maybe we should actually guide users more and provide valid polygons by default. Returning invalid fiboa after conversion is also less ideal for tests.

Alternative is an option in the convert command to run both options with a default to run them, but then we can probably remove the option in the improve command.

I tend towards running make_valid and explode_polygon by default in the converters with an option to disable it, I think.

@m-mohr m-mohr changed the title Converters: Source geometries of "improved" versions? Converters: Source geometries or "improved" versions? Nov 14, 2024
@cholmes
Copy link
Contributor
cholmes commented Nov 14, 2024

I do generally now lean towards keeping the converters as minimal as possible - don't do anything except get into fiboa, so then all the potential 'improve' and 'filter' are up to the user.

But I can definitely get behind make_valid and explode_polygon by default. I think ideally we'd include in the core spec that polygons should be valid. And potentially also that it's just single polygons - but we should probably make an issue and discuss the reasoning.

I do also think it could be interesting to have a way for a converter author to include the 'improve' / 'filter' commands they recommend, and even make that the 'default'. Like fiboa convert ec_lt --no-improvements would just do the basics, fiboa convert ec_lt would include the filtering of the crop types (not include pastures, forests, etc), would add in hcat extension. And then someone could do --no-improvements and then apply the improve and filter they want.

@m-mohr m-mohr self-assigned this Nov 14, 2024
@m-mohr m-mohr added this to fiboa Nov 14, 2024
@github-project-automation github-project-automation bot moved this to Backlog in fiboa Nov 14, 2024
@m-mohr m-mohr moved this from Backlog to In Progress in fiboa Nov 14, 2024
@m-mohr
Copy link
Contributor Author
m-mohr commented Nov 14, 2024

Okay, cool, I'll add an option for the converters to run make_valid and explode_multipolygons by default.

m-mohr added a commit that referenced this issue Nov 15, 2024
* Add fiboa improve command #79 #21

* Write custom schemas to fiboa metadata for use in improve/merge/etc. #113 and minor fixes

* Make geometries valid and explode to Polygons by default #119

* Explode polygons option for improve command

* Add minimal test

* Fix pick_schemas
@m-mohr
Copy link
Contributor Author
m-mohr commented Nov 15, 2024

I made the changes in #114

@ivorbosloper Do you know whether gdf.explode has the side-effect that the IDs will not be unique any longer? That seems not ideal and would actually be invalid according to the spec...

@m-mohr m-mohr linked a pull request Nov 15, 2024 that will close this issue
@ivorbosloper
Copy link
Collaborator

@m-mohr Yes, you're right. The geodataframe.explode function has flags ignore_index and index_parts to deal with this challenge (either ignore the index and start counting at 1 or create a multi-level index for additional parts). Using these options will change the ids, so we should only use them in rare circumstances. So it should be a Converter option or we should 'detect' possible duplicate ids. I think, given the index_parts option, it would be nicest if a 2-part multipolygon with id=123 would get become 123_1 and 123_2 , and all the single-part polygons could keep their original id.

@m-mohr
Copy link
Contributor Author
m-mohr commented Mar 7, 2025

Yes, I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants
0