8000 Error for multiple different variadic concepts with the same name by lukaspie · Pull Request #646 · FAIRmat-NFDI/pynxtools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Error for multiple different variadic concepts with the same name #646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

lukaspie
Copy link
Collaborator
@lukaspie lukaspie commented May 26, 2025

As discussed in #636 and #638, we should not allow the same instance name for two unnamed groups, e.g. NXuser and NXsample. We implement this here by running a check on the whole mapping table before any of the other validation starts. Note that we have to remove such keys as they will necessarily lead to HDF5 conflicts (HDF5 depends on unique names for groups and datasets).

Error message may need some improvement, wasn't too sure how to report this issue.

@lukaspie lukaspie marked this pull request as ready for review May 26, 2025 19:45
@lukaspie lukaspie requested review from rettigl and RubelMozumder May 26, 2025 19:45
Copy link
Collaborator
@RubelMozumder RubelMozumder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lukaspie lukaspie force-pushed the 643-bug-raise-error-if-there-are-two-variadic-concepts-with-the-same-name branch from 4f264aa to 179bba4 Compare May 27, 2025 20:00
@lukaspie lukaspie force-pushed the 643-bug-raise-error-if-there-are-two-variadic-concepts-with-the-same-name branch from 2553ff4 to 0225333 Compare May 28, 2025 12:24
Copy link
Collaborator
@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, however there is at least one case this does not cover:
If I define the same concept twice, once with a variadic name, once without (for a named concept), the writer will fail, because the entry is already written:
Example:
/ENTRY/INSTRUMENT/energy_resolution/type
and
/ENTRY/INSTRUMENT/RESOLUTION[energy_resolution]/type
both defined.
I think, however, that we don't have to catch such cases, as this is clearly a broken input. I think we probably can never and don't have to make this completly failsafe.

Comment on lines +927 to +928
except TypeError:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we arrive here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is from here, which is used here to remove keys where we have an invalid type for a named concept. It was introduced in #638.

Not the most elegant solution, but works for now. We should probably not even reach the TypeError here, but at least we are covered.


tree = generate_tree_from(appdef)
collector.clear()
find_instance_name_conflicts(mapping, keys_to_remove)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the functions below, you don't pass the keys_to_remove, but use them from the global context. Why is this handled differently here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the function somewhere else first, where the global context was not available. Removed it now.

@rettigl
Copy link
Collaborator
rettigl commented May 28, 2025

The identification of valid or invalid concepts don't seem to fully work yet. If I have two similar groups with several sub-fields:

  "/ENTRY/USER[user]": {
    "name": ,
    "role": ",
    "affiliation": ,
    "address": ,
    "email": 
  },
  "/ENTRY/ILLEGAL[user]": {
    "name": ,
    "role": ",
    "affiliation": ,
    "address": ,
    "email": 
  },

all keys are removed:

WARNING: Instance name 'user' used for multiple different concepts: ILLEGAL, USER. The following keys are affected: /ENTRY[entry]/ILLEGAL[user]/address, /ENTRY[entry]/ILLEGAL[user]/affiliation, /ENTRY[entry]/ILLEGAL[user]/email, /ENTRY[entry]/ILLEGAL[user]/name, /ENTRY[entry]/ILLEGAL[user]/role, /ENTRY[entry]/USER[user]/address, /ENTRY[entry]/USER[user]/affiliation, /ENTRY[entry]/USER[user]/email, /ENTRY[entry]/USER[user]/name, /ENTRY[entry]/USER[user]/role.
WARNING: The key /ENTRY[entry]/ILLEGAL[user]/address will not be written.
WARNING: The key /ENTRY[entry]/ILLEGAL[user]/affiliation will not be written.
WARNING: The key /ENTRY[entry]/ILLEGAL[user]/email will not be written.
WARNING: The key /ENTRY[entry]/ILLEGAL[user]/name will not be written.
WARNING: The key /ENTRY[entry]/ILLEGAL[user]/role will not be written.
WARNING: The key /ENTRY[entry]/USER[user]/address will not be written.
WARNING: The key /ENTRY[entry]/USER[user]/affiliation will not be written.
WARNING: The key /ENTRY[entry]/USER[user]/email will not be written.
WARNING: The key /ENTRY[entry]/USER[user]/name will not be written.
WARNING: The key /ENTRY[entry]/USER[user]/role will not be written.

even though the second one is invalid.

If there is just one sub-key in the valid entry, it works as expected, which is also what the test contains. I suggest to fix this and extend the test by a second sub-key. Maybe also break the test apart into several ones that test different aspects of this functionality.

@lukaspie
Copy link
Collaborator Author
lukaspie commented Jun 5, 2025

This looks good to me, however there is at least one case this does not cover: If I define the same concept twice, once with a variadic name, once without (for a named concept), the writer will fail, because the entry is already written: Example: /ENTRY/INSTRUMENT/energy_resolution/type and /ENTRY/INSTRUMENT/RESOLUTION[energy_resolution]/type both defined. I think, however, that we don't have to catch such cases, as this is clearly a broken input. I think we probably can never and don't have to make this completly failsafe.

I implemented a check (+test) for this now and if the non-concept key is valid, we remove the one with the concept. That is, in your example, we remove /ENTRY/INSTRUMENT/RESOLUTION[energy_resolution]/type and keep /ENTRY/INSTRUMENT/energy_resolution/type.

The identification of valid or invalid concepts don't seem to fully work yet. If I have two similar groups with several sub-fields [...] If there is just one sub-key in the valid entry, it works as expected, which is also what the test contains. I suggest to fix this and extend the test by a second sub-key. Maybe also break the test apart into several ones that test different aspects of this functionality.

This should be fixed now. I also split up the tests to cover the different cases individually.

@lukaspie lukaspie requested a review from rettigl June 5, 2025 12:18
@rettigl
Copy link
Collaborator
rettigl commented Jun 5, 2025

I only checked the two cases I had mentioned here again, this works fine now. LGTM.

Copy link
Collaborator
@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lukaspie lukaspie merged commit 37343ba into master Jun 5, 2025
17 checks passed
@lukaspie lukaspie deleted the 643-bug-raise-error-if-there-are-two-variadic-concepts-with-the-same-name branch June 5, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Raise error if there are two variadic concepts with the same name
3 participants
0