8000 Follow links and resolve by lukaspie · Pull Request #642 · FAIRmat-NFDI/pynxtools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Follow links and resolve #642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 22, 2025

Conversation

lukaspie
Copy link
Collaborator
@lukaspie lukaspie commented May 21, 2025

@rettigl with this PR, we properly follow links as discussed in #638 (comment) and raise an error if the type of the key does not match the type of the linked entity. This works both for concepts from appdefs as well as from base classes (see the two new tests).

I had to make a lot of changes on the way unfortunately, hence this new PR (pointing to my previous one):

  • Link following was completely changed. Before, we could only follow if a single field within a group was linked, but already a structure like
    "/ENTRY[my_entry]/OPTIONAL_group[some_group]/required_field": {"link": ..."},
    "/ENTRY[my_entry]/OPTIONAL_group[some_group]/optional_field": 1.0,
    
    could not be resolved.
  • Now also following links when we check in the is_documented part for concepts from the base classes.

Further improvements:

  • ValidationProblems are auto-enumerated (meaning we can add new ones anywhere)
  • Keys to removed are stored along the way and then removed directly in the validate_data_dict function (so we don't have to pass them to the converter) -> validate_dict_against just returns a boolean, as before.

Questions:

  • Can you double check that your initial use case works now?
  • Do we want to remove keys where the type of the linked entity doesn't match?
  • More broadly, do we want to remove fields/groups if they lead to ExpectedGroup/ExpectedField errors?

@lukaspie lukaspie requested a review from rettigl May 21, 2025 15:00
@lukaspie lukaspie changed the title add a separate error for names with wrong type and remove such keys Follow links and resolve May 21, 2025
ValidationProblem.ExpectedGroup,
None,
)
# collector.collect_and_log(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if not matching?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove, we should log.

@rettigl
Copy link
Collaborator
rettigl commented May 21, 2025

I don't have time for a full review, but these are my observations:

  • Valid "field"-level link:
    "RESOLUTION[angular_resolution]": {
      "resolution": "@link:/entry/instrument/electronanalyzer/angular_resolution/resolution"
    },

gives:

File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/validation.py:763, in validate_dict_against.<locals>.is_documented(key, tree)
    760     keys_to_remove.append(key)
    761     return False
--> 763 elif node.type == "field" and not all(
    764     k.startswith("@") for k in resolved_link[key]
    765 ):
    766     # Field should only have values.
    767     if is_mapping:
    768         collector.collect_and_log(
    769             key,
    770             ValidationProblem.ExpectedField,
    771             None,
    772         )

TypeError: 'float' object is not iterable
  • Other valid "field"-level link:
    "calibrated_axis": "@link:/entry/data/angular0",
    gives:
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/validation.py:764, in <genexpr>(.0)
    760     keys_to_remove.append(key)
    761     return False
    763 elif node.type == "field" and not all(
--> 764     k.startswith("@") for k in resolved_link[key]
    765 ):
    766     # Field should only have values.
    767     if is_mapping:
    768         collector.collect_and_log(
    769             key,
    770             ValidationProblem.ExpectedField,
    771             None,
    772         )

AttributeError: 'numpy.float64' object has no attribute 'startswith'
  • Valid "group" level link works:
    "RESOLUTION[angular_resolution]": "@link:/entry/instrument/electronanalyzer/angular_resolution",
  • Invalid "field" to "group" level
    "RESOLUTION[angular_resolution]": {
      "resolution": "@link:/entry/instrument/electronanalyzer/angular_resolution"
    },

gives correct error:
ERROR: Expected a field at /ENTRY[entry]/INSTRUMENT[instrument]/RESOLUTION[angular_resolution]/resolution but found a group.

  • Invalid "Field" to "Group" link
    "RESOLUTION[angular_resolution]": "@link:/entry/instrument/electronanalyzer/angular_resolution/resolution",
    gives
    ERROR: Expected a group at /ENTRY[entry]/INSTRUMENT[instrument]/RESOLUTION[angular_resolution] but found a field or attribute.

@lukaspie
Copy link
Collaborator Author
lukaspie commented May 22, 2025

I don't have time for a full review, but these are my observations:

  • Valid "field"-level link:
    "RESOLUTION[angular_resolution]": {
      "resolution": "@link:/entry/instrument/electronanalyzer/angular_resolution/resolution"
    },

gives:

File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/validation.py:763, in validate_dict_against.<locals>.is_documented(key, tree)
    760     keys_to_remove.append(key)
    761     return False
--> 763 elif node.type == "field" and not all(
    764     k.startswith("@") for k in resolved_link[key]
    765 ):
    766     # Field should only have values.
    767     if is_mapping:
    768         collector.collect_and_log(
    769             key,
    770             ValidationProblem.ExpectedField,
    771             None,
    772         )

TypeError: 'float' object is not iterable
  • Other valid "field"-level link:
    "calibrated_axis": "@link:/entry/data/angular0",
    gives:
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/validation.py:764, in <genexpr>(.0)
    760     keys_to_remove.append(key)
    761     return False
    763 elif node.type == "field" and not all(
--> 764     k.startswith("@") for k in resolved_link[key]
    765 ):
    766     # Field should only have values.
    767     if is_mapping:
    768         collector.collect_and_log(
    769             key,
    770             ValidationProblem.ExpectedField,
    771             None,
    772         )

AttributeError: 'numpy.float64' object has no attribute 'startswith'

These were coming from the same issue which should be fixed with 1976f30. Namely, the logic for checking linked fields (in the validation against base classes) was ordered incorrectly. Should be fixed now, can you please try again? I also added positive tests for links (both for appdef and base class concepts) in that and the next commit.

@rettigl
Copy link
Collaborator
rettigl commented May 22, 2025

These were coming from the same issue which should be fixed with 1976f30. Namely, the logic for checking linked fields (in the validation against base classes) was ordered incorrectly. Should be fixed now, can you please try again? I also added positive tests for links (both for appdef and base class concepts) in that and the next commit.

It seems to work now.
However, it silently removes keys that are conflicting, and does not inform the user that it does this:
ERROR: Expected a group at /ENTRY[entry]/INSTRUMENT[instrument]/RESOLUTION[angular_resolution] but found a field or attribute.
From this I don't find it obvious that the key will be removed...

@lukaspie
Copy link
Collaborator Author

These were coming from the same issue which should be fixed with 1976f30. Namely, the logic for checking linked fields (in the validation against base classes) was ordered incorrectly. Should be fixed now, can you please try again? I also added positive tests for links (both for appdef and base class concepts) in that and the next commit.

It seems to work now. However, it silently removes keys that are conflicting, and does not inform the user that it does this: ERROR: Expected a group at /ENTRY[entry]/INSTRUMENT[instrument]/RESOLUTION[angular_resolution] but found a field or attribute. From this I don't find it obvious that the key will be removed...

Great that it works. Currently, I have the warning messages disabled, but indeed we remove these keys. This was my question, shall we remove them in the first place? More broadly, do we want to remove fields/groups if they lead to ExpectedGroup/ExpectedField errors (not just from links)? If so, obviously we can enable the messages again.

@rettigl
Copy link
Collaborator
rettigl commented May 22, 2025

This was my question, shall we remove them in the first place? More broadly, do we want to remove fields/groups if they lead to ExpectedGroup/ExpectedField errors (not just from links)?

Maybe we only remove them if 8000 they lead to write errors because of conflicts in the dict? Or we just raise an error in that case.

@lukaspie
Copy link
Collaborator Author

This was my question, shall we remove them in the first place? More broadly, do we want to remove fields/groups if they lead to ExpectedGroup/ExpectedField errors (not just from links)?

Maybe we only remove them if they lead to write errors because of conflicts in the dict? Or we just raise an error in that case.

I think it's not possible to predict if we'll get a write error while we are validating. The original error that we got here only became a problem because of the deprecated attribute.

I think for the original error in the writer we can just do

if nx_class := attrs.get("type"):
    grp.attrs["NX_class"] = nx_class
 else:
    logger.error(f"No attribute 'NX_class' could be written for {parent_path}.")

Then we don't even need to raise anymore and we can keep the keys in the template (at least for now, maybe there will be further problems later).

@lukaspie
Copy link
Collaborator Author
lukaspie commented May 22, 2025

EDIT: if we don't remove such keys, we get warnings like Expected a group at /ENTRY[my_entry]/USER[my_user] but found a field or attribute.

I think this is more confusing than removing the keys.

Copy link
Collaborator
@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I understand all the changes to the logic here. So at least some more comments in the code would be good for later code maintenance.
Otherwise, it appears to do the correct thing (acccording to my few tests).

is_mapping = isinstance(resolved_link[key], Mapping)

if node.type == "group" and not is_mapping:
# Groups must have subelements
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I think this is not necessary, neither by h5 nor by Nexus, no?

Copy link
Collaborator Author
@lukaspie lukaspie May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our case, we don't allow this because we don't have a value to write into the template then. So if we want to ahve groups, we always need to write subkeys.

We could think about allowing to write empty group (by passing an empty dict for example), but currently we don't allow this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, makes sense, was just a thought I had here. Maybe add a comment

ValidationProblem.ExpectedGroup,
None,
)
# collector.collect_and_log(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove, we should log.

@lukaspie lukaspie merged commit a4779ce into 599-undescriptive-error-message-cleanup May 22, 2025
1 check passed
@lukaspie lukaspie deleted the follow-link branch May 22, 2025 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0