8000 Add proper warning. by RubelMozumder · Pull Request #636 · FAIRmat-NFDI/pynxtools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add proper warning. #636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

RubelMozumder
Copy link
Collaborator
@RubelMozumder RubelMozumder commented May 7, 2025

PR #623 is readdressed here!
This PR writes log messages considering the following possible cases:

  • A variadic group, or field, or attribute instantiated with a name as a named field, or group, or named attribute.
  • A field or group is redefined in the template as a group or field, respectively.
  • At a level one variadic group and one variadic field instantiated with the same name, e.g., identifierNAME[identifier] and COLLECTION[identifier].
  • Update tests

Dissussion :
In case of a confined group and a field name

  1. Consider the key that comes first to be written in the nexus file
  2. Remove the key that comes later and write the nexus file, though the file is partly correct.

PR #638 is handling the issues noted here.

@RubelMozumder RubelMozumder force-pushed the 599-undescriptive-error-message_3 branch from 743b84e to 7324290 Compare May 14, 2025 08:14
@RubelMozumder RubelMozumder requested review from rettigl and lukaspie and removed request for rettigl May 14, 2025 08:16
@RubelMozumder
Copy link
Collaborator Author
RubelMozumder commented May 14, 2025

I ran the code with entry_identifier field and COLLECTION[entry_identifier] to create the warning massage and error:

WARNING: Defined non-variadic name 'entry_identifier' of field conflicts with the given variadic concept 'COLLECTION' group. Such conflict may break pynxtools writer.
WARNING - WARNING: Defined non-variadic name 'entry_identifier' of field conflicts with the given variadic concept 'COLLECTION' group. Such conflict may break pynxtools writer.
WARNING: Field /ENTRY[entry]/COLLECTION[entry_identifier]/identifier written without documentation.
WARNING - WARNING: Field /ENTRY[entry]/COLLECTION[entry_identifier]/identifier written without documentation.

Error:

  File "/home/rubel/NOMAD-FAIRmat/nomad-distro-dev-RM/packages/pynxtools/src/pynxtools/dataconverter/convert.py", line 415, in convert_cli
    convert(
  File "/home/rubel/NOMAD-FAIRmat/nomad-distro-dev-RM/packages/pynxtools/src/pynxtools/dataconverter/convert.py", line 246, in convert
    Writer(data=data, nxdl_f_path=nxdl_f_path, output_path=output).write()
  File "/home/rubel/NOMAD-FAIRmat/nomad-distro-dev-RM/packages/pynxtools/src/pynxtools/dataconverter/writer.py", line 346, in write
    self._put_data_into_hdf5()
  File "/home/rubel/NOMAD-FAIRmat/nomad-distro-dev-RM/packages/pynxtools/src/pynxtools/dataconverter/writer.py", line 302, in _put_data_into_hdf5
    raise IOError(
OSError: Unknown error occured writing the path: /ENTRY[entry]/entry_identifier with the following message: Unable to synchronously create dataset (name already exists)

@RubelMozumder RubelMozumder force-pushed the 599-undescriptive-error-message_3 branch from 7324290 to 724bc11 Compare May 14, 2025 08:26
@lukaspie
Copy link
Collaborator
lukaspie commented May 14, 2025

I am still not happy with the contents of this PR. IMO, a correct approach would be:

  1. to throw an error on cases where e.g. a field is given that has a name that another group at the same level has
  2. this error should be separate to the existing checks for concept names -> these are two different checks!
  3. if this error is thrown, it should result in that field not to be written at all (i.e., it shall be removed from the template)

I have summarized my ideas in #638. Please have a look and see what you think @RubelMozumder @rettigl.

@RubelMozumder
Copy link
Collaborator Author
RubelMozumder commented May 15, 2025

I am still not happy with the contents of this PR. IMO, a correct approach would be:

  1. to throw an error on cases where e.g. a field is given that has a name that another group at the same level has
  2. this error should be separate to the existing checks for concept names -> these are two different checks!
  3. if this error is thrown, it should result in that field not to be written at all (i.e., it shall be removed from the template)

I have summarized my ideas in #638. Please have a look and see what you think @RubelMozumder @rettigl.

  1. It already raises an error (if the field and group have the same name at the same level) - the error comes from the Nexus writer from the h5py package.
  2. There are two checks:
    a. h5py checks if any dataset or group is overwritten and raise an error (Unknown error occured writing the path: /ENTRY[entry]/entry_identifier with the following message: Unable to synchronously create dataset (name already exists)) which very clearly says duplication of name and name of the concept,
    b. pynxtools validation checks and clearly reports back if there are fields and groups having the same names (such as non variadic field name and veriadic group name) giving proper clue of fields and gorups.
  3. I am not convinced to remove any data or keys from template. It is the task of template provider i.e. pynxtools plugins. This kind of curation may mislead the users.
    a. First thing, this is a severe error from plugin and the sole task of plugin is to provide the clear and correct template.
    b. Until now the task of the dataconverter is to check the template and report back if there is any inconsistency between the supplied data and concepts from technical data model and write the data in a hdf5 file. Here, dataconverter does not know which concept (field or group) users want to keep. Conflicting may come from updating in app def or using old config file or corruption in the raw data file or from eln.

@lukaspie
Copy link
Collaborator

I am still not happy with the contents of this PR. IMO, a correct approach would be:

  1. to throw an error on cases where e.g. a field is given that has a name that another group at the same level has
  2. this error should be separate to the existing checks for concept names -> these are two different checks!
  3. if this error is thrown, it should result in that field not to be written at all (i.e., it shall be removed from the template)

I have summarized my ideas in #638. Please have a look and see what you think @RubelMozumder @rettigl.

  1. It already raises an error (if the field and group have the same name at the same level) - the error comes from the Nexus writer from the h5py package.

To be clear, I think this should be a validation error (and should be logged as such), this should never be passed to the writer in the first place. Note that with the current approach in this PR, we are indeed not raising an error at all, but we still write the group entry_identifier (albeit with a warning) even though it is defined as a field.

  1. There are two checks:
    a. h5py checks if any dataset or group is overwritten and raise an error (Unknown error occured writing the path: /ENTRY[entry]/entry_identifier with the following message: Unable to synchronously create dataset (name already exists)) which very clearly says duplication of name and name of the concept,

This basically just tells you that there already is a group called entry_identifier because you are both definining /ENTRY[entry]/entry_identifier and /ENTRY[entry]/COLLECTION[entry_identifier]. Again, this is a different problem! In one case you define the same object twice (which should be an HDF5 problem), in the other you define a NeXus group which is in the base class defined as a field (this is a validation issue).

b. pynxtools validation checks and clearly reports back if there are fields and groups having the same names (such as non variadic field name and veriadic group name) giving proper clue of fields and gorups.

Where exactly is it doing this? There is only this warning: Defined non-variadic name '{value.name}' of {value.type} conflicts with the given variadic concept '{path}'." Where exactly is this stating that this is a group/field which has a name that conflicts with the name of a field/group elsewhere?

  1. I am not convinced to remove any data or keys from template. It is the task of template provider i.e. pynxtools plugins. This kind of curation may mislead the users.
    a. First thing, this is a severe error from plugin and the sole task of plugin is to provide the clear and correct template.

Yes, this is why we raise a validation error to indicate that this needs fixing. The question is: do we still want to write a correct file or not? IMO, it is fine to remove the key and write a file afterwards which is correct.

b. Until now the task of the dataconverter is to check the template and report back if there is any inconsistency between the supplied data and concepts from technical data model and write the data in a hdf5 file. Here, dataconverter does not know which concept (field or group) users want to keep. Conflicting may come from updating in app def or using old config file or corruption in the raw data file or from eln.

We can only keep concepts/data that are correctly matching with NeXus. Note that we have had logic to remove keys if they are not correct for a while now (see ada259b). I think it's totally fine to do that as long as you tell the user why you are doing it and giving a validation error that can be fixed.

@lukaspie
Copy link
Collaborator

To be clear, my comment above is not necessarily negative with respect to what this PR is doing. We should just clearly discuss what we want to have and what the role of the individual parts (validator, writer, plugins) shall be. Only then we can implement a working solution. Would be more than happy to have further discussions on this.

@RubelMozumder
Copy link
Collaborator Author

To be clear, my comment above is not necessarily negative with respect to what this PR is doing. We should just clearly discuss what we want to have and what the role of the individual parts (validator, writer, plugins) shall be. Only then we can implement a working solution. Would be more than happy to have further discussions on this.

  1. Log level in validation: We can think of two types of validation levels, WARNING and CRITICAL. We should not go for any ERROR, because it does not go with the code principle of validation, which mainly says collect and print all warning messages without interruption. Upon the discussion, we can implement the logic so that pynxtools can stop the conversion process before starting the writing hdf5 file for having critical messages. Though, anyway for such critical warning pynxtools will break the conversion and raise an error from the corresponding code location. In that case, users may loss the opportunity to know the technical details.
  2. Currently, the warning message also includes the type of conflicting concepts field or group. You may have overlooked the warning and error massages, I put in a comment in this PR. Here is the typical warning message WARNING: Defined non-variadic name 'entry_identifier' of field conflicts with the given variadic concept 'COLLECTION' group. Such conflict may break pynxtools writer.
  3. Though, I am not convinced to remove the keys or data from the template, I really want to give that freedom to the pynxtools plugin developer, if, still, we want something like that I would prefer that in respect of user permission by a flag --crurate or something similar.

@rettigl
Copy link
Collaborator
rettigl commented May 19, 2025

If I add

  "/ENTRY/entry_identifier": {
    "identifier":"@attrs:metadata/loader/scan_path"
  },

I get:

WARNING: Field /ENTRY[entry]/entry_identifier/identifier written without documentation.
WARNING: Nexus group /entry/entry_identifier is written without NX_class attribute.

The validation does not pick up that I try to write a group with a name that is defined as field.
If I add a concept, I get:

WARNING: Defined non-variadic name 'entry_identifier' of field conflicts with the given variadic concept 'COLLECTION' group. Such conflict may break pynxtools writer.
WARNING: Field /ENTRY[entry]/COLLECTION[entry_identifier]/identifier written without documentation.

If I have the conflicting field also in the dict, I get:

WARNING: Defined non-variadic name 'entry_identifier' of field conflicts with the given variadic concept 'COLLECTION' group. Such conflict may break pynxtools writer.
WARNING: Field /ENTRY[entry]/COLLECTION[entry_identifier]/identifier written without documentation.

And the conversion crashes with a value error.

@lukaspie
Copy link
Collaborator
  1. Currently, the warning message also includes the type of conflicting concepts field or group. You may have overlooked the warning and error massages, I put in a comment in this PR. Here is the typical warning message WARNING: Defined non-variadic name 'entry_identifier' of field conflicts with the given variadic concept 'COLLECTION' group. Such conflict may break pynxtools writer.

I did not overlook how the errror message is implemented here. Again, as I said multiple times, these are different problems that should result in different validation results! There are two issues:

  • The case where e.g. a field is given that has a name that another group at the same level has -> this should be a validation logging error, returning a separate error message
  • Giving a variadic name (i.e., COLLECTION) for a non-variadic concept, where I . We already have a solution for this.

Again, the first check shall come before the second. Then the second only comes up if you define a concept name of a group that does not match with that of a named group, but not across types. See the logic in #639.

  1. Log level in validation: We can think of two types of validation levels, WARNING and CRITICAL. We should not go for any ERROR, because it does not go with the code principle of validation, which mainly says collect and print all warning messages without interruption. Upon the discussion, we can implement the logic so that pynxtools can stop the conversion process before starting the writing hdf5 file for having critical messages. Though, anyway for such critical warning pynxtools will break the conversion and raise an error from the corresponding code location. In that case, users may loss the opportunity to know the technical details.
  1. Though, I am not convinced to remove the keys or data from the template, I really want to give that freedom to the pynxtools plugin developer, if, still, we want something like that I would prefer that in respect of user permission by a flag --crurate or something similar.

We can still log messages of ERROR level without interrupting the file creation. I am already doing that in #639. Note that we are not raising an error there so the file is anyway created. But we should really log an errorr to strongly indicate to the user that what they are doing is incorrect. The question is: why would we want to write wrong NeXus files afterall? If we don't remove and also don't break the writing on error (by raising), we basically write corrupt NeXus files. That would be super confusing as well.

My preference would be in this order:

  1. removal of wrong key, write valid NeXus file
  2. raise an error if we see a critical logging error
  3. log an error, but write the write file (I would rather not do this)

@RubelMozumder
Copy link
Collaborator Author

Handled in PR #638

@RubelMozumder RubelMozumder deleted the 599-undescriptive-error-message_3 branch May 20, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0