Undescriptive error message for undefined group #623

RubelMozumder · 2025-04-10T10:53:00Z

This PR address issue #599

And also reorganise the code in function _put_data_into_hdf5 of write.py

RubelMozumder · 2025-04-14T07:27:53Z

@rettigl, you can have a quick look at this PR. The current implementation will raise an error

OSError: Unknown error occurred writing the path: /ENTRY[entry]/entry_identifier/identifier with the following message: NXDL attribute `type` not found for group /ENTRY[entry]/entry_identifier.
Hint: Follow the convention `fixednameVARIACKPART[fixedname_given_name]` with exact group name `fixednameVARIACKPART` in NXDL file.

rettigl

This unfortunately still don't fix the problem, that the writer fails.
In addition, the provided error message does not help to solve the problem.

rettigl · 2025-04-14T08:47:32Z

src/pynxtools/dataconverter/writer.py

-    if "shape" in data.keys():
+    if file is not None and "shape" in data.keys():


I don't understand, how is this related to the issue?

rettigl · 2025-04-14T08:57:22Z

src/pynxtools/dataconverter/writer.py

+                try:
+                    grp.attrs["NX_class"] = attrs["type"]
+                except KeyError as e:
+                    raise NxdlAttributeNotFoundError(
+                        f"NXDL attribute `type` not found for group {parent_path}.\n"
+                        f"Hint: Follow the convention `fixednameVARIACKPART[fixedname_given_name]` "
+                        f"with exact group name `fixednameVARIACKPART`in NXDL file."
+                    ) from e
+


I still don't find this error message helpful. Why is an error raised in the first place?
entry_identifier is now defined as a field in NXentry, while the writer wants to create a group.
This should ideally already be marked as definition violation by the validation routine (I think #621 fixes this), but the writer should just write what it gets. If the user wants to write a group with a name of a defined field, the writer should still do it, I would say, and not fail.
Your error message still does not help me solve the underlying problem as user.

That also came in my mind, writer will write the nexus file even the fields and groups are not defined at all.

If this is the case, we may write GROUP_NOT_FOUND something like that.

entry_identifier is now defined as a field in NXentry, while the writer wants to create a group. This should ideally already be marked as definition violation by the validation routine (I think #621 fixes this), but the writer should just write what it gets.

Fully agree on this, the writer should not need to have this sort of information. In the validation, we really shouldn't allow groups to be called the same as another non-variadic field, this will also be very tricky to handle in NOMAD (see #437). This should be caught and at least a warning should be raised. The question is then if the conversion should fail with an error or if we just write an undocumented group instead. For such a group, we can't define an NX_CLASS attribute (since this new group doesn't fit to any defined class). You can write valid NeXus files without the NX_CLASS attribute though (see nexusformat/definitions#1493), so we should maybe just warn in the writer that this attribute doesn't exist, but continue writing the file.

Sure. I converted that error in warning massage. So, It will neither write the type attribute in nexus file nor stop writing the nexus file.

The message you are throwing still doesn't make any sense for what is the actual problem though.

What I understand, the actual problem is the entry_identifier (or any other group) does not have any group type, whether it is a group of variadic name type or an undefined group.

E.g. if someone writes
.../OBJECT[entry_identifier]/... or .../COLLECTION[entry_identifier]/... which follows the rule of annotating a group of NXobject OR NXcollection type and correct convention.

The issue #599 is an edge case that uses an old format of template key without any group type. Although, all proper warnings come from the validation step.

I see this warning is an extra warning, which is pretty much not needed, in any case, if the validator is unable to detect it.
Or are there some extra points currently missing?

The actual issue is that entry_identifier is defined here https://github.com/FAIRmat-NFDI/nexus_definitions/blob/297e81c8188434403750f7e1cb98fbbfec6c5d8f/base_classes/NXentry.nxdl.xml#L130 with a deprication attribute. Thus, attributes is defined, yet does not contain type, thus the error.
Properly, we should check for the existence of type in attribute.keys before accessing it. Then we also don't need an error or a try/catch block.

Oh, I see what you mean now. This will indeed be solved by #621

Addressed! The try block is removed.

rettigl · 2025-04-14T08:58:03Z

src/pynxtools/dataconverter/writer.py

+            if path.endswith("/@units"):
+                dataset_or_grp.attrs["units"] = d_value
+            else:
+                dataset_or_grp.attrs[entry_name[1:]] = d_value


What issue do these changes solve? I don't understand it.

It does not solve anything. This separates the concept handling of group and field from that of attributes and units.
I also mentioned that in the PR description.

Why do they need to be separated? What do we gain from this? If we have so many changes to the code, it would be good to understand the reasoning for why these changes are made.

RubelMozumder · 2025-04-14T18:25:32Z

Encountering an issue related with @lur_cache at get_inherited_node.
This function returns some old stored values rather than recalculating it. Beause of this behavior, get_inherited_node does not return full list of the base class (skips NXresolution).

'WARNING: NXDL attribute `type` not found for group /ENTRY[entry]/INSTRUMENT[instrument]/RESOLUTION[wavelength_resolution].\nHint: Follow the convention `fixednameVARIACKPART[fixedname_given_name]` with exact group name `fixednameVARIACKPART`in NXDL file.'

The error is from raman reader. Though the group is available in NXoptical_spcetroscopy and the group was not been written in raman nexus file. I have seen such misbehavior from lru_cache earlier.

@lukaspie, currently I have a few solutions in my mind.

To liberal convert_nexus caplog level and set it to Error
Remove the warning from the writer
Deactivate the @lru_cache in dev_tools in nexus_definitions (I will never go for that)

Both solutions 1 and 2 look more or less similar. But I would prefer solution 1, because tests are only for us to prevent unwanted changes from developers. But, warning is important for us and users equally.

Let's discuss it tomorrow.

rettigl · 2025-04-14T21:01:51Z

Encountering an issue related with @lur_cache at get_inherited_node. This function returns some old stored values rather than recalculating it. Beause of this behavior, get_inherited_node does not return full list of the base class (skips NXresolution).
'WARNING: NXDL attribute `type` not found for group /ENTRY[entry]/INSTRUMENT[instrument]/RESOLUTION[wavelength_resolution].\nHint: Follow the convention `fixednameVARIACKPART[fixedname_given_name]` with exact group name `fixednameVARIACKPART`in NXDL file.'
The error is from raman reader. Though the group is available in NXoptical_spcetroscopy and the group was not been written in raman nexus file. I have seen such misbehavior from lru_cache earlier.

@lukaspie, currently I have a few solutions in my mind.
1. To liberal `convert_nexus` caplog level and set it to `Error`

2. Remove the warning from the writer

3. Deactivate the `@lru_cache` in `dev_tools` in `nexus_definitions` (I will never go for that)
Both solutions 1 and 2 look more or less similar. But I would prefer solution 1, because tests are only for us to prevent unwanted changes from developers. But, warning is important for us and users equally.

Let's discuss it tomorrow.

I believe this is not related to the lru_cache. Even if I disable it, I get the same behavior.
The issue arises, because NXresolution is not defined as a variadic name concept anywhere, but only the named concept wavelengh_resolution is defined in NXoptical_spectroscopy.
So (https://github.com/FAIRmat-NFDI/pynxtools/blob/599-undescriptive-error-message/src/pynxtools/dataconverter/writer.py#L222) returns /ENTRY/INSTRUMENT/RESOLUTION, which indeed is not defined in the tree.
This is quite related to #621 and will probably turn up in similar cases, where named concepts are written with concept definition in the template.
Changing RESOLUTION[wavelength_resolution] to wavelength_resolution in the raman rod reader fixes the conversion issue (validation fails then, though).
Note that the reference file does not contain a class definition for this group:

So the problem is indeed one we need to fix @lukaspie

RubelMozumder · 2025-04-15T09:43:54Z

Encountering an issue related with @lur_cache at get_inherited_node. This function returns some old stored values rather than recalculating it. Beause of this behavior, get_inherited_node does not return full list of the base class (skips NXresolution).
'WARNING: NXDL attribute `type` not found for group /ENTRY[entry]/INSTRUMENT[instrument]/RESOLUTION[wavelength_resolution].\nHint: Follow the convention `fixednameVARIACKPART[fixedname_given_name]` with exact group name `fixednameVARIACKPART`in NXDL file.'
The error is from raman reader. Though the group is available in NXoptical_spcetroscopy and the group was not been written in raman nexus file. I have seen such misbehavior from lru_cache earlier.
@lukaspie, currently I have a few solutions in my mind.
1. To liberal `convert_nexus` caplog level and set it to `Error`

2. Remove the warning from the writer

3. Deactivate the `@lru_cache` in `dev_tools` in `nexus_definitions` (I will never go for that)
Both solutions 1 and 2 look more or less similar. But I would prefer solution 1, because tests are only for us to prevent unwanted changes from developers. But, warning is important for us and users equally.
Let's discuss it tomorrow.
I believe this is not related to the lru_cache. Even if I disable it, I get the same behavior. The issue arises, because NXresolution is not defined as a variadic name concept anywhere, but only the named concept wavelengh_resolution is defined in NXoptical_spectroscopy. So (https://github.com/FAIRmat-NFDI/pynxtools/blob/599-undescriptive-error-message/src/pynxtools/dataconverter/writer.py#L222) returns /ENTRY/INSTRUMENT/RESOLUTION, which indeed is not defined in the tree. This is quite related to #621 and will probably turn up in similar cases, where named concepts are written with concept definition in the template. Changing RESOLUTION[wavelength_resolution] to wavelength_resolution in the raman rod reader fixes the conversion issue (validation fails then, though). Note that the reference file does not contain a class definition for this group: So the problem is indeed one we need to fix @lukaspie

Yeah, this error is coming from group RESOLUTION[wavelength_resolution]. Unfortunately, while debugging, I logically found the issue was coming from lru_cache, though I did not check it. I fixed the test data for raman reader.

RubelMozumder · 2025-04-25T10:21:15Z

@rettigl and @lukaspie would you go through it again?

lukaspie · 2025-04-25T14:03:57Z

src/pynxtools/dataconverter/writer.py

+            if path.endswith("/@units"):
+                dataset_or_grp.attrs["units"] = d_value
+            else:
+                dataset_or_grp.attrs[entry_name[1:]] = d_value


Why do they need to be separated? What do we gain from this? If we have so many changes to the code, it would be good to understand the reasoning for why these changes are made.

lukaspie · 2025-04-25T14:13:19Z

src/pynxtools/dataconverter/writer.py

+                    "NXDL attribute `type` not found for group %s.\n"
+                    "Hint: Follow the convention `fixednameVARIADICPART[fixedname_given_name]` "
+                    "or `non_variadic_group` where `fixednameVARIADICPART` is a variadic name "
+                    "and `non_variadic_group` is a fixed name of the group defined in NXDL file "
+                    "Or  no such group exists in NXDL file.",
+                    parent_path,
+                )


I still don't think that error message makes that much sense. The first case (i.e., not using fixednameVARIADICPART[fixedname_given_name] for a concept that has a renameable part) is now fixed by #621, so I don't think we will get to this point if you don't use it properly. Thus, that part of the error message can be removed. I think if you rebase on top of master, such a problem will not occur here anymore.

The actual issue is if we try to attach a field for a group that doesn't exist, but for which the same name is used as a field. For example, using "/ENTRY/entry_identifier/identifier" as a key. entry_identifier is a field with XML attribute deprecated in NXentry, but with this notation we are using it as if it were a named group. Obviously, a field cannot have a sub-field itself, so that will lead to problems. We should instead raise an error that we are trying to write a group with a name that is reserved for a field instead.

This is something that should be caught by the validation instead of in the writer IMO.

We should not allow giving a group some name that is defined as the name of an field/attribute on the same level in general. This may be technically allowed in NeXus, but will break the whole NOMAD integration (since groups are (sub)sections and fields are quantities).

Yeah, we can remove the error massage 👍, as this message mainly says the invalid group or misdefined group. This sort of wanrinig is part of the validation. Here, writer will write the nexus file, if anything is wrong with the writing process that should be handeled here. To check a group, a field or attribute has proper annotation is not part of the writer.

However, in a point I am a bit confused that is raising error wrting a group with the same name of attribute or field (which may come from mistyping, or just a regular error). In that case, group would not get proper nx_class name in such case NOMAD just will overlook that group in registering. Because, nomad can not match that group with any of the subsection in nomad metainfo.

We should not allow giving a group some name that is defined as the name of an field/attribute on the same level in general. This may be technically allowed in NeXus, but will break the whole NOMAD integration (since groups are (sub)sections and fields are quantities).

I agree, this creates the whole problem in the first place. We could consider to allow it to pass, but raise a strong and descriptive warning message.
With the current master, this still gets through validation with
WARNING: Field /ENTRY[entry]/entry_identifier/identifier written without documentation.
And no hint to the conflicting concept, and then breaks in the writer.

I would suggest to do three things:

Add a check for conflicting concepts to the validation with clear and descriptive warning messages

Catch any issue this might create in the writer (consider e.g. also a field of a name being defined, and the same name also as a group), also with clear and descriptive error messages

Any other changes to the code that don't relate to this specific problem I suggest to move to a different PR. It's very hard to review this way, because most changes you made don't relate to the problem as far as I can tell.

In the current master, you can actually provide something like:

"/ENTRY/identifier_entry": "@attrs:metadata/loader/scan_path", "/ENTRY/COLLECTION[entry_identifier]": { "identifier":"@attrs:metadata/loader/scan_path" },

And it breaks writing, but in a different fashion:

WARNING: Given field name 'COLLECTION' conflicts with the non-variadic name 'entry_identifier (opt)' WARNING: Field /ENTRY[entry]/COLLECTION[entry_identifier]/identifier written without documentation.

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/writer.py:294, in Writer._put_data_into_hdf5(self) 293 else: --> 294 dataset = grp.create_dataset(entry_name, data=data) 295 except InvalidDictProvided as exc: File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/.venv/lib/python3.10/site-packages/h5py/_hl/group.py:183, in Group.create_dataset(self, name, shape, dtype, data, **kwds) 181 group = self.require_group(parent_path) --> 183 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds) 184 dset = dataset.Dataset(dsid) File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/.venv/lib/python3.10/site-packages/h5py/_hl/dataset.py:165, in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, dapl, efile_prefix, virtual_prefix, allow_unknown_filter, rdcc_nslots, rdcc_nbytes, rdcc_w0, fill_time) 163 sid = h5s.create_simple(shape, maxshape) --> 165 dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl) 167 if (data is not None) and (not isinstance(data, Empty)): File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper() File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper() File h5py/h5d.pyx:136, in h5py.h5d.create() ValueError: Unable to synchronously create dataset (name already exists) The above exception was the direct cause of the following exception: OSError Traceback (most recent call last) Cell In[17], line 9 2 res = sps.load_scan( 3 scan=scan, 4 metadata=metadata, 5 collect_metadata=True, 6 # iterations=[0, 3, 6], 7 ) 8 # Save scan ----> 9 sps.save(f"Scan{scan}.nxs") 10 # Some plots 11 if len(res.dims) == 2: File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/specsanalyzer/src/specsscan/core.py:668, in SpecsScan.save(self, faddr, **kwds) 665 if "eln_data" in kwds: 666 input_files.append(kwds.pop("eln_data")) --> 668 to_nexus( 669 data=self._result, 670 faddr=faddr, 671 reader=reader, 672 definition=definition, 673 input_files=input_files, 674 **kwds, 675 ) 677 else: 678 raise NotImplementedError( 679 f"Unrecognized file format: {extension}.", 680 ) File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/specsanalyzer/src/specsanalyzer/io.py:426, in to_nexus(data, faddr, reader, definition, input_files, **kwds) 423 else: 424 input_files = tuple(input_files) --> 426 convert( 427 input_file=input_files, 428 objects=(data), 429 reader=reader, 430 nxdl=definition, 431 output=faddr, 432 **kwds, 433 ) File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/convert.py:246, in convert(input_file, reader, nxdl, output, skip_verify, **kwargs) 236 data = transfer_data_into_template( 237 input_file=input_file, 238 reader=reader, (...) 242 **kwargs, 243 ) 245 helpers.add_default_root_attributes(data=data, filename=os.path.basename(output)) --> 246 Writer(data=data, nxdl_f_path=nxdl_f_path, output_path=output).write() 248 logger.info(f"The output file generated: {output}.") File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/writer.py:342, in Writer.write(self) 340 """Writes the NeXus file with previously validated data from the reader with NXDL attrs.""" 341 try: --> 342 self._put_data_into_hdf5() 343 finally: 344 self.output_nexus.close() File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/writer.py:298, in Writer._put_data_into_hdf5(self) 296 print(str(exc)) 297 except Exception as exc: --> 298 raise IOError( 299 f"Unknown error occured writing the path: {path} " 300 f"with the following message: {str(exc)}" 301 ) from exc 303 for links in hdf5_links_for_later: 304 dataset = handle_dicts_entries(*links) OSError: Unknown error occured writing the path: /ENTRY[entry]/entry_identifier with the following message: Unable to synchronously create dataset (name already exists)

WARNING: Given field name 'COLLECTION' conflicts with the non-variadic name 'entry_identifier (opt)'

This is already correctly throwing a warning (should be an error though), but it should probably tell you that one is a field (defined and deprcated in `NXentry´) and the other is a new group.

We should not allow giving a group some name that is defined as the name of an field/attribute on the same level in general. This may be technically allowed in NeXus, but will break the whole NOMAD integration (since groups are (sub)sections and fields are quantities).

I agree, this creates the whole problem in the first place. We could consider to allow it to pass, but raise a strong and descriptive warning message. With the current master, this still gets through validation with WARNING: Field /ENTRY[entry]/entry_identifier/identifier written without documentation. And no hint to the conflicting concept, and then breaks in the writer.

I would suggest to do three things:

Add a check for conflicting concepts to the validation with clear and descriptive warning messages

Catch any issue this might create in the writer (consider e.g. also a field of a name being defined, and the same name also as a group), also with clear and descriptive error messages

Any other changes to the code that don't relate to this specific problem I suggest to move to a different PR. It's very hard to review this way, because most changes you made don't relate to the problem as far as I can tell.

Yeah, I can create two PRs referring to this PR.

I think both of the warnings are properly described the issue here.
E.g., the warning WARNING: Field /ENTRY[entry]/entry_identifier/identifier written without documentation says the correct warning. Indeed, there is not such group entry_identifier and even no such field.

We should not allow this in the first place if there is also a field called entry_identifier defined at the same level.

RubelMozumder · 2025-05-20T13:57:45Z

Handled in PR #638.

RubelMozumder linked an issue Apr 10, 2025 that may be closed by this pull request

Undescriptive error message #599

Closed

RubelMozumder mentioned this pull request Apr 10, 2025

Stage where getting error FAIRmat-NFDI/pynxtools-spm#28

Draft

RubelMozumder changed the title ~~remove unused import.~~ Undescriptive error message for undefined grp Apr 13, 2025

RubelMozumder changed the title ~~Undescriptive error message for undefined grp~~ Undescriptive error message for undefined group Apr 13, 2025

RubelMozumder requested a review from sherjeelshabih April 14, 2025 07:28

rettigl requested changes Apr 14, 2025

View reviewed changes

RubelMozumder force-pushed the 599-undescriptive-error-message branch 2 times, most recently from a745cf6 to 3dd0792 Compare April 14, 2025 11:18

RubelMozumder marked this pull request as ready for review April 14, 2025 15:27

RubelMozumder force-pushed the 599-undescriptive-error-message branch 2 times, most recently from d38a706 to e829dd8 Compare April 22, 2025 15:50

RubelMozumder added 11 commits April 25, 2025 11:59

remove unused import.

ff1506e

Add proper warning massage for incorrect group symantic.

d9218ff

update the warning massage.

3ed3e57

Tests are passing.

15d7217

Polish warning massage.

b4166c5

Remove the check.

315664e

Remove uncessary variable.

26bbd0c

convert to warning.

f53a43f

Add file variable and use get instead of try block.

b94e99d

Tiny update.

4ede930

typo

907cf01

RubelMozumder force-pushed the 599-undescriptive-error-message branch from e829dd8 to 907cf01 Compare April 25, 2025 09:59

Warning massage.

40dbe2f

lukaspie reviewed Apr 25, 2025

View reviewed changes

RubelMozumder mentioned this pull request May 7, 2025

Add proper warning. #636

Closed

4 tasks

RubelMozumder closed this May 20, 2025

RubelMozumder deleted the 599-undescriptive-error-message branch May 20, 2025 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Undescriptive error message for undefined group #623

Undescriptive error message for undefined group #623

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		if "shape" in data.keys():
		if file is not None and "shape" in data.keys():

Undescriptive error message for undefined group #623

Undescriptive error message for undefined group #623

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!