-
Notifications
You must be signed in to change notification settings - Fork 10
Undescriptive error message for undefined group #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@rettigl, you can have a quick look at this PR. The current implementation will raise an error
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately still don't fix the problem, that the writer fails.
In addition, the provided error message does not help to solve the problem.
if "shape" in data.keys(): | ||
if file is not None and "shape" in data.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, how is this related to the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed!
try: | ||
grp.attrs["NX_class"] = attrs["type"] | ||
except KeyError as e: | ||
raise NxdlAttributeNotFoundError( | ||
f"NXDL attribute `type` not found for group {parent_path}.\n" | ||
f"Hint: Follow the convention `fixednameVARIACKPART[fixedname_given_name]` " | ||
f"with exact group name `fixednameVARIACKPART`in NXDL file." | ||
) from e | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't find this error message helpful. Why is an error raised in the first place?
entry_identifier is now defined as a field in NXentry, while the writer wants to create a group.
This should ideally already be marked as definition violation by the validation routine (I think #621 fixes this), but the writer should just write what it gets. If the user wants to write a group with a name of a defined field, the writer should still do it, I would say, and not fail.
Your error message still does not help me solve the underlying problem as user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also came in my mind, writer will write the nexus file even the fields and groups are not defined at all.
If this is the case, we may write GROUP_NOT_FOUND
something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entry_identifier is now defined as a field in NXentry, while the writer wants to create a group. This should ideally already be marked as definition violation by the validation routine (I think #621 fixes this), but the writer should just write what it gets.
Fully agree on this, the writer should not need to have this sort of information. In the validation, we really shouldn't allow groups to be called the same as another non-variadic field, this will also be very tricky to handle in NOMAD (see #437). This should be caught and at least a warning should be raised. The question is then if the conversion should fail with an error or if we just write an undocumented group instead. For such a group, we can't define an NX_CLASS
attribute (since this new group doesn't fit to any defined class). You can write valid NeXus files without the NX_CLASS
attribute though (see nexusformat/definitions#1493), so we should maybe just warn in the writer that this attribute doesn't exist, but continue writing the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I converted that error in warning massage. So, It will neither write the type
attribute in nexus file nor stop writing the nexus file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message you are throwing still doesn't make any sense for what is the actual problem though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I understand, the actual problem is the entry_identifier
(or any other group) does not have any group type, whether it is a group of variadic name type or an undefined group.
E.g. if someone writes
.../OBJECT[entry_identifier]/...
or .../COLLECTION[entry_identifier]/...
which follows the rule of annotating a group of NXobject
OR NXcollection
type and correct convention.
The issue #599 is an edge case that uses an old format of template
key without any group type. Although, all proper warnings come from the validation step.
I see this warning is an extra warning, which is pretty much not needed, in any case, if the validator
is unable to detect it.
Or are there some extra points currently missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual issue is that entry_identifier is defined here https://github.com/FAIRmat-NFDI/nexus_definitions/blob/297e81c8188434403750f7e1cb98fbbfec6c5d8f/base_classes/NXentry.nxdl.xml#L130 with a deprication attribute. Thus, attributes is defined, yet does not contain type, thus the error.
Properly, we should check for the existence of type in attribute.keys before accessing it. Then we also don't need an error or a try/catch block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see what you mean now. This will indeed be solved by #621
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed! The try block is removed.
if path.endswith("/@units"): | ||
dataset_or_grp.attrs["units"] = d_value | ||
else: | ||
dataset_or_grp.attrs[entry_name[1:]] = d_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What issue do these changes solve? I don't understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not solve anything. This separates the concept handling of group and field
from that of attributes and units
.
I also mentioned that in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do they need to be separated? What do we gain from this? If we have so many changes to the code, it would be good to understand the reasoning for why these changes are made.
a745cf6
to
3dd0792
Compare
Encountering an issue related with @lur_cache at get_inherited_node.
The error is from raman reader. Though the group is available in NXoptical_spcetroscopy and the group was not been written in raman nexus file. I have seen such misbehavior from @lukaspie, currently I have a few solutions in my mind.
Both solutions 1 and 2 look more or less similar. But I would prefer solution 1, because tests are only for us to prevent unwanted changes from developers. But, warning is important for us and users equally. Let's discuss it tomorrow. |
I believe this is not related to the lru_cache. Even if I disable it, I get the same behavior. |
Yeah, this error is coming from group |
d38a706
to
e829dd8
Compare
e829dd8
to
907cf01
Compare
if path.endswith("/@units"): | ||
dataset_or_grp.attrs["units"] = d_value | ||
else: | ||
dataset_or_grp.attrs[entry_name[1:]] = d_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do they need to be separated? What do we gain from this? If we have so many changes to the code, it would be good to understand the reasoning for why these changes are made.
"NXDL attribute `type` not found for group %s.\n" | ||
"Hint: Follow the convention `fixednameVARIADICPART[fixedname_given_name]` " | ||
"or `non_variadic_group` where `fixednameVARIADICPART` is a variadic name " | ||
"and `non_variadic_group` is a fixed name of the group defined in NXDL file " | ||
"Or no such group exists in NXDL file.", | ||
parent_path, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think that error message makes that much sense. The first case (i.e., not using fixednameVARIADICPART[fixedname_given_name]
for a concept that has a renameable part) is now fixed by #621, so I don't think we will get to this point if you don't use it properly. Thus, that part of the error message can be removed. I think if you rebase on top of master, such a problem will not occur here anymore.
The actual issue is if we try to attach a field for a group that doesn't exist, but for which the same name is used as a field. For example, using "/ENTRY/entry_identifier/identifier"
as a key. entry_identifier
is a field with XML attribute deprecated
in NXentry
, but with this notation we are using it as if it were a named group. Obviously, a field cannot have a sub-field itself, so that will lead to problems. We should instead raise an error that we are trying to write a group with a name that is reserved for a field instead.
This is something that should be caught by the validation instead of in the writer IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not allow giving a group some name that is defined as the name of an field/attribute on the same level in general. This may be technically allowed in NeXus, but will break the whole NOMAD integration (since groups are (sub)sections and fields are quantities).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can remove the error massage 👍, as this message mainly says the invalid
group or misdefined
group. This sort of wanrinig is part of the validation. Here, writer will write the nexus file, if anything is wrong with the writing process that should be handeled here. To check a group, a field or attribute has proper annotation is not part of the writer.
However, in a point I am a bit confused that is raising error wrting a group with the same name of attribute
or field
(which may come from mistyping, or just a regular error). In that case, group
would not get proper nx_class
name in such case NOMAD just will overlook that group in registering. Because, nomad can not match that group with any of the subsection in nomad metainfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not allow giving a group some name that is defined as the name of an field/attribute on the same level in general. This may be technically allowed in NeXus, but will break the whole NOMAD integration (since groups are (sub)sections and fields are quantities).
I agree, this creates the whole problem in the first place. We could consider to allow it to pass, but raise a strong and descriptive warning message.
With the current master, this still gets through validation with
WARNING: Field /ENTRY[entry]/entry_identifier/identifier written without documentation.
And no hint to the conflicting concept, and then breaks in the writer.
I would suggest to do three things:
- Add a check for conflicting concepts to the validation with clear and descriptive warning messages
- Catch any issue this might create in the writer (consider e.g. also a field of a name being defined, and the same name also as a group), also with clear and descriptive error messages
- Any other changes to the code that don't relate to this specific problem I suggest to move to a different PR. It's very hard to review this way, because most changes you made don't relate to the problem as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current master, you can actually provide something like:
"/ENTRY/identifier_entry": "@attrs:metadata/loader/scan_path",
"/ENTRY/COLLECTION[entry_identifier]": {
"identifier":"@attrs:metadata/loader/scan_path"
},
And it breaks writing, but in a different fashion:
WARNING: Given field name 'COLLECTION' conflicts with the non-variadic name 'entry_identifier (opt)'
WARNING: Field /ENTRY[entry]/COLLECTION[entry_identifier]/identifier written without documentation.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/writer.py:294, in Writer._put_data_into_hdf5(self)
293 else:
--> 294 dataset = grp.create_dataset(entry_name, data=data)
295 except InvalidDictProvided as exc:
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/.venv/lib/python3.10/site-packages/h5py/_hl/group.py:183, in Group.create_dataset(self, name, shape, dtype, data, **kwds)
181 group = self.require_group(parent_path)
--> 183 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
184 dset = dataset.Dataset(dsid)
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/.venv/lib/python3.10/site-packages/h5py/_hl/dataset.py:165, in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, dapl, efile_prefix, virtual_prefix, allow_unknown_filter, rdcc_nslots, rdcc_nbytes, rdcc_w0, fill_time)
163 sid = h5s.create_simple(shape, maxshape)
--> 165 dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl)
167 if (data is not None) and (not isinstance(data, Empty)):
File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()
File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()
File h5py/h5d.pyx:136, in h5py.h5d.create()
ValueError: Unable to synchronously create dataset (name already exists)
The above exception was the direct cause of the following exception:
OSError Traceback (most recent call last)
Cell In[17], line 9
2 res = sps.load_scan(
3 scan=scan,
4 metadata=metadata,
5 collect_metadata=True,
6 # iterations=[0, 3, 6],
7 )
8 # Save scan
----> 9 sps.save(f"Scan{scan}.nxs")
10 # Some plots
11 if len(res.dims) == 2:
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/specsanalyzer/src/specsscan/core.py:668, in SpecsScan.save(self, faddr, **kwds)
665 if "eln_data" in kwds:
666 input_files.append(kwds.pop("eln_data"))
--> 668 to_nexus(
669 data=self._result,
670 faddr=faddr,
671 reader=reader,
672 definition=definition,
673 input_files=input_files,
674 **kwds,
675 )
677 else:
678 raise NotImplementedError(
679 f"Unrecognized file format: {extension}.",
680 )
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/specsanalyzer/src/specsanalyzer/io.py:426, in to_nexus(data, faddr, reader, definition, input_files, **kwds)
423 else:
424 input_files = tuple(input_files)
--> 426 convert(
427 input_file=input_files,
428 objects=(data),
429 reader=reader,
430 nxdl=definition,
431 output=faddr,
432 **kwds,
433 )
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/convert.py:246, in convert(input_file, reader, nxdl, output, skip_verify, **kwargs)
236 data = transfer_data_into_template(
237 input_file=input_file,
238 reader=reader,
(...)
242 **kwargs,
243 )
245 helpers.add_default_root_attributes(data=data, filename=os.path.basename(output))
--> 246 Writer(data=data, nxdl_f_path=nxdl_f_path, output_path=output).write()
248 logger.info(f"The output file generated: {output}.")
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/writer.py:342, in Writer.write(self)
340 """Writes the NeXus file with previously validated data from the reader with NXDL attrs."""
341 try:
--> 342 self._put_data_into_hdf5()
343 finally:
344 self.output_nexus.close()
File /mnt/pcshare/users/Laurenz/AreaB/transformations_testing/pynxtools/src/pynxtools/dataconverter/writer.py:298, in Writer._put_data_into_hdf5(self)
296 print(str(exc))
297 except Exception as exc:
--> 298 raise IOError(
299 f"Unknown error occured writing the path: {path} "
300 f"with the following message: {str(exc)}"
301 ) from exc
303 for links in hdf5_links_for_later:
304 dataset = handle_dicts_entries(*links)
OSError: Unknown error occured writing the path: /ENTRY[entry]/entry_identifier with the following message: Unable to synchronously create dataset (name already exists)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WARNING: Given field name 'COLLECTION' conflicts with the non-variadic name 'entry_identifier (opt)'
This is already correctly throwing a warning (should be an error though), but it should probably tell you that one is a field (defined and deprcated in `NXentry´) and the other is a new group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not allow giving a group some name that is defined as the name of an field/attribute on the same level in general. This may be technically allowed in NeXus, but will break the whole NOMAD integration (since groups are (sub)sections and fields are quantities).
I agree, this creates the whole problem in the first place. We could consider to allow it to pass, but raise a strong and descriptive warning message. With the current master, this still gets through validation with
WARNING: Field /ENTRY[entry]/entry_identifier/identifier written without documentation.
And no hint to the conflicting concept, and then breaks in the writer.I would suggest to do three things:
- Add a check for conflicting concepts to the validation with clear and descriptive warning messages
- Catch any issue this might create in the writer (consider e.g. also a field of a name being defined, and the same name also as a group), also with clear and descriptive error messages
- Any other changes to the code that don't relate to this specific problem I suggest to move to a different PR. It's very hard to review this way, because most changes you made don't relate to the problem as far as I can tell.
Yeah, I can create two PRs referring to this PR.
I think both of the warnings are properly described the issue here.
E.g., the warning WARNING: Field /ENTRY[entry]/entry_identifier/identifier written without documentation
says the correct warning. Indeed, there is not such group entry_identifier
and even no such field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not allow this in the first place if there is also a field called entry_identifier
defined at the same level.
Handled in PR #638. |
This PR address issue #599
_put_data_into_hdf5
of write.py