8000 MPES reader changes from mpes refactoring by domna · Pull Request #203 · FAIRmat-NFDI/pynxtools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

MPES reader cha 10000 nges from mpes refactoring #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jan 22, 2024
Merged

MPES reader changes from mpes refactoring #203

merged 42 commits into from
Jan 22, 2024

Conversation

domna
Copy link
Collaborator
@domna domna commented Jan 8, 2024

This

Fixes #210, Fixes #212

@domna domna changed the title MPES reader changes from the mpes refactoring MPES reader changes from mpes refactoring Jan 8, 2024
@domna domna marked this pull request as draft January 16, 2024 10:18
@domna
Copy link
Collaborator Author
domna commented Jan 17, 2024

Hey @rettigl, if you like you can already review the config_file.json. This is finished. I still need to update the regression tests and there are still some bugs in pynxtools (#210, #212) , which we need to fix before all tests pass.

I introduced a new feature that the config file can be written in a nested fashion, because I thought this is much easier to read and maintain (and we could easier show an "instance" of data at the workshop). I also added support for *{a,b,c} syntax, which basically replaces all * by the list in brackets, here a, b and c. You can see an example for the lenses, where it just expands for each of the lens labels. I checked the function such that it reproduces the old config, so we should be good. Links are now written as @link:<target> instead of a dict, as this is also more consistent with the file referencing. Except for the new link syntax everything should be compatible to the old flat json notation.

@domna domna requested a review from rettigl January 17, 2024 17:14
@coveralls
Copy link
coveralls commented Jan 18, 2024

Pull Request Test Coverage Report for Build 7585289570

  • 58 of 60 (96.67%) changed or added relevant lines in 6 files are covered.
  • 428 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-3.7%) to 40.812%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pynxtools/dataconverter/readers/utils.py 33 35 94.29%
Files with Coverage Reduction New Missed Lines %
pynxtools/dataconverter/readers/mpes/reader.py 3 83.33%
pynxtools/dataconverter/readers/xps/reader_utils.py 10 25.32%
pynxtools/dataconverter/readers/xps/file_parser.py 18 36.21%
pynxtools/dataconverter/readers/xps/reader.py 155 15.71%
pynxtools/dataconverter/readers/xps/xml/xml_specs.py 242 7.83%
Totals Coverage Status
Change from base Build 7567202073: -3.7%
Covered Lines: 4471
Relevant Lines: 10955

💛 - Coveralls

Copy link
Collaborator
@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, this looks quite good to me. However, there are a lot of "NOT IN SCHEMA" entries in the log files, some of them new, some of them also old. I think this should be carefully checked if this is supposed to be like this...

If I run the conversion in the tutorials E1 notebook, with definitions on "fairmat" branch, I get the following errors:

The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/energy_scan_mode is required and hasn't been supplied by the reader.
[info]: Path /ENTRY[entry]/PROCESS[process]/energy_calibration/coefficients not found. Skipping the entry.
The required group, /ENTRY[entry]/SAMPLE[sample]/bias/potentiostat, hasn't been supplied while its optional parent, /ENTRY[entry]/SAMPLE[sample]/bias, is supplied.

Still, the file is being created.
If I run with "mpes-fixes" branch, I get:

The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/energy_scan_mode is required and hasn't been supplied by the reader.
[info]: Path /ENTRY[entry]/PROCESS[process]/energy_calibration/coefficients not found. Skipping the entry.

@rettigl
Copy link
Collaborator
rettigl commented Jan 18, 2024

Conversion in E2 fails with:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[36], [line 1](vscode-notebook-cell:?execution_count=36&line=1)
----> [1](vscode-notebook-cell:?execution_count=36&line=1) convert(input_file=["config_file.json", "WSe2_eln.yaml"],
      [2](vscode-notebook-cell:?execution_count=36&line=2)         objects=res_xarray,
      [3](vscode-notebook-cell:?execution_count=36&line=3)         reader='mpes',
      [4](vscode-notebook-cell:?execution_count=36&line=4)         nxdl='NXmpes',
      [5](vscode-notebook-cell:?execution_count=36&line=5)         output='WSe2.mpes.nxs')

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:269, in convert(input_file, reader, nxdl, output, generate_template, fair, undocumented, **kwargs)
    [264](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:264)         logger.info(
    [265](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:265)             f"NO DOCUMENTATION: The path, {path}, is being written but has no documentation."
    [266](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:266)         )
    [268](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:268) helpers.add_default_root_attributes(data=data, filename=os.path.basename(output))
--> [269](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:269) Writer(data=data, nxdl_f_path=nxdl_f_path, output_path=output).write()
    [271](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:271) logger.info(f"The output file generated: {output}.")

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:339, in Writer.write(self)
    [337](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:337) """Writes the NeXus file with previously validated data from the reader with NXDL attrs."""
    [338](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:338) try:
--> [339](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:339)     self._put_data_into_hdf5()
    [340](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:340) finally:
    [341](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:341)     self.output_nexus.close()

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:302, in Writer._put_data_into_hdf5(self)
    [296](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:296)         raise IOError(
    [297](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:297)             f"Unknown error occured writing the path: {path} "
    [298](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:298)             f"with the following message: {str(exc)}"
    [299](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:299)         ) from exc
    [301](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:301) for links in hdf5_links_for_later:
--> [302](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:302)     dataset = handle_dicts_entries(*links)
    [303](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:303)     if dataset is None:
    [304](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:304)         # If target of a link is invalid to be linked
    [305](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:305)         del self.data[links[-1]]

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:146, in handle_dicts_entries(data, grp, entry_name, output_path, path)
    [144](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:144) elif "link" in data.keys():
    [145](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:145)     if ":/" not in data["link"]:
--> [146](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:146)         grp[entry_name] = h5py.SoftLink(path)  # internal link
    [147](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:147)     else:
    [148](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:148)         grp[entry_name] = h5py.ExternalLink(file, path)  # external link

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:929, in Dataset.__setitem__(self, args, val)
    [925](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:925) else:
    [926](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:926)     # If the input data is already an array, let HDF5 do the conversion.
    [927](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:927)     # If it's a list or similar, don't make numpy guess a dtype for it.
    [928](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:928)     dt = None if isinstance(val, numpy.ndarray) else self.dtype.base
--> [929](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:929)     val = numpy.asarray(val, order='C', dtype=dt)
    [931](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:931) # Check for array dtype compatibility and convert
    [932](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:932) if self.dtype.subdtype is not None:

TypeError: float() argument must be a string or a number, not 'SoftLink'

@rettigl
Copy link
Collaborator
rettigl commented Jan 18, 2024

This happens at
grafik
because of
grafik

I think we should add a test that the writer does not try to create a link if the link target does not exist...

@domna
Copy link
Collaborator Author
domna commented Jan 19, 2024

Generally, this looks quite good to me. However, there are a lot of "NOT IN SCHEMA" entries in the log files, some of them new, some of them also old. I think this should be carefully checked if this is supposed to be like this...

If I run the conversion in the tutorials E1 notebook, with definitions on "fairmat" branch, I get the following errors:

The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/energy_scan_mode is required and hasn't been supplied by the reader.
[info]: Path /ENTRY[entry]/PROCESS[process]/energy_calibration/coefficients not found. Skipping the entry.
The required group, /ENTRY[entry]/SAMPLE[sample]/bias/potentiostat, hasn't been supplied while its optional parent, /ENTRY[entry]/SAMPLE[sample]/bias, is supplied.

This is expected as for fairmat the config file does not match.

Still, the file is being created. If I run with "mpes-fixes" branch, I get:

The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/energy_scan_mode is required and hasn't been supplied by the reader.
[info]: Path /ENTRY[entry]/PROCESS[process]/energy_calibration/coefficients not found. Skipping the entry.

This is because it is named differently in metadata in the example file than the file we have in pynxtools. Changing this value "@attrs:metadata/energy_correction/calibration/coeffs" to this one "@attrs:metadata/energy_correction/calibration/coefficients" fixes it (L376 in config_file.json). The error about the energy_scan_mode is odd, because it should be set to optional in mpes-fixes and I also don't get this error. Can you try this one again?
Here is the associated PR with updated example files.

Sorry, I also should've mentioned that I did not yet test the examples with this. But I think E1 should be good now and I will check the error for the E2 as well.

@domna
Copy link
Collaborator Author
domna commented Jan 19, 2024

I think we should add a test that the writer does not try to create a link if the link target does not exist...

We check for that and abort if the target does not exist:

logger.warning("No path '%s' available to be linked.", path)

The problem in E2 is that it is trying to write an empty field, I think.
Does it work if you set the metadata manually w/o gathering it from the archive?

@rettigl
Copy link
Collaborator
rettigl commented Jan 19, 2024

The problem in E2 is that it is trying to write an empty field, I think.
Does it work if you set the metadata manually w/o gathering it from the archive?

I can try that. It's probably in the template at that point, but None. So this check should be included (that's what I meant).

@domna
Copy link
Collaborator Author
domna commented Jan 19, 2024

The problem in E2 is that it is trying to write an empty field, I think.
Does it work if you set the metadata manually w/o gathering it from the archive?

I can try that. It's probably in the template at that point, but None. So this check should be included (that's what I meant).

I already checked it. It also has problems when the metadata is set manually and I think it's because there are some additional keys in the associated eln file. I'm working on it rn.

Yes, with the latest changes everything is working for E2 :)

@rettigl please feel free to give it another check but it should be good to go then.

Copy link
Collaborator
@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems to be working now, also with the examples.

@domna domna merged commit 2831461 into master Jan 22, 2024
@domna domna deleted the mpes-reader-update branch January 22, 2024 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants
0