8000 Improved Full Detector Response storage and conversion by jdbuhler · Pull Request #364 · cositools/cosipy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Improved Full Detector Response storage and conversion #364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: develop
Choose a base branch
from

Conversation

jdbuhler
Copy link
Contributor

This PR overhauls the on-disk and in-memory format of the FullDetectorResponse. The principal goals are to

  • reduce on-disk storage size via HDF5 internal compression, without incurring an unacceptable running time penalty
  • reduce rsp->h5 conversion time and memory usage
  • allow h5->rsp conversion without data loss

The biggest change is that the raw CDS ring counts are stored as small integers on disk, separately from the effective area multipliers. This change greatly improves compressibility and reduces peak memory during rsp->h5 conversion. As a bonus, it allows more efficient weighted averaging of slices at runtime and dynamic selection of 32- vs 64-bit PSRs.

Conversion has been separated from the FullDetectorResponse class into its own RspConverter class, which handles both rsp->h5 and h5->rsp.

After consultation with Israel et al., these changes remove support for the following deprecated formats and features:

  • sparse responses
  • miniDC2 format
  • reading the spectrum for effective area computation from a file (was already broken in DC3 codebase)

The new on-disk response format is not backwards or forwards compatible with the previous format. For this reason, I've established a new directory COSI-SMEX/develop in the public cosipy bucket on wasabi, put the new response files there, and updated the tutorials to use them instead of the ones from DC2/DC3. It is anticipated that DC4 will use the new format, while DC3 will remain with the current one.

The patch introduces a dependency on the hdf5plugin package, a well-supported companion to h5py that implements the BitShuffle compression algorithm used by the new response format.

Supporting performance numbers

Conversion time and memory

DC2 O3 continuum response: < 4 minutes, 4 GB
DC3 O3 polarization response: < 6 minutes, 6.33 GB
O4 version of continuum response: < 90 minutes, 52.7 GB

Testing was on a 2.3 GHz Intel Xeon Gold 5118 server with 192 GB of RAM and SSD storage; only one core was used.

(The first two used to take hours, while the last took overnight and needed at least half a terabyte of RAM)

HDF5 response file sizes on disk

DC2 O3 continuum response: 592.76 MB (.rsp.gz is 841.58 MB)
DC3 O3 polarization response: 795.76 MB (.rsp.gz is 954.58 MB )
O4 version of continuum response: 4187.05 MB (.rsp.gz is 5186.87 MB)

These file sizes are < 1/10 the size of the previous uncompressed HDF5 responses for O3, and more like 1/50 for O4
response, assuming (as was the case for the file I received) the latter is stored in float32 precision.

Read times

The following are average times to read one slice from the NuLambda axis (i.e., rsp[i]). Testing was on the same machine used for the conversion tests.

DC2 O3 continuum response: 4.5 ms (old), 6.4 ms (new)
DC3 O3 polarization response: 8.8 ms (old), 11 ms (new)
O4 version of continuum response: 350 ms (old), 46 ms (new)

"Old" is the previous response format, while "new" is the format in this PR. Note that for large enough responses, such as the O4 continuum response, it's faster to read less data from disk and decompress than to read the data uncompressed.

By way of comparison, compressing the DC2 O3 continuum response with HDF5's gzip internal compression yielded a substantially larger file (889.77 MB) and read times of 21 ms, > 3x longer than with the BitShuffle compression used in this patch.

Jeremy Buhler and others added 29 commits May 16, 2025 15:57
  into a new RspConverter class
* change on-disk response format to split out counts from
  effective area, and use low-overhead compression on counts
* allow counts to be stored in integer types smaller than 32 bits;
  provide auto-detection of the best size as an option
* Rework .rsp conversion code to use less memory
* Remove outdated support for sparse .rsp and broken code for
  file-based normalization
* Enable axes of response to be read and written via the Axes object
  instead of replicating its functionality.  Keep textual descriptions
  of axes in a separate group for pretty-printing.
* Allow each Healpix axis to have its own nside; all-sky axes use
  nside=1
* capture and store header fields other than the axes and the
  size of counts in the .h5 file for future reference
tables stored at class scope.
* Filter axes to be used for output HDF5 file as they
are being read from .rsp, rather than arbitrarily
removing the last couple of axes afterwards.
…deconvolution

* make default for response construction to infer the element size from the
  data, rather than guess a too-large size
  the content of the response
* fix __array__ method of FullDetectorResponse
* use new test full detector and polarization responses, and update
  test suite to expect the outputs that occur when they are used
  (work in progress)
* remove backup file that snuck into docs
* add conversion to Histogram to FullDetectorResponse API, rather
  than open-coding it in image_deconvolution/dataIF
not relevant to the output being tested and occasionally causes crashes
…sponse

* add unit tests for response conversion between .rsp and .h5
* fix typo in FullDetectorResponse.to_histogram()
* to further avoid confusion between response versions, rename
  CONTENTS dataset to COUNTS, since it is now raw counts
* FullDetectorResponse.open() should support only HDF5, as we cannot
  use an .rsp file live, it is not reasonable to convert a usefully
  large (o3 or above) .rsp.gz file "on the fly", and we have a
  separate converter class now.
…nverting the same

.rsp to .h5 twice yields identical byte streams.
* Do order tracking of headers ourselves instead of relying on HDF5 to do it.
…er than "DC2/DC3",

  with new checksums.  Shorten the names while we're at it.
* Remove vestiges of miniDC2 from image deconvolution notebooks
* make sure setup for cosipy installs hdf5plugin package
existence of COSI-SMEX/develop tree on wasabi
* update .h5s for test full detector responses to newest on-disk format
eff_area, which controls the type returned by __getitem__, etc.
This lets us have a float32 response instead of float64 if desired.
Empirically, we get slightly better compression and notably faster
decompression without the extra shuffle
entire response in memory in order to do F-to-C order transposition.
Instead, we do the transposition one chunk at a time, which roughly
halves memory usage for large responses.
not a general Histogram.  Rename the relevant function from to_histogram()
to to_dr() and update its return type accordingly
* Add get_pixel() method to implement __getitem__,
  but with the possibility of specifying a weight that
  we can apply to eff_area rather than to the much larger
  counts matrix for greater efficiency.  Use this method
  in get_psr() and friends.
  with new "quiet" option to RspConverter class
- response test cases now use get_pixel() rather than __getitem__
  for most tests
and use it in RspConverter when converting back to .rsp
Copy link
codecov bot commented May 28, 2025

Codecov Report

Attention: Patch coverage is 87.16049% with 52 lines in your changes missing coverage. Please review.

Project coverage is 83.50%. Comparing base (0570719) to head (b9a72f8).

Files with missing lines Patch % Lines
cosipy/response/RspConverter.py 83.93% 49 Missing ⚠️
cosipy/response/FullDetectorResponse.py 97.75% 2 Missing ⚠️
cosipy/response/DetectorResponse.py 50.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
cosipy/image_deconvolution/dataIF_COSI_DC2.py 92.10% <100.00%> (+0.04%) ⬆️
cosipy/response/__init__.py 100.00% <100.00%> (ø)
cosipy/response/DetectorResponse.py 90.74% <50.00%> (ø)
cosipy/response/FullDetectorResponse.py 90.68% <97.75%> (+34.27%) ⬆️
cosipy/response/RspConverter.py 83.93% <83.93%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0