Mixed Precision fixes and QOL #118

FlorianDeconinck · 2025-03-11T20:25:40Z

Mixed precision fixes:

Translate: read parameters with the correct type, get rid of workaround
Constants: CNST_OP20 is a true 64-bit

Quality of Life

Update gitignore to skip gt4py.next caches
MutiModalMetric now doesn't report the expected NaNs

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
New check tests, if applicable, are included

oelbert

I'm glad it works but is there no other way lol

oelbert · 2025-03-12T17:52:39Z

ndsl/stencils/testing/savepoint.py

@@ -16,7 +16,7 @@ def dataset_to_dict(ds: xr.Dataset) -> Dict[str, Union[np.ndarray, float, int]]:

 def _process_if_scalar(value: np.ndarray) -> Union[np.ndarray, float, int]:
    if len(value.shape) == 0:
-        return value.item()
+        return value.max()  # trick to make sure we get the right type back


Wow I hate this

I plucked this out of a 300+ comment on the numpy git of people trying to kill each other over the fact that item gives back a python type instead of a numpy type.

I stepped away from the fight with my max trick, never to open that door again.

The hero we need but not the one we deserve

* updating 4d handling * debug 4d test data * more iter * moving ser_to_nc here * updating datatype in translate test * typing works * fix dict, lint * remove empty line * change from 4d to Nd * Expose `k_start` and `k_end` automatically for any FrozenStencil * Fix k_start + utest * lint * Fix for 2d stencils * Add threshold overrides to the multimodal metric * Always report results, add summary with one liners * Remove "mmr" from the keys * README in testing * Better Latex (?) * Better Latex (?) * fixing a typo that breaks bools in translate tests (#80) * Fix summary filename * Fix report, filename * Fix choosing right absolute difference for F32 * Make robust for NaN value * Detect when array have different dimensions, if only one dimension, collapse Clean up type infer and log work * Lint * Add rank 0 to the data * Check data exists for rank, skip & print if not * Fix bad logic on skip test for parallel * Verbose exported names * Make boilerplate calls more nimble * New option: `which_savepoint` Better error on bad output data Fix missing integer type check * QOL for mypy/flak8 type hints * Add SECONDS_PER_DAY as a constants following mixed precision standards * Lint * Cleanups in dace orchestration Readability improvements in dace orchestration including - early returns - spelling out variable names - fixing typos * Rename program -> dace_program * Make sure all constants adhere to the floating point precision set by the system * Move `is_float` to `dsl.typing` * Move Quantity to sub-directory + breakout the subcomponent * Fix tests * Lint * Remove `cp.ndarray` since cupy is optional * Restore workaround for optional cupy * "GFS" -> "UFS" * Cupy trick for metadata * Add comments for constant explanation * Describe 64/32-bit FloatFields * Make sure the `make_storage_data` respects the array dtype. * Fix logic for MultiModal metric and verbose it * Added an MPI all_reduce for quantities based on SUM operation to communicator.py * linted * Add initial skeleton of pytest test for all reduce * Added assertion tests for 1, 2 and 3D quantities passed through mpi_allreduce_sum * Linted * Added pytest.mark to skip test if mpi4py isn't available * lint changes * Addressed PR comments and added additional CPU backends to unit test * Added setters for various Quantity properties to enable setting of Quantity metadata and data properties. * Added function in QuantityMetadata class that allows copying of Metadata properties from one class to another. Subsequent Quantity setters that performed the copying of QuantityMetadata properties were removed * Expose all SG metric terms in grid_data * Add `Allreduce` and all MPI OP * Update utest * Fix `local_comm` * Fix utest * Enforce `comm_abc.Comm` into Communicator * Fix `comm` object in serial utest * Lint + `MPIComm` on testing architecture * Make sure the correct allocator backend is used for Quantities * Add in_place option for Allreduce * Cleanup ndsl/dsl/dace/utils.py (#96) * Fix typos * DaCeProgress: avoid double assignment of prefix * Add type hints/simplify kernel_theoretical_timing Adding type hints allowed to simplify `kernel_theoretical_timing`. * Fix merge * Hotfix for grid generation use of mpi operators * Merge examples/mpi/.gitignore into top-level .gitignore * Remove hard-coded __version__ numbers Removes hard-coded version numbers from `__init__` files. * Fixing a bunch of typos * hotfix netcdf version for dockerfiles * Updated version number in setup.py to reflect new release, 2025.01.00 * Adding in exception for compute domains with sizes less than or equal to halo size (#103) * Adding in exception for compute domains with less than 4 points to vector_halo_update method * Updated exception in communicator to compare halo size to compute domain size * linting * Moved domain size checker to SubtileGridSizer class method from_tile_params * Fix passing down ak/bk for pressure coefficients when they are available from an outside source (online model case) (#107) * [QOL] Logging, Type Hints and Quantity helpers (#108) * Log on rank 0 Docstrings & typi hints on logger Stencil Config has a `verbose` option On verbose: FrozenStencil log when run (in GT backends) * Update `config` in orchestrate call to solve type hint inconcistencies * Quantity helper `to_netcdf` with multi rank support * Automatic Int precision and stencil regeneration change (#104) * Added feature to enable automatic detection of integer precision. Should remove the need for i32/i64 declaration (although their functionality is still retained) and replace both with the regular Int type * change default rebuild state to false for get_factories * Merged Float and Int precision detection functions into one common path * Re-added old function to fulfil a PACE dependency * updated docstring * Added ability to declare 32 or 64 bit IntFields, overrulling the system precision * Added one dimensional bool fields * Fix error message in typing.py Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> * output type for global_set_precision --------- Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> * Bump DaCe to v1.0.1 (#109) Our current DaCe version is some commit from September 2024. Meanwhile DaCe matured to v1 and recently release v1.0.1. This brings the DaCe submodule to the latest stable release version. Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> * Streamline linting workflow (#110) Linting should give fast feedback. The current workflow takes ~3mins where most of the time is spent installing (unnecessary) python packages. To run `pre-commit`, we only need the source files and `pre-commit` itself, which can be installed standalone. This brings runtime of the linting stage down to ~30 seconds. Other changes - update checkout action to v4 - update python setup action to v5 - change python version from 3.11.7 to 3.11 (any patch number will do) This is a follow-up of PR NOAA-GFDL/PyFV3#40 in PyFV3. Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> * [FIX] Type hint for precision dependant Float, Int (#111) * Fix the type hint of Float, Int * Attempt using TypeAlias * Feature: Adding documentation (#97) * Added doc files * Adding image files to docs * Linting * Updated docs to reflect changes requested in PR 97 * Linting --------- Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> * [Translate test] Save better reports & netCDF for multiple ranks on failure (#106) * Save reports & netCDF for multiple ranks on failure Fix multi modal threshold for parallel tests * Order field by name in NetCDF * Print all indices in logs. Sort by descernding ULP * Allow sorting by metrics and index with `--sort_report` option * Remove the `rank` froom SavepointCase. Access is done via `grid` * Some docstrings * Adds some quick capacities used in the post-radiation phase of the physics, including the Stefan-Boltzmann constant (#116) * add namelist option * add stephan boltzmann constant * lint * Apply suggestions from code review Change comments to docstring style Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> --------- Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> * Adding temperature of h2o triple point (#115) * add ttp * Update ndsl/constants.py Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> * switch comments to docstrings for autodocs * lint --------- Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> * [Feature] Porting workflow: enhancing errors readability (#114) * Save all fields (pass and fail) and organize them by field * Option `--no_report` to bypass logging & netcdf save Move logs per variable into a `details` subfolder * Order variable name in serialbox-to-netcdf * `extra_data_load` function to load savepoint data saved outside the canonical savepoint * Docs / Type Hint * Fixed typo in error statment --------- Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com> * Feature: NetCDF output precision configurable (#117) * Removed hard-code of np.float32 from NetCDFMonitor transfer_type, replaced with Float type * Added multiple options for NetCDF precision * Added checking for use of 32 precision and float64 output * Using NumPy type instead of string in NetCDFMonitor precision variable * Added warning to netcdf_monitor.py for mismatch in precision settings * Forgot f-string in warn message of netcdf_monitor * Mixed Precision fixes and QOL (#118) * Ignore `.next` caches * CNST_OP20 is a true 64-bit * Translate: Fix reading parameters with the right precision * Multimodal metric: Skip reporting on expected values * Bad commit * Add license (Apache 2.0) (#105) Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> * Change deprecated `np.product()` to `np.prod()` (#120) Starting with numpy v1.25.0, `np.product()` is deprecated and `np.prod()` should be used instead. Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> * Update GT4Py and DaCe to bring in refactored GT4Py/DaCe bridge that exposes control flow (#119) * Update DaCe to v1.0.2 DaCe v1.0.2 brings two fixes for DaCe transformations: one for DeadDataflowElimination and one for StateFusion. * Bump gt4py to include refactored gt4py/dace bridge * Test with modified pace pipeline - added this to re-trigger the new pace pipeline after limiting zarr to not install v3 (for now) because of breaking API changes. - added this note to re-trigger after fixing the pace pipeline to not pull requirements from `develop`. - added this note to ret-trigger after fixing the repo name * Revert "Test with modified pace pipeline" This reverts commit cd6560e. --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> * Grid Mixed Precision and Coriolis force load (+ QOL) (#121) * Pass `dtype` down in allocator utils (gt4py_utils) * Allow coriolis forces to be read in * Edge factors are always 64-bit * Quantity QOL * Make sure to pass `dtype` to load the grid cleanly * Translate grid: load coriolis forces, area 64 is 64-bit * Bad merge * Typo * GEOS version of dz_min (#122) * Doc enhancment (#123) **Description** Port and adaptation of the initial commit of the documentation. Fixes issue #113 **Checklist:** - [X] I have performed a self-review of my own code - [X] I have made corresponding changes to the documentation - [X] My changes generate no new warnings * Fix saving NetCDF for parallel translate test (#125) * Release candidate 2025.03.00 (#124) Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> * Fix for bad merge of 7fdfa5 (#129) --------- Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com> Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com> Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov> Co-authored-by: Roman Cattaneo <> Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov> Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com> Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com> Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com> Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

FlorianDeconinck added 4 commits March 11, 2025 16:21

Ignore .next caches

3635fbe

CNST_OP20 is a true 64-bit

54f0b78

Translate: Fix reading parameters with the right precision

157ab1b

Multimodal metric: Skip reporting on expected values

9117bab

FlorianDeconinck requested review from oelbert and fmalatino and removed request for oelbert March 11, 2025 20:25

Bad commit

a887510

FlorianDeconinck requested a review from twicki March 12, 2025 14:02

oelbert approved these changes Mar 12, 2025

View reviewed changes

FlorianDeconinck merged commit 08ad037 into NOAA-GFDL:develop Mar 12, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mixed Precision fixes and QOL #118

Mixed Precision fixes and QOL #118

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mixed Precision fixes and QOL #118

Mixed Precision fixes and QOL #118

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!