Releases: ROCm/rocThrust
Releases · ROCm/rocThrust
rocThrust 3.3.0 for ROCm 6.4.1
rocThrust code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.
rocThrust 3.3.0 for ROCm 6.4.0
Added
- Added a section to install Thread Building Block (TBB) inside
cmake/Dependencies.cmake
if TBB is not already available. - Made Thread Building Block (TBB) an optional dependency with the new
BUILD_HIPSTDPAR_TEST_WITH_TBB
flag, default isOFF
. When the flag isOFF
and TBB is not already on the machine it will compile without TBB. Otherwise is will compile it with TBB. - Added extended tests to
rtest.py
. These tests are extra tests that did not fit the criteria of smoke and regression tests. These tests will take much longer to run relative to smoke and regression tests. Usepython rtest.py [--emulation|-e|--test|-t]=extended
to run these tests. - Added regression tests to
rtest.py
. These tests recreate scenarios that have caused hardware problems in past emulation environments. Usepython rtest.py [--emulation|-e|--test|-t]=regression
to run these tests. - Added smoke test options, which runs a subset of the unit tests and ensures that less than 2gb of VRAM will be used. Use
python rtest.py [--emulation|-e|--test|-t]=smoke
to run these tests. - Added
--emulation
option forrtest.py
- Merged changes from upstream CCCL/thrust 2.4.0
- Merged changes from upstream CCCL/thrust 2.5.0
- Added
find_first_of
to HIPSTDPAR - Added
search
andfind_end
to HIPSTDPAR - Added
search_n
to HIPSTDPAR - Updated HIPSTDPAR's
adjacent_find
to use rocPRIM's implementation
Changed
- Changed the C++ version from 14 to 17. C++14 will be deprecated in the next major release.
--test|-t
is no longer a required flag forrtest.py
. Instead, the user can use either--emulation|-e
or--test|-t
, but not both.- Split the contents of HIPSTDPAR's forwarding header into several implementation headers.
- Fixed
copy_if
to work with large data types (512 bytes)
Known Issues
thrust::inclusive_scan_by_key
might produce incorrect results when it's used with -O2 or -O3 optimization.
- The error is caused by a recent compiler change. There is a fix available that will be released at a later date.
rocThrust 3.2.0 for ROCm 6.3.3
rocThrust code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.
rocThrust 3.2.0 for ROCm 6.3.2
rocThrust code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.
rocThrust 3.2.0 for ROCm 6.3.1
rocThrust code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.
rocThrust 3.2.0 for ROCm 6.3.0
Added
- Merged changes from upstream CCCL/thrust 2.3.2
- Only the NVIDIA backend uses
tuple
andpair
types from libcu++, other backends continue to
use the original Thrust implementations and hence do not require libcu++ (CCCL) as a dependency.
- Only the NVIDIA backend uses
- Added the
thrust::hip::par_det
execution policy to enable bitwise reproducibility on algorithms that are not bitwise reproducible by default.
Changed
- Updated the default value for the
-a
argument fromrmake.py
togfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201
. - Enabled the upstream (thrust) test suite for execution by default. It can still be disabled by CMake option
-DENABLE_UPSTREAM_TESTS=OFF
.
Resolved issues
- Fixed an issue in
rmake.py
where the list storing cmake options would contain individual characters instead of a full string of options. - Fixed the HIP backend not passing
TestCopyIfNonTrivial
from the upstream (thrust) test suite. - Fixed tests failing when compiled with
-D_GLIBCXX_ASSERTIONS=ON
.
rocThrust 3.1.1 for ROCm 6.2.4
Added
- gfx1151 Support
rocThrust 3.1.0 for ROCm 6.2.2
rocThrust code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.
rocThrust 3.1.0 for ROCm 6.2.1
rocThrust code for ROCm 6.2.1 did not change. The library was rebuilt for the updated ROCm 6.2.1 stack.
rocThrust 3.1.0 for ROCm 6.2.0
Additions
- Merged changes from upstream CCCL/thrust 2.2.0
- Updated the contents of
system/hip
andtest
with the upstream changes tosystem/cuda
andtesting
- Updated the contents of
Changes
- Updated internal calls to
rocprim::detail::invoke_result
to use the public APIrocprim::invoke_result
. - Use
rocprim::device_adjacent_difference
foradjacent_difference
API call. - Updated internal use of custom iterator in
thrust::detail::unique_by_key
to use rocPRIM'srocprim::unique_by_key
. - Updated
adjecent_difference
to make use ofrocprim:adjecent_difference
when iterators are comparable and not equal otherwise userocprim:adjacent_difference_inplace
.
Known issues
thrust::reduce_by_key
outputs are not bit-wise reproducible, as run-to-run results for pseudo-associative reduction operators (e.g. floating-point arithmetic operators) are not deterministic on the same device.- Note that currently, rocThrust memory allocation is performed in such a way that most algorithmic API functions cannot be called from within hipGraphs.