-
Notifications
You must be signed in to change notification settings - Fork 74.7k
g3doc format #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The g3doc used in Tensorflow is, I believe, identical to Markdown. If you find anything that doesn't meet the markdown specification, file an issue. |
I mean the comments in the C++ that are used to generate the markdown files. Example:
is used to generate
and
My point is that the docs were pre-generated. I can't find another project where Google has open sourced the g3doc utility so that others may rerun it and update the documentation. If the answer is "the syntax is just like doxygen", then that's an acceptable answer, although it would still be nice to be given the same tool to generate the markdown. |
The syntax is, in fact, doxygen: part of our docs generation pipeline calls doxygen. We are working on making more of the doc generator publicly available, but it requires a decent amount of untangling. |
Fixed Linear Regression example in README.md.
Fix typos and broken link
Is there a blog post or guide out for this yet? |
Check out the documentation authoring guide: https://www.tensorflow.org/community/documentation |
…antBuffers Fix/bconv2d update to allocat persistant buffers
Sync with upstream master
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () #7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () #8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () #9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () #10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () #11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () tensorflow#7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () tensorflow#8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () tensorflow#9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () tensorflow#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () tensorflow#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 tensorflow#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```
… to Initialize() Imported from GitHub PR openxla/xla#12228 The first time that a NormThunk is executed, it will build a cudnn execution plan. This build step can hang if a NCCL collective is running at the same time. To fix this, I've moved the build step to take place during thunk initialization. We only observe this hang when using cudnn 9. Here's a backtrace from the hang that will be fixed: ``` Thread 585 (Thread 0x7fb9391ff640 (LWP 41364) "main.py"): #0 0x00007fd3d17cffd9 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fd3d17da24f in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fd070967dfe in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1 #3 0x00007fd0709c928a in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1 #4 0x00007f1970d76102 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #5 0x00007f1970f2c999 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #6 0x00007f1970a7d4ab in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #7 0x00007f1970d0a9cb in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #8 0x00007fce60b2a98c in cudnn::backend::ExecutionPlan::finalize_internal() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #9 0x00007fce60aefbb1 in cudnn::backend::Descriptor::finalize() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #10 0x00007fce60b15bec in cudnnBackendFinalize () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #11 0x00007fd2521b8f39 in cudnn_frontend::ExecutionPlanBuilder_v8::build() () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #12 0x00007fd2521734ba in stream_executor::gpu::(anonymous namespace)::GetExecPlanFromHeuristics(cudnn_frontend::OperationGraph_v8&&, stream_executor::gpu::(anonymous namespace)::CudnnHandle const&, bool) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #13 0x00007fd25216ff9b in stream_executor::gpu::CudnnSupport::NormRunnerFromDesc(stream_executor::Stream*, stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormKind, double, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #14 0x00007fd24e36b88b in stream_executor::dnn::NormOp::RunnerFromAlgorithmDesc(stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #15 0x00007fd24e36ae37 in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}::operator()() const () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #16 0x00007fd24e36adbc in void absl::lts_20230802::base_internal::CallOnceImpl<stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}>(std::atomic<unsigned int>*, absl::lts_20230802::base_internal::SchedulingMode, stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}&&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #17 0x00007fd24e36a9bd in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #18 0x00007fd24e369d29 in xla::gpu::RunGpuNorm(xla::gpu::GpuNormConfig const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, stream_executor::DeviceMemoryBase const&, stream_executor::Stream*, xla::gpu::RunNormOptions) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #19 0x00007fd24e368be6 in xla::gpu::NormThunk::ExecuteOnStream(xla::gpu::Thunk::ExecuteParams const&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so ``` Copybara import of the project: -- f53533087ba1ddcf65ad7cc6268ee89de4690d15 by Trevor Morris <tmorris@nvidia.com>: Fix hang with cudnn layer norm by moving cudnn init to Initialize() Merging this change closes #12228 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12228 from trevor-m:tmorris-norm-init f53533087ba1ddcf65ad7cc6268ee89de4690d15 PiperOrigin-RevId: 633207155
… to Initialize() Imported from GitHub PR openxla/xla#12228 The first time that a NormThunk is executed, it will build a cudnn execution plan. This build step can hang if a NCCL collective is running at the same time. To fix this, I've moved the build step to take place during thunk initialization. We only observe this hang when using cudnn 9. Here's a backtrace from the hang that will be fixed: ``` Thread 585 (Thread 0x7fb9391ff640 (LWP 41364) "main.py"): #0 0x00007fd3d17cffd9 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fd3d17da24f in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fd070967dfe in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1 #3 0x00007fd0709c928a in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1 #4 0x00007f1970d76102 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #5 0x00007f1970f2c999 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #6 0x00007f1970a7d4ab in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #7 0x00007f1970d0a9cb in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #8 0x00007fce60b2a98c in cudnn::backend::ExecutionPlan::finalize_internal() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #9 0x00007fce60aefbb1 in cudnn::backend::Descriptor::finalize() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #10 0x00007fce60b15bec in cudnnBackendFinalize () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #11 0x00007fd2521b8f39 in cudnn_frontend::ExecutionPlanBuilder_v8::build() () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #12 0x00007fd2521734ba in stream_executor::gpu::(anonymous namespace)::GetExecPlanFromHeuristics(cudnn_frontend::OperationGraph_v8&&, stream_executor::gpu::(anonymous namespace)::CudnnHandle const&, bool) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #13 0x00007fd25216ff9b in stream_executor::gpu::CudnnSupport::NormRunnerFromDesc(stream_executor::Stream*, stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormKind, double, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #14 0x00007fd24e36b88b in stream_executor::dnn::NormOp::RunnerFromAlgorithmDesc(stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #15 0x00007fd24e36ae37 in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}::operator()() const () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #16 0x00007fd24e36adbc in void absl::lts_20230802::base_internal::CallOnceImpl<stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}>(std::atomic<unsigned int>*, absl::lts_20230802::base_internal::SchedulingMode, stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}&&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #17 0x00007fd24e36a9bd in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #18 0x00007fd24e369d29 in xla::gpu::RunGpuNorm(xla::gpu::GpuNormConfig const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, stream_executor::DeviceMemoryBase const&, stream_executor::Stream*, xla::gpu::RunNormOptions) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #19 0x00007fd24e368be6 in xla::gpu::NormThunk::ExecuteOnStream(xla::gpu::Thunk::ExecuteParams const&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so ``` Copybara import of the project: -- f53533087ba1ddcf65ad7cc6268ee89de4690d15 by Trevor Morris <tmorris@nvidia.com>: Fix hang with cudnn layer norm by moving cudnn init to Initialize() Merging this change closes #12228 PiperOrigin-RevId: 633220207
…r RunBackend. Both of these call into LLVM code that reads the compiler options. Fixes the following race: ``` WARNING: ThreadSanitizer: data race (pid=869815) Read of size 1 at 0x7f8b24effc08 by thread T65: #0 llvm::cl::opt_storage<bool, false, false>::getValue() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1406:38 (xla_extension.so+0xa281417) (BuildId: 7f5d2098f168c4db) #1 llvm::cl::opt_storage<bool, false, false>::operator bool() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1410:38 (xla_extension.so+0xa281417) #2 llvm::CodeGenTargetMachineImpl::CodeGenTargetMachineImpl(llvm::Target const&, llvm::StringRef, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOptLevel) /proc/self/cwd/external/llvm-project/llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp:97:7 (xla_extension.so+0xa281417) #3 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/lib/Target/X86/X86TargetMachine.cpp:236:7 (xla_extension.so+0x9803b80) (BuildId: 7f5d2098f168c4db) #4 llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1258:16 (xla_extension.so+0x980757a) (BuildId: 7f5d2098f168c4db) #5 llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) const /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:462:12 (xla_extension.so+0x94ba529) (BuildId: 7f5d2098f168c4db) #6 llvm::EngineBuilder::selectTarget(llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&) /proc/self/cwd/external/llvm-project/llvm/lib/ExecutionEngine/TargetSelect.cpp:88:18 (xla_extension.so+0x94ba529) #7 xla::cpu::JitCompiler::InferTargetMachine(llvm::TargetOptions const&, llvm::CodeGenOptLevel, std::optional<tsl::port::CPUFeature>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:88:12 (xla_extension.so+0x48d070f) (BuildId: 7f5d2098f168c4db) #8 xla::cpu::CpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1017:3 (xla_extension.so+0x2f6dc47) (BuildId: 7f5d2098f168c4db) #9 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:749:3 (xla_extension.so+0x2f127e2) (BuildId: 7f5d2098f168c4db) #10 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f127e2) #11 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #13 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #15 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) Previous write of size 1 at 0x7f8b24effc08 by thread T66 (mutexes: write M0): #0 void llvm::cl::opt_storage<bool, false, false>::setValue<bool>(bool const&, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401:11 (xla_extension.so+0x100bace9) (BuildId: 7f5d2098f168c4db) #1 void llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefaultImpl<bool, void>() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h (xla_extension.so+0x100bace9) #2 llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefault() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1474:32 (xla_extension.so+0x100bace9) #3 llvm::cl::Option::reset() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:460:3 (xla_extension.so+0x100cac0e) (BuildId: 7f5d2098f168c4db) #4 (anonymous namespace)::CommandLineParser::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:1478:17 (xla_extension.so+0x100cac0e) #5 llvm::cl::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:2831:17 (xla_extension.so+0x100caa72) (BuildId: 7f5d2098f168c4db) #6 xla::llvm_ir::LLVMCommandLineOptionsLock::LLVMCommandLineOptionsLock(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&) /proc/self/cwd/external/xla/xla/service/llvm_ir/llvm_command_line_options.cc:70:5 (xla_extension.so+0x91d69f4) (BuildId: 7f5d2098f168c4db) #7 xla::cpu::CpuCompiler::RunBackend(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1727:39 (xla_extension.so+0x2f781c8) (BuildId: 7f5d2098f168c4db) #8 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:754:19 (xla_extension.so+0x2f12883) (BuildId: 7f5d2098f168c4db) #9 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f12883) #10 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #11 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #13 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) ``` PiperOrigin-RevId: 707655482
…r RunBackend. Both of these call into LLVM code that reads the compiler options. Fixes the following race: ``` WARNING: ThreadSanitizer: data race (pid=869815) Read of size 1 at 0x7f8b24effc08 by thread T65: #0 llvm::cl::opt_storage<bool, false, false>::getValue() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1406:38 (xla_extension.so+0xa281417) (BuildId: 7f5d2098f168c4db) #1 llvm::cl::opt_storage<bool, false, false>::operator bool() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1410:38 (xla_extension.so+0xa281417) #2 llvm::CodeGenTargetMachineImpl::CodeGenTargetMachineImpl(llvm::Target const&, llvm::StringRef, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOptLevel) /proc/self/cwd/external/llvm-project/llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp:97:7 (xla_extension.so+0xa281417) #3 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/lib/Target/X86/X86TargetMachine.cpp:236:7 (xla_extension.so+0x9803b80) (BuildId: 7f5d2098f168c4db) #4 llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1258:16 (xla_extension.so+0x980757a) (BuildId: 7f5d2098f168c4db) #5 llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) const /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:462:12 (xla_extension.so+0x94ba529) (BuildId: 7f5d2098f168c4db) #6 llvm::EngineBuilder::selectTarget(llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&) /proc/self/cwd/external/llvm-project/llvm/lib/ExecutionEngine/TargetSelect.cpp:88:18 (xla_extension.so+0x94ba529) #7 xla::cpu::JitCompiler::InferTargetMachine(llvm::TargetOptions const&, llvm::CodeGenOptLevel, std::optional<tsl::port::CPUFeature>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:88:12 (xla_extension.so+0x48d070f) (BuildId: 7f5d2098f168c4db) #8 xla::cpu::CpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1017:3 (xla_extension.so+0x2f6dc47) (BuildId: 7f5d2098f168c4db) #9 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:749:3 (xla_extension.so+0x2f127e2) (BuildId: 7f5d2098f168c4db) #10 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f127e2) #11 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #13 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #15 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) Previous write of size 1 at 0x7f8b24effc08 by thread T66 (mutexes: write M0): #0 void llvm::cl::opt_storage<bool, false, false>::setValue<bool>(bool const&, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401:11 (xla_extension.so+0x100bace9) (BuildId: 7f5d2098f168c4db) #1 void llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefaultImpl<bool, void>() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h (xla_extension.so+0x100bace9) #2 llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefault() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1474:32 (xla_extension.so+0x100bace9) #3 llvm::cl::Option::reset() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:460:3 (xla_extension.so+0x100cac0e) (BuildId: 7f5d2098f168c4db) #4 (anonymous namespace)::CommandLineParser::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:1478:17 (xla_extension.so+0x100cac0e) #5 llvm::cl::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:2831:17 (xla_extension.so+0x100caa72) (BuildId: 7f5d2098f168c4db) #6 xla::llvm_ir::LLVMCommandLineOptionsLock::LLVMCommandLineOptionsLock(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&) /proc/self/cwd/external/xla/xla/service/llvm_ir/llvm_command_line_options.cc:70:5 (xla_extension.so+0x91d69f4) (BuildId: 7f5d2098f168c4db) #7 xla::cpu::CpuCompiler::RunBackend(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1727:39 (xla_extension.so+0x2f781c8) (BuildId: 7f5d2098f168c4db) #8 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:754:19 (xla_extension.so+0x2f12883) (BuildId: 7f5d2098f168c4db) #9 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f12883) #10 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #11 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #13 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) ``` PiperOrigin-RevId: 707655482
…r RunBackend. Both of these call into LLVM code that reads the compiler options. Fixes the following race: ``` WARNING: ThreadSanitizer: data race (pid=869815) Read of size 1 at 0x7f8b24effc08 by thread T65: #0 llvm::cl::opt_storage<bool, false, false>::getValue() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1406:38 (xla_extension.so+0xa281417) (BuildId: 7f5d2098f168c4db) #1 llvm::cl::opt_storage<bool, false, false>::operator bool() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1410:38 (xla_extension.so+0xa281417) #2 llvm::CodeGenTargetMachineImpl::CodeGenTargetMachineImpl(llvm::Target const&, llvm::StringRef, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOptLevel) /proc/self/cwd/external/llvm-project/llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp:97:7 (xla_extension.so+0xa281417) #3 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/lib/Target/X86/X86TargetMachine.cpp:236:7 (xla_extension.so+0x9803b80) (BuildId: 7f5d2098f168c4db) #4 llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1258:16 (xla_extension.so+0x980757a) (BuildId: 7f5d2098f168c4db) #5 llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) const /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:462:12 (xla_extension.so+0x94ba529) (BuildId: 7f5d2098f168c4db) #6 llvm::EngineBuilder::selectTarget(llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&) /proc/self/cwd/external/llvm-project/llvm/lib/ExecutionEngine/TargetSelect.cpp:88:18 (xla_extension.so+0x94ba529) #7 xla::cpu::JitCompiler::InferTargetMachine(llvm::TargetOptions const&, llvm::CodeGenOptLevel, std::optional<tsl::port::CPUFeature>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:88:12 (xla_extension.so+0x48d070f) (BuildId: 7f5d2098f168c4db) #8 xla::cpu::CpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1017:3 (xla_extension.so+0x2f6dc47) (BuildId: 7f5d2098f168c4db) #9 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:749:3 (xla_extension.so+0x2f127e2) (BuildId: 7f5d2098f168c4db) #10 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f127e2) #11 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #13 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #15 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) Previous write of size 1 at 0x7f8b24effc08 by thread T66 (mutexes: write M0): #0 void llvm::cl::opt_storage<bool, false, false>::setValue<bool>(bool const&, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401:11 (xla_extension.so+0x100bace9) (BuildId: 7f5d2098f168c4db) #1 void llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefaultImpl<bool, void>() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h (xla_extension.so+0x100bace9) #2 llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefault() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1474:32 (xla_extension.so+0x100bace9) #3 llvm::cl::Option::reset() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:460:3 (xla_extension.so+0x100cac0e) (BuildId: 7f5d2098f168c4db) #4 (anonymous namespace)::CommandLineParser::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:1478:17 (xla_extension.so+0x100cac0e) #5 llvm::cl::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:2831:17 (xla_extension.so+0x100caa72) (BuildId: 7f5d2098f168c4db) #6 xla::llvm_ir::LLVMCommandLineOptionsLock::LLVMCommandLineOptionsLock(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&) /proc/self/cwd/external/xla/xla/service/llvm_ir/llvm_command_line_options.cc:70:5 (xla_extension.so+0x91d69f4) (BuildId: 7f5d2098f168c4db) #7 xla::cpu::CpuCompiler::RunBackend(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1727:39 (xla_extension.so+0x2f781c8) (BuildId: 7f5d2098f168c4db) #8 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:754:19 (xla_extension.so+0x2f12883) (BuildId: 7f5d2098f168c4db) #9 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f12883) #10 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #11 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #13 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) ``` PiperOrigin-RevId: 707721170
… kernel rocm cu Imported from GitHub PR openxla/xla#24898 Fix issue reported by asan while running the tests on rocm ci: ``` ==1718600==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5030001d97f8 at pc 0x5647cfdda211 bp 0x7ffc9eb7eac0 sp 0x7ffc9eb7eab8 READ of size 8 at 0x5030001d97f8 thread T0 #0 0x5647cfdda210 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 #1 0x5647cfdda210 in void absl::lts_20230802::container_internal::InitializeSlots<std::allocator<char>, 8ul, 8ul>(absl::lts_20230802::container_internal::CommonFields&, std::allocator<char>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1403:24 #2 0x7f066c2cfdde in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::resize(unsigned long) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9dde) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #3 0x7f066c2cfd97 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::prepare_insert(unsigned long) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9d97) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #4 0x7f066c2cfcca in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::find_or_prepare_insert<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>(std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const&) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9cca) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #5 0x7f066c2cf9c4 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::EmplaceDecomposable::operator()<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>&>, std::tuple<stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>&&>>(std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>&>&&, std::tuple<stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>&&>&&) const (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x99c4) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #6 0x7f066c2cf0ad in stream_executor::GetComparisonKernel(stream_executor::StreamExecutor*, stream_executor::GpuAsmOpts) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x90ad) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #7 0x7f066c37ba93 in stream_executor::RedzoneAllocator::CheckRedzones() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/redzone_allocator.cc:272:3 #8 0x7f06b31bb7e9 in absl::lts_20230802::StatusOr<xla::AutotuneResult> xla::gpu::(anonymous namespace)::GemmAutotuner::GetBestAlgorithm<long, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&>(xla::HloInstruction const*, absl::lts_20230802::Span<long const>, double, bool, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:328:7 #9 0x7f06b31bb7e9 in xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:256:12 #10 0x7f06b31bb7e9 in xla::gpu::(anonymous namespace)::GemmAutotuner::operator()(xla::HloInstruction const*, xla::gpu::AutotuneCacheKey const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:137:18 #11 0x7f06b31b6760 in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #12 0x7f06b31b6760 in absl::lts_20230802::StatusOr<xla::AutotuneResult> std::__invoke_impl<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(std::__invoke_other, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #13 0x7f06b31b6760 in std::enable_if<is_invocable_r_v<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>, absl::lts_20230802::StatusOr<xla::AutotuneResult>>::type std::__invoke_r<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:114:9 #14 0x7f06b31b6760 in std::_Function_handler<absl::lts_20230802::StatusOr<xla::AutotuneResult> (), xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0>::_M_invoke(std::_Any_data const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:290:9 #15 0x7f06b308670d in std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()>::operator()() const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:590:9 #16 0x7f06b308670d in xla::gpu::AutotunerUtil::Autotune(xla::HloInstruction const*, xla::gpu::AutotuneConfig const&, std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/autotuner_util.cc:460:3 #17 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #18 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnComputation(xla::HloComputation*, xla::gpu::(anonymous namespace)::GemmAutotuner&, unsigned long*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:468:7 #19 0x7f06b31b336e in xla::gpu::GemmAlgorithmPicker::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:495:5 #20 0x7f06b30242f3 in xla::HloPassPipeline::RunHelper(xla::HloPassInterface*, xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_pipeline.h:150:5 #21 0x7f06b3010bb9 in absl::lts_20230802::StatusOr<bool> xla::HloPassPipeline::RunPassesInternal<xla::HloModule>(xla::HloModule*, xla::DebugOptions const&, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:198:30 #22 0x7f06b300f786 in xla::HloPassPipeline::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:338:10 #23 0x5647cfd66945 in xla::HloPassInterface::Run(xla::HloModule*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_interface.h:85:12 #24 0x7f06c2908be0 in xla::gpu::GpuCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1754:3 #25 0x7f06c2a000f3 in xla::gpu::AMDGPUCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/amdgpu_compiler.cc:197:3 #26 0x7f06c28f85e9 in xla::gpu::GpuCompiler::OptimizeHloModule(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1392:3 #27 0x7f06c291250d in xla::gpu::GpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1824:3 #28 0x5647cfd63784 in xla::Compiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, stream_executor::DeviceMemoryAllocator*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/service/compiler.h:177:12 #29 0x7f06c339acba in xla::HloTestBase::GetOptimizedModule(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/hlo_test_base.cc:188:32 #30 0x5647cfd89516 in xla::gpu::(anonymous namespace)::GpuCompilerTest_CollectivePermuteDecompositionAndPipelining_Test::TestBody() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler_test.cc:879:3 #31 0x7f06c2c649dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #32 0x7f06c2c649dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #33 0x7f06c2c64708 in testing::Test::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #34 0x7f06c2c6771b in testing::TestInfo::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #35 0x7f06c2c6a5ab in testing::TestSuite::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #36 0x7f06c2c96eba in testing::internal::UnitTestImpl::RunAllTests() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #37 0x7f06c2c9579d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #38 0x7f06c2c9579d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #39 0x7f06c2c95203 in testing::UnitTest::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #40 0x7f06c2d679b8 in RUN_ALL_TESTS() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #41 0x7f06c2d679b8 in main /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #42 0x7f064c0b3d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #43 0x7f064c0b3e3f in __libc_start_main csu/../csu/libc-start.c:392:3 #44 0x5647cfc7b044 in _start (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/gpu_compiler_test_gpu_amd_any+0xff044) (BuildId: ef1ac485eb61840d0e2233a2cca69eec) 0x5030001d97f8 is located 8 bytes before 32-byte region [0x5030001d9800,0x5030001d9820) allocated by thread T0 here: #0 0x5647cfd1527f in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/gpu_compiler_test_gpu_amd_any+0x19927f) (BuildId: ef1ac485eb61840d0e2233a2cca69eec) #1 0x7f064c39798b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2) #2 0x7f06b31bb5b7 in google::protobuf::Duration* google::protobuf::MessageLite::CreateMaybeMessage<google::protobuf::Duration>(google::protobuf::Arena*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_protobuf/src/google/protobuf/message_lite.h:425:12 #3 0x7f06b31bb5b7 in xla::AutotuneResult::_internal_mutable_run_time() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/autotuning.pb.h:3079:15 #4 0x7f06b31bb5b7 in xla::AutotuneResult::mutable_run_time() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/autotuning.pb.h:3085:45 #5 0x7f06b31bb5b7 in absl::lts_20230802::StatusOr<xla::AutotuneResult> xla::gpu::(anonymous namespace)::GemmAutotuner::GetBestAlgorithm<long, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&>(xla::HloInstruction const*, absl::lts_20230802::Span<long const>, double, bool, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:321:15 #6 0x7f06b31bb5b7 in xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:256:12 #7 0x7f06b31bb5b7 in xla::gpu::(anonymous namespace)::GemmAutotuner::operator()(xla::HloInstruction const*, xla::gpu::AutotuneCacheKey const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:137:18 #8 0x7f06b31b6760 in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #9 0x7f06b31b6760 in absl::lts_20230802::StatusOr<xla::AutotuneResult> std::__invoke_impl<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(std::__invoke_other, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #10 0x7f06b31b6760 in std::enable_if<is_invocable_r_v<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>, absl::lts_20230802::StatusOr<xla::AutotuneResult>>::type std::__invoke_r<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:114:9 #11 0x7f06b31b6760 in std::_Function_handler<absl::lts_20230802::StatusOr<xla::AutotuneResult> (), xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0>::_M_invoke(std::_Any_data const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:290:9 #12 0x7f06b308670d in std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()>::operator()() const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:590:9 #13 0x7f06b308670d in xla::gpu::AutotunerUtil::Autotune(xla::HloInstruction const*, xla::gpu::AutotuneConfig const&, std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/autotuner_util.cc:460:3 #14 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #15 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnComputation(xla::HloComputation*, xla::gpu::(anonymous namespace)::GemmAutotuner&, unsigned long*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:468:7 #16 0x7f06b31b336e in xla::gpu::GemmAlgorithmPicker::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:495:5 #17 0x7f06b30242f3 in xla::HloPassPipeline::RunHelper(xla::HloPassInterface*, xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_pipeline.h:150:5 #18 0x7f06b3010bb9 in absl::lts_20230802::StatusOr<bool> xla::HloPassPipeline::RunPassesInternal<xla::HloModule>(xla::HloModule*, xla::DebugOptions const&, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:198:30 #19 0x7f06b300f786 in xla::HloPassPipeline::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:338:10 #20 0x5647cfd66945 in xla::HloPassInterface::Run(xla::HloModule*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_interface.h:85:12 #21 0x7f06c2908be0 in xla::gpu::GpuCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1754:3 #22 0x 10000 7f06c2a000f3 in xla::gpu::AMDGPUCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/amdgpu_compiler.cc:197:3 #23 0x7f06c28f85e9 in xla::gpu::GpuCompiler::OptimizeHloModule(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1392:3 #24 0x7f06c291250d in xla::gpu::GpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1824:3 #25 0x5647cfd63784 in xla::Compiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, stream_executor::DeviceMemoryAllocator*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/service/compiler.h:177:12 #26 0x7f06c339acba in xla::HloTestBase::GetOptimizedModule(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/hlo_test_base.cc:188:32 #27 0x5647cfd89516 in xla::gpu::(anonymous namespace)::GpuCompilerTest_CollectivePermuteDecompositionAndPipelining_Test::TestBody() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler_test.cc:879:3 #28 0x7f06c2c649dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #29 0x7f06c2c649dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #30 0x7f06c2c64708 in testing::Test::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #31 0x7f06c2c6771b in testing::TestInfo::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #32 0x7f06c2c6a5ab in testing::TestSuite::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #33 0x7f06c2c96eba in testing::internal::UnitTestImpl::RunAllTests() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #34 0x7f06c2c9579d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #35 0x7f06c2c9579d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #36 0x7f06c2c95203 in testing::UnitTest::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #37 0x7f06c2d679b8 in RUN_ALL_TESTS() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #38 0x7f06c2d679b8 in main /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #39 0x7f064c0b3d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const Shadow bytes around the buggy address: 0x5030001d9500: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9580: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd 0x5030001d9600: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa 0x5030001d9680: fd fd fd fd fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9700: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd =>0x5030001d9780: fd fa fa fa 00 00 00 fa fa fa 00 00 00 00 fa[fa] 0x5030001d9800: 00 00 00 00 fa fa 00 00 00 00 fa fa fd fd fd fd 0x5030001d9880: fa fa fd fd fd fd fa fa fd fd fd fa fa fa fd fd 0x5030001d9900: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa 0x5030001d9980: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9a00: fa fa fd fd fd fa fa fa fd fd fd fd fa fa fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==1718600==ABORTING ``` Copybara import of the project: -- 9a75d26eb9aab4226a690658d254a057fc59f22c by alekstheod <atheodor@amd.com>: Fix access memory asan issue in redzone_allocator_kernel_rocm.cu Merging this change closes #24898 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#24898 from ROCm:ci_fix_asan_invalid_memory_access_in_redzone_allocator_kernel_rocm_cu 9a75d26eb9aab4226a690658d254a057fc59f22c PiperOrigin-RevId: 745536108
… kernel rocm cu Imported from GitHub PR openxla/xla#24898 Fix issue reported by asan while running the tests on rocm ci: ``` ==1718600==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5030001d97f8 at pc 0x5647cfdda211 bp 0x7ffc9eb7eac0 sp 0x7ffc9eb7eab8 READ of size 8 at 0x5030001d97f8 thread T0 #0 0x5647cfdda210 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 #1 0x5647cfdda210 in void absl::lts_20230802::container_internal::InitializeSlots<std::allocator<char>, 8ul, 8ul>(absl::lts_20230802::container_internal::CommonFields&, std::allocator<char>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1403:24 #2 0x7f066c2cfdde in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::resize(unsigned long) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9dde) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #3 0x7f066c2cfd97 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::prepare_insert(unsigned long) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9d97) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #4 0x7f066c2cfcca in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::find_or_prepare_insert<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>(std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const&) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9cca) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #5 0x7f066c2cf9c4 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::EmplaceDecomposable::operator()<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>&>, std::tuple<stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>&&>>(std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>&>&&, std::tuple<stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>&&>&&) const (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x99c4) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #6 0x7f066c2cf0ad in stream_executor::GetComparisonKernel(stream_executor::StreamExecutor*, stream_executor::GpuAsmOpts) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x90ad) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #7 0x7f066c37ba93 in stream_executor::RedzoneAllocator::CheckRedzones() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/redzone_allocator.cc:272:3 #8 0x7f06b31bb7e9 in absl::lts_20230802::StatusOr<xla::AutotuneResult> xla::gpu::(anonymous namespace)::GemmAutotuner::GetBestAlgorithm<long, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&>(xla::HloInstruction const*, absl::lts_20230802::Span<long const>, double, bool, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:328:7 #9 0x7f06b31bb7e9 in xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:256:12 #10 0x7f06b31bb7e9 in xla::gpu::(anonymous namespace)::GemmAutotuner::operator()(xla::HloInstruction const*, xla::gpu::AutotuneCacheKey const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:137:18 #11 0x7f06b31b6760 in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #12 0x7f06b31b6760 in absl::lts_20230802::StatusOr<xla::AutotuneResult> std::__invoke_impl<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(std::__invoke_other, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #13 0x7f06b31b6760 in std::enable_if<is_invocable_r_v<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>, absl::lts_20230802::StatusOr<xla::AutotuneResult>>::type std::__invoke_r<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:114:9 #14 0x7f06b31b6760 in std::_Function_handler<absl::lts_20230802::StatusOr<xla::AutotuneResult> (), xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0>::_M_invoke(std::_Any_data const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:290:9 #15 0x7f06b308670d in std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()>::operator()() const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:590:9 #16 0x7f06b308670d in xla::gpu::AutotunerUtil::Autotune(xla::HloInstruction const*, xla::gpu::AutotuneConfig const&, std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/autotuner_util.cc:460:3 #17 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #18 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnComputation(xla::HloComputation*, xla::gpu::(anonymous namespace)::GemmAutotuner&, unsigned long*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:468:7 #19 0x7f06b31b336e in xla::gpu::GemmAlgorithmPicker::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:495:5 #20 0x7f06b30242f3 in xla::HloPassPipeline::RunHelper(xla::HloPassInterface*, xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_pipeline.h:150:5 #21 0x7f06b3010bb9 in absl::lts_20230802::StatusOr<bool> xla::HloPassPipeline::RunPassesInternal<xla::HloModule>(xla::HloModule*, xla::DebugOptions const&, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:198:30 #22 0x7f06b300f786 in xla::HloPassPipeline::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:338:10 #23 0x5647cfd66945 in xla::HloPassInterface::Run(xla::HloModule*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_interface.h:85:12 #24 0x7f06c2908be0 in xla::gpu::GpuCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1754:3 #25 0x7f06c2a000f3 in xla::gpu::AMDGPUCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/amdgpu_compiler.cc:197:3 #26 0x7f06c28f85e9 in xla::gpu::GpuCompiler::OptimizeHloModule(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1392:3 #27 0x7f06c291250d in xla::gpu::GpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1824:3 #28 0x5647cfd63784 in xla::Compiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, stream_executor::DeviceMemoryAllocator*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/service/compiler.h:177:12 #29 0x7f06c339acba in xla::HloTestBase::GetOptimizedModule(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/hlo_test_base.cc:188:32 #30 0x5647cfd89516 in xla::gpu::(anonymous namespace)::GpuCompilerTest_CollectivePermuteDecompositionAndPipelining_Test::TestBody() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler_test.cc:879:3 #31 0x7f06c2c649dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #32 0x7f06c2c649dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #33 0x7f06c2c64708 in testing::Test::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #34 0x7f06c2c6771b in testing::TestInfo::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #35 0x7f06c2c6a5ab in testing::TestSuite::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #36 0x7f06c2c96eba in testing::internal::UnitTestImpl::RunAllTests() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #37 0x7f06c2c9579d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #38 0x7f06c2c9579d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #39 0x7f06c2c95203 in testing::UnitTest::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #40 0x7f06c2d679b8 in RUN_ALL_TESTS() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #41 0x7f06c2d679b8 in main /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #42 0x7f064c0b3d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #43 0x7f064c0b3e3f in __libc_start_main csu/../csu/libc-start.c:392:3 #44 0x5647cfc7b044 in _start (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/gpu_compiler_test_gpu_amd_any+0xff044) (BuildId: ef1ac485eb61840d0e2233a2cca69eec) 0x5030001d97f8 is located 8 bytes before 32-byte region [0x5030001d9800,0x5030001d9820) allocated by thread T0 here: #0 0x5647cfd1527f in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/gpu_compiler_test_gpu_amd_any+0x19927f) (BuildId: ef1ac485eb61840d0e2233a2cca69eec) #1 0x7f064c39798b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2) #2 0x7f06b31bb5b7 in google::protobuf::Duration* google::protobuf::MessageLite::CreateMaybeMessage<google::protobuf::Duration>(google::protobuf::Arena*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_protobuf/src/google/protobuf/message_lite.h:425:12 #3 0x7f06b31bb5b7 in xla::AutotuneResult::_internal_mutable_run_time() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/autotuning.pb.h:3079:15 #4 0x7f06b31bb5b7 in xla::AutotuneResult::mutable_run_time() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/autotuning.pb.h:3085:45 #5 0x7f06b31bb5b7 in absl::lts_20230802::StatusOr<xla::AutotuneResult> xla::gpu::(anonymous namespace)::GemmAutotuner::GetBestAlgorithm<long, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&>(xla::HloInstruction const*, absl::lts_20230802::Span<long const>, double, bool, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:321:15 #6 0x7f06b31bb5b7 in xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:256:12 #7 0x7f06b31bb5b7 in xla::gpu::(anonymous namespace)::GemmAutotuner::operator()(xla::HloInstruction const*, xla::gpu::AutotuneCacheKey const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:137:18 #8 0x7f06b31b6760 in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #9 0x7f06b31b6760 in absl::lts_20230802::StatusOr<xla::AutotuneResult> std::__invoke_impl<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(std::__invoke_other, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #10 0x7f06b31b6760 in std::enable_if<is_invocable_r_v<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>, absl::lts_20230802::StatusOr<xla::AutotuneResult>>::type std::__invoke_r<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:114:9 #11 0x7f06b31b6760 in std::_Function_handler<absl::lts_20230802::StatusOr<xla::AutotuneResult> (), xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0>::_M_invoke(std::_Any_data const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:290:9 #12 0x7f06b308670d in std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()>::operator()() const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:590:9 #13 0x7f06b308670d in xla::gpu::AutotunerUtil::Autotune(xla::HloInstruction const*, xla::gpu::AutotuneConfig const&, std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/autotuner_util.cc:460:3 #14 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #15 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnComputation(xla::HloComputation*, xla::gpu::(anonymous namespace)::GemmAutotuner&, unsigned long*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:468:7 #16 0x7f06b31b336e in xla::gpu::GemmAlgorithmPicker::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:495:5 #17 0x7f06b30242f3 in xla::HloPassPipeline::RunHelper(xla::HloPassInterface*, xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_pipeline.h:150:5 #18 0x7f06b3010bb9 in absl::lts_20230802::StatusOr<bool> xla::HloPassPipeline::RunPassesInternal<xla::HloModule>(xla::HloModule*, xla::DebugOptions const&, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:198:30 #19 0x7f06b300f786 in xla::HloPassPipeline::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:338:10 #20 0x5647cfd66945 in xla::HloPassInterface::Run(xla::HloModule*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_interface.h:85:12 #21 0x7f06c2908be0 in xla::gpu::GpuCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1754:3 #22 0x7f06c2a000f3 in xla::gpu::AMDGPUCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/amdgpu_compiler.cc:197:3 #23 0x7f06c28f85e9 in xla::gpu::GpuCompiler::OptimizeHloModule(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1392:3 #24 0x7f06c291250d in xla::gpu::GpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1824:3 #25 0x5647cfd63784 in xla::Compiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, stream_executor::DeviceMemoryAllocator*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/service/compiler.h:177:12 #26 0x7f06c339acba in xla::HloTestBase::GetOptimizedModule(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/hlo_test_base.cc:188:32 #27 0x5647cfd89516 in xla::gpu::(anonymous namespace)::GpuCompilerTest_CollectivePermuteDecompositionAndPipelining_Test::TestBody() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler_test.cc:879:3 #28 0x7f06c2c649dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #29 0x7f06c2c649dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #30 0x7f06c2c64708 in testing::Test::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #31 0x7f06c2c6771b in testing::TestInfo::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #32 0x7f06c2c6a5ab in testing::TestSuite::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #33 0x7f06c2c96eba in testing::internal::UnitTestImpl::RunAllTests() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #34 0x7f06c2c9579d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #35 0x7f06c2c9579d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #36 0x7f06c2c95203 in testing::UnitTest::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #37 0x7f06c2d679b8 in RUN_ALL_TESTS() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #38 0x7f06c2d679b8 in main /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #39 0x7f064c0b3d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const Shadow bytes around the buggy address: 0x5030001d9500: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9580: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd 0x5030001d9600: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa 0x5030001d9680: fd fd fd fd fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9700: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd =>0x5030001d9780: fd fa fa fa 00 00 00 fa fa fa 00 00 00 00 fa[fa] 0x5030001d9800: 00 00 00 00 fa fa 00 00 00 00 fa fa fd fd fd fd 0x5030001d9880: fa fa fd fd fd fd fa fa fd fd fd fa fa fa fd fd 0x5030001d9900: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa 0x5030001d9980: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9a00: fa fa fd fd fd fa fa fa fd fd fd fd fa fa fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==1718600==ABORTING ``` Copybara import of the project: -- 9a75d26eb9aab4226a690658d254a057fc59f22c by alekstheod <atheodor@amd.com>: Fix access memory asan issue in redzone_allocator_kernel_rocm.cu Merging this change closes #24898 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#24898 from ROCm:ci_fix_asan_invalid_memory_access_in_redzone_allocator_kernel_rocm_cu 9a75d26eb9aab4226a690658d254a057fc59f22c PiperOrigin-RevId: 745536108
… kernel rocm cu Imported from GitHub PR openxla/xla#24898 Fix issue reported by asan while running the tests on rocm ci: ``` ==1718600==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5030001d97f8 at pc 0x5647cfdda211 bp 0x7ffc9eb7eac0 sp 0x7ffc9eb7eab8 READ of size 8 at 0x5030001d97f8 thread T0 #0 0x5647cfdda210 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 #1 0x5647cfdda210 in void absl::lts_20230802::container_internal::InitializeSlots<std::allocator<char>, 8ul, 8ul>(absl::lts_20230802::container_internal::CommonFields&, std::allocator<char>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1403:24 #2 0x7f066c2cfdde in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::resize(unsigned long) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9dde) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #3 0x7f066c2cfd97 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::prepare_insert(unsigned long) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9d97) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #4 0x7f066c2cfcca in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::find_or_prepare_insert<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>(std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const&) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x9cca) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #5 0x7f066c2cf9c4 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::NodeHashMapPolicy<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>, absl::lts_20230802::hash_internal::Hash<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::equal_to<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>>, std::allocator<std::pair<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const, stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>>>>::EmplaceDecomposable::operator()<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>&>, std::tuple<stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>&&>>(std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<stream_executor::StreamExecutor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, void*>&>&&, std::tuple<stream_executor::TypedKernel<stream_executor::DeviceMemory<unsigned char>, unsigned char, unsigned long, stream_executor::DeviceMemory<unsigned long>>&&>&&) const (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x99c4) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #6 0x7f066c2cf0ad in stream_executor::GetComparisonKernel(stream_executor::StreamExecutor*, stream_executor::GpuAsmOpts) (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/../../../_solib_local/libxla_Sstream_Uexecutor_Sgpu_Slibredzone_Uallocator_Ukernel_Urocm_Urocm.so+0x90ad) (BuildId: 3bd12bfb947fb25a2a780cc09bea1d9c) #7 0x7f066c37ba93 in stream_executor::RedzoneAllocator::CheckRedzones() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/redzone_allocator.cc:272:3 #8 0x7f06b31bb7e9 in absl::lts_20230802::StatusOr<xla::AutotuneResult> xla::gpu::(anonymous namespace)::GemmAutotuner::GetBestAlgorithm<long, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&>(xla::HloInstruction const*, absl::lts_20230802::Span<long const>, double, bool, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:328:7 #9 0x7f06b31bb7e9 in xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:256:12 #10 0x7f06b31bb7e9 in xla::gpu::(anonymous namespace)::GemmAutotuner::operator()(xla::HloInstruction const*, xla::gpu::AutotuneCacheKey const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:137:18 #11 0x7f06b31b6760 in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #12 0x7f06b31b6760 in absl::lts_20230802::StatusOr<xla::AutotuneResult> std::__invoke_impl<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(std::__invoke_other, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #13 0x7f06b31b6760 in std::enable_if<is_invocable_r_v<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>, absl::lts_20230802::StatusOr<xla::AutotuneResult>>::type std::__invoke_r<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:114:9 #14 0x7f06b31b6760 in std::_Function_handler<absl::lts_20230802::StatusOr<xla::AutotuneResult> (), xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0>::_M_invoke(std::_Any_data const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:290:9 #15 0x7f06b308670d in std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()>::operator()() const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:590:9 #16 0x7f06b308670d in xla::gpu::AutotunerUtil::Autotune(xla::HloInstruction const*, xla::gpu::AutotuneConfig const&, std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/autotuner_util.cc:460:3 #17 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #18 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnComputation(xla::HloComputation*, xla::gpu::(anonymous namespace)::GemmAutotuner&, unsigned long*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:468:7 #19 0x7f06b31b336e in xla::gpu::GemmAlgorithmPicker::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:495:5 #20 0x7f06b30242f3 in xla::HloPassPipeline::RunHelper(xla::HloPassInterface*, xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits< F438 char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_pipeline.h:150:5 #21 0x7f06b3010bb9 in absl::lts_20230802::StatusOr<bool> xla::HloPassPipeline::RunPassesInternal<xla::HloModule>(xla::HloModule*, xla::DebugOptions const&, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:198:30 #22 0x7f06b300f786 in xla::HloPassPipeline::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:338:10 #23 0x5647cfd66945 in xla::HloPassInterface::Run(xla::HloModule*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_interface.h:85:12 #24 0x7f06c2908be0 in xla::gpu::GpuCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1754:3 #25 0x7f06c2a000f3 in xla::gpu::AMDGPUCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/amdgpu_compiler.cc:197:3 #26 0x7f06c28f85e9 in xla::gpu::GpuCompiler::OptimizeHloModule(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1392:3 #27 0x7f06c291250d in xla::gpu::GpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1824:3 #28 0x5647cfd63784 in xla::Compiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, stream_executor::DeviceMemoryAllocator*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/service/compiler.h:177:12 #29 0x7f06c339acba in xla::HloTestBase::GetOptimizedModule(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/hlo_test_base.cc:188:32 #30 0x5647cfd89516 in xla::gpu::(anonymous namespace)::GpuCompilerTest_CollectivePermuteDecompositionAndPipelining_Test::TestBody() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler_test.cc:879:3 #31 0x7f06c2c649dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #32 0x7f06c2c649dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #33 0x7f06c2c64708 in testing::Test::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #34 0x7f06c2c6771b in testing::TestInfo::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #35 0x7f06c2c6a5ab in testing::TestSuite::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #36 0x7f06c2c96eba in testing::internal::UnitTestImpl::RunAllTests() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #37 0x7f06c2c9579d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #38 0x7f06c2c9579d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #39 0x7f06c2c95203 in testing::UnitTest::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #40 0x7f06c2d679b8 in RUN_ALL_TESTS() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #41 0x7f06c2d679b8 in main /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #42 0x7f064c0b3d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #43 0x7f064c0b3e3f in __libc_start_main csu/../csu/libc-start.c:392:3 #44 0x5647cfc7b044 in _start (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/gpu_compiler_test_gpu_amd_any+0xff044) (BuildId: ef1ac485eb61840d0e2233a2cca69eec) 0x5030001d97f8 is located 8 bytes before 32-byte region [0x5030001d9800,0x5030001d9820) allocated by thread T0 here: #0 0x5647cfd1527f in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/service/gpu/gpu_compiler_test_gpu_amd_any+0x19927f) (BuildId: ef1ac485eb61840d0e2233a2cca69eec) #1 0x7f064c39798b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2) #2 0x7f06b31bb5b7 in google::protobuf::Duration* google::protobuf::MessageLite::CreateMaybeMessage<google::protobuf::Duration>(google::protobuf::Arena*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_protobuf/src/google/protobuf/message_lite.h:425:12 #3 0x7f06b31bb5b7 in xla::AutotuneResult::_internal_mutable_run_time() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/autotuning.pb.h:3079:15 #4 0x7f06b31bb5b7 in xla::AutotuneResult::mutable_run_time() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-opt/bin/xla/autotuning.pb.h:3085:45 #5 0x7f06b31bb5b7 in absl::lts_20230802::StatusOr<xla::AutotuneResult> xla::gpu::(anonymous namespace)::GemmAutotuner::GetBestAlgorithm<long, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&>(xla::HloInstruction const*, absl::lts_20230802::Span<long const>, double, bool, xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&)::'lambda'(long const&)&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:321:15 #6 0x7f06b31bb5b7 in xla::gpu::(anonymous namespace)::GemmAutotuner::TuneGpuBlas(xla::HloInstruction const*, xla::gpu::GemmConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:256:12 #7 0x7f06b31bb5b7 in xla::gpu::(anonymous namespace)::GemmAutotuner::operator()(xla::HloInstruction const*, xla::gpu::AutotuneCacheKey const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:137:18 #8 0x7f06b31b6760 in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #9 0x7f06b31b6760 in absl::lts_20230802::StatusOr<xla::AutotuneResult> std::__invoke_impl<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(std::__invoke_other, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 #10 0x7f06b31b6760 in std::enable_if<is_invocable_r_v<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>, absl::lts_20230802::StatusOr<xla::AutotuneResult>>::type std::__invoke_r<absl::lts_20230802::StatusOr<xla::AutotuneResult>, xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&>(xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:114:9 #11 0x7f06b31b6760 in std::_Function_handler<absl::lts_20230802::StatusOr<xla::AutotuneResult> (), xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&)::$_0>::_M_invoke(std::_Any_data const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:290:9 #12 0x7f06b308670d in std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()>::operator()() const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:590:9 #13 0x7f06b308670d in xla::gpu::AutotunerUtil::Autotune(xla::HloInstruction const*, xla::gpu::AutotuneConfig const&, std::function<absl::lts_20230802::StatusOr<xla::AutotuneResult> ()> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/autotuner_util.cc:460:3 #14 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnInstruction(xla::HloInstruction*, xla::gpu::(anonymous namespace)::GemmAutotuner&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:418:3 #15 0x7f06b31b336e in xla::gpu::(anonymous namespace)::RunOnComputation(xla::HloComputation*, xla::gpu::(anonymous namespace)::GemmAutotuner&, unsigned long*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:468:7 #16 0x7f06b31b336e in xla::gpu::GemmAlgorithmPicker::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/autotuning/gemm_algorithm_picker.cc:495:5 #17 0x7f06b30242f3 in xla::HloPassPipeline::RunHelper(xla::HloPassInterface*, xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_pipeline.h:150:5 #18 0x7f06b3010bb9 in absl::lts_20230802::StatusOr<bool> xla::HloPassPipeline::RunPassesInternal<xla::HloModule>(xla::HloModule*, xla::DebugOptions const&, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:198:30 #19 0x7f06b300f786 in xla::HloPassPipeline::Run(xla::HloModule*, absl::lts_20230802::flat_hash_set<std::basic_string_view<char, std::char_traits<char>>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator<std::basic_string_view<char, std::char_traits<char>>>> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/hlo/pass/hlo_pass_pipeline.cc:338:10 #20 0x5647cfd66945 in xla::HloPassInterface::Run(xla::HloModule*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/hlo/pass/hlo_pass_interface.h:85:12 #21 0x7f06c2908be0 in xla::gpu::GpuCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1754:3 #22 0x7f06c2a000f3 in xla::gpu::AMDGPUCompiler::OptimizeHloPostLayoutAssignment(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&, tsl::thread::ThreadPool*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/amdgpu_compiler.cc:197:3 #23 0x7f06c28f85e9 in xla::gpu::GpuCompiler::OptimizeHloModule(xla::HloModule*, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&, xla::Compiler::TargetConfig const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1392:3 #24 0x7f06c291250d in xla::gpu::GpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler.cc:1824:3 #25 0x5647cfd63784 in xla::Compiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, stream_executor::DeviceMemoryAllocator*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/service/compiler.h:177:12 #26 0x7f06c339acba in xla::HloTestBase::GetOptimizedModule(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/hlo_test_base.cc:188:32 #27 0x5647cfd89516 in xla::gpu::(anonymous namespace)::GpuCompilerTest_CollectivePermuteDecompositionAndPipelining_Test::TestBody() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/service/gpu/gpu_compiler_test.cc:879:3 #28 0x7f06c2c649dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #29 0x7f06c2c649dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #30 0x7f06c2c64708 in testing::Test::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #31 0x7f06c2c6771b in testing::TestInfo::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #32 0x7f06c2c6a5ab in testing::TestSuite::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #33 0x7f06c2c96eba in testing::internal::UnitTestImpl::RunAllTests() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #34 0x7f06c2c9579d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #35 0x7f06c2c9579d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #36 0x7f06c2c95203 in testing::UnitTest::Run() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #37 0x7f06c2d679b8 in RUN_ALL_TESTS() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #38 0x7f06c2d679b8 in main /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #39 0x7f064c0b3d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const Shadow bytes around the buggy address: 0x5030001d9500: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9580: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd 0x5030001d9600: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa 0x5030001d9680: fd fd fd fd fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9700: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd =>0x5030001d9780: fd fa fa fa 00 00 00 fa fa fa 00 00 00 00 fa[fa] 0x5030001d9800: 00 00 00 00 fa fa 00 00 00 00 fa fa fd fd fd fd 0x5030001d9880: fa fa fd fd fd fd fa fa fd fd fd fa fa fa fd fd 0x5030001d9900: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa 0x5030001d9980: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x5030001d9a00: fa fa fd fd fd fa fa fa fd fd fd fd fa fa fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==1718600==ABORTING ``` Copybara import of the project: -- 9a75d26eb9aab4226a690658d254a057fc59f22c by alekstheod <atheodor@amd.com>: Fix access memory asan issue in redzone_allocator_kernel_rocm.cu Merging this change closes #24898 PiperOrigin-RevId: 745563669
Imported from GitHub PR openxla/xla#24900 Fix asan memory access violation: ``` exec ${PAGER:-/usr/bin/less} "$0" || exit 1 Executing tests from //xla/service:elemental_ir_emitter_test_gpu_amd_any ----------------------------------------------------------------------------- Running test /home/atheodor/projects/tmp/xla_asan/execroot/xla/bazel-out/k8-opt/bin/xla/service/elemental_ir_emitter_test_gpu_amd_any.runfiles/xla/xla/service/elemental_ir_emitter_test_gpu_amd_any --gtest_shuffle --gtest_fail_if_no_test_linked on GPU 3 Note: Randomizing tests' orders with a seed of 19906 . [==========] Running 118 tests from 13 test suites. [----------] Global test environment set-up. [----------] 10 tests from ElementalIrEmitterExecutionTypedTest/7, where TypeParam = ml_dtypes::float8_internal::float8_e5m2 [ RUN ] ElementalIrEmitterExecutionTypedTest/7.ConvertFloatsToFloat ================================================================= ==2457579==ERROR: AddressSanitizer: use-after-poison on address 0x506000843a08 at pc 0x7f401151be6a bp 0x7ffd1e3c3410 sp 0x7ffd1e3c3408 READ of size 8 at 0x506000843a08 thread T0 #0 0x7f401151be69 in stream_executor::gpu::RocmExecutor::UnloadGpuBinary(stream_executor::ModuleHandle) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:596:23 #1 0x7f401151b036 in stream_executor::gpu::RocmExecutor::UnloadModule(stream_executor::ModuleHandle) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:496:10 #2 0x7f405dee713b in stream_executor::ScopedModuleHandle::~ScopedModuleHandle() /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/stream_executor/scoped_module_handle.h:48:7 #3 0x7f405dee713b in std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>::~pair() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_iterator.h:2488:12 #4 0x7f405dee713b in void __gnu_cxx::new_allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>::destroy<std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>>(std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:168:10 #5 0x7f405dee713b in void std::allocator_traits<std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>::destroy<std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>>(std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>&, std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:535:8 #6 0x7f405dee713b in void absl::lts_20230802::container_internal::map_slot_policy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>::destroy<std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>(std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>*, absl::lts_20230802::container_internal::map_slot_type<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:419:7 #7 0x7f405dee713b in void absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>::destroy<std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>(std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>*, absl::lts_20230802::container_internal::map_slot_type<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/flat_hash_map.h:578:5 #8 0x7f405dee713b in void absl::lts_20230802::container_internal::common_policy_traits<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>, void>::destroy<std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>(std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>*, absl::lts_20230802::container_internal::map_slot_type<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/common_policy_traits.h:50:5 #9 0x7f405dee713b in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>, absl::lts_20230802::container_internal::HashEq<stream_executor::StreamExecutor*, void>::Hash, absl::lts_20230802::container_internal::HashEq<stream_executor::StreamExecutor*, void>::Eq, std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>::destroy_slots() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1946:9 #10 0x7f405dee713b in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>, absl::lts_20230802::container_internal::HashEq<stream_executor::StreamExecutor*, void>::Hash, absl::lts_20230802::container_internal::HashEq<stream_executor::StreamExecutor*, void>::Eq, std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>::~raw_hash_set() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1885:5 #11 0x7f405dee8580 in xla::gpu::GpuExecutable::~GpuExecutable() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:155:1 #12 0x7f405dee8d4d in xla::gpu::GpuExecutable::~GpuExecutable() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:151:33 #13 0x7f407b818b3f in std::default_delete<xla::Executable>::operator()(xla::Executable*) const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2 #14 0x7f407b818b3f in std::unique_ptr<xla::Executable, std::default_delete<xla::Executable>>::~unique_ptr() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4 #15 0x7f407b818b3f in xla::(anonymous namespace)::HloRunnerExecutable::~HloRunnerExecutable() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:69:7 #16 0x7f407b818b3f in xla::(anonymous namespace)::HloRunnerExecutable::~HloRunnerExecutable() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:69:7 #17 0x7f407b7e6503 in std::default_delete<xla::OpaqueExecutable>::operator()(xla::OpaqueExecutable*) const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2 #18 0x7f407b7e6503 in std::unique_ptr<xla::OpaqueExecutable, std::default_delete<xla::OpaqueExecutable>>::~unique_ptr() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4 #19 0x7f407b7e6503 in xla::HloRunner::ExecuteWithMovedDeviceBuffersAndBufferAssignment(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, xla::BufferAssignmentProto const*, std::vector<xla::ScopedShapedBuffer, std::allocator<xla::ScopedShapedBuffer>>, bool, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:400:1 #20 0x7f407b7e57c3 in xla::HloRunner::Execute(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal const* const>, bool, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:221:3 #21 0x55b8a3cb4622 in xla::HloRunnerInterface::Execute(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal const* const>, bool) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/service/hlo_runner_interface.h:244:12 #22 0x55b8a3cb4622 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompareInternal(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal* const>, std::optional<xla::ErrorSpec> const&, bool, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:238:5 #23 0x55b8a3cbf766 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompare(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal* const>, std::optional<xla::ErrorSpec> const&, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:94:9 #24 0x55b8a3cbf235 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompare(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, std::optional<xla::ErrorSpec> const&, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&, std::optional<long>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:140:12 #25 0x55b8a3cceda8 in xla::(anonymous namespace)::ElementalIrEmitterExecutionTest::RunTypeConversionTest(std::basic_string_view<char, std::char_traits<char>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/elemental_ir_emitter_test.cc:76:5 #26 0x55b8a3cd8cf3 in xla::(anonymous namespace)::ElementalIrEmitterExecutionTypedTest_ConvertFloatsToFloat_Test<ml_dtypes::float8_internal::float8_e5m2>::TestBody() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/elemental_ir_emitter_test.cc:472:36 #27 0x7f407b2f09dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #28 0x7f407b2f09dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #29 0x7f407b2f0708 in testing::Test::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #30 0x7f407b2f371b in testing::TestInfo::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #31 0x7f407b2f65ab in testing::TestSuite::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #32 0x7f407b322eba in testing::internal::UnitTestImpl::RunAllTests() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #33 0x7f407b32179d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #34 0x7f407b32179d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #35 0x7f407b321203 in testing::UnitTest::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #36 0x7f407b3f59b8 in RUN_ALL_TESTS() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #37 0x7f407b3f59b8 in main /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #38 0x7f4004766d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #39 0x7f4004766e3f in __libc_start_main csu/../csu/libc-start.c:392:3 #40 0x55b8a3b9be44 in _start (/home/atheodor/projects/tmp/xla_asan/execroot/xla/bazel-out/k8-opt/bin/xla/service/elemental_ir_emitter_test_gpu_amd_any+0x10ce44) (BuildId: 1c37d17e488373aad7bf33204cb4234e) 0x506000843a08 is located 40 bytes inside of 56-byte region [0x5060008439e0,0x506000843a18) allocated by thread T0 here: #0 0x55b8a3c3607f in malloc (/home/atheodor/projects/tmp/xla_asan/execroot/xla/bazel-out/k8-opt/bin/xla/service/elemental_ir_emitter_test_gpu_amd_any+0x1a707f) (BuildId: 1c37d17e488373aad7bf33204cb4234e) #1 0x7f4004a4a98b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2) #2 0x7f40115449aa in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::initialize_slots() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2505:5 #3 0x7f40115449aa in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::resize(unsigned long) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2515:5 #4 0x7f40115443fa in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::prepare_insert(unsigned long) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2672:7 #5 0x7f40115442df in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::find_or_prepare_insert<stream_executor::ModuleHandle>(stream_executor::ModuleHandle const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2659:13 #6 0x7f4011524701 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_map<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::try_emplace_impl<stream_executor::ModuleHandle const&>(stream_executor::ModuleHandle const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_map.h:202:22 #7 0x7f4011524701 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_map<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::try_emplace<stream_executor::ModuleHandle, 0>(stream_executor::ModuleHandle const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_map.h:139:12 #8 0x7f4011524701 in decltype(absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>::value(std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>* std::addressof<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>(std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>&)(decltype(__declval<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>(0)) std::declval<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>&>()()))) absl::lts_20230802::container_internal::raw_hash_map<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::operator[]<stream_executor::ModuleHandle, absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>>(stream_executor::ModuleHandle const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_map.h:184:28 #9 0x7f4011524701 in stream_executor::gpu::RocmExecutor::LoadModuleFromHsaco(char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:717:39 #10 0x7f4011524387 in stream_executor::gpu::RocmExecutor::LoadModule(stream_executor::MultiModuleLoaderSpec const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:705:12 #11 0x7f405deeae34 in xla::gpu::GpuExecutable::ResolveConstantGlobals(stream_executor::Stream*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:499:5 #12 0x7f405def050a in xla::gpu::GpuExecutable::ExecuteAsyncOnStreamImpl(xla::ServiceExecutableRunOptions const*, std::variant<absl::lts_20230802::Span<xla::ShapedBuffer const* const>, absl::lts_20230802::Span<xla::ExecutionInput>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:703:5 #13 0x7f405deefc6f in xla::gpu::GpuExecutable::ExecuteAsyncOnStream(xla::ServiceExecutableRunOptions const*, std::vector<xla::ExecutionInput, std::allocator<xla::ExecutionInput>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:661:10 #14 0x7f401607a78e in xla::Executable::ExecuteAsyncOnStreamWrapper(xla::ServiceExecutableRunOptions const*, std::vector<xla::ExecutionInput, std::allocator<xla::ExecutionInput>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/executable.cc:229:7 #15 0x7f4016079fd3 in xla::Executable::ExecuteOnStreamWrapper(xla::ServiceExecutableRunOptions const*, std::vector<xla::ExecutionInput, std::allocator<xla::ExecutionInput>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/executable.cc:153:7 #16 0x7f407b7ea78b in xla::HloRunner::ExecuteWithExecutionInputs(xla::Executable*, std::vector<xla::ExecutionInput, std::allocator<xla::ExecutionInput>>, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:448:3 #17 0x7f407b7ecde2 in xla::HloRunner::ExecuteWithMovedDeviceBuffers(xla::Executable*, std::vector<xla::ScopedShapedBuffer, std::allocator<xla::ScopedShapedBuffer>>, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:415:3 #18 0x7f407b7e642a in xla::HloRunner::ExecuteWithMovedDeviceBuffersAndBufferAssignment(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, xla::BufferAssignmentProto const*, std::vector<xla::ScopedShapedBuffer, std::allocator<xla::ScopedShapedBuffer>>, bool, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:398:10 #19 0x7f407b7e57c3 in xla::HloRunner::Execute(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal const* const>, bool, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:221:3 #20 0x55b8a3cb4622 in xla::HloRunnerInterface::Execute(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal const* const>, bool) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/service/hlo_runner_interface.h:244:12 #21 0x55b8a3cb4622 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompareInternal(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal* const>, std::optional<xla::ErrorSpec> const&, bool, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:238:5 #22 0x55b8a3cbf766 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompare(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal* const>, std::optional<xla::ErrorSpec> const&, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:94:9 #23 0x55b8a3cbf235 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompare(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, std::optional<xla::ErrorSpec> const&, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&, std::optional<long>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:140:12 #24 0x55b8a3cceda8 in xla::(anonymous namespace)::ElementalIrEmitterExecutionTest::RunTypeConversionTest(std::basic_string_view<char, std::char_traits<char>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/elemental_ir_emitter_test.cc:76:5 #25 0x55b8a3cd8cf3 in xla::(anonymous namespace)::ElementalIrEmitterExecutionTypedTest_ConvertFloatsToFloat_Test<ml_dtypes::float8_internal::float8_e5m2>::TestBody() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/elemental_ir_emitter_test.cc:472:36 #26 0x7f407b2f09dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #27 0x7f407b2f09dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #28 0x7f407b2f0708 in testing::Test::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #29 0x7f407b2f371b in testing::TestInfo::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #30 0x7f407b2f65ab in testing::TestSuite::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #31 0x7f407b322eba in testing::internal::UnitTestImpl::RunAllTests() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #32 0x7f407b32179d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #33 0x7f407b32179d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #34 0x7f407b321203 in testing::UnitTest::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #35 0x7f407b3f59b8 in RUN_ALL_TESTS() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #36 0x7f407b3f59b8 in main /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #37 0x7f4004766d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: use-after-poison /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:596:23 in stream_executor::gpu::RocmExecutor::UnloadGpuBinary(stream_executor::ModuleHandle) Shadow bytes around the buggy address: 0x506000843780: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa 0x506000843800: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd 0x506000843880: fd fd fd fa fa fa fa fa fd fd fd fd fd fd fd fd 0x506000843900: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa 0x506000843980: fd fd fd fd fd fd fd fa fa fa fa fa 00 00 00 00 =>0x506000843a00: f7[f7]f7 fa fa fa fa fa 00 00 00 00 00 00 00 00 0x506000843a80: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa 0x506000843b00: 00 00 00 00 00 00 00 fa fa fa fa fa 00 00 00 00 0x506000843b80: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 fa 0x506000843c00: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa 0x506000843c80: 00 00 00 00 00 00 00 fa fa fa fa fa fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==2457579==ABORTING ``` Copybara import of the project: -- 8f74d4c822d951b5a213500ea9396ed7b160871d by alekstheod <atheodor@amd.com>: Fix asan report memory access vialation in rocm_executor Merging this change closes #24900 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#24900 from ROCm:ci_fix_invalid_memory_access_in_rocm_executor 8f74d4c822d951b5a213500ea9396ed7b160871d PiperOrigin-RevId: 745548395
Imported from GitHub PR openxla/xla#24900 Fix asan memory access violation: ``` exec ${PAGER:-/usr/bin/less} "$0" || exit 1 Executing tests from //xla/service:elemental_ir_emitter_test_gpu_amd_any ----------------------------------------------------------------------------- Running test /home/atheodor/projects/tmp/xla_asan/execroot/xla/bazel-out/k8-opt/bin/xla/service/elemental_ir_emitter_test_gpu_amd_any.runfiles/xla/xla/service/elemental_ir_emitter_test_gpu_amd_any --gtest_shuffle --gtest_fail_if_no_test_linked on GPU 3 Note: Randomizing tests' orders with a seed of 19906 . [==========] Running 118 tests from 13 test suites. [----------] Global test environment set-up. [----------] 10 tests from ElementalIrEmitterExecutionTypedTest/7, where TypeParam = ml_dtypes::float8_internal::float8_e5m2 [ RUN ] ElementalIrEmitterExecutionTypedTest/7.ConvertFloatsToFloat ================================================================= ==2457579==ERROR: AddressSanitizer: use-after-poison on address 0x506000843a08 at pc 0x7f401151be6a bp 0x7ffd1e3c3410 sp 0x7ffd1e3c3408 READ of size 8 at 0x506000843a08 thread T0 #0 0x7f401151be69 in stream_executor::gpu::RocmExecutor::UnloadGpuBinary(stream_executor::ModuleHandle) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:596:23 #1 0x7f401151b036 in stream_executor::gpu::RocmExecutor::UnloadModule(stream_executor::ModuleHandle) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:496:10 #2 0x7f405dee713b in stream_executor::ScopedModuleHandle::~ScopedModuleHandle() /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/stream_executor/scoped_module_handle.h:48:7 #3 0x7f405dee713b in std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>::~pair() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_iterator.h:2488:12 #4 0x7f405dee713b in void __gnu_cxx::new_allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>::destroy<std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>>(std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:168:10 #5 0x7f405dee713b in void std::allocator_traits<std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>::destroy<std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>>(std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>&, std::pair<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:535:8 #6 0x7f405dee713b in void absl::lts_20230802::container_internal::map_slot_policy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>::destroy<std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>(std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>*, absl::lts_20230802::container_internal::map_slot_type<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:419:7 #7 0x7f405dee713b in void absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>::destroy<std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>(std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>*, absl::lts_20230802::container_internal::map_slot_type<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/flat_hash_map.h:578:5 #8 0x7f405dee713b in void absl::lts_20230802::container_internal::common_policy_traits<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>, void>::destroy<std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>(std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>*, absl::lts_20230802::container_internal::map_slot_type<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/common_policy_traits.h:50:5 #9 0x7f405dee713b in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>, absl::lts_20230802::container_internal::HashEq<stream_executor::StreamExecutor*, void>::Hash, absl::lts_20230802::container_internal::HashEq<stream_executor::StreamExecutor*, void>::Eq, std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>::destroy_slots() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1946:9 #10 0x7f405dee713b in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::StreamExecutor*, stream_executor::ScopedModuleHandle>, absl::lts_20230802::container_internal::HashEq<stream_executor::StreamExecutor*, void>::Hash, absl::lts_20230802::container_internal::HashEq<stream_executor::StreamExecutor*, void>::Eq, std::allocator<std::pair<stream_executor::StreamExecutor* const, stream_executor::ScopedModuleHandle>>>::~raw_hash_set() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1885:5 #11 0x7f405dee8580 in xla::gpu::GpuExecutable::~GpuExecutable() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:155:1 #12 0x7f405dee8d4d in xla::gpu::GpuExecutable::~GpuExecutable() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:151:33 #13 0x7f407b818b3f in std::default_delete<xla::Executable>::operator()(xla::Executable*) const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2 #14 0x7f407b818b3f in std::unique_ptr<xla::Executable, std::default_delete<xla::Executable>>::~unique_ptr() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4 #15 0x7f407b818b3f in xla::(anonymous namespace)::HloRunnerExecutable::~HloRunnerExecutable() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:69:7 #16 0x7f407b818b3f in xla::(anonymous namespace)::HloRunnerExecutable::~HloRunnerExecutable() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:69:7 #17 0x7f407b7e6503 in std::default_delete<xla::OpaqueExecutable>::operator()(xla::OpaqueExecutable*) const /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2 #18 0x7f407b7e6503 in std::unique_ptr<xla::OpaqueExecutable, std::default_delete<xla::OpaqueExecutable>>::~unique_ptr() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4 #19 0x7f407b7e6503 in xla::HloRunner::ExecuteWithMovedDeviceBuffersAndBufferAssignment(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, xla::BufferAssignmentProto const*, std::vector<xla::ScopedShapedBuffer, std::allocator<xla::ScopedShapedBuffer>>, bool, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:400:1 #20 0x7f407b7e57c3 in xla::HloRunner::Execute(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal const* const>, bool, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:221:3 #21 0x55b8a3cb4622 in xla::HloRunnerInterface::Execute(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal const* const>, bool) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/service/hlo_runner_interface.h:244:12 #22 0x55b8a3cb4622 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompareInternal(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal* const>, std::optional<xla::ErrorSpec> const&, bool, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:238:5 #23 0x55b8a3cbf766 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompare(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal* const>, std::optional<xla::ErrorSpec> const&, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:94:9 #24 0x55b8a3cbf235 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompare(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, std::optional<xla::ErrorSpec> const&, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&, std::optional<long>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:140:12 #25 0x55b8a3cceda8 in xla::(anonymous namespace)::ElementalIrEmitterExecutionTest::RunTypeConversionTest(std::basic_string_view<char, std::char_traits<char>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/elemental_ir_emitter_test.cc:76:5 #26 0x55b8a3cd8cf3 in xla::(anonymous namespace)::ElementalIrEmitterExecutionTypedTest_ConvertFloatsToFloat_Test<ml_dtypes::float8_internal::float8_e5m2>::TestBody() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/elemental_ir_emitter_test.cc:472:36 #27 0x7f407b2f09dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #28 0x7f407b2f09dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #29 0x7f407b2f0708 in testing::Test::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #30 0x7f407b2f371b in testing::TestInfo::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #31 0x7f407b2f65ab in testing::TestSuite::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #32 0x7f407b322eba in testing::internal::UnitTestImpl::RunAllTests() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #33 0x7f407b32179d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #34 0x7f407b32179d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #35 0x7f407b321203 in testing::UnitTest::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #36 0x7f407b3f59b8 in RUN_ALL_TESTS() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #37 0x7f407b3f59b8 in main /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #38 0x7f4004766d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #39 0x7f4004766e3f in __libc_start_main csu/../csu/libc-start.c:392:3 #40 0x55b8a3b9be44 in _start (/home/atheodor/projects/tmp/xla_asan/execroot/xla/bazel-out/k8-opt/bin/xla/service/elemental_ir_emitter_test_gpu_amd_any+0x10ce44) (BuildId: 1c37d17e488373aad7bf33204cb4234e) 0x506000843a08 is located 40 bytes inside of 56-byte region [0x5060008439e0,0x506000843a18) allocated by thread T0 here: #0 0x55b8a3c3607f in malloc (/home/atheodor/projects/tmp/xla_asan/execroot/xla/bazel-out/k8-opt/bin/xla/service/elemental_ir_emitter_test_gpu_amd_any+0x1a707f) (BuildId: 1c37d17e488373aad7bf33204cb4234e) #1 0x7f4004a4a98b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2) #2 0x7f40115449aa in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::initialize_slots() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2505:5 #3 0x7f40115449aa in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::resize(unsigned long) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2515:5 #4 0x7f40115443fa in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::prepare_insert(unsigned long) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2672:7 #5 0x7f40115442df in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::find_or_prepare_insert<stream_executor::ModuleHandle>(stream_executor::ModuleHandle const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2659:13 #6 0x7f4011524701 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_map<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::try_emplace_impl<stream_executor::ModuleHandle const&>(stream_executor::ModuleHandle const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_map.h:202:22 #7 0x7f4011524701 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_map<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::try_emplace<stream_executor::ModuleHandle, 0>(stream_executor::ModuleHandle const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_map.h:139:12 #8 0x7f4011524701 in decltype(absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>::value(std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>* std::addressof<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>(std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>&)(decltype(__declval<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>(0)) std::declval<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>&>()()))) absl::lts_20230802::container_internal::raw_hash_map<absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>, absl::lts_20230802::hash_internal::Hash<stream_executor::ModuleHandle>, std::equal_to<stream_executor::ModuleHandle>, std::allocator<std::pair<stream_executor::ModuleHandle const, std::pair<ihipModule_t*, unsigned long>>>>::operator[]<stream_executor::ModuleHandle, absl::lts_20230802::container_internal::FlatHashMapPolicy<stream_executor::ModuleHandle, std::pair<ihipModule_t*, unsigned long>>>(stream_executor::ModuleHandle const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_map.h:184:28 #9 0x7f4011524701 in stream_executor::gpu::RocmExecutor::LoadModuleFromHsaco(char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:717:39 #10 0x7f4011524387 in stream_executor::gpu::RocmExecutor::LoadModule(stream_executor::MultiModuleLoaderSpec const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:705:12 #11 0x7f405deeae34 in xla::gpu::GpuExecutable::ResolveConstantGlobals(stream_executor::Stream*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:499:5 #12 0x7f405def050a in xla::gpu::GpuExecutable::ExecuteAsyncOnStreamImpl(xla::ServiceExecutableRunOptions const*, std::variant<absl::lts_20230802::Span<xla::ShapedBuffer const* const>, absl::lts_20230802::Span<xla::ExecutionInput>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:703:5 #13 0x7f405deefc6f in xla::gpu::GpuExecutable::ExecuteAsyncOnStream(xla::ServiceExecutableRunOptions const*, std::vector<xla::ExecutionInput, std::allocator<xla::ExecutionInput>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/gpu/gpu_executable.cc:661:10 #14 0x7f401607a78e in xla::Executable::ExecuteAsyncOnStreamWrapper(xla::ServiceExecutableRunOptions const*, std::vector<xla::ExecutionInput, std::allocator<xla::ExecutionInput>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/executable.cc:229:7 #15 0x7f4016079fd3 in xla::Executable::ExecuteOnStreamWrapper(xla::ServiceExecutableRunOptions const*, std::vector<xla::ExecutionInput, std::allocator<xla::ExecutionInput>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/executable.cc:153:7 #16 0x7f407b7ea78b in xla::HloRunner::ExecuteWithExecutionInputs(xla::Executable*, std::vector<xla::ExecutionInput, std::allocator<xla::ExecutionInput>>, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:448:3 #17 0x7f407b7ecde2 in xla::HloRunner::ExecuteWithMovedDeviceBuffers(xla::Executable*, std::vector<xla::ScopedShapedBuffer, std::allocator<xla::ScopedShapedBuffer>>, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:415:3 #18 0x7f407b7e642a in xla::HloRunner::ExecuteWithMovedDeviceBuffersAndBufferAssignment(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, xla::BufferAssignmentProto const*, std::vector<xla::ScopedShapedBuffer, std::allocator<xla::ScopedShapedBuffer>>, bool, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:398:10 #19 0x7f407b7e57c3 in xla::HloRunner::Execute(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal const* const>, bool, xla::ExecutionProfile*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/hlo_runner.cc:221:3 #20 0x55b8a3cb4622 in xla::HloRunnerInterface::Execute(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal const* const>, bool) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/service/hlo_runner_interface.h:244:12 #21 0x55b8a3cb4622 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompareInternal(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal* const>, std::optional<xla::ErrorSpec> const&, bool, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:238:5 #22 0x55b8a3cbf766 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompare(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, absl::lts_20230802::Span<xla::Literal* const>, std::optional<xla::ErrorSpec> const&, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:94:9 #23 0x55b8a3cbf235 in xla::HloRunnerAgnosticReferenceMixin<xla::HloRunnerAgnosticTestBase>::RunAndCompare(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, std::optional<xla::ErrorSpec> const&, std::function<void (xla::HloModule*)> const&, std::function<void (xla::HloModule*)> const&, std::optional<long>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/./xla/tests/hlo_runner_agnostic_reference_mixin.h:140:12 #24 0x55b8a3cceda8 in xla::(anonymous namespace)::ElementalIrEmitterExecutionTest::RunTypeConversionTest(std::basic_string_view<char, std::char_traits<char>>) /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/elemental_ir_emitter_test.cc:76:5 #25 0x55b8a3cd8cf3 in xla::(anonymous namespace)::ElementalIrEmitterExecutionTypedTest_ConvertFloatsToFloat_Test<ml_dtypes::float8_internal::float8_e5m2>::TestBody() /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/service/elemental_ir_emitter_test.cc:472:36 #26 0x7f407b2f09dd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #27 0x7f407b2f09dd in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #28 0x7f407b2f0708 in testing::Test::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2739:5 #29 0x7f407b2f371b in testing::TestInfo::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2885:11 #30 0x7f407b2f65ab in testing::TestSuite::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:3063:30 #31 0x7f407b322eba in testing::internal::UnitTestImpl::RunAllTests() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:6054:44 #32 0x7f407b32179d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2664:10 #33 0x7f407b32179d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:2700:14 #34 0x7f407b321203 in testing::UnitTest::Run() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/src/gtest.cc:5594:10 #35 0x7f407b3f59b8 in RUN_ALL_TESTS() /home/atheodor/projects/tmp/xla_asan/execroot/xla/external/com_google_googletest/googletest/include/gtest/gtest.h:2334:73 #36 0x7f407b3f59b8 in main /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/tests/xla_internal_test_main.cc:65:10 #37 0x7f4004766d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: use-after-poison /home/atheodor/projects/tmp/xla_asan/execroot/xla/xla/stream_executor/rocm/rocm_executor.cc:596:23 in stream_executor::gpu::RocmExecutor::UnloadGpuBinary(stream_executor::ModuleHandle) Shadow bytes around the buggy address: 0x506000843780: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa 0x506000843800: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd 0x506000843880: fd fd fd fa fa fa fa fa fd fd fd fd fd fd fd fd 0x506000843900: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa 0x506000843980: fd fd fd fd fd fd fd fa fa fa fa fa 00 00 00 00 =>0x506000843a00: f7[f7]f7 fa fa fa fa fa 00 00 00 00 00 00 00 00 0x506000843a80: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa 0x506000843b00: 00 00 00 00 00 00 00 fa fa fa fa fa 00 00 00 00 0x506000843b80: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 fa 0x506000843c00: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa 0x506000843c80: 00 00 00 00 00 00 00 fa fa fa fa fa fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==2457579==ABORTING ``` Copybara import of the project: -- 8f74d4c822d951b5a213500ea9396ed7b160871d by alekstheod <atheodor@amd.com>: Fix asan report memory access vialation in rocm_executor Merging this change closes #24900 PiperOrigin-RevId: 745592235
Imported from GitHub PR openxla/xla#25269 Reported issue: ``` exec ${PAGER:-/usr/bin/less} "$0" || exit 1 Executing tests from //xla/service:compiler_test_gpu_amd_any ----------------------------------------------------------------------------- Running test /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any.runfiles/xla/xla/service/compiler_test_gpu_amd_any --gtest_shuffle --gtest_fail_if_no_test_linked on GPU 0 ================================================================= ==168009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50400002c1c0 at pc 0x7f59e50b52e7 bp 0x7ffc8c2358d0 sp 0x7ffc8c2358c8 READ of size 8 at 0x50400002c1c0 thread T0 #0 0x7f59e50b52e6 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 #1 0x7f59e50b52e6 in absl::lts_20230802::container_internal::probe(absl::lts_20230802::container_internal::CommonFields const&, unsigned long) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1298:41 #2 0x7f59e50b52e6 in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::find_or_prepare_insert<std::tuple<std::type_index, void*>>(std::tuple<std::type_index, void*> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2645:16 #3 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable::operator()<std::tuple<std::type_index, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(std::tuple<std::type_index, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>&&) const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2459:20 #4 0x7f59e50af8a8 in decltype(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>()(std::declval<std::tuple<std::type_index, void*>&& const&>(), std::piecewise_construct, std::declval<std::tuple<std::tuple<std::type_index, void*>&&>>(), std::declval<std::tuple<stream_executor::MultiKernelLoaderSpec&&>>())) absl::lts_20230802::container_internal::memory_internal::DecomposePairImpl<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::tuple<std::type_index, void*>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:140:10 #5 0x7f59e50af8a8 in decltype(memory_internal::DecomposePairImpl(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), PairArgs(std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0)))) absl::lts_20230802::container_internal::DecomposePair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:207:10 #6 0x7f59e50af8a8 in decltype(absl::container_internal::DecomposePair(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(), std::declval<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>())) absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/flat_hash_map.h:591:12 #7 0x7f59e50af8a8 in decltype(absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0))) absl::lts_20230802::container_internal::hash_policy_traits<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, void>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/hash_policy_traits.h:134:12 #8 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::emplace<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, 0>(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2064:12 #9 0x7f59e50af8a8 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::insert(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1991:12 #10 0x7f59e50af8a8 in stream_executor::gpu::GpuKernelRegistry::RegisterKernel(std::type_info const&, void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/gpu_kernel_registry.cc:67:45 #11 0x7f59e50d1982 in absl::lts_20230802::Status stream_executor::gpu::GpuKernelRegistry::RegisterKernel<stream_executor::gpu::MakeBatchPointersKernel>(void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/gpu/gpu_kernel_registry.h:86:12 #12 0x7f59e50d1982 in RegisterKernelMakeBatchPointersKernelRocmImpl() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #13 0x7f59e50d1982 in 'lambda'()::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #14 0x7f59e50d1982 in 'lambda'()::__invoke() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #15 0x7f59e50d1982 in stream_executor::port::Initializer::Initializer(void (*)()) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/platform/default/initialize.h:26:42 #16 0x7f59e50d1982 in __cxx_global_var_init.1 /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #17 0x7f59e50d1982 in _GLOBAL__sub_I_make_batch_pointers_kernel_rocm.cu.cc /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc #18 0x7f5a5b27a47d in call_init elf/dl-init.c:70:3 #19 0x7f5a5b27a567 in call_init elf/dl-init.c:33:6 #20 0x7f5a5b27a567 in _dl_init elf/dl-init.c:117:5 #21 0x7f5a5b2942c9 (/lib64/ld-linux-x86-64.so.2+0x202c9) (BuildId: e4de036b19e4768e7591b596c4be9f9015f2d28a) 0x50400002c1c0 is located 8 bytes after 40-byte region [0x50400002c190,0x50400002c1b8) allocated by thread T0 here: #0 0x557d0f77fcdf in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any+0x1e8cdf) (BuildId: e96972f8c7f880083ff6ad5985d3c06d) #1 0x7f59d733098b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2) SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const Shadow bytes around the buggy address: 0x50400002bf00: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa 0x50400002bf80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa 0x50400002c000: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa 0x50400002c080: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 00 0x50400002c100: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00 =>0x50400002c180: fa fa 00 00 00 00 00 fa[fa]fa fa fa fa fa fa fa 0x50400002c200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==168009==ABORTING ``` Why this fixes the issue: * Consider compiling this class into a different .so files where this function will get inlined and we will get different instances while we still want to have singleton. * In rocm compiler wrapper script we do not yet support sanitizer flags so our cu.cc files are not getting instrumented while our normal cc files do! This might cause a memory disalignment while running with asan (theory). Copybara import of the project: -- ffcd58918137191cdba6db571e0e5af0e57de2e1 by alekstheod <atheodor@amd.com>: Fix asan issue do to a singleton in header file Merging this change closes #25269 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#25269 from ROCm:ci_fix_singleton_in_header_file ffcd58918137191cdba6db571e0e5af0e57de2e1 PiperOrigin-RevId: 747815699
Imported from GitHub PR openxla/xla#25269 Reported issue: ``` exec ${PAGER:-/usr/bin/less} "$0" || exit 1 Executing tests from //xla/service:compiler_test_gpu_amd_any ----------------------------------------------------------------------------- Running test /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any.runfiles/xla/xla/service/compiler_test_gpu_amd_any --gtest_shuffle --gtest_fail_if_no_test_linked on GPU 0 ================================================================= ==168009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50400002c1c0 at pc 0x7f59e50b52e7 bp 0x7ffc8c2358d0 sp 0x7ffc8c2358c8 READ of size 8 at 0x50400002c1c0 thread T0 #0 0x7f59e50b52e6 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 #1 0x7f59e50b52e6 in absl::lts_20230802::container_internal::probe(absl::lts_20230802::container_internal::CommonFields const&, unsigned long) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1298:41 #2 0x7f59e50b52e6 in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::find_or_prepare_insert<std::tuple<std::type_index, void*>>(std::tuple<std::type_index, void*> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2645:16 #3 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable::operator()<std::tuple<std::type_index, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(std::tuple<std::type_index, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>&&) const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2459:20 #4 0x7f59e50af8a8 in decltype(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>()(std::declval<std::tuple<std::type_index, void*>&& const&>(), std::piecewise_construct, std::declval<std::tuple<std::tuple<std::type_index, void*>&&>>(), std::declval<std::tuple<stream_executor::MultiKernelLoaderSpec&&>>())) absl::lts_20230802::container_internal::memory_internal::DecomposePairImpl<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::tuple<std::type_index, void*>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:140:10 #5 0x7f59e50af8a8 in decltype(memory_internal::DecomposePairImpl(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), PairArgs(std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0)))) absl::lts_20230802::container_internal::DecomposePair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:207:10 #6 0x7f59e50af8a8 in decltype(absl::container_internal::DecomposePair(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(), std::declval<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>())) absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/flat_hash_map.h:591:12 #7 0x7f59e50af8a8 in decltype(absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0))) absl::lts_20230802::container_internal::hash_policy_traits<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, void>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/hash_policy_traits.h:134:12 #8 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::emplace<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, 0>(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2064:12 #9 0x7f59e50af8a8 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::insert(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1991:12 #10 0x7f59e50af8a8 in stream_executor::gpu::GpuKernelRegistry::RegisterKernel(std::type_info const&, void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/gpu_kernel_registry.cc:67:45 #11 0x7f59e50d1982 in absl::lts_20230802::Status stream_executor::gpu::GpuKernelRegistry::RegisterKernel<stream_executor::gpu::MakeBatchPointersKernel>(void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/gpu/gpu_kernel_registry.h:86:12 #12 0x7f59e50d1982 in RegisterKernelMakeBatchPointersKernelRocmImpl() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #13 0x7f59e50d1982 in 'lambda'()::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #14 0x7f59e50d1982 in 'lambda'()::__invoke() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #15 0x7f59e50d1982 in stream_executor::port::Initializer::Initializer(void (*)()) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/platform/default/initialize.h:26:42 #16 0x7f59e50d1982 in __cxx_global_var_init.1 /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #17 0x7f59e50d1982 in _GLOBAL__sub_I_make_batch_pointers_kernel_rocm.cu.cc /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc #18 0x7f5a5b27a47d in call_init elf/dl-init.c:70:3 #19 0x7f5a5b27a567 in call_init elf/dl-init.c:33:6 #20 0x7f5a5b27a567 in _dl_init elf/dl-init.c:117:5 #21 0x7f5a5b2942c9 (/lib64/ld-linux-x86-64.so.2+0x202c9) (BuildId: e4de036b19e4768e7591b596c4be9f9015f2d28a) 0x50400002c1c0 is located 8 bytes after 40-byte region [0x50400002c190,0x50400002c1b8) allocated by thread T0 here: #0 0x557d0f77fcdf in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any+0x1e8cdf) (BuildId: e96972f8c7f880083ff6ad5985d3c06d) #1 0x7f59d733098b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2) SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const Shadow bytes around the buggy address: 0x50400002bf00: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa 0x50400002bf80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa 0x50400002c000: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa 0x50400002c080: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 00 0x50400002c100: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00 =>0x50400002c180: fa fa 00 00 00 00 00 fa[fa]fa fa fa fa fa fa fa 0x50400002c200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==168009==ABORTING ``` Why this fixes the issue: * Consider compiling this class into a different .so files where this function will get inlined and we will get different instances while we still want to have singleton. * In rocm compiler wrapper script we do not yet support sanitizer flags so our cu.cc files are not getting instrumented while our normal cc files do! This might cause a memory disalignment while running with asan (theory). Copybara import of the project: -- ffcd58918137191cdba6db571e0e5af0e57de2e1 by alekstheod <atheodor@amd.com>: Fix asan issue do to a singleton in header file Merging this change closes #25269 PiperOrigin-RevId: 747823659
Imported from GitHub PR openxla/xla#25269 Reported issue: ``` exec ${PAGER:-/usr/bin/less} "$0" || exit 1 Executing tests from //xla/service:compiler_test_gpu_amd_any ----------------------------------------------------------------------------- Running test /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any.runfiles/xla/xla/service/compiler_test_gpu_amd_any --gtest_shuffle --gtest_fail_if_no_test_linked on GPU 0 ================================================================= ==168009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50400002c1c0 at pc 0x7f59e50b52e7 bp 0x7ffc8c2358d0 sp 0x7ffc8c2358c8 READ of size 8 at 0x50400002c1c0 thread T0 #0 0x7f59e50b52e6 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 #1 0x7f59e50b52e6 in absl::lts_20230802::container_internal::probe(absl::lts_20230802::container_internal::CommonFields const&, unsigned long) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1298:41 #2 0x7f59e50b52e6 in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::find_or_prepare_insert<std::tuple<std::type_index, void*>>(std::tuple<std::type_index, void*> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2645:16 #3 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable::operator()<std::tuple<std::type_index, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(std::tuple<std::type_index, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>&&) const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2459:20 #4 0x7f59e50af8a8 in decltype(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>()(std::declval<std::tuple<std::type_index, void*>&& const&>(), std::piecewise_construct, std::declval<std::tuple<std::tuple<std::type_index, void*>&&>>(), std::declval<std::tuple<stream_executor::MultiKernelLoaderSpec&&>>())) absl::lts_20230802::container_internal::memory_internal::DecomposePairImpl<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::tuple<std::type_index, void*>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:140:10 #5 0x7f59e50af8a8 in decltype(memory_internal::DecomposePairImpl(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), PairArgs(std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0)))) absl::lts_20230802::container_internal::DecomposePair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:207:10 #6 0x7f59e50af8a8 in decltype(absl::container_internal::DecomposePair(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(), std::declval<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>())) absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/flat_hash_map.h:591:12 #7 0x7f59e50af8a8 in decltype(absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0))) absl::lts_20230802::container_internal::hash_policy_traits<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, void>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/hash_policy_traits.h:134:12 #8 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::emplace<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, 0>(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2064:12 #9 0x7f59e50af8a8 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::insert(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1991:12 #10 0x7f59e50af8a8 in stream_executor::gpu::GpuKernelRegistry::RegisterKernel(std::type_info const&, void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/gpu_kernel_registry.cc:67:45 #11 0x7f59e50d1982 in absl::lts_20230802::Status stream_executor::gpu::GpuKernelRegistry::RegisterKernel<stream_executor::gpu::MakeBatchPointersKernel>(void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/gpu/gpu_kernel_registry.h:86:12 #12 0x7f59e50d1982 in RegisterKernelMakeBatchPointersKernelRocmImpl() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #13 0x7f59e50d1982 in 'lambda'()::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #14 0x7f59e50d1982 in 'lambda'()::__invoke() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #15 0x7f59e50d1982 in stream_executor::port::Initializer::Initializer(void (*)()) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/platform/default/initialize.h:26:42 #16 0x7f59e50d1982 in __cxx_global_var_init.1 /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1 #17 0x7f59e50d1982 in _GLOBAL__sub_I_make_batch_pointers_kernel_rocm.cu.cc /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc #18 0x7f5a5b27a47d in call_init elf/dl-init.c:70:3 #19 0x7f5a5b27a567 in call_init elf/dl-init.c:33:6 #20 0x7f5a5b27a567 in _dl_init elf/dl-init.c:117:5 #21 0x7f5a5b2942c9 (/lib64/ld-linux-x86-64.so.2+0x202c9) (BuildId: e4de036b19e4768e7591b596c4be9f9015f2d28a) 0x50400002c1c0 is located 8 bytes after 40-byte region [0x50400002c190,0x50400002c1b8) allocated by thread T0 here: #0 0x557d0f77fcdf in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any+0x1e8cdf) (BuildId: e96972f8c7f880083ff6ad5985d3c06d) #1 0x7f59d733098b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2) SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f36 732D 7074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const Shadow bytes around the buggy address: 0x50400002bf00: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa 0x50400002bf80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa 0x50400002c000: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa 0x50400002c080: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 00 0x50400002c100: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00 =>0x50400002c180: fa fa 00 00 00 00 00 fa[fa]fa fa fa fa fa fa fa 0x50400002c200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x50400002c400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==168009==ABORTING ``` Why this fixes the issue: * Consider compiling this class into a different .so files where this function will get inlined and we will get different instances while we still want to have singleton. * In rocm compiler wrapper script we do not yet support sanitizer flags so our cu.cc files are not getting instrumented while our normal cc files do! This might cause a memory disalignment while running with asan (theory). Copybara import of the project: -- ffcd58918137191cdba6db571e0e5af0e57de2e1 by alekstheod <atheodor@amd.com>: Fix asan issue do to a singleton in header file Merging this change closes #25269 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#25269 from ROCm:ci_fix_singleton_in_header_file ffcd58918137191cdba6db571e0e5af0e57de2e1 PiperOrigin-RevId: 747833213
Has the g3doc specification been publicly released? Is an effort going to be made so that outside contributors know exactly how comments should be written in order to be correctly parsed?
The text was updated successfully, but these errors were encountered: