Tags: jbaileyhandle/llvm-project
Tags
[SWDEV-324968] - Fixes libffi cmake issue on centos/sles. Recently, a change was introduced where the cmake searches for libffi.a to fix versioning issues with 18.04/20.04. There is no libffi static archive on centos/sles so we need to add ffi as a fallback in order to find libff.so. Change-Id: Ia684e48fc19de4d9769e83d5fbfc26ece9e6db88
SWDEV-321398: replace hostcall module flag with function attribute This internal version is currently a squash of four upstream reviews: 1. D119087: [AMDGPU] [NFC] refactor the AMDGPU attributor 2. D119308: [AMDGPU] [NFC] Fix incorrect use of bitwise operator. 3. D119249: [Attributor][NFC] Expose new API in AAPointerInfo 4. D119216: [AMDGPU] replace hostcall module flag with function attribute Of these ROCm#1, ROCm#2 and ROCm#3 are submitted in upstream/main, while ROCm#4 is under review. The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Change-Id: I6cc12050602c3f477575c3ca09a883797169e9e3
[CUDA][HIP] Do not treat host var address as constant in device compi… …lation Currently clang treats host var address as constant in device compilation, which causes const vars initialized with host var address promoted to device variables incorrectly and results in undefined symbols. This patch fixes that. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D118153 Fixes: SWDEV-309881 Change-Id: I0a69357063c6f8539ef259c96c250d04615f4473
[CUDA][HIP] Do not treat host var address as constant in device compi… …lation Currently clang treats host var address as constant in device compilation, which causes const vars initialized with host var address promoted to device variables incorrectly and results in undefined symbols. This patch fixes that. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D118153 Fixes: SWDEV-309881 Change-Id: I0a69357063c6f8539ef259c96c250d04615f4473
Allow to use a whole register file on gfx90a for VGPRs In a kernel which does not have calls or AGPR usage we can allocate the whole vector register budget for VGPRs and have no AGPRs as long as VGPRs stay addressable (i.e. below 256). Patch by: Stanislav Mekhanoshin Change-Id: I2ea6eea58a449cf12368a37af18a892220c6e23b
[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hos… …tcall_internal() declaration. The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument. This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf. Differential revision: https://reviews.llvm.org/D110337 [AMDGPU] Correction to 095c48f. Differential Revision: https://reviews.llvm.org/D110337 Change-Id: I5eb847884f4cb98687dcfdef85f78d2d2c380bcd
Revert "Turn on the new pass manager by default" This reverts commit 669ddd1. Un-XFAIL one test Change-Id: Ieebd1fa4a1457970fb174b897c8223557f675b51
[HIP] Defer operator overloading errors Although clang is able to defer overloading resolution diagnostics for common functions. It does not defer overloading resolution caused diagnostics for overloaded operators. This patch extends the existing deferred diagnostic mechanism and defers a diagnostic caused by overloaded operator. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D104505 Fixes: SWDEV-236370 Change-Id: I0ff9ef18f30820112182c5ff94ccba26c3b2b914
[AMDGPU] Mark scavenged SGPR as used Otherwise it reuses the same register for storing the stack slot offset if the stack slot offset is big. Differential Revision: https://reviews.llvm.org/D100461 Change-Id: I57e764c66e0e8c72e5d8e241de194333b6e2d3ff
[AMDGPU] ds_read_*/ds_write_* operations require strict alignment. Due to performance reasons, ds_read_*/ds_write_* operations require strict alignment. Avoid selecting them in under-aligned situations irrespective of whether "unligned access mode" is enabled or not. Change-Id: Ibe648cf663eb80365cff0e456e69a813c7e55aa2
PreviousNext