8000 Tags · jbaileyhandle/llvm-project · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: jbaileyhandle/llvm-project

Tags

rocm-5.1.0

Toggle rocm-5.1.0's commit message
[SWDEV-324968] - Fixes libffi cmake issue on centos/sles.

Recently, a change was introduced where the cmake
searches for libffi.a to fix versioning issues
with 18.04/20.04. There is no libffi static archive
on centos/sles so we need to add ffi as a fallback
in order to find libff.so.

Change-Id: Ia684e48fc19de4d9769e83d5fbfc26ece9e6db88

rocm-5.0.2

Toggle rocm-5.0.2's commit message
SWDEV-321398: replace hostcall module flag with function attribute

This internal version is currently a squash of four upstream reviews:

1. D119087: [AMDGPU] [NFC] refactor the AMDGPU attributor
2. D119308: [AMDGPU] [NFC] Fix incorrect use of bitwise operator.
3. D119249: [Attributor][NFC] Expose new API in AAPointerInfo
4. D119216: [AMDGPU] replace hostcall module flag with function attribute

Of these ROCm#1, ROCm#2 and ROCm#3 are submitted in upstream/main, while ROCm#4 is
under review.

The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Change-Id: I6cc12050602c3f477575c3ca09a883797169e9e3

rocm-5.0.1

Toggle rocm-5.0.1's commit message
[CUDA][HIP] Do not treat host var address as constant in device compi…

…lation

Currently clang treats host var address as constant in device compilation,
which causes const vars initialized with host var address promoted to
device variables incorrectly and results in undefined symbols.

This patch fixes that.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D118153

Fixes: SWDEV-309881

Change-Id: I0a69357063c6f8539ef259c96c250d04615f4473

rocm-5.0.0

Toggle rocm-5.0.0's commit message
[CUDA][HIP] Do not treat host var address as constant in device compi…

…lation

Currently clang treats host var address as constant in device compilation,
which causes const vars initialized with host var address promoted to
device variables incorrectly and results in undefined symbols.

This patch fixes that.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D118153

Fixes: SWDEV-309881

Change-Id: I0a69357063c6f8539ef259c96c250d04615f4473

rocm-4.5.2

Toggle rocm-4.5.2's commit message
Allow to use a whole register file on gfx90a for VGPRs

In a kernel which does not have calls or AGPR usage we can allocate
the whole vector register budget for VGPRs and have no AGPRs as
long as VGPRs stay addressable (i.e. below 256).

Patch by: Stanislav Mekhanoshin

Change-Id: I2ea6eea58a449cf12368a37af18a892220c6e23b

rocm-4.5.0

Toggle rocm-4.5.0's commit message
[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hos…

…tcall_internal() declaration.

The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument.
This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf.

Differential revision: https://reviews.llvm.org/D110337

[AMDGPU] Correction to 095c48f.

Differential Revision: https://reviews.llvm.org/D110337

Change-Id: I5eb847884f4cb98687dcfdef85f78d2d2c380bcd

rocm-4.3.1

Toggle rocm-4.3.1's commit message
Revert "Turn on the new pass manager by default"

This reverts commit 669ddd1.

Un-XFAIL one test

Change-Id: Ieebd1fa4a1457970fb174b897c8223557f675b51

rocm-4.3.0

Toggle rocm-4.3.0's commit message
[HIP] Defer operator overloading errors

Although clang is able to defer overloading resolution
diagnostics for common functions. It does not defer
overloading resolution caused diagnostics for overloaded
operators.

This patch extends the existing deferred
diagnostic mechanism and defers a diagnostic caused
by overloaded operator.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D104505

Fixes: SWDEV-236370

Change-Id: I0ff9ef18f30820112182c5ff94ccba26c3b2b914

rocm-4.2.0

Toggle rocm-4.2.0's commit message
[AMDGPU] Mark scavenged SGPR as used

Otherwise it reuses the same register for storing the stack slot
offset if the stack slot offset is big.

Differential Revision: https://reviews.llvm.org/D100461

Change-Id: I57e764c66e0e8c72e5d8e241de194333b6e2d3ff

rocm-4.1.1

Toggle rocm-4.1.1's commit message
[AMDGPU] ds_read_*/ds_write_* operations require strict alignment.

Due to performance reasons, ds_read_*/ds_write_* operations require
strict alignment. Avoid selecting them in under-aligned situations
irrespective of whether "unligned access mode" is enabled or not.

Change-Id: Ibe648cf663eb80365cff0e456e69a813c7e55aa2
0