Description
Dear clspv team,
We ran a outputSortedEven algorithm using clspv and it seems like there are issues with clspv .
Algorithm explanation:
input: array of integers
output: sorted array (contains only even numbers)
Explanation: We're trying to sort the array in ascending order and all the elements in output array must be even numbers, so for example A=[8, 2, 7, 1, 4] is input array then output array will be Output=[0, 2, 4, 6, 8],
after sorting the array we're converting all the odd elements of the array into even by subtracting 1 from them.
Also we are trying to store the indexes after sorting the array.
original outputSortedEven : https://godbolt.org/z/xs1q3se5v --> Wrong output
modified outputSortedEven kernel with volatile added for is_repeating boolean variable : https://godbolt.org/z/a96M3beYs --> Correct output
Input: 8, 1, 3, 7, 11, 13
Correct Output:
Output value array: 0, 2, 6, 8, 10, 12
Output index array: 1, 2, 3, 0, 4, 5
Output we are getting with clspv(1475) + llvm(134094):
Output value array: 0, 2, 2147483647, 2147483647, 2147483647, 2147483647
Output index array: 1, 2, 2, 2, 2, 2
clspv top commit used by us: 0e20b28
0e20b28 (HEAD) Fix SimplifyPointerBitcastPass hang on const GEP GVs (#1475)
0f171d7 Support multiple spirvop in the same kernel (#1477)
llvm src top commit used by:
7baa7edc00c5 (HEAD) [libclc]: clspv: add a dummy implementation for mul_hi (#134094)
edc22c64e527 [X86] getFauxShuffleMask - only handle VTRUNC nodes with matching src/dst sizes (#134161)
The problem is llvm InstructionCombine and JumpThreading Pass are aggressively optimizing the is_repeating variable used in the algorithm.
Current observation:
clspv O2 optimization level -> Issue
clspv O1 optimization level -> issue
clspv O0 optimization level -> No Issue
clspv O2 optimization level + llvm InstructionCombine Pass Disable -> Issue
clspv O2 optimization level + llvm InstructionCombine Pass + JumpThreading Pass Disabled -> No Issue
so our point is from clspv side do we need to call the llvm passes in a different order or
we need to create a custom pass in clspv which will determine that this cl kernel doesn't need to go through InstructionCombine and JumpThreading passes.
If you want llvm dumps or anything more data please let me know.