-
Notifications
You must be signed in to change notification settings - Fork 45
Median Filter on HOST and HIP #559
New issue 8000
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- Adds RPP Median Filter for HOST and HIP with support for U8/F16/F32/I8 and NCHW-NHWC toggle support
- Adds relevant QA/Unit/Performance tests
add doc images
templated median filter compute in HIP backend
RPP Median Filter on HOST and HIP
@r-abishek - can you resolve the merge conflicts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a median filter augmentation across HOST and HIP backends, with full support for U8/F16/F32/I8 data types and NCHW/NHWC layouts, and updates the test suite to exercise the new functionality.
- Registers
MEDIAN_FILTER
in the augmentation map, enums, and test-case sets. - Implements
rppt_median_filter_host
andrppt_median_filter_gpu
, plus the CPU/GPU kernel code and header declarations. - Updates test scripts and mappings (
common.py
, runImageTests.py) and adds QA/unit/performance tests for the new filter.
Reviewed Changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
utilities/test_suite/rpp_test_suite_image.h | Registers MEDIAN_FILTER and updates parameter/case sets |
utilities/test_suite/common.py | Adds median_filter mapping and updates filter category lists |
utilities/test_suite/HOST/runImageTests.py | Includes median_filter in kernel-size test loops |
utilities/test_suite/HOST/Tensor_image_host.cpp | Adds MEDIAN_FILTER switch case with host API call |
utilities/test_suite/HIP/runImageTests.py | Includes median_filter in HIP test loops |
utilities/test_suite/HIP/Tensor_image_hip.cpp | Adds MEDIAN_FILTER switch case with GPU API call |
src/modules/tensor/rppt_tensor_filter_augmentations.cpp | Implements host and GPU median filter functions |
src/modules/tensor/cpu/kernel/median_filter.cpp | Adds generic median filter kernel implementation |
src/include/tensor/host_tensor_executors.hpp | Declares median_filter_generic_host_tensor template |
src/include/tensor/hip_tensor_executors.hpp | Declares hip_exec_median_filter_tensor template |
api/rppt_tensor_filter_augmentations.h | Documents and prototypes median filter APIs |
Comments suppressed due to low confidence (1)
src/modules/tensor/cpu/kernel/median_filter.cpp:63
- C++ does not support variable-length arrays; consider using std::vector blockData(kernelSizeSquared * channels) or allocating a fixed-size buffer to avoid undefined behavior.
T blockData[kernelSizeSquared * channels];
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #559 +/- ##
===========================================
+ Coverage 87.60% 87.66% +0.05%
===========================================
Files 185 187 +2
Lines 78747 79249 +502
===========================================
+ Hits 68984 69467 +483
- Misses 9763 9782 +19
🚀 New features to boost your workflow:
|
@r-abishek - can you resolve the conflicts |
Rpp32u srcIdx = row * srcDescPtr->strides.hStride + col * srcDescPtr->strides.wStride; | ||
|
||
// Copy pixel values for all channels | ||
for (Rpp32s ch = 0; ch < channels; ch++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider optimizing this loop code try
"
memcpy(&blockData[index], &srcPtrTemp[srcIdx], channels);
index += channels;
"
OR with #pragma unroll before line #79. Try both and see which results in more perf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for (Rpp32s j = -padLength; j <= padLength; j++) | ||
{ | ||
// Clamp the row and column to image boundaries (nearest-neighbor padding) | ||
Rpp32s row = std::max(0, std::min(rowIdx + i, heightLimit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we compute the min and max outside the for loops since padlength is predefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
channelBlock[i] = blockData[i * channels + ch]; | ||
|
||
// Sort the data to compute median | ||
std::sort(channelBlock, channelBlock + kernelSizeSquared); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered using nth_element instead of std::sort? It is too slow
int mid = kernelSizeSquared / 2;
std::nth_element(channelBlock, channelBlock + mid, channelBlock + kernelSizeSquared);
uint8_t median = channelBlock[mid];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
T *dstPtrTemp = dstPtrRow; | ||
for(Rpp32s j = 0; j < roi.xywhROI.roiWidth; j++) | ||
{ | ||
median_filter_generic_tensor(srcPtrChannel, dstPtrTemp, i, j, kernelSizeSquared, padLength, roi.xywhROI.roiHeight - 1, roi.xywhROI.roiWidth - 1, 1, srcDescPtr, dstDescPtr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
median_filter_generic_tensor has to be optimized as per suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AryanSalmanpour : Can you please add your review for the HIP code. I just skimmed through it
// Sorting network for 3x3 (9 elements) median | ||
#define SWAP(i, j) if (window[i] > window[j]) { float tmp = window[i]; window[i] = window[j]; window[j] = tmp; } | ||
|
||
SWAP(1, 2); SWAP(4, 5); SWAP(7, 8); SWAP(0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is skipping alternate SWAPs. like SWAP(2, 3), SWAP(5,6) etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fixed sorting network for 9 elements. Some swaps like SWAP(2, 3) or SWAP(5, 6) are skipped because they're not needed — the sorting is still correct with fewer steps.
} | ||
else | ||
{ | ||
// Partial selection sort for median - sufficient to find median without full sorting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an approximate algorithm?
@r-abishek need the conflicts resolved to run CI |
Median Filter : Review comments and Conflicts resolution
@kiritigowda conflicts resolved |
// -------------------- median_filter device helpers -------------------- | ||
|
||
template<int kernelSize> | ||
__device__ float compute_median(float *window) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r-abishek We have a built-in function for computing the median: __builtin_amdgcn_fmed3f
. Please see if you can use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tried using the __builtin_amdgcn_fmed3f
function for median computation. However, this function can only compute the median of three values, whereas a 3×3 filter requires finding the median of nine values.
We attempted sequential calls with combinations of the 9 elements, but faced accuracy issues and were unable to match the golden outputs in QA tests.
This function provides a significant performance boost and may still be useful for approximate or relaxed-accuracy cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r-abishek @HazarathKumarM Please take a look at the MIVisionX HIP kernel, which utilizes the __builtin_amdgcn_fmed3f
function for computing a 3x3 median filter. https://github.com/ROCm/MIVisionX/blob/develop/amd_openvx/openvx/hipvx/filter_kernels.cpp#L508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AryanSalmanpour Thanks! This is working very well for 3x3 specifically. Testing and sending this and few more updates soon.
Unfortunately not for 5,7,9 sizes though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some comments are still not addressed