8000 Median Filter on HOST and HIP by r-abishek · Pull Request #559 · ROCm/rpp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Median Filter on HOST and HIP #559

New issue 8000

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from

Conversation

r-abishek
Copy link
Member
8000
  • Adds RPP Median Filter for HOST and HIP with support for U8/F16/F32/I8 and NCHW-NHWC toggle support
  • Adds relevant QA/Unit/Performance tests

@r-abishek r-abishek requested a review from Copilot June 4, 2025 04:34
@r-abishek r-abishek added enhancement New feature or request ci:precheckin labels Jun 4, 2025
Copilot

This comment was marked as outdated.

@kiritigowda kiritigowda self-assigned this Jun 4, 2025
@kiritigowda kiritigowda requested a review from rrawther June 4, 2025 04:40
@kiritigowda
Copy link
Collaborator

@r-abishek - can you resolve the merge conflicts?

@kiritigowda kiritigowda requested a review from Copilot June 6, 2025 22:27
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a median filter augmentation across HOST and HIP backends, with full support for U8/F16/F32/I8 data types and NCHW/NHWC layouts, and updates the test suite to exercise the new functionality.

  • Registers MEDIAN_FILTER in the augmentation map, enums, and test-case sets.
  • Implements rppt_median_filter_host and rppt_median_filter_gpu, plus the CPU/GPU kernel code and header declarations.
  • Updates test scripts and mappings (common.py, runImageTests.py) and adds QA/unit/performance tests for the new filter.

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
utilities/test_suite/rpp_test_suite_image.h Registers MEDIAN_FILTER and updates parameter/case sets
utilities/test_suite/common.py Adds median_filter mapping and updates filter category lists
utilities/test_suite/HOST/runImageTests.py Includes median_filter in kernel-size test loops
utilities/test_suite/HOST/Tensor_image_host.cpp Adds MEDIAN_FILTER switch case with host API call
utilities/test_suite/HIP/runImageTests.py Includes median_filter in HIP test loops
utilities/test_suite/HIP/Tensor_image_hip.cpp Adds MEDIAN_FILTER switch case with GPU API call
src/modules/tensor/rppt_tensor_filter_augmentations.cpp Implements host and GPU median filter functions
src/modules/tensor/cpu/kernel/median_filter.cpp Adds generic median filter kernel implementation
src/include/tensor/host_tensor_executors.hpp Declares median_filter_generic_host_tensor template
src/include/tensor/hip_tensor_executors.hpp Declares hip_exec_median_filter_tensor template
api/rppt_tensor_filter_augmentations.h Documents and prototypes median filter APIs
Comments suppressed due to low confidence (1)

src/modules/tensor/cpu/kernel/median_filter.cpp:63

  • C++ does not support variable-length arrays; consider using std::vector blockData(kernelSizeSquared * channels) or allocating a fixed-size buffer to avoid undefined behavior.
T blockData[kernelSizeSquared * channels];

r-abishek and others added 3 commits June 9, 2025 14:03
Copy link
codecov bot commented Jun 11, 2025

Codecov Report

Attention: Patch coverage is 99.20319% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...odules/tensor/rppt_tensor_filter_augmentations.cpp 97.17% 3 Missing ⚠️
src/modules/tensor/hip/kernel/median_filter.cpp 99.64% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #559      +/-   ##
===========================================
+ Coverage    87.60%   87.66%   +0.05%     
===========================================
  Files          185      187       +2     
  Lines        78747    79249     +502     
===========================================
+ Hits         68984    69467     +483     
- Misses        9763     9782      +19     
Files with missing lines Coverage Δ
src/modules/tensor/cpu/kernel/median_filter.cpp 100.00% <100.00%> (ø)
src/modules/tensor/hip/kernel/median_filter.cpp 99.64% <99.64%> (ø)
...odules/tensor/rppt_tensor_filter_augmentations.cpp 97.56% <97.17%> (-0.19%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kiritigowda
Copy link
Collaborator

@r-abishek - can you resolve the conflicts

Rpp32u srcIdx = row * srcDescPtr->strides.hStride + col * srcDescPtr->strides.wStride;

// Copy pixel values for all channels
for (Rpp32s ch = 0; ch < channels; ch++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider optimizing this loop code try
"
memcpy(&blockData[index], &srcPtrTemp[srcIdx], channels);
index += channels;
"
OR with #pragma unroll before line #79. Try both and see which results in more perf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (Rpp32s j = -padLength; j <= padLength; j++)
{
// Clamp the row and column to image boundaries (nearest-neighbor padding)
Rpp32s row = std::max(0, std::min(rowIdx + i, heightLimit));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we compute the min and max outside the for loops since padlength is predefined

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

channelBlock[i] = blockData[i * channels + ch];

// Sort the data to compute median
std::sort(channelBlock, channelBlock + kernelSizeSquared);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you considered using nth_element instead of std::sort? It is too slow

int mid = kernelSizeSquared / 2;
std::nth_element(channelBlock, channelBlock + mid, channelBlock + kernelSizeSquared);
uint8_t median = channelBlock[mid];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

T *dstPtrTemp = dstPtrRow;
for(Rpp32s j = 0; j < roi.xywhROI.roiWidth; j++)
{
median_filter_generic_tensor(srcPtrChannel, dstPtrTemp, i, j, kernelSizeSquared, padLength, roi.xywhROI.roiHeight - 1, roi.xywhROI.roiWidth - 1, 1, srcDescPtr, dstDescPtr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

median_filter_generic_tensor has to be optimized as per suggestions

Copy link
Contributor
@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AryanSalmanpour : Can you please add your review for the HIP code. I just skimmed through it

// Sorting network for 3x3 (9 elements) median
#define SWAP(i, j) if (window[i] > window[j]) { float tmp = window[i]; window[i] = window[j]; window[j] = tmp; }

SWAP(1, 2); SWAP(4, 5); SWAP(7, 8); SWAP(0, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is skipping alternate SWAPs. like SWAP(2, 3), SWAP(5,6) etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fixed sorting network for 9 elements. Some swaps like SWAP(2, 3) or SWAP(5, 6) are skipped because they're not needed — the sorting is still correct with fewer steps.

}
else
{
// Partial selection sort for median - sufficient to find median without full sorting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an approximate algorithm?

@kiritigowda
Copy link
Collaborator

@r-abishek need the conflicts resolved to run CI

@rrawther rrawther requested a review from AryanSalmanpour June 17, 2025 00:42
@r-abishek
Copy link
Member Author

@kiritigowda conflicts resolved

@kiritigowda kiritigowda requested a review from rrawther June 20, 2025 18:50
// -------------------- median_filter device helpers --------------------

template<int kernelSize>
__device__ float compute_median(float *window)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r-abishek We have a built-in function for computing the median: __builtin_amdgcn_fmed3f. Please see if you can use it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tried using the __builtin_amdgcn_fmed3f function for median computation. However, this function can only compute the median of three values, whereas a 3×3 filter requires finding the median of nine values.
We attempted sequential calls with combinations of the 9 elements, but faced accuracy issues and were unable to match the golden outputs in QA tests.
This function provides a significant performance boost and may still be useful for approximate or relaxed-accuracy cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r-abishek @HazarathKumarM Please take a look at the MIVisionX HIP kernel, which utilizes the __builtin_amdgcn_fmed3f function for computing a 3x3 median filter. https://github.com/ROCm/MIVisionX/blob/develop/amd_openvx/openvx/hipvx/filter_kernels.cpp#L508

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AryanSalmanpour Thanks! This is working very well for 3x3 specifically. Testing and sending this and few more updates soon.
Unfortunately not for 5,7,9 sizes though

Copy link
Contributor
@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some comments are still not addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:precheckin enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0