8000 RPP Tensor Support - Snow on HOST and HIP by Dineshbabu-Ravichandran · Pull Request #347 · r-abishek/rpp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

RPP Tensor Support - Snow on HOST and HIP #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: develop
Choose a base branch
from

Conversation

Dineshbabu-Ravichandran
  • Adds tensor support for Snow Augmentation optimized using AVX2 on HOST backend
  • Adds tensor support for Snow Augmentation on HIP backend
  • Adds unit and performance tests support for the Snow Augmentation in test suite

@Srihari-mcw
Copy link

I think this output should be on the unit tests default 150x150 image. Pls check once @Dineshbabu-Ravichandran
image

* \details The Snow augmentation does a modification of brightness on a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img640x480.png Sample Input

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same doubt here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* \param [in] brightnessCoefficient brightness modification parameter for snow calculation (1D tensor in HOST memory, of size batchSize with 1 < brightnessCoefficient[i] <= 4 for each image in batch)
* \param [in] snowThreshold threshold parameter for snow calculation (1D tensor in HOST memory, of size batchSize with 0 < snowThresholdTensor[i] <= 1 for each image in batch)
* \param [in] darkMode darkMode values to set dark mode on/off (1D tensor in HOST memory, of size batchSize, with darkModeTensor[i] = 0/1)
* \param [in] roiTensorSrc ROI data in HOST memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to roiTensorPtrSrc

* \param [in] brightnessCoefficient brightness modification parameter for snow calculation (1D tensor in pinned/HIP memory, of size batchSize with 1 < brightnessCoefficient[i] <= 4 for each image in batch)
* \param [in] snowThreshold threshold parameter for snow calculation (1D tensor in pinned/HIP memory, of size batchSize with 0 < snowThreshold[i] <= 1 for each image in batch)
* \param [in] darkMode darkMode values to set dark mode on/off (1D tensor in pinned/HIP memory, of size batchSize, with darkModeTensor[i] = 0/1)
* \param [in] roiTensorSrc ROI data in HIP memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to roiTensorPtrSrc

@@ -3135,6 +3138,217 @@ inline void compute_color_temperature_24_host(__m256 *p, __m256 pAdj)
p[2] = _mm256_sub_ps(p[2], pAdj); // color_temperature adjustment Bs
}

inline void compute_snow_host(RpptFloatRGB *pixel, Rpp32f brightnessCoefficient, Rpp32f snowCoefficient, Rpp32s darkMode)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions cannot be part of the host code for snow itself snow.hpp @Dineshbabu-Ravichandran @sampath1117 Pls share your thoughts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in warp perspective case I remember the helper functions to be part of same host file

pH = avx_p0; // hue = 0.0f;
pS = avx_p0; // sat = 0.0f;
pAdd = avx_p0; // add = 0.0f;
pL = _mm256_mul_ps(_mm256_add_ps(pCmax, pCmin), _mm256_set1_ps(0.5f)); // l = delta * 0.5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is wrong in this line

@@ -1555,6 +1574,13 @@ inline void rpp_load24_f32pln3_to_f32pln3_avx(Rpp32f *srcPtrR, Rpp32f *srcPtrG,
p[2] = _mm256_loadu_ps(srcPtrB);
}

inline void rpp_load24_f16pln3_to_f32pln3_avx(Rpp16f *srcPtrR, Rpp16f *srcPtrG, Rpp16f *srcPtrB, __m256 *p)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls move this function before rpp_load24_f16pln3_to_f32pln3_avx as in threshold implementation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1647,6 +1673,11 @@ inline void rpp_load8_f32_to_f32_avx(Rpp32f *srcPtr, __m256 *p)
p[0] = _mm256_loadu_ps(srcPtr);
}

inline void rpp_load8_f16_to_f32_avx(Rpp16f *srcPtr, __m256 *p)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls move this function before rpp_load8_f32_to_f64_avx


int globalThreads_x = (dstDescPtr->strides.hStride + 7) >> 3;
int globalThreads_y = dstDescPtr->h;
int globalThreads_z = handle.GetBatchSize();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int globalThreads_z = dstDescPtr->n; maybe could be used here?

}
else if ((srcDescPtr->dataType == RpptDataType::F16) && (dstDescPtr->dataType == RpptDataType::F16))
{
snow_f16_f16_host_tensor((Rpp16f*) (static_cast<Rpp8u*>(srcPtr) + srcDescPtr->offsetInBytes),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls do reinterpret_cast<Rpp16f*> here

}
else if ((srcDescPtr->dataType == RpptDataType::F32) && (dstDescPtr->dataType == RpptDataType::F32))
{
snow_f32_f32_host_tensor((Rpp32f*) (static_cast<Rpp8u*>(srcPtr) + srcDescPtr->offsetInBytes),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls do reinterpret_cast<Rpp32f*> here

}
else if ((srcDescPtr->dataType == RpptDataType::F16) && (dstDescPtr->dataType == RpptDataType::F16))
{
hip_exec_snow_tensor((half*) (static_cast<Rpp8u*>(srcPtr) + srcDescPtr->offsetInBytes),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reinterpret_cast<half*> here

}
else if ((srcDescPtr->dataType == RpptDataType::F32) && (dstDescPtr->dataType == RpptDataType::F32))
{
hip_exec_snow_tensor((Rpp32f*) (static_cast<Rpp8u*>(srcPtr) + srcDescPtr->offsetInBytes),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use reinerpret_cast

@Dineshbabu-Ravichandran Dineshbabu-Ravichandran changed the base branch from develop to master September 26, 2024 11:13
@Dineshbabu-Ravichandran Dineshbabu-Ravichandran changed the base branch from master to develop September 26, 2024 11:13
Copy link
@HazarathKumarM HazarathKumarM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dineshbabu-Ravichandran please resolve the comments

}
else if ((srcDescPtr->layout == RpptLayout::NCHW) && (dstDescPtr->layout == RpptLayout::NHWC))
{
globalThreads_x = (srcDescPtr->strides.hStride + 7) >> 3;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is repeated , the same code is there in L347

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the line L406.

if (roiType == RpptRoiType::LTRB)
hip_exec_roi_converison_ltrb_to_xywh(roiTensorPtrSrc, handle);


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the empty line here

rpp_hip_load8_and_unpack_to_float8(srcPtr + srcIdx, &pix_f8);
snow_hip_compute(srcPtr, &pix_f8, brightnessCoefficient, snowThreshold, darkMode);
rpp_hip_pack_float8_and_store8(dstPtr + dstIdx, &pix_f8);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the empty line


__device__ __forceinline__ void snow_8RGB_hip_compute(d_float24 *pix_f24, float *brightnessCoefficient, float *snowThreshold, int *darkMode)
{
snow_1RGB_hip_compute(&(pix_f24->f1[ 0]), &(pix_f24->f1[ 8]), &(pix_f24->f1[16]), brightnessCoefficient, snowThreshold, darkMode);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the extra space inside [] brackets


__device__ __forceinline__ void snow_1GRAY_hip_compute(float *pixel, float *brightnessCoefficient, float *snowThreshold, int *darkMode)
{
float l = *pixel;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use some meaningful variable name instead of 'l'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chnaged l to lightness.

l = l * fmaf((brightnessFactor - 1.0f), (1.0f - (l - lower_threshold) / (upper_threshold - lower_threshold)), 1.0f);
}
// Modify L
if(l <= *snowThreshold)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the {} brackets for this IF statement

pixel->R = hueCoefficient[0];
pixel->G = hueCoefficient[1];
pixel->B = hueCoefficient[2];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please the empty line here

__device__ __forceinline__ void snow_1GRAY_hip_compute(float *pixel, float *brightnessCoefficient, float *snowThreshold, int *darkMode)
{
float l = *pixel;
float lower_threshold = 0.0f;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use snake case for variable names, use only camel case


// Modify L
if(l <= *snowThreshold && !((hue >= 0.514f && hue <= 0.63f) && (sat >= 0.196f) && (l >= 0.196f)))
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove {} brackets if there is only one line inside conditional statements or loops,

Please check all such instances in this PR and remove the brackets

*pixelG = rgb_f4.y;
*pixelB = rgb_f4.z;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the empty line here

* Make initial changes for raw CPP version of warp perspective

* Fix calls to compute_warp_perspective_src_loc function

* Update changes to go through nearest neighbours case

* AVX HOST codes for warp perspective initial

* Fixes for accuracy in warp perspective

* More fixes for accuracy in warp perspective

* Update the cide for AVX version of Planar to Planar

* Add bilinear u8 host code for warp perspective

* Make updates to include functions for F32 data type

* Make updates to use cast instead of set and fix issues with raw C implementation

* Add i8 host codes

* Add updates for F16 Bilinear Code

* Update the initial HIP code for warp perspective

* Update fixes for HIP code

* Add Warp Perspective Nearest Neighbors F16 code for PKD3_to_PLN3 and PLN3_to_PLN3

* Add updates for PLN to PLN configuration

* Add updates for PKD3 to PKD3 case

* Rename variables

* Update changes to log images separately for Bilinear and Nearest Neighbors

* fixed bug in raw c code of PKD-PLN variant

* minor bug fix for F16 PLN variants

* minor fixes in HOST test suite

* Update the HIP code for review comments and refactoring of device functions

* Update the comments alignment

* Rename functions and add cases in HOST and HIP runTests.py

* Update indentations for compuatations and rename vectors

* Update documentations and add more reference variables

* Make more formatting changes

* Make further updates by including test cases

* Make updates to use reinterpret cast

* Update reinterpret casts for PLN to PLN configuration u8 and i8 codes

* Make updates to enclose code inside AVX2 flag

* Make further changes to update type casting

* Update the version

* Make updates to add warp perspective image

* Modify comments, update CHANGELOG and update flags

* Update further comments in warp perspective

* Add more comments for warp perspective

* Update based on further review comments

* Update the case number for warp_perspective in common.py

* Address review comments

* Make initial changes for raw CPP version of warp perspective

* Fix calls to compute_warp_perspective_src_loc function

* Update changes to go through nearest neighbours case

* AVX HOST codes for warp perspective initial

* Fixes for accuracy in warp perspective

* More fixes for accuracy in warp perspective

* Update the cide for AVX version of Planar to Planar

* Add bilinear u8 host code for warp perspective

* Make updates to include functions for F32 data type

* Make updates to use cast instead of set and fix issues with raw C implementation

* Add i8 host codes

* Add updates for F16 Bilinear Code

* Update the initial HIP code for warp perspective

* Update fixes for HIP code

* Add Warp Perspective Nearest Neighbors F16 code for PKD3_to_PLN3 and PLN3_to_PLN3

* Add updates for PLN to PLN configuration

* Add updates for PKD3 to PKD3 case

* Rename variables

* Update changes to log images separately for Bilinear and Nearest Neighbors

* fixed bug in raw c code of PKD-PLN variant

* minor bug fix for F16 PLN variants

* minor fixes in HOST test suite

* Update the HIP code for review comments and refactoring of device functions

* Update the comments alignment

* Rename functions and add cases in HOST and HIP runTests.py

* Update indentations for compuatations and rename vectors

* Update documentations and add more reference variables

* Make more formatting changes

* Make further updates by including test cases

* Make updates to use reinterpret cast

* Update reinterpret casts for PLN to PLN configuration u8 and i8 codes

* Make updates to enclose code inside AVX2 flag

* Make further changes to update type casting

* Make updates to add warp perspective image

* Modify comments, update CHANGELOG and update flags

* Update further comments in warp perspective

* Add more comments for warp perspective

* Update based on further review comments

* Update the case number for warp_perspective in common.py

* Address review comments

* Fix conflits with warp perspective

* Update version details

* Merge branch 'ar/opt_warp_perspective' of https://github.com/r-abishek/rpp into opt_warp_perspective_rebased

* Update version to 1.9.10 including warp perspective

* Updates to convert to XYWH from LTRB instead of opposite

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Revert changes and convert to ltrb instead of xywh

---------

Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0