|
CUB
|
DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within global memory.
unsigned char, int, double, etc. Although the direct radix sorting method can only be applied to unsigned integral types, DeviceRadixSort is able to sort signed and floating-point types via simple bit-wise transformations that ensure lexicographic key ordering.CUB_CDP macro in your compiler's macro definitions.uint32 keys. Performance plots for other scenarios can be found in the detailed method descriptions below.
Definition at line 81 of file device_radix_sort.cuh.
Static Public Methods | |
Key-value pairs | |
| template<typename Key , typename Value > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | SortPairs (void *d_temp_storage, size_t &temp_storage_bytes, Key *d_keys_in, Key *d_keys_out, Value *d_values_in, Value *d_values_out, int num_items, int begin_bit=0, int end_bit=sizeof(Key)*8, cudaStream_t stream=0, bool debug_synchronous=false) |
| Sorts key-value pairs into ascending order. (~2N auxiliary storage required) More... | |
| template<typename Key , typename Value > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | SortPairs (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< Key > &d_keys, DoubleBuffer< Value > &d_values, int num_items, int begin_bit=0, int end_bit=sizeof(Key)*8, cudaStream_t stream=0, bool debug_synchronous=false) |
| Sorts key-value pairs into ascending order. (~N auxiliary storage required) More... | |
| template<typename Key , typename Value > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | SortPairsDescending (void *d_temp_storage, size_t &temp_storage_bytes, Key *d_keys_in, Key *d_keys_out, Value *d_values_in, Value *d_values_out, int num_items, int begin_bit=0, int end_bit=sizeof(Key)*8, cudaStream_t stream=0, bool debug_synchronous=false) |
| Sorts key-value pairs into descending order. (~2N auxiliary storage required). More... | |
| template<typename Key , typename Value > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | SortPairsDescending (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< Key > &d_keys, DoubleBuffer< Value > &d_values, int num_items, int begin_bit=0, int end_bit=sizeof(Key)*8, cudaStream_t stream=0, bool debug_synchronous=false) |
| Sorts key-value pairs into descending order. (~N auxiliary storage required). More... | |
Keys-only | |
| template<typename Key > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | SortKeys (void *d_temp_storage, size_t &temp_storage_bytes, Key *d_keys_in, Key *d_keys_out, int num_items, int begin_bit=0, int end_bit=sizeof(Key)*8, cudaStream_t stream=0, bool debug_synchronous=false) |
| Sorts keys into ascending order. (~2N auxiliary storage required) More... | |
| template<typename Key > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | SortKeys (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< Key > &d_keys, int num_items, int begin_bit=0, int end_bit=sizeof(Key)*8, cudaStream_t stream=0, bool debug_synchronous=false) |
| Sorts keys into ascending order. (~N auxiliary storage required). More... | |
| template<typename Key > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | SortKeysDescending (void *d_temp_storage, size_t &temp_storage_bytes, Key *d_keys_in, Key *d_keys_out, int num_items, int begin_bit=0, int end_bit=sizeof(Key)*8, cudaStream_t stream=0, bool debug_synchronous=false) |
| Sorts keys into descending order. (~2N auxiliary storage required). More... | |
| template<typename Key > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | SortKeysDescending (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< Key > &d_keys, int num_items, int begin_bit=0, int end_bit=sizeof(Key)*8, cudaStream_t stream=0, bool debug_synchronous=false) |
| Sorts keys into descending order. (~N auxiliary storage required). More... | |
|
inlinestatic |
Sorts key-value pairs into ascending order. (~2N auxiliary storage required)
[begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.N+P), where N is the length of the input and P is the number of streaming multiprocessors on the device. For sorting using only O(P) temporary storage, see the sorting interface using DoubleBuffer wrappers below.d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.uint32,uint32 and uint64,uint64 pairs, respectively.
int keys with associated vector of int values. | Key | [inferred] Key type |
| Value | [inferred] Value type |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_keys_in | Pointer to the input data of key data to sort |
| [out] | d_keys_out | Pointer to the sorted output sequence of key data |
| [in] | d_values_in | Pointer to the corresponding input sequence of associated value items |
| [out] | d_values_out | Pointer to the correspondingly-reordered output sequence of associated value items |
| [in] | num_items | Number of items to reduce |
| [in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
| [in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 147 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts key-value pairs into ascending order. (~N auxiliary storage required)
[begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P), where P is the number of streaming multiprocessors on the device (and is typically a small constant relative to the input size N).d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.uint32,uint32 and uint64,uint64 pairs, respectively.
int keys with associated vector of int values. | Key | [inferred] Key type |
| Value | [inferred] Value type |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in,out] | d_keys | Reference to the double-buffer of keys whose "current" buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
| [in,out] | d_values | Double-buffer of values whose "current" buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
| [in] | num_items | Number of items to reduce |
| [in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
| [in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 248 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts key-value pairs into descending order. (~2N auxiliary storage required).
[begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.N+P), where N is the length of the input and P is the number of streaming multiprocessors on the device. For sorting using only O(P) temporary storage, see the sorting interface using DoubleBuffer wrappers below.d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.int keys with associated vector of int values. | Key | [inferred] Key type |
| Value | [inferred] Value type |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_keys_in | Pointer to the input data of key data to sort |
| [out] | d_keys_out | Pointer to the sorted output sequence of key data |
| [in] | d_values_in | Pointer to the corresponding input sequence of associated value items |
| [out] | d_values_out | Pointer to the correspondingly-reordered output sequence of associated value items |
| [in] | num_items | Number of items to reduce |
| [in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
| [in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 328 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts key-value pairs into descending order. (~N auxiliary storage required).
[begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P), where P is the number of streaming multiprocessors on the device (and is typically a small constant relative to the input size N).d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.int keys with associated vector of int values. | Key | [inferred] Key type |
| Value | [inferred] Value type |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in,out] | d_keys | Reference to the double-buffer of keys whose "current" buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
| [in,out] | d_values | Double-buffer of values whose "current" buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
| [in] | num_items | Number of items to reduce |
| [in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
| [in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 424 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts keys into ascending order. (~2N auxiliary storage required)
[begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.N+P), where N is the length of the input and P is the number of streaming multiprocessors on the device. For sorting using only O(P) temporary storage, see the sorting interface using DoubleBuffer wrappers below.d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.uint32 and uint64 keys, respectively.
int keys. | Key | [inferred] Key type |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_keys_in | Pointer to the input data of key data to sort |
| [out] | d_keys_out | Pointer to the sorted output sequence of key data |
| [in] | num_items | Number of items to reduce |
| [in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
| [in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 506 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts keys into ascending order. (~N auxiliary storage required).
[begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P), where P is the number of streaming multiprocessors on the device (and is typically a small constant relative to the input size N).d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.uint32 and uint64 keys, respectively.
int keys. | Key | [inferred] Key type |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in,out] | d_keys | Reference to the double-buffer of keys whose "current" buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
| [in] | num_items | Number of items to reduce |
| [in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
| [in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 595 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts keys into descending order. (~2N auxiliary storage required).
[begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.N+P), where N is the length of the input and P is the number of streaming multiprocessors on the device. For sorting using only O(P) temporary storage, see the sorting interface using DoubleBuffer wrappers below.d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.int keys. | Key | [inferred] Key type |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_keys_in | Pointer to the input data of key data to sort |
| [out] | d_keys_out | Pointer to the sorted output sequence of key data |
| [in] | num_items | Number of items to reduce |
| [in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
| [in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 670 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts keys into descending order. (~N auxiliary storage required).
[begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P), where P is the number of streaming multiprocessors on the device (and is typically a small constant relative to the input size N).d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.int keys. | Key | [inferred] Key type |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in,out] | d_keys | Reference to the double-buffer of keys whose "current" buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
| [in] | num_items | Number of items to reduce |
| [in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
| [in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 754 of file device_radix_sort.cuh.
1.8.4