|
CUB
|
DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within global memory.
CUB_CDP macro in your compiler's macro definitions.int32 items, where 50% of the items are randomly selected.
int32 items where segments have lengths uniformly sampled from [1,1000].
Definition at line 82 of file device_select.cuh.
Static Public Methods | |
| template<typename InputIterator , typename FlagIterator , typename OutputIterator , typename NumSelectedIterator > | |
| CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t | Flagged (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_in, FlagIterator d_flags, OutputIterator d_out, NumSelectedIterator d_num_selected, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
Uses the d_flags sequence to selectively copy the corresponding items from d_in into d_out. The total number of items selected is written to d_num_selected.
. | |
| template<typename InputIterator , typename OutputIterator , typename NumSelectedIterator , typename SelectOp > | |
| CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t | If (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_in, OutputIterator d_out, NumSelectedIterator d_num_selected, int num_items, SelectOp select_op, cudaStream_t stream=0, bool debug_synchronous=false) |
Uses the select_op functor to selectively copy items from d_in into d_out. The total number of items selected is written to d_num_selected.
. | |
| template<typename InputIterator , typename OutputIterator , typename NumSelectedIterator > | |
| CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t | Unique (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_in, OutputIterator d_out, NumSelectedIterator d_num_selected, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
Given an input sequence d_in having runs of consecutive equal-valued keys, only the first key from each run is selectively copied to d_out. The total number of items selected is written to d_num_selected.
. | |
|
inlinestatic |
Uses the d_flags sequence to selectively copy the corresponding items from d_in into d_out. The total number of items selected is written to d_num_selected.
d_flags must be castable to bool (e.g., bool, char, int, etc.).d_out and maintain their original relative ordering.d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.CUB_CDP macro in your compiler's macro definitions.int device vector. | InputIterator | [inferred] Random-access input iterator type for reading input items (may be a simple pointer type) |
| FlagIterator | [inferred] Random-access input iterator type for reading selection flags (may be a simple pointer type) |
| OutputIterator | [inferred] Random-access output iterator type for writing selected items (may be a simple pointer type) |
| NumSelectedIterator | [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type) |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [in] | d_flags | Pointer to the input sequence of selection flags |
| [out] | d_out | Pointer to the output sequence of selected data items |
| [out] | d_num_selected | Pointer to the output total number of items selected (i.e., length of d_out) |
| [in] | num_items | Total number of input items (i.e., length of d_in) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false. |
Definition at line 134 of file device_select.cuh.
|
inlinestatic |
Uses the select_op functor to selectively copy items from d_in into d_out. The total number of items selected is written to d_num_selected.
d_out and maintain their original relative ordering.d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.CUB_CDP macro in your compiler's macro definitions.int32 and int64 items, respectively. Items are selected with 50% probability.
int device vector. | InputIterator | [inferred] Random-access input iterator type for reading input items (may be a simple pointer type) |
| OutputIterator | [inferred] Random-access output iterator type for writing selected items (may be a simple pointer type) |
| NumSelectedIterator | [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type) |
| SelectOp | [inferred] Selection operator type having member bool operator()(const T &a) |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [out] | d_out | Pointer to the output sequence of selected data items |
| [out] | d_num_selected | Pointer to the output total number of items selected (i.e., length of d_out) |
| [in] | num_items | Total number of input items (i.e., length of d_in) |
| [in] | select_op | Unary selection operator |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false. |
Definition at line 241 of file device_select.cuh.
|
inlinestatic |
Given an input sequence d_in having runs of consecutive equal-valued keys, only the first key from each run is selectively copied to d_out. The total number of items selected is written to d_num_selected.
== equality operator is used to determine whether keys are equivalentd_out and maintain their original relative ordering.d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.CUB_CDP macro in your compiler's macro definitions.int32 and int64 items, respectively. Segments have lengths uniformly sampled from [1,1000].
int device vector. | InputIterator | [inferred] Random-access input iterator type for reading input items (may be a simple pointer type) |
| OutputIterator | [inferred] Random-access output iterator type for writing selected items (may be a simple pointer type) |
| NumSelectedIterator | [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type) |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [out] | d_out | Pointer to the output sequence of selected data items |
| [out] | d_num_selected | Pointer to the output total number of items selected (i.e., length of d_out) |
| [in] | num_items | Total number of input items (i.e., length of d_in) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false. |
Definition at line 332 of file device_select.cuh.
1.8.4