 cub::ArgIndexInputIterator< InputIteratorT, OffsetT > | A random-access input wrapper for pairing dereferenced values with their corresponding indices (forming KeyValuePair tuples) |
 cub::ArgMax | Arg max functor (keeps the value and offset of the first occurrence of the larger item) |
 cub::ArgMin | Arg min functor (keeps the value and offset of the first occurrence of the smallest item) |
 cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockDiscontinuity class provides collective methods for flagging discontinuities within an ordered set of items partitioned across a CUDA thread block.
|
 cub::BlockExchange< T, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockExchange class provides collective methods for rearranging data partitioned across a CUDA thread block.
|
 cub::BlockHistogram< T, BLOCK_DIM_X, ITEMS_PER_THREAD, BINS, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockHistogram class provides collective methods for constructing block-wide histograms from data samples partitioned across a CUDA thread block.
|
 cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockLoad class provides collective data movement methods for loading a linear segment of items from memory into a blocked arrangement across a CUDA thread block.
|
 cub::BlockRadixSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, RADIX_BITS, MEMOIZE_OUTER_SCAN, INNER_SCAN_ALGORITHM, SMEM_CONFIG, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockRadixSort class provides collective methods for sorting items partitioned across a CUDA thread block using a radix sorting method.
|
 cub::BlockReduce< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread block.
|
 cub::BlockScan< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockScan class provides collective methods for computing a parallel prefix sum/scan of items partitioned across a CUDA thread block.
|
 cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory.
|
 cub::CacheModifiedInputIterator< MODIFIER, ValueType, OffsetT > | A random-access input wrapper for dereferencing array values using a PTX cache load modifier |
 cub::CacheModifiedOutputIterator< MODIFIER, ValueType, OffsetT > | A random-access output wrapper for storing array values using a PTX cache-modifier |
 cub::CachingDeviceAllocator | A simple caching allocator for device memory allocations |
 cub::Cast< B > | Default cast functor |
 cub::ConstantInputIterator< ValueType, OffsetT > | A random-access input generator for dereferencing a sequence of homogeneous values |
 cub::CountingInputIterator< ValueType, OffsetT > | A random-access input generator for dereferencing a sequence of incrementing integer values |
 cub::CubVector< T, vec_elements > | Exposes a member typedef Type that names the corresponding CUDA vector type if one exists. Otherwise Type refers to the CubVector structure itself, which will wrap the corresponding x, y, etc. vector fields |
 cub::DeviceHistogram | DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within global memory.
|
 cub::DevicePartition | DevicePartition provides device-wide, parallel operations for partitioning sequences of data items residing within global memory.
|
 cub::DeviceRadixSort | DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within global memory.
|
 cub::DeviceReduce | DeviceReduce provides device-wide, parallel operations for computing a reduction across a sequence of data items residing within global memory.
|
 cub::DeviceRunLengthEncode | DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within global memory.
|
 cub::DeviceScan | DeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within global memory.
|
 cub::DeviceSelect | DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within global memory.
|
 cub::DeviceSpmv | DeviceSpmv provides device-wide parallel operations for performing sparse-matrix * dense-vector multiplication (SpMV) |
 cub::DoubleBuffer< T > | Double-buffer storage wrapper for multi-pass stream transformations that require more than one storage array for streaming intermediate results back and forth |
 cub::Equality | Default equality functor |
 cub::Equals< A, B > | Type equality test |
 cub::If< IF, ThenType, ElseType > | Type selection (IF ? ThenType : ElseType) |
 cub::Inequality | Default inequality functor |
 cub::InequalityWrapper< EqualityOp > | Inequality functor (wraps equality functor) |
 cub::Int2Type< A > | Allows for the treatment of an integral constant as a type at compile-time (e.g., to achieve static call dispatch based on constant integral values) |
 cub::IsPointer< Tp > | Pointer vs. iterator |
 cub::IsVolatile< Tp > | Volatile modifier test |
 cub::KeyValuePair< _Key, _Value > | A key identifier paired with a corresponding value |
 cub::Log2< N, CURRENT_VAL, COUNT > | Statically determine log2(N), rounded up |
 cub::Max | Default max functor |
 cub::Min | Default min functor |
 cub::NullType | A simple "NULL" marker type |
 cub::PowerOfTwo< N > | Statically determine if N is a power-of-two |
 cub::ReduceByKeyOp< ReductionOpT > | < Binary reduction operator to apply to values |
 cub::ReduceBySegmentOp< ReductionOpT > | Reduce-by-segment functor |
 cub::RemoveQualifiers< Tp, Up > | Removes const and volatile qualifiers from type Tp |
 cub::Sum | Default sum functor |
 cub::SwizzleScanOp< ScanOp > | Binary operator wrapper for switching non-commutative scan arguments |
 cub::TexObjInputIterator< T, OffsetT > | A random-access input wrapper for dereferencing array values through texture cache. Uses newer Kepler-style texture objects |
 cub::TexRefInputIterator< T, UNIQUE_ID, OffsetT > | A random-access input wrapper for dereferencing array values through texture cache. Uses older Tesla/Fermi-style texture references |
 cub::TransformInputIterator< ValueType, ConversionOp, InputIteratorT, OffsetT > | A random-access input wrapper for transforming dereferenced values |
 cub::Uninitialized< T > | A storage-backing wrapper that allows types with non-trivial constructors to be aliased in unions |
 cub::Uninitialized< _TempStorage > | |
  cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockDiscontinuity require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::BlockExchange< T, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockExchange require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::BlockHistogram< T, BLOCK_DIM_X, ITEMS_PER_THREAD, BINS, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockHistogram require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::LoadInternal< BLOCK_LOAD_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::LoadInternal< BLOCK_LOAD_WARP_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::LoadInternal< BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockLoad require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::BlockRadixSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, RADIX_BITS, MEMOIZE_OUTER_SCAN, INNER_SCAN_ALGORITHM, SMEM_CONFIG, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::BlockReduce< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::BlockScan< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StoreInternal< BLOCK_STORE_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StoreInternal< BLOCK_STORE_WARP_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StoreInternal< BLOCK_STORE_WARP_TRANSPOSE_TIMESLICED, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockStore require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::WarpReduce< T, LOGICAL_WARP_THREADS, PTX_ARCH >::TempStorage | The operations exposed by WarpReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
  cub::WarpScan< T, LOGICAL_WARP_THREADS, PTX_ARCH >::TempStorage | The operations exposed by WarpScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse |
 cub::WarpReduce< T, LOGICAL_WARP_THREADS, PTX_ARCH > | The WarpReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread warp.
|
 cub::WarpScan< T, LOGICAL_WARP_THREADS, PTX_ARCH > | The WarpScan class provides collective methods for computing a parallel prefix scan of items partitioned across a CUDA thread warp.
|