|
CUB
|
DeviceSpmv provides device-wide parallel operations for performing sparse-matrix * dense-vector multiplication (SpMV).
CUB_CDP macro in your compiler's macro definitions. Definition at line 70 of file device_spmv.cuh.
Static Public Methods | |
CSR matrix operations | |
| template<typename ValueT > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | CsrMV (void *d_temp_storage, size_t &temp_storage_bytes, ValueT *d_values, int *d_row_offsets, int *d_column_indices, ValueT *d_vector_x, ValueT *d_vector_y, int num_rows, int num_cols, int num_nonzeros, ValueT alpha, ValueT beta, cudaStream_t stream=0, bool debug_synchronous=false) |
| This function performs the matrix-vector operation y = alpha*A*x + beta*y. More... | |
|
inlinestatic |
This function performs the matrix-vector operation y = alpha*A*x + beta*y.
| ValueT | [inferred] Matrix and vector value type (e.g., /p float, /p double, etc.) |
| [in] | d_temp_storage | Device allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_values | Pointer to the array of num_nonzeros values of the corresponding nonzero elements of matrix A. |
| [in] | d_row_offsets | Pointer to the array of m + 1 offsets demarcating the start of every row in d_column_indices and d_values (with the final entry being equal to num_nonzeros) |
| [in] | d_column_indices | Pointer to the array of num_nonzeros column-indices of the corresponding nonzero elements of matrix A. (Indices are zero-valued.) |
| [in] | d_vector_x | Pointer to the array of num_cols values corresponding to the dense input vector x |
| [out] | d_vector_y | Pointer to the array of num_rows values corresponding to the dense output vector y |
| [in] | num_rows | number of rows of matrix A. |
| [in] | num_cols | number of columns of matrix A. |
| [in] | num_nonzeros | number of nonzero elements of matrix A. |
| [in] | alpha | Scalar used for multiplication of the matrix A nonzeros. |
| [in] | beta | Scalar used for multiplication of the vector_y addend. (If beta is zero, vector_y need not comprise valid data elements.) |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false. |
Definition at line 134 of file device_spmv.cuh.
1.8.4