8000 GitHub - pauleonix/thrustshift
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

pauleonix/thrustshift

 
 

Repository files navigation

Thrustshift library

CUDA library about what I consider useful and generic functions.

Dependencies

Design Concept

Ranges instead of iterators

A common design of the STL and thrust is to deploy iterators to access data. E.g. the function copy

copy(src_begin, src_end, dst_begin);

The function cannot check if the range accessed by dst_begin is of the same length as the source range, nor can be checked that src_begin and src_end point to the same range. Thrustshift makes use of the concept of a range, which provides a length and addresses the latter mentioned drawbacks.

thrustshift::async::copy(stream, src, dst);

Asynchronous namespace

Where possible Thrustshift provides asynchronous functions in the async namespace. The first argument to these functions is always a cuda::stream_t of the cuda-api-wrappers. E.g. Thrust only provides synchronous gather and scatter functions, which synchronize with the device.

Polymorphic memory resources

Thrustshift adapts the concept of the polymorphic memory resources of the STL to provide a full configurable interface regarding the memory usage of functions. Some functions might need temporary memory, which can be allocated with the given memory resource. E.g.

thrustshift::async::reduce(stream, values, result, reduction_functor, initial_value, delayed_memory_resource);

The delayed_memory_resource is only allowed to deallocate the memory, which was allocated by thrustshift::async::reduce after all calls to stream, which were queued by thrustshift::async::reduce, are finished. Although the deallocate is already called for all buffers, which were allocated by thrustshift::async::reduce. If you just want running code use:

#include <thrustshift/memory_resource.h>
#include <thrustshift/reduction.h>

...
thrustshift::pmr::delayed_pool_type<thrustshift::pmr::managed_resource_type> delayed_memory_resource;
...
thrustshift::async::reduce(stream, values, result, reduction_functor, initial_value, delayed_memory_resource);
...

Other design concepts regarding temporarily required buffers

cuSPARSE and CUB either use a different function call to determine the size of the temporary buffer or just return the size of the temporary buffer on the first function call. Both designs make the nesting of functions, which require temporary buffers inherently difficult. E.g. if you write a function, which uses two other functions, which require temporary buffers, but the new function should only expose one void* tmp_buffer you must take care about alignment if the two other functions require different types of temporary buffers. The emerging code is difficult to read and blows up the code size.

How to build the project

  1. Clone the Dependencies to your machine and build the projects

  2. Clone the repository to your preferred location

  3. Create a build folder

take build/release # zsh shell
  1. Configure with CMake

In the following the library is built for CUDA architecture sm_75. Please adjust accordingly for your GPU architecture.

cmake ... -DCMAKE_BUILD_TYPE=Release
  -DCMakeshift_DIR=$PATH_TO_CMAKESHIFT_BUILD_DIR
  -Dgsl-lite_DIR=$PATH_TO_GSL_LITE_BUILD_DIR
  -Dcuda-api-wrappers_DIR=$PATH_TO_CUDA_API_WRAPPERS_BUILD_DIR
  -Dsysmakeshift_DIR=$PATH_TO_SYSMAKESHIFT_BUILD_DIR
  -Dmakeshift_DIR=$PATH_TO_MAKESHIFT_BUILD_DIR
  -Dcuda-kat_DIR=$PATH_TO_CUDA_KAT_DIR
  -DCMAKE_CUDA_ARCHITECTURES="75"

Alternatively to declaring the paths to the dependencies explicitly, you can install them and set the CMAKE_PREFIX_PATH accordingly.

  1. Make
make -j

Thrustshift is a header-only library. Therefore, 'building' only creates the CMake config files.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 74.4%
  • Cuda 20.7%
  • CMake 2.2%
  • R 1.8%
  • Python 0.9%
0