8000 Improving Csr::strategy_type · Issue #320 · ginkgo-project/ginkgo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Improving Csr::strategy_type #320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thoasm opened this issue Jun 27, 2019 · 2 comments
Open

Improving Csr::strategy_type #320

thoasm opened this issue Jun 27, 2019 · 2 comments
Labels
is:interface-breaking This change may break interface compatibility. Can only be done through a major release. is:proposal Maybe we should do something this way. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. type:matrix-format This is related to the Matrix formats

Comments

@thoasm
Copy link
Member
thoasm commented Jun 27, 2019

Currently, all strategies for CSR are tailored for CUDA, which is fine for the most part.
However, also the names are CUDA specific (see Csr::cusparse), which should be changed in my opinion since we want to support multiple platforms. I would prefer to go with a more neutral name like Csr::sparse_library in that case (maybe there is even an OpenMP sparse library we could use for that).
Also, automatical only works on CUDA devices and even requires a CudaExecutor to work. I would prefer a solution where it is possible to also adapt to certain OpenMP properties (which we should have as soon as we have a more sophisticated SpMV there).

Additionally, I am not sure why we use an std::shared_ptr for these strategies. Currently, we always have to call std::make_shared to generate a strategy, which is both not intuitive and not necessary since there is not much stored inside a strategy object (at most an std::string and an int64_t). I think copying the object would be faster than allocating memory on the heap, although it should not really matter much (the more important part for me is the intuitiveness).
We could also encapsulate the strategies in a class named strategy, so it is clear that Csr::spmv_strategy::automatical is an SpMV strategy.

In summary, I think the following changes should be introduced:

  • Change the names of the strategies to more neutral ones, e.g. cusparse -> sparse_library
  • Make automatical to actually be automatical and dependent on the executor (CUDA vs. OpenMP vs. Reference) without requiring a CudaExecutor
  • Change the type of the strategy from std::shared_ptr to just a plain object since the most one of these objects contain is an int64_t and an std::string
  • put all strategy classes in a separate class spmv_strategy (or similar), so you call it with Csr::spmv_strategy::automatical, which is more descriptive

Additionally, some functionality/performance changes can also be incorporated into the strategies:

  • Split the generate/prepare step required for some strategies (cusparse. hipsparse) and move them if possible to make_srow to keep data persistent over many apply calls optimizing the apply calls. See discussion here.
@thoasm thoasm added is:proposal Maybe we should do something this way. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. type:matrix-format This is related to the Matrix formats labels Jun 27, 2019
@yhmtsai
Copy link
Member
yhmtsai commented Aug 13, 2019

The strategy automatical or load_balance compute the srow for the GPU kernel but the matrix is maybe on the host memory.

strategy creates a CUDA handle to take the parameter of GPU code
bechmark/spmv does not pass the executor to strategy when reading the matrix data. code

This implementation leads that read matrix of benchmark/spmv still uses device 0 when we set device_id=1.
I think using make_srow only when creating the matrix on GPU should be okay.

@thoasm
Copy link
Member Author
thoasm commented Aug 13, 2019

I think it would be better if we had a kernel call to make_srow and call the kernel that matches the executor where the matrix is stored. Just using make_srow for CUDA is the wrong approach in my opinion, since we want to support all platforms we support with Ginkgo.
Currently, the strategy is limited to only Nvidia GPUs, while we might want to have support for specialized OpenMP kernels, or for AMD GPUs, which we will also support in the near future.

@thoasm thoasm added the is:interface-breaking This change may break interface compatibility. Can only be done through a major release. label Nov 11, 2019
@tcojean tcojean added this to the Ginkgo 2.0.0 milestone Apr 30, 2020
@upsj upsj removed this from the Ginkgo 2.0.0 milestone Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:interface-breaking This change may break interface compatibility. Can only be done through a major release. is:proposal Maybe we should do something this way. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

No branches or pull requests

4 participants
0