8000 Improve how copies are handled by MrBurmark · Pull Request #514 · LLNL/RAJAPerf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Improve how copies are handled #514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open

Conversation

MrBurmark
Copy link
Member

Improve copies

This does a number of things to improve how copies are handled.
Add synchronization to copyData to ensure those copies complete. This fixes a potential issue around host->device and device->device copies with cuda and hip.
Use the programming model memcpy implementation instead of copyData in some sections of the code. This fixes a potential issue where a sequential memcpy implementation was used for host to host copies with cuda and hip that could lead to a host-device race condition.
Make a combined scopedMoveData and allocData API called allocDataForInit, along with variants that initialize memory. This generally reduces the number of allocations and copies used by allocating directly into the memory space used during initialization. This simplifies usage in setup routines, but makes allocate and copy patterns elsewhere more explicit.

  • This PR is a r 8000 efactoring, bugfix
  • It does the following:
    • Modifies/refactors scopedMoveData to combine it with allocation and init and give it a better name.
    • Fixes a potential host/device race condition issue

MrBurmark added 4 commits May 22, 2025 12:28
This simplifies things into an allocation with optional init,
user init sequentially on host, and copy to final memory space.
Previously with separate allocData and scopedMoveData there was
an extra copy to device and copy back to host with allocations and
deallocations.
This avoids a host device race condition that could occur
if the memory space used was host.
@MrBurmark MrBurmark requested review from rhornung67, rchen20 and a team May 22, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0