8000 [pull] branch-23.04 from rapidsai:branch-23.04 by pull[bot] · Pull Request #24 · vishalbelsare/raft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[pull] branch-23.04 from rapidsai:branch-23.04 #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: branch-23.04
Choose a base branch
from

Conversation

pull[bot]
Copy link
@pull pull bot commented Mar 24, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sp 10000 onsor : )

yong-wang and others added 2 commits March 24, 2023 16:21
The main changes are:

- Add a one-block version. It uses single thread block for one row of a batch and is used when `len` is relatively small (<= 16384)
- Avoid writing candidates to buffers when the number of candidates is larger than buffer length.
- Add a parameter to control whether to use a fused filter in the last pass or use a standalone filter kernel. The later case is preferable when the leading bits of inputs are almost same.
- Early stopping: when the target bucket contains `k` values, we can stop the computation earlier
- Many implementation details are polished, like the initialization of `counter`, calculation of kernel launch parameters, and the scan step
- Tests and benchmarks are updated to include the new implementations. New benchmarks are added to demonstrate the advantage of adaptive version.

Authors:
  - Yong Wang (https://github.com/yong-wang)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1175
This is a copy and modification of a user's project but I think this is going to be generally useful to users as the same types of challenges are going to come up again. In this case, the user wasn't able to build/link because they weren't using `rapids-cmake` to propagate important configuration settings. I think having a skeleton project available that we build in CI and keep up to date will help new users build more applications on RAFT. 


TODO:
- [x] Make building the template optional 
- [x] Verify this can build in CMake and reuse already built/installed bits
- [x] Add to docs / readme and reference in README.md
- [x] Add a little example of invoking an API (maybe `pairwise_distances`?) to `main()`

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Micka (https://github.com/lowener)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Divye Gala (https://github.com/divyegala)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1312
@pull pull bot added the ⤵️ pull label Mar 24, 2023
cjnolet and others added 13 commits March 24, 2023 21:40
This should cut compilation time for refine_d_int64_t_float.cu.o et al from ~900 seconds to 29 seconds.

The refine specialization contain >100 instances of the ivfflat_interleaved_scan kernel, even though these should be seperately compiled by the ivfflat_search specializations. 

The call to ivf_flat_interleaved_scan is [here](https://github.com/rapidsai/raft/blob/56ac43ad93a319a61073dce1b3b937f6f13ade63/cpp/include/raft/neighbors/detail/refine.cuh#L121). 

Depends on (so please merge after) PR #1307.

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1360
Add the ability for a user to specify an epilogue function to run after the distance in the brute_force::knn call.

This lets us remove faiss from cuml, by updating the hdbscan reachability code (rapidsai/cuml#5293)

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1371
New `bench/ann` artifact for comparing (C++ APIs for) GPU-acclerated algorithms end-to-end. Working on this w/ @tfeher but had to squash the original commits into a single commit.

Things left to do:
- [x] Separate `benchmarks` executables for each different algorithm
- [x] Separate build targets for `ggnn` and `hnswlib`
- [x] Revise `bench/ann` docs
- [x] Break `factory.cuh` abd `benchmark.cu` / `benchmark.cpp` into individual files for each different algorithm to make it easier to plug in new algorithms. 
- [x] Separate into its own conda package

closes #1211

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Ben Frederickson (https://github.com/benfred)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1304
Add `--time` option to `build.sh` that enables compile time logging of `nvcc`. 

Also, add a script `cpp/scripts/analyze_nvcc_log.py` to find the translation units that take the longest time. 

Output looks like: 

```
$ cpp/scripts/analyze_nvcc_log.py cpp/build/nvcc_compile_log.csv
-- loading data
-- analyzing data
-- Ten longest translation units:
phase  index                                               file        cicc   cudafe++  fatbinary  gcc (compiling)  gcc (preprocessing 1)  gcc (preprocessing 4)        ptxas   total time
0         10  ions/detail/canberra_double_double_double_int.cu    42.431063  10.601856   0.020979         6.747153               3.721194               2.093567  1618.390375  1684.006186
1         11  zations/detail/canberra_float_float_float_int.cu    36.928960   9.804138   0.011537         6.796088               3.481156               1.790703  1584.262875  1643.075457
2         85  ors/specializations/refine_d_uint64_t_uint8_t.cu   602.935531  14.980877   0.529673        36.300566               6.270717               2.889723   933.622969  1597.530056
3         84  bors/specializations/refine_d_uint64_t_int8_t.cu   606.513281  16.243960   0.729282        39.981113               5.608029               3.028493   897.241469  1569.345628
4         53  stance/neighbors/ivfpq_search_int8_t_uint64_t.cu   841.049750   8.233967   1.025554        24.248578               4.069022               1.747108   631.193734  1511.567713
5         52  istance/neighbors/ivfpq_search_float_uint64_t.cu   837.241437   8.145278   1.042313        24.400606               3.433528               1.882623   627.786672  1503.932457
6         54  tance/neighbors/ivfpq_search_uint8_t_uint64_t.cu   846.706656   8.371286   1.025517        24.094691               3.432749               1.645345   618.319234  1503.595479
7         76  izations/detail/ivfpq_search_uint8_t_uint64_t.cu   698.726266   7.086368   1.050021        39.727723               3.259101               1.333935   406.509937  1157.693351
8         74  alizations/detail/ivfpq_search_float_uint64_t.cu   706.702516   6.905794   1.049731        39.923895               2.814361               2.057154   395.604000  1155.057450
9         75  lizations/detail/ivfpq_search_int8_t_uint64_t.cu   689.390281   6.483386   1.025864        39.865668               3.121696               1.297788   409.099562  1150.284245
10        83  hbors/specializations/refine_d_uint64_t_float.cu   334.705594  15.466444   0.680270        36.551977               5.405133               2.947568   715.708781  1111.465767
-- Plotting absolute compile times
-- Wrote absolute compile time plot to cpp/build/nvcc_compile_log.csv.absolute.compile_times.png
-- Plotting relative compile times
-- Wrote relative compile time plot to cpp/build/nvcc_compile_log.csv.relative.compile_times.png
```

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1262
This PR will also automatically generate an HTML report in `conda-cpp-build` CI runs under the task `Upload Additional Artifacts`

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1369
cupy just released version 12, which we do not yet support, so wheels CI must be pinned to require a lower version

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Sevag H (https://github.com/sevagh)

URL: #1383
Due to some typos in #1304, the benchmarks are not built any more. 

I have fixed this in this PR. I think it is important that this PR goes into 23.04, otherwise we will not have benchmarks for two months in the stable release.

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1389
Switches raft over to the upcoming raft_export(`COMPONET`) support. 

This allows raft to not require to support custom exporting logic

Authors:
  - Robert Maynard (https://github.com/robertmaynard)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1154
This PR pins `dask` and `distributed` to `2023.3.2` and `2023.3.2.1` respectively for `23.04` release.

xref: rapidsai/cudf#13070

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Ray Douglass (https://github.com/raydouglass)
  - Ben Frederickson (https://github.com/benfred)
  - Joseph (https://github.com/jolorunyomi)

URL: #1399
This PR adds CAGRA, a graph based method for nearest neighbor search.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Ben Frederickson (https://github.com/benfred)
  - Divye Gala (https://github.com/divyegala)

URL: #1375
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0