This repository was archived by the owner on Mar 25, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
Improve pymepix pipeline performance (centroiding) #14
Labels
enhancement
New feature or request
Comments
|
troehling
pushed a commit
that referenced
this issue
Jul 28, 2021
troehling
pushed a commit
that referenced
this issue
Jul 28, 2021
- Print stacktrace in case of an error for better maintainability
troehling
pushed a commit
that referenced
this issue
Sep 9, 2021
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Uh oh!
There was an error while loading. Please reload this page.
As it turns out, the
centroiding
, as expected, in pymepix becomes a bottleneck if run with too much data. We measured the time the clustering and centroiding takes in turn. Clustering needs much longer (factor 100: 0.2 - 0.75 s for clustering, 0.002-0.006 s for centroiding; for approximately 15,000 voxels per trigger, with 5898 triggers in the complete dataset) but it's not fast enough to empty the queue of data coming from thepacketprocessor
and data piles up. With a single centroiding process and the dataset described above at 100 Hz a queue of about 72 packets builds up (approximately one packet per second - processing must be double as fast in this case to process all packets in time).For now, the number of data chunks being sent to the
packetprocessor
is reduced to a third. The number of ions per shot is around 1, and we run at 200 Hz.I can think of a couple of options to improve the performance of the
centroiding
:centroiding
n_jobs
to dbscan doesn't work at the moment, we submitted an issue with joblib (Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1 warning with scikit-learn joblib/joblib#1204)n_jobs
scikit-learn.dbscan
with a compiled versionOption 1 should be very straight forward and worthwhile testing if it solves the immediate bottleneck for now:
pymepix/pymepix/processing/acquisition.py
line 141:self.addStage(4, Centroiding)
→self.addStage(4, Centroiding, num_processes=10)
As for option 2, let's see what they come up with.
For option 3 I see a lot of potential, if done right. In principle, allows the structure of our trivial parallelisation of the cluster finding. The danger here is, that if
dbscan
is performed over each trigger number like such:the time
dbscan
actually will take with respect to the time, the data is sliced, the function is called, data is serialised and so on and so forth, is much too short, and we would produce a tremendous overhead. So the time the computation of the clusters need, has to take much longer as compared to all the overhead a parallelisation brings.The way we could achieve this, is to leave the clustering as it is, but add to the
centroiding.py
a worker pool which can perform the clustering on a "chunk" basis and maybe even do the centroiding in an asynchronous fashion. Naively this should scale linearly, so with a pool of 10 workers we can expect an improvement of a factor of 10x.Need to look into
joblib
andloki
some more to understand the possibilities of creating such a worker pool...→ After a bit of reading,
multiprocessing.pool
is probably the best bet here.scikit-learn.dbscan
seems to be a pure Python implementation (https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/cluster/_dbscan.py). However, apparently it gets compiled withCython
, though I don't quite understand by looking atdbscan
how. If it doesn't get compiled, we can expect a significant acceleration (again if the overhead of context switching and potential memory copy isn't too big...), if it does get compiled, we won't.... E.g. https://github.com/s1998/parallel-DBSCAN claims a speed-up of 13x on a 36 core machine as compared toscikit-learn.dbscan
(frankly, if comparingscikit-learn.dbscan
with a C++ implementation running on a 36 cores only get 13 time faster,dbscan
is already pretty efficient). Btw, alsonumba
on that end probably doesn't help, although testing it isn't much effort, and we would know for sure...The lowest hanging fruit is option 1 and the biggest difference should make option 3. If still necessary, switching to a compiled version of
dbscan
should be a transparent change but as we already run the clustering in a pool of workers at that time, much gain in performance is not expected (maybe a factor of 5 or 10 at the best).The text was updated successfully, but these errors were encountered: