Description
some time ago, pdsh became very slow for me when connecting to even just 50-100 machines, running something simple like /bin/true. years ago this would take much less than a second (with persistent ssh connections). now it's taking quite a while, more than ten seconds to do this.
even using -R exec
running echo
, the following takes 12 seconds to run on just 57 hosts, it's not even logging in anywhere so not sure what it could be doing:
cat << % > testgenders
web[100-150] phy,prod,web,site1
db[100-105] phy,db,site1
%
time -p \
src/pdsh/pdsh -F ./testgenders -R exec -g site1 echo %h |
wc -l
on a dual-core Intel 8th Gen 16GB RAM:
57
real 12.86
user 16.66
sys 8.02
built using --without-rsh --without-ssh --with-exec --with-genders
. would normally want to use ssh with this, just used exec for testing, but would think that should be instant. this used to be very much faster. trying to run this under strace, it takes about 10 minutes:
strace -qcf src/pdsh/pdsh -F ./testgenders -R exec -g site1 \
echo %h 2>&1 >/dev/null | head
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
87.16 6423.893472 37348217 172 poll
3.15 232.366846 355845 653 73 futex
3.15 232.241463 232241463 1 1 rt_sigtimedwait
3.15 232.206853 4073804 57 wait4
2.07 152.765258 585307 261 nanosleep
1.31 96.749412 1 59769314 59763728 close
0.00 0.037793 331 114 socketpair
0.00 0.015195 44 343 285 execve
dang, seems to want to close something an awful lot. the fd numbers it's trying to close get out of hand rapidly:
strace -qqf -e close \
src/pdsh/pdsh -F ./testgenders -R exec -g site1 \
echo %h 2>&1 >/dev/null |
grep --line-buffered unfinished |
awk '{
fd = substr($3, index($3, "(") + 1);
if (fd > top) {
top = fd
if (fd % 10 == 0) {
print fd; fflush();
}
}'
10
90
990
9990
99990
thinking this would need backtrace, wanted to get one thread running, so tried FANOUT=1
... but then PROBLEM GONE!!! found the transition point:
$ time -p FANOUT=495 src/pdsh/pdsh -F ./testgenders -R exec -g site1 echo %h >/dev/null
real 0.04
user 0.06
sys 0.02
$ time -p FANOUT=496 src/pdsh/pdsh -F ./testgenders -R exec -g site1 echo %h >/dev/null
real 12.72
user 16.61
sys 8.22
ok so is it wrong to have high fanout? what transition occurs at 496? isn't this number just a limit? shouldn't the fanout be able to work to arbitrary extent if the machine has enough resources to handle it? maybe we have 10k machines in the cluster and we got a big controller to handle them in one batch (our cluster's not that large, but maybe next year ;-)
originally had tried to find a FANOUT=0
option to mean "as many threads as requested for all nodes to be done in one batch," but had just used FANOUT=9999
since that doesn't seem possible to specify. some time ago this was set innocuously, apparently this causes a problem making pdsh slow, which has been bugging me for some time but didn't see the link between the two.