poll/close error storm after adding just one more to fanout (for me 496)

some time ago, pdsh became very slow for me when connecting to even just 50-100 machines, running something simple like /bin/true. years ago this would take much less than a second (with persistent ssh connections). now it's taking quite a while, more than ten seconds to do this.

even using -R exec running echo, the following takes 12 seconds to run on just 57 hosts, it's not even logging in anywhere so not sure what it could be doing:

cat << % > testgenders
web[100-150] phy,prod,web,site1
db[100-105] phy,db,site1
%
time -p \
src/pdsh/pdsh -F ./testgenders -R exec -g site1 echo %h |
wc -l

on a dual-core Intel 8th Gen 16GB RAM:

57
real 12.86
user 16.66
sys 8.02

built using --without-rsh --without-ssh --with-exec --with-genders. would normally want to use ssh with this, just used exec for testing, but would think that should be instant. this used to be very much faster. trying to run this under strace, it takes about 10 minutes:

strace -qcf src/pdsh/pdsh -F ./testgenders -R exec -g site1 \
  echo %h 2>&1 >/dev/null | head

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.16 6423.893472    37348217       172           poll
  3.15  232.366846      355845       653        73 futex
  3.15  232.241463   232241463         1         1 rt_sigtimedwait
  3.15  232.206853     4073804        57           wait4
  2.07  152.765258      585307       261           nanosleep
  1.31   96.749412           1  59769314  59763728 close
  0.00    0.037793         331       114           socketpair
  0.00    0.015195          44       343       285 execve

dang, seems to want to close something an awful lot. the fd numbers it's trying to close get out of hand rapidly:

strace -qqf -e close \
src/pdsh/pdsh -F ./testgenders -R exec -g site1 \
echo %h 2>&1 >/dev/null |
grep --line-buffered unfinished |
awk '{
fd = substr($3, index($3, "(") + 1);
if (fd > top) {
  top = fd
  if (fd % 10 == 0) {
    print fd; fflush();
  }
}'

10
90
990
9990
99990

thinking this would need backtrace, wanted to get one thread running, so tried FANOUT=1... but then PROBLEM GONE!!! found the transition point:

 $ time -p FANOUT=495 src/pdsh/pdsh -F ./testgenders -R exec -g site1 echo %h >/dev/null
real 0.04
user 0.06
sys 0.02

 $ time -p FANOUT=496 src/pdsh/pdsh -F ./testgenders -R exec -g site1 echo %h >/dev/null
real 12.72
user 16.61
sys 8.22

ok so is it wrong to have high fanout? what transition occurs at 496? isn't this number just a limit? shouldn't the fanout be able to work to arbitrary extent if the machine has enough resources to handle it? maybe we have 10k machines in the cluster and we got a big controller to handle them in one batch (our cluster's not that large, but maybe next year ;-)

originally had tried to find a FANOUT=0 option to mean "as many threads as requested for all nodes to be done in one batch," but had just used FANOUT=9999 since that doesn't seem possible to specify. some time ago this was set innocuously, apparently this causes a problem making pdsh slow, which has been bugging me for some time but didn't see the link between the two.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions