-
Notifications
You must be signed in to change notification settings - Fork 6k
[DRAFT] Cluster reboot tool #20967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Cluster reboot tool #20967
Conversation
586acc0
to
6649f16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... You generate the host files, and then run each round manually, correct? Then run a health check before starting each round?
Each round reboots max one server per set.
Gives a good amount of visibility, and close enough to optimal to be fine. Certainly an improvement over clushing.
|
||
func makeHostfile() { | ||
pools, totalServers, err := getInfra() | ||
var rebootRounds [200][200]map[string]*Server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var rebootRounds [200][200]map[string]*Server | |
const maxRounds = 200 | |
const maxPools = 200 | |
var rebootRounds [maxRounds][maxPools]map[string]*Server |
That is correct, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
if !ok { | ||
|
||
for _, rv := range rebootRounds[i][pid] { | ||
if haveMatchingSets(rv, s) { | ||
continue nextServer | ||
} | ||
} | ||
|
||
rebootRounds[i][pid][s.Endpoint] = pools[pid].Servers[sid] | ||
pools[pid].Servers[sid].Processed = true | ||
processed++ | ||
} else { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !ok { | |
for _, rv := range rebootRounds[i][pid] { | |
if haveMatchingSets(rv, s) { | |
continue nextServer | |
} | |
} | |
rebootRounds[i][pid][s.Endpoint] = pools[pid].Servers[sid] | |
pools[pid].Servers[sid].Processed = true | |
processed++ | |
} else { | |
continue | |
} | |
if ok { | |
continue | |
} | |
for _, rv := range rebootRounds[i][pid] { | |
if haveMatchingSets(rv, s) { | |
continue nextServer | |
} | |
} | |
rebootRounds[i][pid][s.Endpoint] = pools[pid].Servers[sid] | |
pools[pid].Servers[sid].Processed = true | |
processed++ |
} | ||
for _, rv3 := range rv2 { | ||
// _, err = roundFile.WriteString(rv3.Endpoint + "\n") | ||
_, err = roundFile.WriteString(strings.Replace(rv3.Endpoint, ":443", "", -1) + "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for :80
---- or any port???
Seems like you don't expect port to be on the host name in the file, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a split instead
if err != nil { | ||
panic(err) | ||
} | ||
hostsList := bytes.Split(hosts, []byte{10}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for ease of reading...
hostsList := bytes.Split(hosts, []byte{10}) | |
hostsList := bytes.Split(hosts, []byte{'\n'}) |
if len(v) < 1 { | ||
continue | ||
} | ||
rebootServer(string(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda expected these to all be concurrent. These are split in "safe" sets to do that... But maybe that should just be an opt-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, before you use this command you'd use the "hostfile" command to generate a hostfile which contains a list of hosts that do not share erasure sets
config := &ssh.ClientConfig{ | ||
User: "root", | ||
HostKeyCallback: ssh.InsecureIgnoreHostKey(), | ||
Timeout: 10 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a dial timeout to establish.
You probably want some "operation" timeout as well... So you don't risk that hanging forever, Not sure how to do that on the session, though.
} | ||
defer session.Close() | ||
|
||
output, err := session.CombinedOutput("echo 'this is where we execute the command'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, prolly want command as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a client testing this out and we don't want to run the actual commands yet :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree. But I was thinking making it a flag - with this as the default. They can then replace it with their actual restart command once they are happy.
} | ||
} | ||
|
||
func haveMatchingSets(s1 *Server, s2 *Server) (yes bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func haveMatchingSets(s1 *Server, s2 *Server) (yes bool) { | |
// haveMatchingSets returns if s1 and s2 share any sets. | |
// Servers are assumed to be in same pool. | |
func haveMatchingSets(s1 *Server, s2 *Server) (yes bool) { |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 15 days if no further activity occurs. Thank you for your contributions. |
This is a helper too which let's clients reboot minio clusters, 1 server per erasure set at a time.