ballot
ensures that it's the only one running a command, by running a leader
election on a ZooKeeper cluster. Others that try to run the same command (based
on ZooKeeper path) will wait until the active ballot
releases leadership.
Some example use cases include:
- A distributed cron service, where jobs need to run at specific times or time
intervals, and cannot overlap, but still needs to failover to standby
instances on failure. In this case
ballot
acts as lockrun would on a single-server system. - Any singleton service, that would be a kubernetes
Deployment
of 1 replica, if kubernetes is not an option, or if running on kubernetes and trying to achieve failover faster than the control plane can re-schedule pods.
ballot run once --candidate-id `hostname` -- /usr/bin/env LD_PRELOAD=trickle.so command-with-exclusive-resource
ballot run cron --candidate-id `hostname` --schedule='@daily' -- /usr/bin/env LD_PRELOAD=trickle.so command-with-exclusive-resource
A running and healthy ZooKeeper ensemble is required. Only plain unauthenticated connections are supported at this time.
ballot
relies on
viper/cobra
for command line and configuration, and thus can be configured using command
line arguments, environment variables, and configuration files. Remote
configuration sources are not available at this time.
Several configuration sources can be mixed, as long as parameters are not duplicated between sources. The following are all equivalent:
-
Running
ballot --config-file ./ballot.yaml run [...]
withballot.yaml
containing:--- zookeeper-servers: - server1 - server2 - 'server3:2181' zookeeper-base-path: /com/scality/backbeat/singleton zookeeper-session-timeout: 5s log-level: debug log-format: json
-
Running
env \ ZOOKEEPER_SERVERS=server1,server2,server3:2181 \ ZOOKEEPER_BASE_PATH=/com/scality/backbeat/singleton \ ZOOKEEPER_SESSION_TIMEOUT=5s LOG_LEVEL=debug \ LOG_FORMAT=json \ ballot run [...]
-
Running
ballot \ --zookeeper-servers server1 \ --zookeeper-servers server2 \ --zookeeper-servers server3:2181 \ --zookeeper-base-path /com/scality/backbeat/singleton \ --zookeeper-session-timeout 5s \ --log-level debug \ --log-format json \ run [...]
Global configuration:
Parameter | Description | Default | Allowed values |
---|---|---|---|
config-file |
Configuration file | ~/.config/ballot/ballot.json , ~/.config/ballot/ballot.yaml , /etc/ballot/ballot.json , /etc/ballot/ballot.yaml , ./ballot.json , ./ballot.yaml |
Any path to a valid yaml or json ballot configuration |
zookeeper-servers |
List of ZooKeeper servers | localhost |
List of server or server:port |
zookeeper-base-path |
Base path to ZooKeeper election proposal nodes | /ballot/election |
Any valid, otherwise unused ZooKeeper path |
zookeeper-session-timeout |
ZooKeeper session timeout, used to detect purge stale election members | 5s |
Any duration between 2 and 20 times ZooKeeper's tickTime |
log-level |
Log level | info |
trace , debug , info , warn , error , fatal , panic |
log-format |
Logs format | human |
human , json , raw |
debug |
Debug mode with extra checks and logging | false |
false , true |
run once
runs the provided command. Use --
to separate ballot
's own
arguments from the command's executable and arguments if they contain -
's
(which is very likely).
Setting parameters on-child-success
and on-election-failure
to reelect
or
rerun
will cause the command to be run several times. In this case, once
means "not scheduled to run regularly", and is not a guarantee of only one run.
Note: If ZooKeeper goes down while a ballot
process is a leader and others are
waiting as followers, they will all stay in their respective roles and operate
until ZooKeeper comes back. They will, however, not be able to perform any
failover or run new commands until ZooKeeper is back up.
Parameter | Description | Default | Allowed values |
---|---|---|---|
candidate-id |
This ballot 's ID, to be used for display and debugging purposes. Uniqueness of IDs between ballot s is not required, but makes inspecting easier. |
N/A, mandatory | Any string |
on-child-success |
What to do when the command we run terminates with exit code 0. | reelect |
rerun , reelect , exit , ignore |
on-child-error |
What to do when the command we run terminates with an exit code other than 0, or cannot be executed for any reason. | reelect |
rerun , reelect , exit , ignore |
on-election-failure |
What to do when the election process fails for technical reasons (unreachable ZooKeeper, for example). Use run-anyway only if running too many processes at once is preferable to none at all. |
retry |
retry , run-anyway , exit |
wrap-child-logs |
Whether to display the command's logs formatted as our own logs, or have its stdout and stderr directly connected to ballot 's |
true |
false , true |
run cron
runs the provided command on a schedule. Use --
to separate ballot
's own
arguments from the command's executable and arguments if they contain -
's
(which is very likely).
Leader election is performed once and all timed invocations happen on the node
that was elected leader. Elections do not happen for each invocation, unless
on-child-success
/on-child-error
are set to reelect
.
The only supported concurrency policy at the moment is to skip an invocation if the previous invocation is still running.
Note: If ZooKeeper goes down while a ballot
process is a leader and others are
waiting as followers, they will all stay in their respective roles and operate
until ZooKeeper comes back. They will, however, not be able to perform any
failover in the event that the leader goes down.
Future features can include adding a deadline and killing overdue jobs, allowing invocation queueing, allowing concurrent invocations.
Parameter | Description | Default | Allowed values |
---|---|---|---|
schedule |
Schedule to run on. | N/A, mandatory | Extended Cron format |
candidate-id |
This ballot 's ID, to be used for display and debugging purposes |
N/A, mandatory. Uniqueness of IDs between ballot s is not required, but makes inspecting easier. |
Any string |
on-child-success |
What to do when the command we run terminates with exit code 0. | reelect |
rerun , reelect , exit , ignore |
on-child-error |
What to do when the command we run terminates with an exit code other than 0, or cannot be executed for any reason. | reelect |
rerun , reelect , exit , ignore |
on-election-failure |
What to do when the election process fails for technical reasons (unreachable ZooKeeper, for example). Use run-anyway only if running too many processes at once is preferable to none at all. |
retry |
retry , run-anyway , exit |
wrap-child-logs |
Whether to display the command's logs formatted as our own logs, or have its stdout and stderr directly connected to ballot 's |
true |
false , true |
Not implemented yet.
Parameter | Description | Default | Allowed values |
---|
Not implemented yet.
Parameter | Description | Default | Allowed values |
---|
In order to contribute, please follow the Contributing Guidelines.
TODO: Developer instructions to be written here.
Licensed under the Apache 2.0 license.