Runner is a script that wraps any Linux command and outputs a summary of execution including analysis of system statistics.
Run it with a command and with one or more of following options:
● -c COUNT - Number of times to run the given command.
● -fc N, --failed-count N - Number of allowed failed command invocation attempts before giving up.
● -st, --sys-trace - For each failed execution, create a log for each of the following values, measured during command execution:
○ Disk IO
○ Memory
○ Processes/threads and cpu usage of the command
○ Network card package counters
● -ct, --call-trace - For each failed execution, create a log with all the system calls ran by the command.
● -lt, --log-trace - For each failed execution, create logs for the command outputs (stdout, stderr).
● -nt, --net-trace - For each failed execution, create a ‘pcap’ file with the network traffic during the execution.
● -d, --debug - Debug mode, show each instruction executed by the script.
● -h, --help - Print a usage message to STDERR explaining how the script should be used.
Once completed, Runner will print a summary of the command return codes (frequency of each and matching run iterations). This is also if/when the script is interrupted via ctrl+c or ‘kill’. Finally, Runner will return the most frequent return code when exiting.
- A Linux machine, actual or virtual (using
kvm
,virt-manager
orvirtualbox
). - Optional:
make
- Python3.8 + pip
- Install service requirements with pip3 install -r requirements.txt.
run make test
https://docs.python.org/3.8/ https://python.readthedocs.io/en/latest/ https://linux.die.net https://stackoverflow.com https://askubuntu.com
Here are some of the challenges I've faced while working on this script.
-
The script requires to be run as root, since some of the actions require special access permissions (e.g. tcpdump).
-
In some cases there could occur a deadlock between the script as the parent process and the command as the child process: As mentioned, the script forks a child process to run the command and opens a pipe (subprocess.PIPE) for each stream output (stdout/stderr). However, in case the command outputs too many bytes relative to the pipe's buffer capacity, such overflow would block the child waiting for the parent to read some bytes and clear the pipe. However, since the parent only picks up the stream outputs once the child is done, using psutil.Process.communicate(), a deadlock could occur. A suggested solution for this would be to redirect the command's stdout and stderr to files, to immediately dump their contents onto files together with the creation of the child process. However, since the script intends to create the log files only if the command fails, this is a case of chicken and egg (unless files for the stream outputs are created and then deleted, which of course would prove inefficient).
-
In some cases there could arise a situation similar to race conditions with regards to call-trace and net-trace: The script forks a child process to run the given command by the user with psutil.Popen(). After that, it forks a child process to run 'strace -f -p ' and pipes strace's output to itself. It does the same with 'tcpdump -i any -w '. The setback with this approach is that by the time strace is called to attach to the child, some system calls made by the given command could be missed. A suggested fix, specific to the strace command, could be to run 'strace -f -D -o ' - this will open the given command's new process on the spot, and daemonize strace so that the command's pid is correctly retrieved, together with its stdout and stderr outputs.
-
The script is currently mainly supported in Linux/Unix systems.
- Implement a timeout mechanism on the given command, so that commands such as 'ping ' return even without passing Ctrl+C or 'kill'.
- Implement the fix suggested in challenge #3 above.
- File contents of strace log are empty in some cases.
- tcpdump does not always create a pcap file.
- Inspect using docker to simplify environment dependencies.