-
-
Notifications
You must be signed in to change notification settings - Fork 892
[RFC] Log forwarding code in nginx container is brittle and may lead to process stalls #3630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Issues not for bugs, enhancement requests or discussion go stale after 21 days of inactivity. This issue will be automatically closed after 14 days. Mark the issue as fresh by simply adding a comment to the issue. |
This issue persists. |
Sorry for the radio silence, been busy with other things... #3271 is another example of the same. I agree that we should improve upon the status-quo. I am not very familiar with supervisor (I usually use daemontools) but would review a PR switching the various containers and processes to supervisor with a dedicated filter process. |
Good to hear @nextgens; no worries, I know how busy a maintainer's life is. Happy to hear! I might be able to implement that. Is there anyone we should loop into this to make sure everyone is on board on the mailu project side? Would you mind if I cleaned up the startup script a bit while I am adding supervisord? |
You can reach-out on Matrix
To be discussed; I am not opposed to it in principle |
I believe CPython handles the EINTR case pretty well, but I wouldn't trust it to never raise exceptions either, as anyone should do for any length of Python code running on CPython.
The former solution would be a small plug for a giant barrel potentially full of similar holes. I do like the second approach though, and not only does it better manage processes, it could also enable better logging. |
Hi all,
as per request in #3627, I am moving the discussion here.
Below are the key points from the other thread. Since the post copied below, @nextgens kindly pointed out that the stream forwarding functionality is required for LogFilter to work. For this reason, removing the entire stream forwarding functionality is not a suitable fix.
The remaining possibilities are:
Would a PR adding exception handling to the forwarder thread be desirable?
I gave the underlying Popen code a thorough review. What I found is, that it is surprisingly hard to use this API correctly.
I do think
forward_text_lines
still lacks error handling.https://github.com/Mailu/Mailu/blob/aa52bbb731c374a707df41588a108cedaca71fc4/core/base/libs/socrate/socrate/system.py#L163C1-L166C32
Using Popen without
text=True
as was done in the patch does help. Popen's init always usesio.open(<file>, 'wb', <bufsize>)
and wraps the streams inio.TextIOWrapper
whentext_mode=True
. So without the flag, it returns file descriptor based streams.https://github.com/python/cpython/blob/3275cb19530fb5c7115cf8313f1ada9621ed3a92/Lib/subprocess.py#L1018C13-L1033C62
So the functions being used are really:
One of the biggest potential sources of an exception would be the EINTR system error – i.e. the error that is thrown when a unix signal is sent to a process while a system call is being executed. There is some code in the above functions handling this case; there is also a lot of other very very complicated error handling and a lot of NULL propagation in the above code and I would not trust it to never raise exceptions.
A relatively easy fix would be to add exception handling to
forward_text_lines
and deliberately crash the entire process on an error:There is an even easier fix: Let the operating system handle forwarding of stdout & stderr. The
subprocess.Popen
documentation states:We could rewrite
run_process_and_forward_output
with this knowledge to let the operating system handle stream forwarding like this:This – in essence – is the functionality of
subprocess.run()
, so the code simplifies to:This should solve the issue of exceptions caused by stream forwarding comprehensively.
There are a couple more issues in the
start.sh
code caused by the script running multiple processes:./letsencrypt.py
OR./certwatcher.py
OR nothingdovecot -c /etc/dovecot/proxy.conf
/usr/sbin/nginx -g 'daemon off;'
On linux processes should be
wait(2)
-ed upon, otherwise zombie processes may be created. I suppose dovecot will take care of this itself since it deamonizes after after startup, but the python scripts won't.This script also allows for the possibility that dovecot crashes with the container and nginx staying alive; this could lead to a degraded system.
Personally, I would solve these problems by rewriting
start
to just set LD_PRELOAD to load the hardened malloc implementation, provision the configuration, create a supervisord config to start all the subprocesses and finallyexecl
into supervisord.This would preserve the current functionality and outsource all the complex process management to [supervisord](http://supervisord.org/introduction.html].
https://github.com/Mailu/Mailu/blob/aa52bbb731c374a707df41588a108cedaca71fc4/core/nginx/start.py#L15C1-L18C42
The text was updated successfully, but these errors were encountered: