8000 Minor fixes to ChildProcess by srett · Pull Request #685 · epoupon/lms · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Minor fixes to ChildProcess #685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 6, 2025

Conversation

srett
Copy link
Contributor
@srett srett commented Jun 2, 2025

Commit from the disk caching draft isolated into single PR.

  • Don't set FD_CLOEXEC, but close manually before fork: Since we don't use pipe2() to set the flag atomically, we're prone to races with concurrent fork+execs either way. As ChildProcess uses a mutex here anyways, as long as there is no other part in lms that would fork+exec, we're not any more or less safe now, but the code is shorter and we cannot fail the fcntl().
  • Don't make the write-end of the pipe non-blocking. Usually programs do not expect this, i.e. they either never check the return code of a write to stdout (potential data loss), or if they do, they just bail out on any error, and don't handle EAGAIN. ffmpeg seemed to handle this fine though (or we were just lucky and always read the data faster than ffmpeg could produce it).
  • Do not close stdin and stderr. Again ffmpeg seems to handle this, at least regarding stdin thanks to -nostdin, but in general if a process tries to write to stderr and fd 2 is not open, it might just bail out. Even worse, the process might have opened some file it wants to work with, and that file got assigned fd 2 (as that was the next free fd) - the program would corrupt whatever file it opened there whenever it tries to write to stderr. Try to open /dev/null instead for stdin and stderr, and if that fails, keep whatever lms inherited open, which should be safer than relying on the child process to handle this case properly.
  • exec() does not return on success, no need to check return code, any return from it is a failure.
  • Wrong error message in error path when assigning fd to boost stream.
  • F_SETPIPE_SZ requires and int, not size_t. This worked on little endian systems probably since the value passed was <2^32, but on big endian systems you'd effectively pass "0" if using a 64bit type.
  • Address a few clang-tidy complaints (constness)

--

I just noticed that in main(), stdin also gets closed. For similar reasons as above, this might be risky. If any part of lms, or any library it depends on ever tries to read from stdin, it will cause trouble. Checking my local lms instance, we get:

# ll /proc/63052/fd
total 0
lrwx------ 1 lms lms 64 Jun  2 12:56 0 -> 'anon_inode:[eventfd]'

That is, fd 0 (stdin) is now an event fd, used for signalling within the process. If anything reads from this fd, it will snatch away an event notification from something within lms, which could result in very subtle and confusing bugs, like something seemingly deadlocking or hanging for unspecified amounts of time.
Instead, I'd recommend also either opening /dev/null and dup2()ing it onto stdin, or leaving stdin as is; when started via a service manager like systemd, stdin is /dev/null anyways, so everything would be fine either way.

- Don't set FD_CLOEXEC, but close manually before fork:
  Since we don't use pipe2() to set the flag atomically, we're prone to
  races with concurrent fork+execs either way. As ChildProcess uses a
  mutex here anyways, as long as there is no other part in lms that
  would fork+exec, we're not any more or less safe now, but the code
  is shorter and we cannot fail the fcntl.
- Don't make the write-end of the pipe non-blocking. Usually programs
  do not expect this, i.e. they either never check the return code of
  a write to stdout (potential data loss), or if they do, they just
  bail out on any error, and don't handle EAGAIN. ffmpeg seemed
  to handle this fine though (or we were just lucky and always read
  the data faster than ffmpeg could produce it).
- Do not close stdin and stderr. Again ffmpeg seems to handle this,
  at least regarding stdin thanks to -nostdin, but in general if a
  process tries to write to stderr and fd 2 is not open, it might
  just bail out. Even worse, the process might have opened some
  file it wants to work with, and that file got assigned fd 2 (as
  that was the next free fd) - the program would corrupt whatever
  file it opened there whenever it tries to write to stderr.
  Try to open /dev/null instead for stdin and stderr, and if that
  fails, keep whatever lms inherited open, which should be safer
  than relying on the child process to handle this case properly.
- exec() does not return on success, no need to check return code,
  any return from it is a failure.
- Wrong error message in error path when assigning fd to boost stream.
- F_SETPIPE_SZ requires and int, not size_t. This worked on little
  endian systems since the value passed was < 2^32, but on big
  endian systems you'd effectively pass "0" if using a 64bit type.
- Address a few clang-tidy complaints (constness)
@epoupon
Copy link
Owner
epoupon commented Jun 2, 2025

IIRC I closed stderr because ffmpeg got stuck if we don't pull bytes from its stderr in case of many errors being written .

@srett
Copy link
Contributor Author
srett commented Jun 2, 2025

That should be addressed by opening /dev/null as stderr. Currently the output goes god-knows-where. :D

@epoupon
Copy link
Owner
epoupon commented Jun 2, 2025

Don't make the write-end of the pipe non-blocking. Usually programs do not expect this, i.e. they either never check the return code of a write to stdout (potential data loss), or if they do, they just bail out on any error, and don't handle EAGAIN. ffmpeg seemed to handle this fine though (or we were just lucky and always read the data faster than ffmpeg could produce it).

For this, transcoding large files should work fine. FFmpeg just hangs while writing its output as long as the UI does not dequeue the next chunk of data. Have you checked whether it's still working correctly?

Edit: actually there are already some bugs ongoing that broke this (like connection closed despite keep alive messages).
Writing into the disk at once and reading afterwards from there will hopefully solve this.
But this raises some more questions:

  • what about huge files like live sessions (multi hours gigs). They can fill in the cache quite easily
  • currently, the player of the UI requests transcoding from an offset to be as efficient as possible (we don't want to transcode the whole file only if a small part is needed, no wait, no waste). This is now also a new parameter for OS
    • should we request transcode only from the requested offset? But the result is likely not reusable
    • should we transcode everything and just use ffmpeg in skip/copy mode?

@@ -63,32 +63,23 @@ namespace lms::core
{
// make sure only one thread is executing this part of code
static std::mutex mutex;
std::unique_lock<std::mutex> lock{ mutex };
const std::lock_guard<std::mutex> lock{ mutex };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(const std::scoped_lock lock{mutex} should be enough)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I wasn't sure about that. I researched all the types and what the differences are, and found that generally unique_lock has more overhead, and lock_guard and scoped_lock are almost identical. The intriguing argument for lock_guard was that you can avoid accidentally not locking anything.
This compiles:
std::scoped_lock lock;
while this doesn't: std::lock_guard<std::mutex> lock;, i.e., avoiding accidentally not passing the actual lock.

But I just checked and saw that the rest of LMS uses scoped_lock, so I'll switch to that for consistency.

@srett
Copy link
Contributor Author
srett commented Jun 3, 2025

Don't make the write-end of the pipe non-blocking. Usually programs do not expect this, i.e. they either never check the return code of a write to stdout (potential data loss), or if they do, they just bail out on any error, and don't handle EAGAIN. ffmpeg seemed to handle this fine though (or we were just lucky and always read the data faster than ffmpeg could produce it).

If ffmpeg already stops/slows down when LMS does not fetch the next chunk of data from the pipe fast enough, then this is a very strong indication that ffmpeg sets the fd to blocking mode again on its own to be safe, or gracefully handles EGAIN - because this is usually what you want in a pipe setup: if the consuming process is reading slower than the producing process outputs, you want the producer to slow down. This is exactly what's happening when you do something like this on the shell:

some_command | grep foo

Let's assume some_command is really fast and grep really slow. If the pipe between them fills up, you want blocking mode in some_command: the write() call to the pipe/stdout should not return until some data can be put into the pipe. If stdout were in nonblocking mode, the write() call would return immediately and errno would be set to EAGAIN - but what would that help the process? It's a simple tool, and cannot do any meaningful work in the meantime either. If it were to keep producing more output, where should it put that? So, you'd call write() again, and again, until it finally succeeds, which just wastes CPU cycles and is effectively the same as using blocking I/O.

And since LMS still sets the read end of the pipe to non-blocking, any read from the pipe when there's no data available will still return immediately with EAGAIN, so it's free to do something else in the meantime. No hangs in LMS. Obviously in C++ you see nothing of this directly, since it's all hidden away under layers of abstraction that give you the async I/O interface, and it "just works". :)

For this, transcoding large files should work fine. FFmpeg just hangs while writing its output as long as the UI does not dequeue the next chunk of data. Have you checked whether it's still working correctly?

I'm using Tempo most of the time, but so far the webUI seemed to behave exactly as it should. I can use it some more at work to make sure. FWIW I've been running the changes for almost a week now, and nothing blew up yet. ;)

Edit: actually there are already some bugs ongoing that broke this (like connection closed despite keep alive messages). Writing into the disk at once and reading afterwards from there will hopefully solve this. But this raises some more questions:

I've run into this in the past with other web apps, where nginx has a default timeout for a connection to a backend that one needs to raise. I think with long transcode sessions you can run into the same problem. But it seems wt already gives you a proper notification about an aborted request, as during testing, I've seen a lot of aborted transcode sessions in the log, where the ffmpeg child gets killed properly.
But yes, when using disk caching, the whole thing is decoupled from the client's request and writes to disk as fast as possible, even when the client goes away.

  • what about huge files like live sessions (multi hours gigs). They can fill in the cache quite easily

Indeed - I thought about setting an upper limit for the length of transcoded tracks, and not use the cache if it's exceeded, but ironically, the longer the track, the more helpful the cache would be, as chances increase the transfer will be interrupted if you have bad wifi/mobile. And without the cache, that would mean streaming the song from the start again, if your client does not use the timeOffset request parameter (Tempo doesn't for example). I just hope disk space is cheap enough that most people will not care about a few GBs of cache. At 128kbit/s, 3GB of cache would already give you 53 hours of music. I think most people will not care as long as it enhances playback experience.

  • currently, the player of the UI requests transcoding from an offset to be as efficient as possible (we don't want to transcode the whole file only if a small part is needed, no wait, no waste). This is now also a new parameter for OS

  • should we request transcode only from the requested offset? But the result is likely not reusable

I've not addressed this yet, and in the first implementation I just skip the cache if timeOffset is != 0 for that reason. As you said, this would just spam the cache with dozens of chunks from the same song that will likely never be used again, because chances are close to zero that you'll seek to exactly the same point again.

  • should we transcode everything and just use ffmpeg in skip/copy mode?

I think this is the way to go, if you decide to make use of the cache for the WebUI too. Transcode once, then use -c:a copy with -ss. It should be very fast and be almost I/O bound entirely.

@epoupon epoupon added this to the v3.67.0 milestone Jun 6, 2025
@epoupon epoupon added bug and removed enhancement labels Jun 6, 2025
@epoupon epoupon merged commit 54771e6 into epoupon:develop Jun 6, 2025
13 of 19 checks passed
@epoupon
Copy link
Owner
epoupon commented Jun 6, 2025

I think this is the way to go, if you decide to make use of the cache for the WebUI too. Transcode once, then use -c:a copy with -ss. It should be very fast and be almost I/O bound entirely.

Yes I think both UI and subsonic backend should use the same transcoding interface.

Anyway, this PR is merged. Thanks again for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0