Minor fixes to ChildProcess #685

srett · 2025-06-02T11:06:54Z

Commit from the disk caching draft isolated into single PR.

Don't set FD_CLOEXEC, but close manually before fork: Since we don't use pipe2() to set the flag atomically, we're prone to races with concurrent fork+execs either way. As ChildProcess uses a mutex here anyways, as long as there is no other part in lms that would fork+exec, we're not any more or less safe now, but the code is shorter and we cannot fail the fcntl().
Don't make the write-end of the pipe non-blocking. Usually programs do not expect this, i.e. they either never check the return code of a write to stdout (potential data loss), or if they do, they just bail out on any error, and don't handle EAGAIN. ffmpeg seemed to handle this fine though (or we were just lucky and always read the data faster than ffmpeg could produce it).
Do not close stdin and stderr. Again ffmpeg seems to handle this, at least regarding stdin thanks to -nostdin, but in general if a process tries to write to stderr and fd 2 is not open, it might just bail out. Even worse, the process might have opened some file it wants to work with, and that file got assigned fd 2 (as that was the next free fd) - the program would corrupt whatever file it opened there whenever it tries to write to stderr. Try to open /dev/null instead for stdin and stderr, and if that fails, keep whatever lms inherited open, which should be safer than relying on the child process to handle this case properly.
exec() does not return on success, no need to check return code, any return from it is a failure.
Wrong error message in error path when assigning fd to boost stream.
F_SETPIPE_SZ requires and int, not size_t. This worked on little endian systems probably since the value passed was <2^32, but on big endian systems you'd effectively pass "0" if using a 64bit type.
Address a few clang-tidy complaints (constness)

--

I just noticed that in main(), stdin also gets closed. For similar reasons as above, this might be risky. If any part of lms, or any library it depends on ever tries to read from stdin, it will cause trouble. Checking my local lms instance, we get:

# ll /proc/63052/fd
total 0
lrwx------ 1 lms lms 64 Jun  2 12:56 0 -> 'anon_inode:[eventfd]'

That is, fd 0 (stdin) is now an event fd, used for signalling within the process. If anything reads from this fd, it will snatch away an event notification from something within lms, which could result in very subtle and confusing bugs, like something seemingly deadlocking or hanging for unspecified amounts of time.
Instead, I'd recommend also either opening /dev/null and dup2()ing it onto stdin, or leaving stdin as is; when started via a service manager like systemd, stdin is /dev/null anyways, so everything would be fine either way.

- Don't set FD_CLOEXEC, but close manually before fork: Since we don't use pipe2() to set the flag atomically, we're prone to races with concurrent fork+execs either way. As ChildProcess uses a mutex here anyways, as long as there is no other part in lms that would fork+exec, we're not any more or less safe now, but the code is shorter and we cannot fail the fcntl. - Don't make the write-end of the pipe non-blocking. Usually programs do not expect this, i.e. they either never check the return code of a write to stdout (potential data loss), or if they do, they just bail out on any error, and don't handle EAGAIN. ffmpeg seemed to handle this fine though (or we were just lucky and always read the data faster than ffmpeg could produce it). - Do not close stdin and stderr. Again ffmpeg seems to handle this, at least regarding stdin thanks to -nostdin, but in general if a process tries to write to stderr and fd 2 is not open, it might just bail out. Even worse, the process might have opened some file it wants to work with, and that file got assigned fd 2 (as that was the next free fd) - the program would corrupt whatever file it opened there whenever it tries to write to stderr. Try to open /dev/null instead for stdin and stderr, and if that fails, keep whatever lms inherited open, which should be safer than relying on the child process to handle this case properly. - exec() does not return on success, no need to check return code, any return from it is a failure. - Wrong error message in error path when assigning fd to boost stream. - F_SETPIPE_SZ requires and int, not size_t. This worked on little endian systems since the value passed was < 2^32, but on big endian systems you'd effectively pass "0" if using a 64bit type. - Address a few clang-tidy complaints (constness)

epoupon · 2025-06-02T13:33:06Z

IIRC I closed stderr because ffmpeg got stuck if we don't pull bytes from its stderr in case of many errors being written .

srett · 2025-06-02T15:23:09Z

That should be addressed by opening /dev/null as stderr. Currently the output goes god-knows-where. :D

epoupon · 2025-06-02T18:50:42Z

Don't make the write-end of the pipe non-blocking. Usually programs do not expect this, i.e. they either never check the return code of a write to stdout (potential data loss), or if they do, they just bail out on any error, and don't handle EAGAIN. ffmpeg seemed to handle this fine though (or we were just lucky and always read the data faster than ffmpeg could produce it).

For this, transcoding large files should work fine. FFmpeg just hangs while writing its output as long as the UI does not dequeue the next chunk of data. Have you checked whether it's still working correctly?

Edit: actually there are already some bugs ongoing that broke this (like connection closed despite keep alive messages).
Writing into the disk at once and reading afterwards from there will hopefully solve this.
But this raises some more questions:

what about huge files like live sessions (multi hours gigs). They can fill in the cache quite easily
currently, the player of the UI requests transcoding from an offset to be as efficient as possible (we don't want to transcode the whole file only if a small part is needed, no wait, no waste). This is now also a new parameter for OS
- should we request transcode only from the requested offset? But the result is likely not reusable
- should we transcode everything and just use ffmpeg in skip/copy mode?

epoupon · 2025-06-02T18:53:26Z

src/libs/core/impl/ChildProcess.cpp

@@ -63,32 +63,23 @@ namespace lms::core
    {
        // make sure only one thread is executing this part of code
        static std::mutex mutex;
-        std::unique_lock<std::mutex> lock{ mutex };
+        const std::lock_guard<std::mutex> lock{ mutex };


(const std::scoped_lock lock{mutex} should be enough)

Ah I wasn't sure about that. I researched all the types and what the differences are, and found that generally unique_lock has more overhead, and lock_guard and scoped_lock are almost identical. The intriguing argument for lock_guard was that you can avoid accidentally not locking anything.
This compiles:
std::scoped_lock lock;
while this doesn't: std::lock_guard<std::mutex> lock;, i.e., avoiding accidentally not passing the actual lock.

But I just checked and saw that the rest of LMS uses scoped_lock, so I'll switch to that for consistency.

srett · 2025-06-03T06:50:54Z

Don't make the write-end of the pipe non-blocking. Usually programs do not expect this, i.e. they either never check the return code of a write to stdout (potential data loss), or if they do, they just bail out on any error, and don't handle EAGAIN. ffmpeg seemed to handle this fine though (or we were just lucky and always read the data faster than ffmpeg could produce it).

If ffmpeg already stops/slows down when LMS does not fetch the next chunk of data from the pipe fast enough, then this is a very strong indication that ffmpeg sets the fd to blocking mode again on its own to be safe, or gracefully handles EGAIN - because this is usually what you want in a pipe setup: if the consuming process is reading slower than the producing process outputs, you want the producer to slow down. This is exactly what's happening when you do something like this on the shell:

some_command | grep foo

Let's assume some_command is really fast and grep really slow. If the pipe between them fills up, you want blocking mode in some_command: the write() call to the pipe/stdout should not return until some data can be put into the pipe. If stdout were in nonblocking mode, the write() call would return immediately and errno would be set to EAGAIN - but what would that help the process? It's a simple tool, and cannot do any meaningful work in the meantime either. If it were to keep producing more output, where should it put that? So, you'd call write() again, and again, until it finally succeeds, which just wastes CPU cycles and is effectively the same as using blocking I/O.

And since LMS still sets the read end of the pipe to non-blocking, any read from the pipe when there's no data available will still return immediately with EAGAIN, so it's free to do something else in the meantime. No hangs in LMS. Obviously in C++ you see nothing of this directly, since it's all hidden away under layers of abstraction that give you the async I/O interface, and it "just works". :)

For this, transcoding large files should work fine. FFmpeg just hangs while writing its output as long as the UI does not dequeue the next chunk of data. Have you checked whether it's still working correctly?

I'm using Tempo most of the time, but so far the webUI seemed to behave exactly as it should. I can use it some more at work to make sure. FWIW I've been running the changes for almost a week now, and nothing blew up yet. ;)

Edit: actually there are already some bugs ongoing that broke this (like connection closed despite keep alive messages). Writing into the disk at once and reading afterwards from there will hopefully solve this. But this raises some more questions:

I've run into this in the past with other web apps, where nginx has a default timeout for a connection to a backend that one needs to raise. I think with long transcode sessions you can run into the same problem. But it seems wt already gives you a proper notification about an aborted request, as during testing, I've seen a lot of aborted transcode sessions in the log, where the ffmpeg child gets killed properly.
But yes, when using disk caching, the whole thing is decoupled from the client's request and writes to disk as fast as possible, even when the client goes away.

what about huge files like live sessions (multi hours gigs). They can fill in the cache quite easily

Indeed - I thought about setting an upper limit for the length of transcoded tracks, and not use the cache if it's exceeded, but ironically, the longer the track, the more helpful the cache would be, as chances increase the transfer will be interrupted if you have bad wifi/mobile. And without the cache, that would mean streaming the song from the start again, if your client does not use the timeOffset request parameter (Tempo doesn't for example). I just hope disk space is cheap enough that most people will not care about a few GBs of cache. At 128kbit/s, 3GB of cache would already give you 53 hours of music. I think most people will not care as long as it enhances playback experience.

currently, the player of the UI requests transcoding from an offset to be as efficient as possible (we don't want to transcode the whole file only if a small part is needed, no wait, no waste). This is now also a new parameter for OS

should we request transcode only from the requested offset? But the result is likely not reusable

I've not addressed this yet, and in the first implementation I just skip the cache if timeOffset is != 0 for that reason. As you said, this would just spam the cache with dozens of chunks from the same song that will likely never be used again, because chances are close to zero that you'll seek to exactly the same point again.

should we transcode everything and just use ffmpeg in skip/copy mode?

I think this is the way to go, if you decide to make use of the cache for the WebUI too. Transcode once, then use -c:a copy with -ss. It should be very fast and be almost I/O bound entirely.

epoupon · 2025-06-06T06:47:48Z

I think this is the way to go, if you decide to make use of the cache for the WebUI too. Transcode once, then use -c:a copy with -ss. It should be very fast and be almost I/O bound entirely.

Yes I think both UI and subsonic backend should use the same transcoding interface.

Anyway, this PR is merged. Thanks again for your work!

epoupon reviewed Jun 2, 2025

View reviewed changes

epoupon added the enhancement label Jun 6, 2025

epoupon added this to the v3.67.0 milestone Jun 6, 2025

epoupon added bug and removed enhancement labels Jun 6, 2025

epoupon merged commit 54771e6 into epoupon:develop Jun 6, 2025
13 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Minor fixes to ChildProcess #685

Minor fixes to ChildProcess #685

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Minor fixes to ChildProcess #685

Minor fixes to ChildProcess #685

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!