Disk caching #683

srett · 2025-05-29T20:12:04Z

My attempt at implementing disk caching.
Introduces a new config variable transcode-cache-size. Set to <> 0 to enable.
It's supposed to be the maximum size of the cache later on, but there is no cleanup worker yet. In fact there's still quite some TODOs in the code, and a bit too much debug spam.
But I thought I'd rather share this sooner than later for feedback, so I tried adding comments in places I thought might benefit from explaining the reasoning behind it. Code is probably also quite ugly in places due to focus on getting things to work and lack of knowledge regarding modern C++ and boost. :)

(Please ignore the first 2 commits entirely, a proper rebase will be done once this is in acceptable shape.)

Keeping the file open allows better prefetching/caching on both the application layer (ifstream) and OS level (vfs layer) as the access pattern is sequential and predictable.

While common C++ documentation seems to repeatedly state how seekg/tellg are for setting/getting the read position in the stream, and seekp/tellp doing the same for the write position, I had to ask FSKING ChatGPT why the f*** the output files keep getting corrupted and it then telling me that on some platforms, namely Linux/POSIX, they are the same pointer since they just wrap native POSIX I/O, which only knows one shared offset for reading and writing. Good job not mentioning this anywhere important, like idk, maybe cppreference.com or such. >:(

- Don't set FD_CLOEXEC, but close manually before fork: Since we don't use pipe2 to set the flag atomically, we're prone to races with parallel fork+execs either way. As ChildProcess uses a mutex here anyways, as long as there is no other part in lms that would fork+exec, we're not any more or less safe now, but the code is shorter and we cannot fail the fcntl. - Do not close stdin and stderr. Again ffmpeg seems to handle this fine, but if a process tries to write to stderr and fd 2 is not open, it might just bail out. Even worse, the process might have opened some file it wants to work with, and that file got assigned fd 2 as that was the next free fd - the program would corrupt whatever file it opened there whenever it tries to write to stderr. Try to open /dev/null instead for stdin and stderr, and if that fails, keep whatever lms inherited. - exec does not return on success, no need to check return code, any return is a failure - wrong error message in error path when assigning fd to boost stream

epoupon · 2025-05-30T06:11:42Z

Hello!
This is not a trivial subject so thanks for working on this :)

You are right, sharing now is better, so that we can discuss general design.
As a general rule, the bigger the PR the less chance it has to be integrated :(
So first I would say that changes like in the last commit (on the ChildProcessManager) should be set in a separate PR.

Can you also rebase on develop?

epoupon · 2025-05-30T06:42:11Z

The modifications on FileResource seem to be unrelated?
Code has been copied from wt's implementation. I think ifs is not stored since a server may serve thousands of concurrent requests in parallel, and you can't have infinite file descriptors open. If it is no big deal, I would just not change that.

I think we need a transcoding service, that would be used from subsonic code or from the ui code.
If caching is enabled, that would be its job to handle it, it should not be in the scope of the "av" library which should be as low level as possible (BTW, this "av" library is expected to be heavily (if not entirely) rewritten since the ultimate plan is to drop ffmpeg at some point and directly use its underlying libraries)

So a first step would be to move existing code to a new service, that does not even do caching, and then add the caching feature.
I would shift the resource handlers from av to this new service and just expose a "ITranscoder" interface from av.

srett · 2025-05-30T07:01:15Z

You are right, sharing now is better, so that we can discuss general design. As a general rule, the bigger the PR the less chance it has to be integrated :(

Thanks for getting back this quickly! Yeah I could see that coming. Worst case at least the concept survives. ;)

So first I would say that changes like in the last commit (on the ChileProcessManager) should be set in a separate PR.

Agreed, I'll make several smaller PRs.

Regarding FileResource:
The reasoning behind it was for one the overhead of repeatedly opening and closing the same file over and over again, resulting in 3 syscalls (open, lseek, close) per chunk (256kb). Also as stated in the commit message, it might confuse the vfs layer of the OS, and if ifstream prefetches data, we'd essentially throw that away every time and re-read it on the next iteration. I guess there is no lms instance so big where this would really be the bottleneck ever, but the same probably goes for file descriptor exhaustion. :)
But the reason for the change in particular regarding disk caching was avoiding a race condition once a cleanup worker is implemented. While I agree it is mostly theoretical and unlikely in practice, you could now delete a file just being served, and the transfer would get interrupted on the next call to processRequest() where we try to open the file again. If you just open it once, the file can safely be deleted while it is still being held open by the FileResourceHandler. That's what I added the sourceGood() method for that is being checked after creating the instance, and if it returns false I assume the file vanished in the meantime, and just transcode it again.

The changes in ChildProcess in particular were preparations for the TODO I left:
https://github.com/srett/lms/blob/0a4ceb2c21c1fde3d2b271d8beae8e170958d4df/src/libs/av/impl/CachingTranscoderSession.cpp#L41-L53

Which should probably also be another PR after the basic version looks good. :)

Can you also rebase on develop?

Sure, only realized this after opening the PR

So a first step would be to move existing code to a new service, that does not even do caching, and then add the caching feature.
I would shift the resource handlers from av to this new service and just expose a "ITranscoder" interface from av.

So this should be the first step now I guess before doing any more work. Should I just try to figure out how that should look like, or would you prefer doing the refactoring of the existing code and then I add the rest?

(BTW, this "av" library is expected to be heavily (if not entirely) rewritten since the ultimate plan is to drop ffmpeg at some point and directly use its underlying libraries)

Unrelated but just my 2ct, I actually prefer the current design. It allows users to quickly swap out the binary for an older/newer version for testing, and avoids dealing with a large API surface, resource handling, and in case there is for example a corrupted file triggering a crash bug, only the ffmpeg process will die, and not the entire lms instance.

epoupon · 2025-05-30T10:24:50Z

Regarding FileResource: The reasoning behind it was for one the overhead of repeatedly opening and closing the same file over and over again, resulting in 3 syscalls (open, lseek, close) per chunk (256kb). Also as stated in the commit message, it might confuse the vfs layer of the OS, and if ifstream prefetches data, we'd essentially throw that away every time and re-read it on the next iteration. I guess there is no lms instance so big where this would really be the bottleneck ever, but the same probably goes for file descriptor exhaustion. :) But the reason for the change in particular regarding disk caching was avoiding a race condition once a cleanup worker is implemented. While I agree it is mostly theoretical and unlikely in practice, you could now delete a file just being served, and the transfer would get interrupted on the next call to processRequest() where we try to open the file again. If you just open it once, the file can safely be deleted while it is still being held open by the FileResourceHandler. That's what I added the sourceGood() method for that is being checked after creating the instance, and if it returns false I assume the file vanished in the meantime, and just transcode it again.

Fair enough! Not sure about sourceGood() though, would have just throw/catch an exception in case of file access error.

So this should be the first step now I guess before doing any more work. Should I just try to figure out how that should look like, or would you prefer doing the refactoring of the existing code and then I add the rest?

Please don't waste time on this, I prefer you to focus on more useful parts of the code :)
So I will do the first step myself asap.

Unrelated but just my 2ct, I actually prefer the current design. It allows users to quickly swap out the binary for an older/newer version for testing, and avoids dealing with a large API surface, resource handling, and in case there is for example a corrupted file triggering a crash bug, only the ffmpeg process will die, and not the entire lms instance.

Yes, I admit there are some drawbacks indeed. Last time I tried (many years ago), the ffmpeg code path to transcode had some annoying hacks, but maybe that's better now. That's the advantage of isolating this in a low level library, that should be a pure implementation detail and should have no impact on the rest of the code.
That being said, lms's av already doesn't rely on ffprobe to inspect files, so using another ffmpeg instance could introduce some discrepancies. (in order to implement the newer OpenSubsonic transcoding API, av will be used to probe and report codecs etc.)

epoupon · 2025-05-31T08:39:53Z

Ok I put what I had in mind on develop. it is largely untested, but I think it should be good enough to start from

srett · 2025-06-02T11:17:01Z

New version soon...

srett added 7 commits May 23, 2025 16:44

Test content-length stuff prototype version

6a149af

FileResourceHandler: Open file once on init, more error handling

050de63

Keeping the file open allows better prefetching/caching on both the application layer (ifstream) and OS level (vfs layer) as the access pattern is sequential and predictable.

caching transcoder compiles

7ce2e16

First version that actually somewhat works

b268356

Fixes, cleanup, comments, TODOs

79ff42d

srett mentioned this pull request Jun 2, 2025

FileResourceHandler: Open file once on init, more error handling #686

Merged

srett closed this Jun 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Disk caching #683

Disk caching #683

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Disk caching #683

Disk caching #683

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!