Crash when profiling application using `folly::coro` with `async-profiler` - possible signal-safety or reentrancy issue? · Issue #2434 · facebook/folly · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m encountering a consistent crash when profiling a native application that uses folly::coro in combination with RocksDB JNI. The crash only occurs while profiling with async-profiler, even with attempts to minimize the profiler intrusion by disabling C stack unwinding or using alternative signals:
./bin/asprof --cstack no --signal 38 -d 300 -f out.html <pid>
Here is the full backtrace:
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007dc2da04527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007dc2da0288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007dc1c118571e in ?? () from /tmp/librocksdbjni12792433653424443920.so
#6 0x00007dc1c17b0dab in rocksdb::AsyncFileReader::Wait() () from /tmp/librocksdbjni12792433653424443920.so
#7 0x00007dc1c13edefd in rocksdb::SingleThreadExecutor::add(folly::Function<void ()>) () from /tmp/librocksdbjni12792433653424443920.so
#8 0x00007dc1c1586769 in void folly::coro::TaskWithExecutor<std::vector<rocksdb::Status, std::allocator<rocksdb::Status> > >::Awaiter::await_suspend<folly::coro::detail::BlockingWaitPromise<std::vector<rocksdb::Status, std::allocator<rocksdb::Status> >&> >(std::__n4861::coroutine_handle<folly::coro::detail::BlockingWaitPromise<std::vector<rocksdb::Status, std::allocator<rocksdb::Status> >&> >) () from /tmp/librocksdbjni12792433653424443920.so
#9 0x00007dc1c1a04e37 in folly::resumeCoroutineWithNewAsyncStackRoot(std::__n4861::coroutine_handle<void>, folly::AsyncStackFrame&) () from /tmp/librocksdbjni12792433653424443920.so
#10 0x00007dc1c158e572 in folly::coro::detail::BlockingWaitTask<std::vector<rocksdb::Status, std::allocator<rocksdb::Status> >&>::get(folly::AsyncStackFrame&) && () from /tmp/librocksdbjni12792433653424443920.so
#11 0x00007dc1c156697c in ?? () from /tmp/librocksdbjni12792433653424443920.so
#12 0x00007dc1c156c631 in rocksdb::Version::MultiGetAsync(rocksdb::ReadOptions const&, rocksdb::MultiGetContext::Range*, std::unordered_map<unsigned long, std::vector<rocksdb::Version::BlobReadContext, std::allocator<rocksdb::Version::BlobReadContext> >, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::vector<rocksdb::Version::BlobReadContext, std::allocator<rocksdb::Version::BlobReadContext> > > > >*) ()
from /tmp/librocksdbjni12792433653424443920.so
#13 0x00007dc1c157009d in rocksdb::Version::MultiGet(rocksdb::ReadOptions const&, rocksdb::MultiGetContext::Range*, rocksdb::ReadCallback*) () from /tmp/librocksdbjni12792433653424443920.so
#14 0x00007dc1c13b78cc in rocksdb::DBImpl::MultiGetImpl(rocksdb::ReadOptions const&, unsigned long, unsigned long, rocksdb::autovector<rocksdb::KeyContext*, 32ul>*, rocksdb::SuperVersion*, unsigned long, rocksdb::ReadCallback*) ()
from /tmp/librocksdbjni12792433653424443920.so
#15 0x00007dc1c13bd9d9 in rocksdb::DBImpl::MultiGetWithCallbackImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::ReadCallback*, rocksdb::autovector<rocksdb::KeyContext*, 32ul>*) ()
from /tmp/librocksdbjni12792433653424443920.so
#16 0x00007dc1c13bdcc3 in rocksdb::DBImpl::MultiGetWithCallback(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::ReadCallback*, rocksdb::autovector<rocksdb::KeyContext*, 32ul>*) ()
from /tmp/librocksdbjni12792433653424443920.so
#17 0x00007dc1c18e36a2 in rocksdb::WriteBatchWithIndex::MultiGetFromBatchAndDB(rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, unsigned long, rocksdb::Slice const*, rocksdb::PinnableSlice*, rocksdb::Status*, bool, rocksdb::ReadCallback*) () from /tmp/librocksdbjni12792433653424443920.so
#18 0x00007dc1c18e479a in rocksdb::WriteBatchWithIndex::MultiGetFromBatchAndDB(rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, unsigned long, rocksdb::Slice const*, rocksdb::PinnableSlice*, rocksdb::Status*, bool) () from /tmp/librocksdbjni12792433653424443920.so
#19 0x00007dc1c18b4236 in rocksdb::TransactionBaseImpl::MultiGet(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, unsigned long, rocksdb::Slice const*, rocksdb::PinnableSlice*, rocksdb::Status*, bool) ()
from /tmp/librocksdbjni12792433653424443920.so
#20 0x00007dc1c12a1153 in Java_org_rocksdb_Transaction_multiGet__JJJ_3_3B () from /tmp/librocksdbjni12792433653424443920.so
The crash occurs consistently in code paths which involve coroutines. It never occurs when coroutines are disabled in RocksDB.
Given that this happens even with stack unwinding disabled, and only when profiling is active, my hypothesis is that signal delivery from the profiler is interrupting Folly’s coroutine machinery in a region that is not signal-safe or reentrant - for example, during coroutine suspension, resumption, or executor dispatch.
I am not deeply familiar with Folly internals, so this is purely a conjecture. It is possible that this is an issue with RocksDB or async-profiler, but this is my best guess at the moment.
Questions:
Is folly::coro intended to be signal-safe and reentrant in face of signal delivery?
Is Folly known to work (or not) with profilers that use signal injection?
Are there recommended techniques or constraints for profiling applications built on top of Folly coroutines?
Can stack unwinding be expected to work in profilers given that C++20 coroutines are "stackless" (i.e. modulo executor internals)?
Version of Folly: 8e8186f67de7a23d3a07366946b1617343927d84
Version of async-profiler: 4.0
Build type: debug
Compiler: gcc 13.3.0
Operating system: Ubuntu 24.04.2 LTS
I am ready to provide more detailed information if I can. Thanks in advance on any guidance.
I’m encountering a consistent crash when profiling a native application that uses
folly::coro
in combination with RocksDB JNI. The crash only occurs while profiling withasync-profiler
, even with attempts to minimize the profiler intrusion by disabling C stack unwinding or using alternative signals:Here is the full backtrace:
The crash occurs consistently in code paths which involve coroutines. It never occurs when coroutines are disabled in RocksDB.
Given that this happens even with stack unwinding disabled, and only when profiling is active, my hypothesis is that signal delivery from the profiler is interrupting Folly’s coroutine machinery in a region that is not signal-safe or reentrant - for example, during coroutine suspension, resumption, or executor dispatch.
I am not deeply familiar with Folly internals, so this is purely a conjecture. It is possible that this is an issue with RocksDB or
async-profiler
, but this is my best guess at the moment.Questions:
folly::coro
intended to be signal-safe and reentrant in face of signal delivery?Version of Folly:
8e8186f67de7a23d3a07366946b1617343927d84
Version of async-profiler: 4.0
Build type: debug
Compiler: gcc 13.3.0
Operating system: Ubuntu 24.04.2 LTS
I am ready to provide more detailed information if I can. Thanks in advance on any guidance.
I have opened a corresponding issue on async-profiler.
The text was updated successfully, but these errors were encountered: