8000 Crash when profiling application using `folly::coro` with `async-profiler` - possible signal-safety or reentrancy issue? · Issue #2434 · facebook/folly · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Crash when profiling application using folly::coro with async-profiler - possible signal-safety or reentrancy issue? #2434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mirgee opened this issue May 8, 2025 · 1 comment

Comments

@mirgee
Copy link
mirgee commented May 8, 2025

I’m encountering a consistent crash when profiling a native application that uses folly::coro in combination with RocksDB JNI. The crash only occurs while profiling with async-profiler, even with attempts to minimize the profiler intrusion by disabling C stack unwinding or using alternative signals:

./bin/asprof --cstack no --signal 38 -d 300 -f out.html <pid>

Here is the full backtrace:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007dc2da04527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007dc2da0288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007dc1c118571e in ?? () from /tmp/librocksdbjni12792433653424443920.so
#6  0x00007dc1c17b0dab in rocksdb::AsyncFileReader::Wait() () from /tmp/librocksdbjni12792433653424443920.so
#7  0x00007dc1c13edefd in rocksdb::SingleThreadExecutor::add(folly::Function<void ()>) () from /tmp/librocksdbjni12792433653424443920.so
#8  0x00007dc1c1586769 in void folly::coro::TaskWithExecutor<std::vector<rocksdb::Status, std::allocator<rocksdb::Status> > >::Awaiter::await_suspend<folly::coro::detail::BlockingWaitPromise<std::vector<rocksdb::Status, std::allocator<rocksdb::Status> >&> >(std::__n4861::coroutine_handle<folly::coro::detail::BlockingWaitPromise<std::vector<rocksdb::Status, std::allocator<rocksdb::Status> >&> >) () from /tmp/librocksdbjni12792433653424443920.so
#9  0x00007dc1c1a04e37 in folly::resumeCoroutineWithNewAsyncStackRoot(std::__n4861::coroutine_handle<void>, folly::AsyncStackFrame&) () from /tmp/librocksdbjni12792433653424443920.so
#10 0x00007dc1c158e572 in folly::coro::detail::BlockingWaitTask<std::vector<rocksdb::Status, std::allocator<rocksdb::Status> >&>::get(folly::AsyncStackFrame&) && () from /tmp/librocksdbjni12792433653424443920.so
#11 0x00007dc1c156697c in ?? () from /tmp/librocksdbjni12792433653424443920.so
#12 0x00007dc1c156c631 in rocksdb::Version::MultiGetAsync(rocksdb::ReadOptions const&, rocksdb::MultiGetContext::Range*, std::unordered_map<unsigned long, std::vector<rocksdb::Version::BlobReadContext, std::allocator<rocksdb::Version::BlobReadContext> >, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::vector<rocksdb::Version::BlobReadContext, std::allocator<rocksdb::Version::BlobReadContext> > > > >*) ()
   from /tmp/librocksdbjni12792433653424443920.so
#13 0x00007dc1c157009d in rocksdb::Version::MultiGet(rocksdb::ReadOptions const&, rocksdb::MultiGetContext::Range*, rocksdb::ReadCallback*) () from /tmp/librocksdbjni12792433653424443920.so
#14 0x00007dc1c13b78cc in rocksdb::DBImpl::MultiGetImpl(rocksdb::ReadOptions const&, unsigned long, unsigned long, rocksdb::autovector<rocksdb::KeyContext*, 32ul>*, rocksdb::SuperVersion*, unsigned long, rocksdb::ReadCallback*) ()
   from /tmp/librocksdbjni12792433653424443920.so
#15 0x00007dc1c13bd9d9 in rocksdb::DBImpl::MultiGetWithCallbackImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::ReadCallback*, rocksdb::autovector<rocksdb::KeyContext*, 32ul>*) ()
   from /tmp/librocksdbjni12792433653424443920.so
#16 0x00007dc1c13bdcc3 in rocksdb::DBImpl::MultiGetWithCallback(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::ReadCallback*, rocksdb::autovector<rocksdb::KeyContext*, 32ul>*) ()
   from /tmp/librocksdbjni12792433653424443920.so
#17 0x00007dc1c18e36a2 in rocksdb::WriteBatchWithIndex::MultiGetFromBatchAndDB(rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, unsigned long, rocksdb::Slice const*, rocksdb::PinnableSlice*, rocksdb::Status*, bool, rocksdb::ReadCallback*) () from /tmp/librocksdbjni12792433653424443920.so
#18 0x00007dc1c18e479a in rocksdb::WriteBatchWithIndex::MultiGetFromBatchAndDB(rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, unsigned long, rocksdb::Slice const*, rocksdb::PinnableSlice*, rocksdb::Status*, bool) () from /tmp/librocksdbjni12792433653424443920.so
#19 0x00007dc1c18b4236 in rocksdb::TransactionBaseImpl::MultiGet(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, unsigned long, rocksdb::Slice const*, rocksdb::PinnableSlice*, rocksdb::Status*, bool) ()
   from /tmp/librocksdbjni12792433653424443920.so
#20 0x00007dc1c12a1153 in Java_org_rocksdb_Transaction_multiGet__JJJ_3_3B () from /tmp/librocksdbjni12792433653424443920.so

The crash occurs consistently in code paths which involve coroutines. It never occurs when coroutines are disabled in RocksDB.

Given that this happens even with stack unwinding disabled, and only when profiling is active, my hypothesis is that signal delivery from the profiler is interrupting Folly’s coroutine machinery in a region that is not signal-safe or reentrant - for example, during coroutine suspension, resumption, or executor dispatch.

I am not deeply familiar with Folly internals, so this is purely a conjecture. It is possible that this is an issue with RocksDB or async-profiler, but this is my best guess at the moment.

Questions:

  • Is folly::coro intended to be signal-safe and reentrant in face of signal delivery?
  • Is Folly known to work (or not) with profilers that use signal injection?
  • Are there recommended techniques or constraints for profiling applications built on top of Folly coroutines?
  • Can stack unwinding be expected to work in profilers given that C++20 coroutines are "stackless" (i.e. modulo executor internals)?

Version of Folly: 8e8186f67de7a23d3a07366946b1617343927d84
Version of async-profiler: 4.0
Build type: debug
Compiler: gcc 13.3.0
Operating system: Ubuntu 24.04.2 LTS

I am ready to provide more detailed information if I can. Thanks in advance on any guidance.

I have opened a corresponding issue on async-profiler.

@mirgee
Copy link
Author
mirgee commented May 12, 2025

The crash in RocksDB turns out to be caused by this issue.

@mirgee mirgee closed this as completed May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@mirgee and others
0