-
Notifications
You must be signed in to change notification settings - Fork 63
Fix deadlock on activity onDestroy #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For me its happening when Android |
Fix a deadlock that occurs when an activity is destroyed without process termination, such as when an activity is destroyed and recreated due to a configuration change. The deadlock occurs because `notify_destroyed` blocks until `destroyed` is set to `true`. This only occurs when `WaitableNativeActivityState` is dropped, but the `WaitableNativeActivityState` instance is the very thing being used to await for destruction, resulting in a deadlock. Instead of waiting for the `WaitableNativeActivityState` to be dropped we now wait until the native `android_main` thread has stopped. So we can tell the difference between the thread not running because it hasn't started or because it has finished (in case `android_main` returns immediately) this replaces the `running` boolean with a tri-state enum. Co-authored-by: Robert Bragg <robert@sixbynine.org>
37625ec
to
a9e91f4
Compare
Ah, right, oops! Your fix generally looked good to me but one notable thing was that it looked like it could theoretically enable a different deadlock if the the spawned thread for I replaced Although it's redundant I've currently kept the It would be great if you're able to double check that this updated PR works for you @sagebind and @ardocrat |
@rib Good catch, thanks. Your patch does look better, and I just tested it and it works just as well in my app. I wasn't totally confident in my implementation since I'm not familiar with the codebase (and new to Android in general), which is why I started with a draft PR. |
Thanks for testing @sagebind It'd be good to make a patch release with this fix, and to do that before landing the GameActivity 2.0.2 update |
Sadly not fixed at my case. At logs I can see deadlock happened at So as temporary dirty fix I am just killing my process after some time to relaunch application later without problems :) |
@ardocrat ah okey, if you're seeing a reference to I guess there must be a similar bug in the |
@ardocrat I can't currently see an equivalent bug in the game-activity backend. In the game-activity backend it blocks in pthread_mutex_lock(&android_app->mutex);
android_app_write_cmd(android_app, APP_CMD_DESTROY);
while (!android_app->destroyed) {
pthread_cond_wait(&android_app->cond, &android_app->mutex);
}
pthread_mutex_unlock(&android_app->mutex); and the thing that sets static void* android_app_entry(void* param) {
// <snip>
_rust_glue_entry(android_app); // (calls android_main)
android_app_destroy(android_app); // <- should set app->destroyed = 1 to unblock android_app_free
return NULL;
} static void android_app_destroy(struct android_app* android_app) {
LOGV("android_app_destroy!");
free_saved_state(android_app);
pthread_mutex_lock(&android_app->mutex);
AConfiguration_delete(android_app->config);
android_app->destroyed = 1;
pthread_cond_broadcast(&android_app->cond);
pthread_mutex_unlock(&android_app->mutex);
// Can't touch android_app object after this.
} Are you maybe able to add a bit of logging to confirm that your applications |
Thank you @rib, I am using
I guess this problem is related to eframe's |
ah, curious. The android lifecycle model (https://developer.android.com/guide/components/activities/activity-lifecycle) will go through onPause and onStop before reaching onDestroy. From your log I see that it looks like The The comment at the end of In the NativeActivity backend we have It looks like the GameActivity glue is missing these kinds of checks to consider that the app may have already destroyed when an Activity native method (like onPause_native) gets called. We can hopefully address this with a check for app->destroyed in |
@rib At this log I tried to manually finish it before calling |
looking at the eframe implementation for close() it seems to just set a flag that results in Winit setting Changing exactly where you My guess at the moment is that calling The async finish message can then trigger because I think actually looking at this a little closer we can't address this directly in I think the right fix here is to probably make It does give me some renewed motivation to think about starting to port GameActivity to Rust though. This same issue almost certainly used to exist in the old NativeActivity backend too when it was written in C. |
@ardocrat I wonder if you could try out this local patch to static void android_app_set_activity_state(struct android_app* android_app,
int8_t cmd) {
pthread_mutex_lock(&android_app->mutex);
- android_app_write_cmd(android_app, cmd);
- while (android_app->activityState != cmd) {
- pthread_cond_wait(&android_app->cond, &android_app->mutex);
+ if (!android_app->destroyed) {
+ android_app_write_cmd(android_app, cmd);
+ while (android_app->activityState != cmd) {
+ pthread_cond_wait(&android_app->cond, &android_app->mutex);
+ }
}
pthread_mutex_unlock(&android_app->mutex);
} that doesn't account for all callbacks but should make You might also need this change for static void android_app_free(struct android_app* android_app) {
pthread_mutex_lock(&android_app->mutex);
+
+ if (!android_app->destroyed) {
+ pthread_mutex_unlock(&android_app->mutex);
+ return;
+ }
+
android_app_write_cmd(android_app, APP_CMD_DESTROY);
while (!android_app->destroyed) {
pthread_cond_wait(&android_app->cond, &android_app->mutex); |
I can also aim to push a branch that you can try out with those fixes and more for the other callbacks, but just in the middle of some other android-activity changes atm and don't want to fully switch context just yet. It's probably also worth noting that there's this PR that's looking to update the GameActivity backend and I'll also look at porting these changes to that branch if they help: #88 |
ah, okey, yeah I though you might get away with testing by just handling android_app_set_activity_state but yeah, android_app_set_window is another api that's called by one of the java callbacks that needs a guard. I made the changes locally but will try and share a branch soon. For @@ -308,6 +308,10 @@ static void android_app_set_window(struct android_app* android_app,
ANativeWindow* window) {
LOGV("android_app_set_window called");
pthread_mutex_lock(&android_app->mutex);
+ if (!android_app->destroyed) {
+ pthread_mutex_unlock(&android_app->mutex);
+ return;
+ }
if (android_app->pendingWindow != NULL) {
android_app_write_cmd(android_app, APP_CMD_TERM_WINDOW);
} |
You should have filed an issue at google. Many of us are using Anyway, I filed an issue. Please upvote 👍 |
…unning We see that some Android callbacks like `onStart()` deadlock, specifically when returning out of the main thread before running any event loop (but likely also whenver terminating the event loop), because they don't check if the thread is still even running and are otherwise guaranteed receive an `activity_state` update or other state change to unblock themselves. This is a followup to [#94] which only concerned itself with a deadlock caused by a destructor not running because that very object was kept alive to poll on the `destroyed` field that destructor was supposed to set, but its new `thread_state` can be reused to disable these condvar waits when the "sending" thread has disappeard. Separately, that PR mentions `Activity` recreates because of configuration changes which isn't supported anyway because `Activity` is still wrongly assumed to be a global singleton. [#94]: https://togithub.com/rust-mobile/android-activity/pull/94
…unning We see that some Android callbacks like `onStart()` deadlock, specifically when returning out of the main thread before running any event loop (but likely also whenver terminating the event loop), because they don't check if the thread is still even running and are otherwise guaranteed receive an `activity_state` update or other state change to unblock themselves. This is a followup to [rust-mobile#94] which only concerned itself with a deadlock caused by a destructor not running because that very object was kept alive to poll on the `destroyed` field that destructor was supposed to set, but its new `thread_state` can be reused to disable these condvar waits when the "sending" thread has disappeard. Separately, that PR mentions `Activity` recreates because of configuration changes which isn't supported anyway because `Activity` is still wrongly assumed to be a global singleton. [rust-mobile#94]: https://togithub.com/rust-mobile/android-activity/pull/94
…unning We see that some Android callbacks like `onStart()` deadlock, specifically when returning out of the main thread before running any event loop (but likely also whenever terminating the event loop), because they don't check if the thread is still even running and are otherwise guaranteed receive an `activity_state` update or other state change to unblock themselves. This is a followup to [rust-mobile#94] which only concerned itself with a deadlock caused by a destructor not running because that very object was kept alive to poll on the `destroyed` field that destructor was supposed to set, but its new `thread_state` can be reused to disable these condvar waits when the "sending" thread has disappeared. Separately, that PR mentions `Activity` recreates because of configuration changes which isn't supported anyway because `Activity` is still wrongly assumed to be a global singleton. [rust-mobile#94]: https://togithub.com/rust-mobile/android-activity/pull/94
…unning We see that some Android callbacks like `onStart()` deadlock, specifically when returning out of the main thread before running any event loop (but likely also whenever terminating the event loop), because they don't check if the thread is still even running and are otherwise guaranteed receive an `activity_state` update or other state change to unblock themselves. This is a followup to [#94] which only concerned itself with a deadlock caused by a destructor not running because that very object was kept alive to poll on the `destroyed` field that destructor was supposed to set, but its new `thread_state` can be reused to disable these condvar waits when the "sending" thread has disappeared. Separately, that PR mentions `Activity` recreates because of configuration changes which isn't supported anyway because `Activity` is still wrongly assumed to be a global singleton. [#94]: https://togithub.com/rust-mobile/android-activity/pull/94
@znakeeye nobody here is obliged to do anything, especially given how cumbersome it is to report upstream issues. We (especially Rib) has spent enough effort as it is to get and keep everything working especially for Rust, a thank-you would have been nicer instead 😉 |
just fwiw, I have filed various upstream issues for I'm not currently sure how it can work to use a I guess it's lucky the application would link with duplicate In general, it's currently only really expected that the |
More specifically; if the prefab integration for
|
Fix a deadlock that occurs when an activity is destroyed without process termination, such as when an activity is destroyed and recreated due to a configuration change.
The deadlock occurs because
notify_destroyed
blocks untildestroyed
is set totrue
. This only occurs whenWaitableNativeActivityState
is dropped, but theWaitableNativeActivityState
instance is the very thing being used to await for destruction, resulting in a deadlock.This doesn't really make much sense, so instead, wait for the Rust-side main thread to terminate to ensure cleanup can proceed.