8000 Fix: Resolve deadlock hang on macOS during Jinx startup by jeff-phil · Pull Request #91 · minad/jinx · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix: Resolve deadlock hang on macOS during Jinx startup #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Fix: Resolve deadlock hang on macOS during Jinx startup #91

wants to merge 1 commit into from

Conversation

jeff-phil
Copy link

Changes to resolve a fatal deadlock hang that occurred on macOS during Jinx startup, due to GUI deadlock from AppleSpell dictionary provider.

Changes to resolve a fatal deadlock hang that occurred on macOS during
Jinx startup, due to GUI deadlock from AppleSpell dictionary provider.
* Add thread-safety when loading dictionary providers using a mutex.
* Fixes issues #29 and #48.
@jeff-phil
Copy link
Author

@minad Did you want all the gory details of why this hang happens here in PR, in a new informational issue, etc.

@minad
Copy link
Owner
minad commented Jul 13, 2023

Hi! Thank you for investigating this. Yes, I would be interested in the background information. Please either add it here or in #29 or #48 such that the information is not lost in the future.

Regarding this PR - I cannot merge it in this form, since it intervenes on the wrong level. The fix should happen on a lower level:

  1. In the Emacs module loading infrastructure. Could this already be the case? See Freeze on osx #48, where it was claimed that the issue is fixed in Emacs 30.
  2. In the AppleSpell dictionary provider of libenchant
  3. In libenchant itself

I am not sure how libenchant works with respect to threads but one can reasonably expect that the library is not thread safe. In that case the dictionary broker allocated from thread A and all its dictionary objects must be usable from thread A, but probably not from thread B. Your fix is problematic in that respect since it creates a separate thread A to allocate the dictionary and then uses it from the gui thread B.

As a workaround - I wonder if the bug also happens for other backends, e.g., if you disable AppleSpell and use the Hunspell provider instead. See https://github.com/minad/jinx#enchant-backends-and-personal-dictionaries.

@minad minad closed this Jul 13, 2023
@jeff-phil
Copy link
Author

Thanks for the review and response. I'll reply to your questions, and then come back here with details.

Like most things, it could be addressed lots of places, but probably won't and definitely not with 4 lines of code. [smile]

The thread is actually just a short-lived thread (nanoseconds), and only to initialize the dictionaries. As soon as it exits the lisp jinx--load-dicts-thread function it's gone.

In the same app/process heap, any objects created across threads of the app will have same access as if from a single thread. So thread A to allocate and thread B use isn't a problem. The thread-safety is handled by the mutex, which the only reason for using mutex is because jinx--load-dicts is called from 2 places currently, and just in case first one isn't finished the mutex protects it from potential deadlocks in enchant. Hopefully that makes sense on why the fix isn't problematic.

There is not currently a way to disable AppleSpell provider with enchant.ordering. I looked into that originally, but it does not stop the shared object from loading through enchant_broker_init. I suspect it's due to enchant-lsmod-2 needing to show all providers and languages in order to configure enchant.ordering.

I tried to research other backends, but didn't find anything concrete. E.g. Emacs itself integrates enchant-2, but uses command-line interface, not the API's.

With regards to Emacs 30, I think part of the issue comes from the fact that many macos emacs folks use the Mitsuharu Yamamoto Apple Emacs "port" that more integrates into OSX with Cocoa framework than Carbon and is far ah 8000 ead of Carbon in usability. Since there is no work yet from Yamamoto for Emacs 30, I'm guessing they are using Carbon port which doesn't hang.

More details in next thread....

@minad
Copy link
Owner
minad commented Jul 15, 2023

Thanks. Sure, these threads are short lived. But yet, complex objects created in some thread can still be connected to that thread via thread local memory. This may not be the case here, but this is not the correct level to handle the issue. If the issue is due to the Cocoa port, it indicates that it should be fixed there. I wonder why the deadlock happens in the first place.

@minad
Copy link
Owner
minad commented Jul 15, 2023

There is not currently a way to disable AppleSpell provider with enchant.ordering. I looked into that originally, but it does not stop the shared object from loading through enchant_broker_init. I suspect it's due to enchant-lsmod-2 needing to show all providers and languages in order to configure enchant.ordering.

Deletion of the broken object will prevent loading it. :)

@jeff-phil
Copy link
Author
jeff-phil commented Jul 15, 2023

Reported in reported issues #29 and #48, was issue of jinx-mode hanging Emacs as soon as the minor mode is enabled. The hang is a complete deadlock requiring SIGKILL to force quit Emacs.

The steps that were happening:

  1. Load jinx package: (require 'jinx).
  2. Enable the mode: (jinx-mode).
  3. This loads jinx-mod.dylib using module-load (Note: should use just a basic (require 'jinx-mod) vs. module-load. See: https://www.gnu.org/software/emacs/manual/html_node/elisp/Dynamic-Modules.html#index-module_002dload). It is compiled if not done already.
  4. Next big step: (jinx--mod-dict) is called for each language variant which is a call into jinx-mod and resolves to jinx_dict(...) function.
  5. About the first thing jinx_dict does is call: jinx_broker(void)
  6. Which initializes Enchant providers via call: enchant_broker_init() which loads all found dictionary providers dynamic libraries.
  7. One of the default providers on macos is AppleSpell. It is loaded and initialized and then starts to do several Cocoa activities, such as loading Systems Services for Emacs in the menubar. (Note: this is where the hang happens.)

Among most important activity to troubleshoot the issue was to start or attach Emacs to lldb debugger and recreate. Stepping through the code I was able to get to the part of the jinx-mod.c code that preceded the hang, the jinx_broker(void) function. That function initializes the Enchant-2 library through enchant_broker_init(), which in turn loads all Spelling providers such as Aspell, Hunspell, and even AppleSpell.

After triggering the hang from enchant_broker_init(), interrupting the process in lldb I was to get backtraces for all threads. Looking through the backtraces, I was able to determine this was the deadlock:

image

In the backtrace, I found it was interesting that from AppleSpell checker loading it triggered NSMenu. Then I remembered that macos will load "System Services" menu items based on the app. I was not able to find documentation or details around this, but I guess the AppleSpell decides that because it was called it must need other text services.

Here is the System Services menu for Emacs before loading jinx:
image

And then (sometimes) after loading jinx, more options are added:

image

In the backtrace, I suspect like most modern GUI apps the problem was related to UI thread starvation. In that, AppleSpell trying to load menus and notify system of new menus - and was waiting for it's thread to fire and finish, but the main Emacs thread was waiting on Enchant to complete. Result was a deadlock.

Ultimately, this is a macos Cocoa framework issue, because older Carbon don't have the same level of eventing. The Carbon Emacs macos port won't be affected by this, but IMHO most macos Emacs users use Cocoa port.

The fix in this PR #91 was to ensure that the Cocoa framework had an active thread to finish sending its notifications. This was a basic fix to ensure that (jinx--mod-dict) (step 4) is called from a transient thread that allows it to fire and forget. Since (jinx--mod-dict) is called from 2 places around the same time, to protect on it stepping on itself a mutex around the function calling (jinx--mod-dict) was used which will ensure the first call is completed before second starts.

The patch successfully solves the deadlock. And is likely a good protection even if not on Apple to prevent dynamic libraries being loaded from hanging Emacs. In effect, sandboxing library loads.

Since the PR will not be merged, a fix for people having this would be something such as the following in emacs init script.

Generic:

(require 'jinx)
(jinx--load-module)
(make-thread #'(lambda()(jinx--mod-dict "any-lang") "my/jinx-init"))
;; ... can load (jinx-mode) now

Or Use Package example:

(use-package jinx
  ;; DO NOT DEFER when (eq window-system 'mac)
  ;;:defer t
  :config
  (jinx--load-module)
  (make-thread #'(lambda()(jinx--mod-dict "any-lang") "my/jinx-init"))
  :hook (emacs-startup . global-jinx-mode)
  :bind (("M-$" . jinx-correct)
         ("C-M-$" . jinx-languages)))

@jeff-phil
Copy link
Author

I wonder why the deadlock happens in the first place.

Lots of details above. Enjoy!

If the issue is due to the Cocoa port, it indicates that it should be fixed there.

"Works on my laptop." :)

We're engineers and constantly up against bugs or "working as designed" stuff all the time as we implement components, write code, etc. regardless of origin of problems.

I love all your projects, super talent and tireless commitment! Thus, I am trying to give back even a little bit. And I assumed most others wouldn't be able to solve this themselves. Hopefully this helped!

@minad
Copy link
Owner
minad commented Jul 15, 2023

Yes, your workaround can help others facing this problem.

"Works on my laptop." :)

It is okay if you disagree with my decision. Your workaround is still breaking an abstraction on the high level, which makes it unacceptable in my book. It is an unfortunate interaction of the broker initialization and the event loop. It seems the broker initialization on Cocoa should always happen outside the event loop thread. I am not entirely sure where the fix should go. Maybe in the Emacs port, maybe in Enchant, maybe here.

I understand that you want to help and see this fixed. But as a maintainer I retain the freedom to reject workarounds which I don't want to maintain.

@jeff-phil
Copy link
Author

Sorry, I must not be communicating well! I did not mean any of what I put as disparaging or disagreeing with your decision, but seems maybe came across that way?

Using less words to say the same thing as my last post:

  "I have it working on my laptop, and hopefully the info posted helps others. Thanks for your hard work!"

@minad
Copy link
Owner
minad commented Jul 16, 2023

Thanks, I misunderstood your statement about "works in my laptop".

@minad minad mentioned this pull request Sep 9, 2023
pkryger pushed a commit to pkryger/jinx that referenced this pull request Sep 23, 2023
This applies changes from minad#91
@jdtsmith
Copy link

I sent a note to Mitsuharu pointing here. There is some confusion about the multiple Emacs ports on MacOS. There are only two. This README gives a great overview of the history. I prefer to call them NS (mostly Cocoa, the officially distributed port) and Carbon (also by now mostly Cocoa, but some Carbon remnants, distributed by the project until Emacs >22). I use this terminology since M-x emacs-version mentions either NS or Carbon.

I can confirm this thread lockup still exists in an Emacs 30 master build of Carbon emacs-mac. The user reporting that Emacs 30 fixes it must have switched to an NS build. The simple use-package workaround above unfortunately is sporadic in fixing it, but the following heavy hammer does the trick:

cd /opt/homebrew/lib/enchant-2/
mkdir bad
mv enchant_applespell.so bad/

BTW, a simple approach to getting a traceback is just right-click on a frozen app's Dock icon/Force Quit/Report....:

Traceback
  11  funcall_module + 860 (emacs-module.c:1183,21 in Emacs + 2035288) [0x101198e58] 1-11
  11  jinx_dict + 120 (jinx-mod.dylib + 14904) [0x10c81fa38] 1-11
   11  enchant_broker_init + 308 (libenchant-2.2.dylib + 20012) [0x10c8d4e2c] 1-11
    11  ??? (enchant_applespell.so + 29168) [0x10ecab1f0] 1-11
     11  ??? (enchant_applespell.so + 27448) [0x10ecaab38] 1-11
      11  +[NSSpellChecker sharedSpellChecker] + 140 (AppKit + 2025568) [0x184617860] 1-11
       11  _dispatch_once_callout + 32 (libdispatch.dylib + 20812) [0x1809fa14c] 1-11
        11  _dispatch_client_callout + 20 (libdispatch.dylib + 14608) [0x1809f8910] 1-11
         11  __36+[NSSpellChecker sharedSpellChecker]_block_invoke + 20 (AppKit + 2025592) [0x184617878] 1-11
          11  -[NSSpellChecker init] + 252 (AppKit + 2025860) [0x184617984] 1-11
           11  -[NSSpellChecker _fillSpellCheckerPopupButton:] + 76 (AppKit + 2029636) [0x184618844] 1-11
            11  -[NSApplication(NSServicesMenuPrivate) _fillSpellCheckerPopupButton:] + 1380 (AppKit + 2031860) [0x1846190f4] 1-11
             11  -[NSMenu insertItem:atIndex:] + 520 (AppKit + 194208) [0x1844586a0] 1-
8000
11
              11  -[NSNotificationCenter postNotificationName:object:userInfo:] + 88 (Foundation + 36848) [0x181d3dff0] 1-11
               11  _CFXNotificationPost + 828 (CoreFoundation + 266780) [0x180c4b21c] 1-11
                11  -[NSOperation waitUntilFinished] + 512 (Foundation + 312364) [0x181d8142c] 1-11
                 11  __psynch_cvwait + 8 (libsystem_kernel.dylib + 20588) [0x180b6d06c] 1-11
                  *11  psynch_cvcontinue + 0 (com.apple.kec.pthread + 18204) [0xfffffe000b7bd1bc] 1-11

@jeff-phil
Copy link
Author

The simple use-package workaround above unfortunately is sporadic in fixing it

Seems like the :defer t causes my use-package work-around not to work. Same for you? I'll edit above to note that.

but the following heavy hammer does the trick:

cd /opt/homebrew/lib/enchant-2/
mkdir bad
mv enchant_applespell.so bad/

with the down-side with removing, now can't use Apple Spell - which would include existing "learned" words from outside of emacs, etc.

@jdtsmith
Copy link
jdtsmith commented Jan 16, 2024

now can't use Apple Spell - which would include existing "learned" words from outside of emacs, etc.

AppleSpell doesn't seem possible to use anyway based on enchant.ordering bugs.
For the latter, I just link ~/.config/enchant/en_US.dic -> ~/Library/Spelling/LocalDictionary and this works fine. You can do a quick:

launchctl stop com.apple.applespell
launchctl start com.apple.applespell

to update local words.

@jeff-phil
Copy link
Author

AppleSpell doesn't seem possible to use anyway based on enchant.ordering bugs.

Works just fine for me with: jinx & emacs, but probably not the best place for discussion. :)

If interested, here's a quick gist so we don't fill up minad's inbox: https://gist.github.com/jeff-phil/c95b72a14d3b4310de7c96e05b3cdc48

@minad
Copy link
Owner
minad commented Jan 16, 2024

If interested, here's a quick gist so we don't fill up minad's inbox: https://gist.github.com/jeff-phil/c95b72a14d3b4310de7c96e05b3cdc48

I don't mind. I can click on unsubscribe if I'd like.

I honestly don't know where this should be fixed. Emacs has its single display thread, modules must obviously be loaded there, but this interferes with the Carbon event loop. Fixing this in Emacs doesn't seem correct. Enchant doesn't know anything about thread interactions. All it cares about is that brokers are not thread safe and are not used across threads (which we don't do). Jinx at the top knows about the problem with the interfering threads but adding the fix here breaks the abstraction. The best place might be enchant_applespell.so since the backend also knows that AppleSpell is a little fragile and needs its special initialization such that it doesn't lock up. However it may not be guaranteed that this would fix all Enchant+AppleSpell applications. This would then indicate that Enchant+AppleSpell is just impossible without breaking the abstraction, since AppleSpell has to know about the application event loop but Enchant obviously doesn't expose such things. In that case, moving enchant_applespell.so to bad/ is indeed the right thing to do ;)

@jeff-phil
Copy link
Author

I honestly don't know where this should be fixed

Yes, not completely sure either. I tried a variety of ways back when first looking at this to get it to not hang. Ultimately, Apple putting a param not to include in service menu for spellcheck when Apple Spell library loaded in a Carbon app would be great. Next I think Emacs Carbon would be next since it "should" probably be the one to handle the nuances from Apple, but I didn't have time or know-how to dig in more than I could. I'll try to dig in at some point again and see if can find an upstream from jinx fix.

The best place might be enchant_applespell.so since the backend also knows...

The problem there is enchant is not a gui application, and this only happens in gui. If run in terminal (emacs -nw) then does not hang.

moving enchant_applespell.so to bad/ is indeed the right thing to do

It gives choices for folks! The use-package work-around I put above is not pretty, but works for me. I'd prefer to keep enchant_applespell.so, since enchant is used by more than just jinx but other common unix packages, and other dev packages like for pyenchant, etc. Others may just want to remove, but will have to remember to do it each time brew updates enchant-2 package.

I don't mind.

Thanks again and as always for putting up with this silliness and still finding time to crank out phenomenal packages!

@jeff-phil
Copy link
Author

Well, I dug in and came up with a proper (non-hacky) patch for "mac" Carbon build of emacs. I'll get submitted. Thanks!

@pkryger
Copy link
pkryger commented Jan 17, 2024

Well, I dug in and came up with a proper (non-hacky) patch for "mac" Carbon build of emacs. I'll get submitted. Thanks!

@jeff-phil please share details of the patch here - I'm interested as well.

@jeff-phil
Copy link
Author
jeff-phil commented Jan 17, 2024

I've been way overthinking this previously, as I now take a fresh look.

It may take some time to get included upstream, so this new workaround which initializes the NSSpellChecker prior to jinx by simply running an AppleScript to do the same thing for users having issue. And if/when the bottom patch makes it in, this won't cause any issues to run again.

(use-package jinx
  :init
  (when (eq window-system 'mac)
    (mac-do-applescript
     "use framework \"AppKit\"
      set spellChecker to current application's NSSpellChecker's sharedSpellChecker()"))
  :hook (emacs-startup . global-jinx-mode)
  :bind (("M-$" . jinx-correct)
         ("C-M-$" . jinx-languages)))




Then separately, the upstream that does the same inside the obj-c code during startup of emacs-mac:

diff --git a/src/macappkit.m b/src/macappkit.m
index 28c6b38f00..e8720fa0df 100644
--- a/src/macappkit.m
+++ b/src/macappkit.m
@@ -1214,6 +1214,9 @@ - (void)applicationDidFinishLaunching:(NSNotification *)notification
   [NSApp registerUserInterfaceItemSearchHandler:self];
   Vmac_help_topics = Qnil;

+  // Initialize spell checker for ispell and jinx enchant-2 using AppleSpell
+  (void)[NSSpellChecker sharedSpellChecker];
+
 #if MAC_OS_X_VERSION_MAX_ALLOWED >= 101500
   /* Work around animation effect glitches for executables linked on
      macOS 10.15.  */

Let me know if any issues here.

@minad
Copy link
Owner
minad commented Jan 17, 2024

If running these two lines of script is all we have to do we can as well include this in Jinx directly.

6D40

@jeff-phil
Copy link
Author

If running these two lines of script is all we have to do we can as well include this in Jinx directly.

Want me to submit updated PR here, or open new? :)

I did send the emacs-mac patch off to Dr. Yamamoto Mitsuharu, as well. FYI, I found ispell when using enchant-2 also hangs due to the same reason. So ultimately will upstream will help there, as well.

@jdtsmith
Copy link

I have restored my bad/enchant_applespell.so and your new fix works so far. Thanks for digging!

@minad
Copy link
Owner
minad commented Jan 17, 2024

I added the fix in 6f9ca86. Can you please test this on both the Carbon and the NS Mac ports?

@jeff-phil
Copy link
Author

Confirmed for Carbon+AppKit (Mitsuharu Yamamoto's Mac port):


image




And confirmed for NS (http://emacsformacosx.com/):


image

Thanks!

@jdtsmith
Copy link

FYI the fix will be applied to the next version of emacs-mac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3C71
Development

Successfully merging this pull request may close these issues.

4 participants
0