8000 Threading support · Issue #237 · pyodide/pyodide · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Threading support #237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rth opened this issue Oct 30, 2018 · 37 comments
Open

Threading support #237

rth opened this issue Oct 30, 2018 · 37 comments
Labels
enhancement New feature or request performance

Comments

@rth
Copy link
Member
rth commented Oct 30, 2018

Just an issue to track the status on the threading support.

Emscripten has Pthreads support via SharedArrayBuffer capability in browsers. The later got disabled due to Spectre etc vulnerabilities, however,

There also has been some activity in emscripten on this subject lately.

Because of,

If code is compiled with -s USE_PTHREADS=1 and the current browser does not support multithreading, then an exception will be thrown at page load time. It is not possible to build one binary that would be able to leverage multithreading when available and fall back to single threaded when not. For such backwards compatibility, two separate builds must be done, one with -s USE_PTHREADS=1 and the other with -s USE_PTHREADS=0.
(taken from emscripten docs)

I imagine we won't be able to do much until both Firefox and Chrome have it enabled by default in a stable release. Still it might be worth trying to make it work in a WIP PR to see what are the difficulties that could arise. Opening this issue for future reference.

This would also very partially address #144

@pbazant
Copy link
pbazant commented Oct 18, 2019

Dear pyodide devs, I'd like to start experimenting with the threading support -- our target browser is Chrome anyway (we need web-bluetooth support). Before I delve into the pyodide codebase: How hard should I expect enabling the threading to be?

@rth
Copy link
Member Author
rth commented Oct 27, 2019

@pbazant I don't have a good understand of much work it would be, but it would be certainly worth trying to build with threading support following emscripten docs and reporting here if you run into any issues :)

Though we likely won't be able to merge it in pyodide until support has been enabled in firefox by default, and it doesn't look like it happened yet.

This was referenced Oct 27, 2019
@gatesn
Copy link
gatesn commented Jan 21, 2020

Looks like this fails in that Emscripten won't allow us to use dynamic loading and PTHREADS in the same build: emscripten-core/emscripten#3494

@pbazant
Copy link
pbazant commented Jan 21, 2020

I actually managed to build Pyodide with threads semi-working. I had to change several compilation options. This caused emscripten to remove many symbols, so I had to force it to keep them by listing them all. Maybe there is a cleaner way.
As you say, this is at the cost of not being able to use dynamic loading.
The problem with the threads was that the non-main threads seem to run only when the main thread is running. Data sharing between the threads seemed to work OK, however! It looked like there was some strange business with the non-main thread being blocked on some mutex. Possibly the GIL behaves in a weird way? I will be investigating this in the future. If I manage to make it fully working, I'll share my results. Thank you all for your answers.

@gatesn
Copy link
gatesn commented Jan 22, 2020

That’s neat - would you be able to push up a branch for me to poke at?

@gcatto
Copy link
gcatto commented Apr 17, 2020

@pbazant I see you mentioned web-bluetooth above. I have a use case where I am just starting to consider that in combination with Pyodide. Have you gotten anywhere?

@pbazant
Copy link
pbazant commented Apr 18, 2020

@pbazant I see you mentioned web-bluetooth above. I have a use case where I am just starting to consider that in combination with Pyodide. Have you gotten anywhere?

It's a company project. I think web-bluetooth is usable only from the main thread, so if you run Pyodide in a worker thread (probably a good idea), you must use some communication mechanism between the threads.
Option 1: use postMessage in bith directions(which is tricky because postMessage is tied to the JS event loop and single runPython invocation doesn't yield to the JS loop)
Option 2: use postMessage only to send messages from Pyodide to the main thread but use the Atomics API + SharedArrayBuffer to send data form the main thread to the worker thread (where Pyodide runs).

@pbazant
Copy link
pbazant commented Apr 18, 2020

We postponed making the threads work and use async/await + asyncio in Pyodide as a temporary solution to concurrency. To use asyncio, we inherited from asyncio.BaseEventLoop and did some tweaking (one has to provide a custom selector implementation).

@stefnotch
Copy link
Contributor

Good news is that Firefox 79 will apparently support WebAssembly threads.

@stefnotch
Copy link
Contributor
< 8000 /tr>

Quick update: Firefox 79 has been released and it supports WebAssembly threads and SharedArrayBuffer.


There are some minor restriction regarding SharedArrayBuffer outside of WebAssembly, which are

  • SharedArrayBuffer objects are in principle always available, but unfortunately the constructor on the global object is hidden, unless the two headers mentioned above are set, for compatibility with web content. There is hope that this restriction can be removed in the future. WebAssembly.Memory can still be used to get an instance.
  • Unless the two headers mentioned above are set, the various postMessage() APIs will throw for SharedArrayBuffer objects. If they are set, postMessage() on Window objects and dedicated workers will function and allow for memory sharing.

The headers in question are the newly introduced

  • Cross-Origin-Opener-Policy with same-origin as value (protects your origin from attackers)
  • Cross-Origin-Embedder-Policy with require-corp as value (protects victims from your origin)

@rth
Copy link
Member Author
rth commented Dec 31, 2020

By @oeway in #958 (comment)

Maybe it's already the time to investigate? It's going to be very useful, imagine with pthread enabled, we can potentially have a web loop closer to native asyncio, allowing more powerful data analysis with multi-threading, the subsequent packages won't need to patch the threading related parts etc.

Yes, we can certainly start investigating. Maybe under a feature flag for a start.

Besides that, the wasm thread features is supported already in Firefox 79 (released 5 month ago) and Chrome 70 (2+ years ago), given the current status of pyodide, most users are early adopters that can afford to upgrade to the latest browser version.

There are certainly early adopters but there are actually more users coming from some kind of teaching related platforms (e.g. https://notebook.basthon.fr or https://buildingai.elementsofai.com/), and there 5 month of Firefox support is very short for end users. Also Safari/WebKit based browsers currently have some issues (#721) but there is hope of fixing it, and as far as I can tell threads won't work there yet (Fyrd/caniuse#5200, https://bugs.webkit.org/show_bug.cgi?id=218944)

Ideally we could have deployed it as a separate path e.g. <version>/full (default without threads), <version>/with_threads (with threads) on the CDN but that would require more CI work (unless we do it semi-manually).

@dalcde
Copy link
Contributor
dalcde commented Dec 31, 2020 via email

@rth
Copy link
Member Author
rth commented Nov 6, 2021

It looks like all major browsers support threading by now. However this would mean dropping support for Safari <14.1, so maybe we could start using it in a long lived feature branch #1932

@joemarshall
Copy link
Contributor

I had a thought on this - when threading does work in pyodide, it would be useful to have an init time flag to disable or enable threads. That way the default build could have threading built in, but in non thread environments it could fall back to a stub implementation of pthreads

@westurner
Copy link
westurner commented Jul 9, 2022 via email

@josephrocca
Copy link
Contributor
josephrocca commented Jan 9, 2023

If documentation stats are of any indication, we have around 16% of Safari users who use Safari <15.1 which won't have Atomics. Also, I'm not sure how good is mobile support.

Looks like browser support is a lot better now (as expected, ~9 months later). Overall support (including mobile devices) for Atomics is 90% which is quite good (consider that await is at ~93%) according to Can I Use.

I'm guessing the main limiting factor now is just the willingness/time of a contributor to implement this? Or are there still potentially some technical roadblocks? It would be great if someone could make a comment with a few dot points indicating technical blockers, if there are any.

Threading support seems like it would be a really big deal!

@westurner
Copy link
westurner commented Jan 10, 2023

From https://web.dev/webassembly-threads/#c :

In C, particularly on Unix-like systems, the common way to use threads is via POSIX Threads provided by the pthread library. Emscripten provides an API-compatible implementation of the pthread library built atop Web Workers, shared memory and atomics, so that the same code can work on the web without changes.

  • From https://emscripten.org/docs/porting/pthreads.html :

    The bottom line is that on the Web it is bad for the main browser thread to wait on anything else. Therefore by default Emscripten warns if pthread_join and pthread_cond_wait happen on the main browser thread, and will throw an error if ALLOW_BLOCKING_ON_MAIN_THREAD is zero (whose message will point to here).

    To avoid these problems, you can use PROXY_TO_PTHREAD, which as mentioned earlier moves your main() function to a pthread, which leaves the main browser thread to focus only on receiving proxied events. This is recommended in general, but may take some porting work, if the application assumed main() was on the main browser thread.

    Another option is to replace blocking calls with nonblocking ones. For example you can replace pthread_join with pthread_tryjoin_np. This may require your application to be refactored to use asynchronous events, perhaps through emscripten_set_main_loop() or Asyncify.

WebAssembly threads & atomics spec: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md

"Wasm threads are now available in all browsers" (2021)
https://www.reddit.com/r/WebAssembly/comments/rk83mr/wasm_threads_are_now_available_in_all_browsers/ :

@westurner
Copy link
westurner commented Jan 10, 2023

From @josephrocca:

I'm guessing the main limiting factor now is just the willingness/time of a contributor to implement this?
Or are there still potentially some technical roadblocks?

@josephrocca
Copy link
Contributor

As I see it big technical roadblock is that threads don't work in site environments where cross site isolation isn't enabled.

This seems more or an annoyance than a technical roadblock though, right? But yeah, I agree this is far less than ideal. That said, I think the Chrome team is working on the spec to make the COEP/COOP/etc. situation easier for developers (e.g. this is not directly relevant to our problem here, but is part of their project to simplify things), so hopefully this is less of a problem over time.

Which annoyingly includes GitHub pages.

BTW, there's a cheeky workaround using Service Workers to add the COEP/COOP headers which makes Github Pages cross-origin isolated. Just include that script in your <head>. Not ideal at all, but better than nothing until Github allows setting those headers.

The pain with threads is that it's a build time choice in emscripten to enable threading or not, and I'm not sure what happens if you build with threading support and then run without isolation?

It seems like your suspicions are correct:

It is not possible to build one binary that would be able to leverage multithreading when available and fall back to single threaded when not. The best you can do is two separate builds, one with and one without threads, and pick between them at runtime.

But this doesn't seem like a big deal - the pyodide.asm.js script will just choose the runtime to load based on whether window.crossOriginIsolated is true/false?

There's a risk that it would mean all pyodide packages would need two builds, threads or no threads.

I don't think it'd need to be all packages though, right? You'd just have threaded builds for some packages where it makes sense, and then load them if the runtime supports it. If the user tries to load a package that only has a threaded build on a window.crossOriginIsolated=false page, then we can give a fairly straight-forward error message explaining that package X requires that you serve the page with COOP/COEP headers.

Unless you're saying that all the non-threaded packages that we currently have won't work properly in a threaded runtime? That would be surprising and very annoying if true.

@joemarshall
Copy link
Contributor

As I see it big technical roadblock is that threads don't work in site environments where cross site isolation isn't enabled. Which annoyingly includes GitHub pages.

The pain with threads is that it's a build time choice in emscripten to enable threading or not, and I'm not sure what happens if you build with threading support and then run without isolation?

There's a risk that it would mean all pyodide packages would need two builds, threads or no threads.

Everything else is just a matter of time to hack things in.

Having said that, I just took a look at emscripten, and pthreads appears to mostly be implemented in JavaScript, which I think might just mean that it is safe to compile packages in pthreads mode even if you have the main pyodide module in two flavours (threads and no threads)?

@josephrocca
Copy link
Contributor

(lol in case anyone is confused, my earlier comment was in response to a now-deleted comment from joe that was mostly the same as the one above)

I think might just mean that it is safe to compile packages in pthreads mode even if you have the main pyodide module in two flavours (threads and no threads)?

Nice if true!

@LijieZhang1998
Copy link

Hi, I'm porting pymongo package to pyodide package. I got an error when I executed my sample code in Node.js. How's the status of threading support in pyodide now? Is there any solution that I can use to work around this error. Thank you.

error PythonError: Traceback (most recent call last): File "/lib/python311.zip/_pyodide/_base.py", line 558, in eval_code_async await CodeRunner( File "/lib/python311.zip/_pyodide/_base.py", line 381, in run_async coroutine = eval(self.code, globals, locals) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 10, in File "/lib/python3.11/site-packages/pymongo/mongo_client.py", line 837, in __init__ self._get_topology() File "/lib/python3.11/site-packages/pymongo/mongo_client.py", line 1214, in _get_topology self._topology.open() File "/lib/python3.11/site-packages/pymongo/topology.py", line 192, in open self._ensure_opened() File "/lib/python3.11/site-packages/pymongo/topology.py", line 596, in _ensure_opened self._update_servers() File "/lib/python3.11/site-packages/pymongo/topology.py", line 747, in _update_servers server.open() File "/lib/python3.11/site-packages/pymongo/server.py", line 49, in open self._monitor.open() File "/lib/python3.11/site-packages/pymongo/monitor.py", line 79, in open self._executor.open() File "/lib/python3.11/site-packages/pymongo/periodic_executor.py", line 87, in open thread.start() File "/lib/python311.zip/threading.py", line 957, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread
at new_error (/Users/lijie.zhang/workspace/pyodide-repo/dist/pyodide.asm.js:9:12562)
at wasm://wasm/0224d1de:wasm-function[295]:0x158a38
at wasm://wasm/0224d1de:wasm-function[451]:0x15fe7d
at _PyCFunctionWithKeywords_TrampolineCall (/Users/lijie.zhang/workspace/pyodide-repo/dist/pyodide.asm.js:9:119469)
at wasm://wasm/0224d1de:wasm-function[1056]:0x1a321d
at wasm://wasm/0224d1de:wasm-function[3383]:0x28a37c
at wasm://wasm/0224d1de:wasm-function[2036]:0x1e4108
at wasm://wasm/0224d1de:wasm-function[1063]:0x1a3705
at wasm://wasm/0224d1de:wasm-function[1066]:0x1a39c6
at wasm://wasm/0224d1de:wasm-function[1067]:0x1a3a68
at wasm://wasm/0224d1de:wasm-function[3196]:0x268abf
at wasm://wasm/0224d1de:wasm-function[3197]:0x26e8d0
at wasm://wasm/0224d1de:wasm-function[1069]:0x1a3b90
at wasm://wasm/0224d1de:wasm-function[1064]:0x1a3820
at wasm://wasm/0224d1de:wasm-function[439]:0x15f603
at Module.callPyObjectKwargs (/Users/lijie.zhang/workspace/pyodide-repo/dist/pyodide.asm.js:9:79884)
at Module.callPyObject (/Users/lijie.zhang/workspace/pyodide-repo/dist/pyodide.asm.js:9:80266)
at Timeout.wrapper [as _onTimeout] (/Users/lijie.zhang/workspace/pyodide-repo/dist/pyodide.asm.js:9:56949)
at listOnTimeout (node:internal/timers:564:17)
at process.processTimers (node:internal/timers:507:7) {

type: 'RuntimeError',
__error_address: 25425896
}

@ryanking13
Copy link
Member
ryanking13 commented Aug 21, 2023

How's the status of threading support in pyodide now?

It is not supported in Pyodide yet, and AFAIK there's not much progress. I think threading support would require a lot of work both on CPython side and Emscripten side. So, unfortunately, it's unlikely to be supported quickly.

Is there any solution that I can use to work around this error.

You'll need to find a way to run the package with threading disabled.

@allsey87
Copy link
allsey87 commented Aug 21, 2023

I think Emscripten's dynamic linking will need to become a lot more stable before Pyodide could add support for threading. Dynamic linking does work, however, it gets very flaky if you start using multiple shared modules and mixing in threading/asyncify.

I think threading support would require a lot of work both on CPython side and Emscripten side.

@ryanking13 how much is there to do on the CPython side exactly? The only challenge I can see is packaging the compiled modules for both multi-threaded and single-threaded builds... unless perhaps the threading module uses some low-level pthread features that Emscripten does not support?

@ryanking13
Copy link
Member

@ryanking13 how much is there to do on the CPython side exactly? The only challenge I can see is packaging the compiled modules for both multi-threaded and single-threaded builds... unless perhaps the threading module uses some low-level pthread features that Emscripten does not support?

It's actually hard to tell until someone tries it for themselves. I am just saying based on my experience. If you're lucky, CPython's threading implementation will be so compatible with Emscripten that it will work without you doing anything, but based on my experience, this is usually not the case.

@allsey87
Copy link
allsey87 F438 commented Aug 23, 2023

Perhaps I can give it a go. I already have a multi-threaded build of Pyodide but I haven't tried to enable the threading module.

@allsey87
Copy link
allsey87 commented Aug 23, 2023

For me, a very simple test with Python threads is working, however, I am using a custom runtime so it would be good if someone could test this with the default JS runtime that Emscripten generates. These are roughly the steps that need to be followed:

  1. Add -pthread to CFLAGS_BASE, CXXFLAGS_BASE and LDFLAGS_BASE to Makefile.env.
  2. Pass --enable-wasm-pthreads to the configure script invocation in cpython/Makefile. I am not actually sure if this does anything other than check that we are building for wasm32-unknown-emscripten. Internally, CPython seems to turn on threading when it detects the flags __EMSCRIPTEN_PTHREADS__ and __EMSCRIPTEN_SHARED_MEMORY__.
  3. Update the sqlite3 package to build and link the multi-threaded version of sqlite3 as follows:
diff --git a/packages/sqlite3/meta.yaml b/packages/sqlite3/meta.yaml
index 5976fbc..d5d26fc 100644
--- a/packages/sqlite3/meta.yaml
+++ b/packages/sqlite3/meta.yaml
@@ -24,15 +24,15 @@ build:
       "Modules/_sqlite/util.c"
     )
 
-    embuilder build sqlite3 --pic
+    embuilder build sqlite3-mt --pic
 
     for file in "${FILES[@]}"; do
       emcc $STDLIB_MODULE_CFLAGS -c "${file}" -o "${file/.c/.o}"  \
         -sUSE_SQLITE3 -DMODULENAME=sqlite
     done
 
     OBJECT_FILES=$(find Modules/_sqlite/ -name "*.o")
     emcc $OBJECT_FILES -o $DISTDIR/_sqlite3.so $SIDE_MODULE_LDFLAGS \
-       -sUSE_SQLITE3 -lsqlite3
+       -sUSE_SQLITE3 -lsqlite3-mt
 
     cd Lib && tar --exclude=test -cf - sqlite3 | tar -C $DISTDIR -xf -

With this, I was able to run the following Python code and saw interleaved execution of the two functions:

import threading
import sys
import time
def print_numbers():
    for i in range(1, 6):
        time.sleep(0.1)
        print('Number:', i)
        sys.stdout.flush()
def print_letters():
    for letter in 'abcde':
        time.sleep(0.1)
        print('Letter:', letter)
        sys.stdout.flush()
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print('Threads are done!')
sys.stdout.flush()
time.sleep(1)

Note that I have quite a different configuration at this point with Pyodide statically linked to several libraries and use Asyncify. I have no idea whether the changes I listed above will work with the latest Pyodide. My configuration is roughly based on the following:

Emscripten version: 3.1.39
Python version: 3.11.3
Pyodide version: f996914 + python/cpython#103322

@lewymati
Copy link
lewymati commented Nov 3, 2023

@allsey87 I couldn't reproduce. I followed your instructions, but I've got
Didn't expect to load any more file_packager files! error in the browser during start of the new thread.
(caused by this line https://github.com/pyodide/pyodide/blob/main/src/js/pyodide.ts#L418 )
I tried removing this Module.locateFile = ... line, but then browser freezes when running the code.

Any chance that there is some missing step?

@G3zz
Copy link
G3zz commented Jun 5, 2024

For me, a very simple test with Python threads is working, however, I am using a custom runtime so it would be good if someone could test this with the default JS runtime that Emscripten generates. These are roughly the steps that need to be followed:

  1. Add -pthread to CFLAGS_BASE, CXXFLAGS_BASE and LDFLAGS_BASE to Makefile.env.
  2. Pass --enable-wasm-pthreads to the configure script invocation in cpython/Makefile. I am not actually sure if this does anything other than check that we are building for wasm32-unknown-emscripten. Internally, CPython seems to turn on threading when it detects the flags __EMSCRIPTEN_PTHREADS__ and __EMSCRIPTEN_SHARED_MEMORY__.
  3. Update the sqlite3 package to build and link the multi-threaded version of sqlite3 as follows:
diff --git a/packages/sqlite3/meta.yaml b/packages/sqlite3/meta.yaml
index 5976fbc..d5d26fc 100644
--- a/packages/sqlite3/meta.yaml
+++ b/packages/sqlite3/meta.yaml
@@ -24,15 +24,15 @@ build:
       "Modules/_sqlite/util.c"
     )
 
-    embuilder build sqlite3 --pic
+    embuilder build sqlite3-mt --pic
 
     for file in "${FILES[@]}"; do
       emcc $STDLIB_MODULE_CFLAGS -c "${file}" -o "${file/.c/.o}"  \
         -sUSE_SQLITE3 -DMODULENAME=sqlite
     done
 
     OBJECT_FILES=$(find Modules/_sqlite/ -name "*.o")
     emcc $OBJECT_FILES -o $DISTDIR/_sqlite3.so $SIDE_MODULE_LDFLAGS \
-       -sUSE_SQLITE3 -lsqlite3
+       -sUSE_SQLITE3 -lsqlite3-mt
 
     cd Lib && tar --exclude=test -cf - sqlite3 | tar -C $DISTDIR -xf -

With this, I was able to run the following Python code and saw interleaved execution of the two functions:

import threading
import sys
import time
def print_numbers():
    for i in range(1, 6):
        time.sleep(0.1)
        print('Number:', i)
        sys.stdout.flush()
def print_letters():
    for letter in 'abcde':
        time.sleep(0.1)
        print('Letter:', letter)
        sys.stdout.flush()
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print('Threads are done!')
sys.stdout.flush()
time.sleep(1)

Note that I have quite a different configuration at this point with Pyodide statically linked to several libraries and use Asyncify. I have no idea whether the changes I listed above will work with the latest Pyodide. My configuration is roughly based on the following:

Emscripten version: 3.1.39 Python version: 3.11.3 Pyodide version: f996914 + python/cpython#103322

On the 0.26.0 branch I get the error below when following these instructions:

error: undefined symbol: _emscripten_run_callback_on_thread (referenced by $JSEvents, referenced by root reference (e.g. compiled C/C++ code))
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`
warning: __emscripten_run_callback_on_thread may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
Error: Aborting compilation due to previous errors

@jlucaso1
Copy link
jlucaso1 commented Sep 28, 2024

This can be a good reference. Porffor compile js to wasm and has promises integration with limited async integration:
https://github.com/CanadaHonk/porffor/blob/main/compiler/builtins/promise.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

0