8000 Goroutines cause deadlocks after `fork()` when run in shared library · Issue #15538 · golang/go · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Goroutines cause deadlocks after fork() when run in shared library #15538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cavaliercoder opened this issue May 4, 2016 · 9 comments
Closed

Comments

@cavaliercoder
Copy link
cavaliercoder commented May 4, 2016

If a shared library is written in Go, loaded using dlopen and then forked, Go behavior becomes unstable, often resulting in a deadlock or bad file descriptor error.

The bad file descriptor error is reproduced in this gist: https://gist.github.com/cavaliercoder/688a3cd7dac20c8edb0c0f6f2851b54d

Comment out the parent pid's call to (*f)() on cbin.c:18 to reproduce the deadlock issue.

This issue has been reproduced on Go 1.5 and 1.6 on CentOS 7, Ubuntu 14 and OS X with only x86_64.

Stack trace from a hung process:

goroutine 0 [idle]:
runtime.futex(0xc820022110, 0x0, 0x0, 0x0, 0x7f0c00000000, 0x7f0cad577879, 0x0, 0x0, 0x7f0cad577ae8, 0xc820022110, ...)
        /usr/local/go/src/runtime/sys_linux_amd64.s:288 +0x21
runtime.futexsleep(0xc820022110, 0xc800000000, 0xffffffffffffffff)
        /usr/local/go/src/runtime/os1_linux.go:39 +0x53
runtime.notesleep(0xc820022110)
        /usr/local/go/src/runtime/lock_futex.go:142 +0xa8
runtime.stoplockedm()
        /usr/local/go/src/runtime/proc1.go:1268 +0xb2
runtime.schedule()
        /usr/local/go/src/runtime/proc1.go:1590 +0x72
runtime.park_m(0xc820000600)
        /usr/local/go/src/runtime/proc1.go:1698 +0x191
runtime.mcall(0x7f0cad5c722a)
        /usr/local/go/src/runtime/asm_amd64.s:204 +0x53
[pid 12184] munmap(0x7f92d61c5000, 4096) = 0
[pid 12184] semop(655360, {{0, 1, SEM_UNDO}}, 1) = 0
[pid 12184] socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 7
[pid 12184] setsockopt(7, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 12184] connect(7, {sa_family=AF_INET, sin_port=htons(10050), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 12184] epoll_create1(EPOLL_CLOEXEC) = 8
[pid 12184] epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3491763008, u64=140268533715776}}) = 0
[pid 12184] futex(0x7f92d1064390, FUTEX_WAKE, 1) = 0
[pid 12184] futex(0xc82001e110, FUTEX_WAIT, 0, NULL
@minux
Copy link
Member
minux commented May 4, 2016 via email

@cavaliercoder
Copy link
Author

Could you recommend a better practice I could suggest to the maintainers of the parent project which is loading my Go lib? Is it sane to dlopen all plugins after each fork in a producer/consumer model?

@ianlancetaylor
Copy link
Contributor

You're right: this is a problem. A multi-threaded program can not safely call fork; this is a general rule, not specific to Go. A single-threaded program that dlopen's a Go shared library becomes a multi-threaded program. At that point, it can not call fork.

I don't think there is anything we can do about this. I'm going to close this as unfortunate.

@cavaliercoder
Copy link
Author

I tested my hypothesis that forking a bunch of processes and then calling dlopen from each would work. In my limited understanding, I expected each PID to have a discrete Go runtime instantiated when they loaded a Go lib. It didn't work... file descriptor errors, semaphore errors, all sorts.

I can confirm that each process had a different PID with a common parent. Each call to dlopen returned a unique handle value and each Go lib independently called init() and reported the expected PID.

Is this a red herring or is there something I can do successfully run multiple isolated Go runtimes from a shared parent PID?

@ianlancetaylor
Copy link
Contributor

I would expect it to work to call dlopen after calling fork. If you have a case where calling dlopen and then not calling fork works, but calling fork and then calling dlopen does not work, please open a new issue with a small reproduction. Thanks.

@ianlancetaylor
Copy link
Contributor

Note that if your original program includes Go code apart from the shared library you are using with dlopen, then fork will not work and the kinds of errors you mention can occur. Unfortunately you can never use fork with any program that includes Go code, as Go code is always multi-threaded.

@cavaliercoder
Copy link
Author

Thanks for confirming. I've raised #15556 to address the issue when fork is called before dlopen.

@cavaliercoder
Copy link
Author

Is there any way possible to re-initialize the go runtime after a fork? Assuming that _init() bootstraps the runtime and starts the required threads at dlopen(), could this call be made idempotent and called again after a fork?

@ianlancetaylor
Copy link
Contributor

In general, we ask that people not ask questions on closed bugs. If you want to discuss the issues, please use a mailing list or forum; see https://golang.org/wiki/Questions . Thanks.

The problem with fork in a multi-threaded program is that you have no idea what the other threads are doing. They could be manipulating internal data structures when the fork happens, in which case there is no clean way to recover. The closest you could come would be to reinitialize everything from scratch, but that would introduce a complex rarely exercised code path that would certainly be a source of bugs.

@golang golang locked and limited conversation to collaborators May 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants
0