8000 using BigTableClient when multiprocessing causes segfaults · Issue #23 · Unoperate/pytorch-cbt · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

using BigTableClient when multiprocessing causes segfaults #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the commu 8000 nity.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kboroszko opened this issue Sep 23, 2021 · 1 comment
Open

using BigTableClient when multiprocessing causes segfaults #23

kboroszko opened this issue Sep 23, 2021 · 1 comment

Comments

@kboroszko
Copy link
Collaborator
kboroszko commented Sep 23, 2021

Here I'm reffering to google-cloud-cpp bigtable API
Because the DataClient holds a reference to [some queue object] within a Table, each table must be destroyed before the fork or before the client is destroyed after the fork. Otherwise, when the destructor is called, this client holds a reference to [some queue object] and tries to reference it causing a segfault.

Because we have no control over when a user might fork the process, we would have to construct a very complicated logic, detecting forks and handling objects we create. To avoid all that ordeal, we opted for a different approach: creating the connection every time we need to interact with BigTable and closing it afterwards. This is slightly inefficient but simple, reliable and robust.

@kboroszko
Copy link
Collaborator Author

Another implication of this is the shape of our api. Namely, write_tensor method. Perhaps, it would be more elegant to allow the users to only write one row to Bigtable and the default row_key would just be a random string. It would make it more consistent with the rest of the api and put emphasis on this being just a simple utility function which should not be used for uploading large quantities of data. However, we can't do that because we create a connection each time we execute write_tensor. It would be extremely bad idea to create a connection for each row.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0