-
Notifications
You must be signed in to change notification settings - Fork 117
不知道如何在python里使用mmap,这样可使多个进程共享一个dic字典的内存? #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
实现还不简单,大概如这哥们说的,http://stackoverflow.com/questions/524342/how-to-store-a-hash-table-in-a-file: This is a bit similar to constructing an on-disk DAWG, which I did a while back. What made that so very sweet was that it could be loaded directly with mmap instead reading the file. If the hash-space is manageable, say 216 or 224 entries, then I think I would do something like this:
This should allow you to mmap and use the table directly, without modification. (scary fast if in the OS cache!) but you have to work with indices instead of pointers. It's pretty spooky to have megabytes available in syscall-round-trip-time, and still have it take up less than that in physical memory, because of paging. 不过此问题被接受的答案是使用boost的serialize来实现,那就没办法使用mmap来共享内存了。所以正确方法还得是上面的这个方法。 |
共享的第三方存储如何?比如memory cached?还有别的可以长期驻留内存的第三方存储。 |
memcache 和 redis 取出来的东西还需要用 json 处理成字典,这个过程很耗时间,尤其是字典比较大的时候。真想找个进程间共享内存的办法在多个独立的进程间共享内存 |
每个进程加载字典的时候都需要几秒时间,并且每个进程都需要大量的内存来存储这样的字典。若多个进程使用mmap共享使用一个最大的只读字典,将可能是一个不错的方案。
准备先修改一个牛人写的cppjieba来实现这个设想,请有兴趣的同学关注:https://github.com/jannson/cppjieba
提出这个方案的原因是因为在自己的两个项目当中确实需要多个进程去处理分词,而每次加载字典的时候都要消息时间,等得不爽!
当然cppjieba里已经使用http方式集中处理分词,也是一个参考的方向。
The text was updated successfully, but these errors were encountered: