-
Notifications
You must be signed in to change notification settings - Fork 280
Save module dependencies not in conda/pip #237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
containers/python/Dockerfile
Outdated
@@ -3,7 +3,7 @@ FROM clipper/py-rpc:latest | |||
MAINTAINER Dan Crankshaw <dscrankshaw@gmail.com> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building this no longer works, but I'm also not sure why this file is around in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work because we can't reference files outside of the docker file's build context. The context directory (the one containing the docker file) must reference clipper_admin
as a subdirectory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. This file should be deleted.
Test FAILed. |
Test FAILed. |
3d234fd
to
0b4320a
Compare
Test FAILed. |
0b4320a
to
d047257
Compare
Test FAILed. |
d047257
to
859d479
Compare
Test FAILed. |
Looks like something's still going wrong with this PR; not read for review yet. |
859d479
to
46f8f63
Compare
Test FAILed. |
46f8f63
to
f6249ff
Compare
Test FAILed. |
jenkins test this please |
f6249ff
to
3fb0625
Compare
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
1eb3fc3
to
1adfa6d
Compare
Test PASSed. |
Test PASSed. |
2c26f76
to
cd5e7ee
Compare
Test PASSed. |
Test PASSed. |
2ac6167
to
1357e17
Compare
1357e17
to
836fafe
Compare
There's now a different mechanism for importing modules from packages ( In By contrast, the old pyspark-container tests remain intact and a new one has been added to test the local module importing. The |
Test FAILed. |
Test FAILed. |
mda.ignore(module_name) | ||
module_paths = mda.get_and_clear_paths() | ||
|
||
def path_already_captured(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say your predict function makes use of some package-based imported module: import my_package.my_module
. Right now, we add the path to .../my_package/
to paths_to_copy
and plan on recursively copying the whole folder to the container later. The location of my_package on the container would be: /model/modules/my_package
. These will be accessible for import because /model/modules
will be added to container's sys.path in the container code.
We also add that module name, my_package.my_module
, to our Module Dependency Analyzer to find all the files that the module is dependent on; it's possible that some don't exist in the folder .../my_package/
and we still want to copy them to the container. For each file with path .../<file_name>
, its copy location on the container will be /model/modules/<file_name>
, and these will be accessible to directly import because /model/modules
will be added to container's sys.path in the container code.
In order to get the paths to all these files, we run module_paths = mda.get_and_clear_paths()
, module_paths
will certainly have files contained in my_package/...
. To avoid copying multiple versions of these files, we check here if each is already captured (using path_already_captured
), and don't add it to the set of lists we'll copy if it is.
This seems good in most cases because it prevents us from doing a bunch of unnecessary copies. However, there's a case (seems kind of unlikely) where it will cause an ImportError. Let's say that your prediction function makes use of two modules: one imported via package (import my_package.my_module
), and one imported directly by adding it's enclosing folder to your path and running (import my_module_2
). Let's say that my_module_2
resides under my_package/
.
Now, we'll run into an ImportError in the container because it won't be able to import the module with name my_module_2
; it expects the module file to be in it's path (probably at /model/modules/my_module_2.py
, but we prevented it from getting copied there because we were already going to copy it to /model/modules/my_package/my_module_2.py
.
I'm not sure how much we care about this case. If we don't think it's costly to have a bunch of unnecessary file copies, then I think we should remove the code responsible for avoiding them (lines 736-744 here). Do you have any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in person, but to re-iterate, let's keep the de-duplication logic (that seems useful).
Let's support the case when users import packages but not try to support when users add the enclosing folder to sys.path
. That's sort of an anti-pattern in Python (despite the fact we make occasional use of it for good reasons within this repo) and it's okay to not support it.
Test PASSed. |
Forgot to ping you earlier @dcrankshaw -- this is now ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test FAILed. |
jenkins test this please |
Test PASSed. |
Resolves #207
Seems to work when using prediction functions of both the following forms (where
test_util.py
is some module not in conda or pip that has aCOEFFICIENT=<int>
variable):