8000 Save module dependencies not in conda/pip by nishadsingh1 · Pull Request #237 · ucbrise/clipper · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Save module dependencies not in conda/pip #237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 13, 2017

Conversation

nishadsingh1
Copy link
@nishadsingh1 nishadsingh1 commented Jun 29, 2017

Resolves #207

Seems to work when using prediction functions of both the following forms (where test_util.py is some module not in conda or pip that has a COEFFICIENT=<int> variable):

import test_util
func = lambda x : len(x) * test_util.COEFFICIENT
import test_util as tu
func = lambda x : len(x) * tu.COEFFICIENT

@@ -3,7 +3,7 @@ FROM clipper/py-rpc:latest
MAINTAINER Dan Crankshaw <dscrankshaw@gmail.com>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building this no longer works, but I'm also not sure why this file is around in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work because we can't reference files outside of the docker file's build context. The context directory (the one containing the docker file) must reference clipper_admin as a subdirectory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This file should be deleted.

@nishadsingh1 nishadsingh1 requested a review from dcrankshaw June 29, 2017 02:10
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/499/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/501/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/503/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/504/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/505/
Test FAILed.

@nishadsingh1
Copy link
Author

Looks like something's still going wrong with this PR; not read for review yet.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/506/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/520/
Test FAILed.

@nishadsingh1
Copy link
Author

jenkins test this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/521/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/522/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/525/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/526/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/554/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/555/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/556/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/557/
Test PASSed.

@nishadsingh1 nishadsingh1 force-pushed the save_mod_deps branch 2 times, most recently from 2ac6167 to 1357e17 Compare July 11, 2017 17:50
@nishadsingh1
Copy link
Author

There's now a different mechanism for importing modules from packages (import util.mock_module) and importing modules directly by having them in your path. That's why the integration test suites import the two modules in different ways.

In clipper_manager_tests.py, there's only one test that queries the python-container, test_deployed_and_linked_predict_function_queried_successfully. This test now makes use of both local module importing mechanisms. There isn't a separate tests for querying a python-container that uses these because the bulk of the interesting work being done by the deploy_predict_func feature is package management.

By contrast, the old pyspark-container tests remain intact and a new one has been added to test the local module importing. The deploy_pyspark_model feature needs to coordinate loading a pyspark model into the container, and testing that behavior independent of package management might clarify where bugs exist if/when they pop up.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/558/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/559/
Test FAILed.

mda.ignore(module_name)
module_paths = mda.get_and_clear_paths()

def path_already_captured(path):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say your predict function makes use of some package-based imported module: import my_package.my_module. Right now, we add the path to .../my_package/ to paths_to_copy and plan on recursively copying the whole folder to the container later. The location of my_package on the container would be: /model/modules/my_package. These will be accessible for import because /model/modules will be added to container's sys.path in the container code.

We also add that module name, my_package.my_module, to our Module Dependency Analyzer to find all the files that the module is dependent on; it's possible that some don't exist in the folder .../my_package/ and we still want to copy them to the container. For each file with path .../<file_name>, its copy location on the container will be /model/modules/<file_name>, and these will be accessible to directly import because /model/modules will be added to container's sys.path in the container code.

In order to get the paths to all these files, we run module_paths = mda.get_and_clear_paths(), module_paths will certainly have files contained in my_package/.... To avoid copying multiple versions of these files, we check here if each is already captured (using path_already_captured), and don't add it to the set of lists we'll copy if it is.

This seems good in most cases because it prevents us from doing a bunch of unnecessary copies. However, there's a case (seems kind of unlikely) where it will cause an ImportError. Let's say that your prediction function makes use of two modules: one imported via package (import my_package.my_module), and one imported directly by adding it's enclosing folder to your path and running (import my_module_2). Let's say that my_module_2 resides under my_package/.

Now, we'll run into an ImportError in the container because it won't be able to import the module with name my_module_2; it expects the module file to be in it's path (probably at /model/modules/my_module_2.py, but we prevented it from getting copied there because we were already going to copy it to /model/modules/my_package/my_module_2.py.

I'm not sure how much we care about this case. If we don't think it's costly to have a bunch of unnecessary file copies, then I think we should remove the code responsible for avoiding them (lines 736-744 here). Do you have any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in person, but to re-iterate, let's keep the de-duplication logic (that seems useful).

Let's support the case when users import packages but not try to support when users add the enclosing folder to sys.path. That's sort of an anti-pattern in Python (despite the fact we make occasional use of it for good reasons within this repo) and it's okay to not support it.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/560/
Test PASSed.

@nishadsingh1
Copy link
Author

Forgot to ping you earlier @dcrankshaw -- this is now ready for review again.

Copy link
Contributor
@dcrankshaw dcrankshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/566/
Test FAILed.

@dcrankshaw
Copy link
Contributor

jenkins test this please

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/569/
Test PASSed.

@dcrankshaw dcrankshaw merged commit 5115239 into ucbrise:develop Jul 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0