-
Notifications
You must be signed in to change notification settings - Fork 280
Refactor and clean up Clipper admin python package #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@feynmanliang This is finally ready for another round of work on the k8s support. I'd like to treat this branch as a "feature branch" and submit PRs against it until the package is fully ready to be merged. Some contextThe package has been re-organized pretty substantially and the new package is in TODOs for youk8s_container_manager.py should be the only file you have to modify, I stuck some TODOs in there for you. I tried to document what the semantics of each function should be, but you can also look at how If you have any feedback or ideas on how to improve the rest of the package, feel free to implement them or make suggestions, but don't feel obligated to. TestingYou'll know that your implementation is correct and complete when the integration tests pass with the Kubernetes container manager. Note that you'll need to rebuild the Clipper docker containers locally before running the tests. You can run them with
I've modified the integration tests to use the k8s container manager, but if you'd like to see how they run with just Docker, you can change line 22 in SERVICE = "docker" |
Test FAILed. |
Test FAILed. |
I'll look at this today./ |
The commit history on the k8s work is lost (presumably because you did a merge against |
return | ||
|
||
@abc.abstractmethod | ||
def add_replica(self, name, version, input_type, repo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not set_number_replicas
to improve referential transparency and also support decrementing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why do we need anything more than just the name
if we're operating on an already deployed model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are both good points. The reasons for both choices are historical.
# when calling `add_replica()`. In this case, the two methods | ||
# would require different behavior. | ||
@abc.abstractmethod | ||
def deploy_model(self, name, version, input_type, repo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the comment, but yes you can start a Deployment with 0 replicas and modify that later. However, you'll need this in the superclass because running on K8s requires first creating the Deployment to manage rolling updates/restarting pods/etc whereas when running locally there is no Deployment to create.
Also, in the old CM, this method registers the model with clipper (self._publish_new_model
). I suspect you'll want to do that here as well (it's currently missing in the Docker CM).
pass | ||
|
||
|
||
def start_clipper(cm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this isn't encapsulated in a class? Seems weird to have the end user maintain the reference to the ContainerManager
, manage its lifecycle, and pass it into all these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that does seem weird. At first I was thinking that plain functions would be cleaner and you could potentially wrap this in a CLI, but given that there's still this container management state that needs to be tracked, we should create a class and track it for the user.
self.public_hostname = clipper_public_hostname | ||
|
||
@abc.abstractmethod | ||
def start_clipper(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems strange that the cm
is responsible for starting clipper, but cm
's constructor takes Clipper hostname as an argument (which isn't known until after Clipper is started).
I would've madestart_clipper
concrete and have it delegate to an abstract _start_clipper
that is overridden by implementations of the abc which returns the hostname, then have start_clipper
(this method) set self.public_hostname
with the return.
req_json = json.dumps({ | ||
"name": name, | ||
"input_type": input_type, | ||
"default_output": default_output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This missing a field
ClipperException: Received error status code: 400 and message: Error parsing JSON: JSON object does not have required key: candidate_model_names. Expected JSON schema:
{
"name" := string,
"candidate_model_names" := [string],
"input_type" := "integers" | "bytes" | "floats" | "doubles" | "strings",
"default_output" := string,
"latency_slo_micros" := int
}
clipper_admin_v2/README.rst
Outdated
+ Joey Gonzalez (@jegonzal) | ||
+ Corey Zumar (@Corey-Zumar) | ||
+ Nishad Singh (@nishadsingh1) | ||
+ Alexey Tumanov (@atumanov) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add me?
|
||
# Constants | ||
|
||
CLIPPER_QUERY_PORT = 1337 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These must not be hardcoded for k8s compatibility, where there is no guarantee of the actual port bound by a service. Since we don't ever use these without having a reference to a CM, consider making this a property of a running CM instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. That makes sense.
|
||
logger.info("Pushing model Docker image to {}".format(repo)) | ||
docker_client.images.push(repository=repo) | ||
cm.deploy_model(name, version, input_type, repo, num_replicas=num_replicas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails because num_replicas
is not part of CM's public interface (it is provided in the Docker CM). If we want to expose it, we should add it to the CM's public API.
config.load_kube_config() | ||
self._k8s_v1 = client.CoreV1Api() | ||
self._k8s_beta = client.ExtensionsV1beta1Api() | ||
self.redis_port = redis_port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are duplicated across both CM implementations. Since redis and a registry are both mandatory for a CM to do things (e.g. deploy docker images), this could be initialized in the superclass (via the super
constructor and **kwargs
).
I sent you a PR with the TODOs (and some of the comments ^^) at dcrankshaw#3 |
Awesome. Thanks for the comments. I know that the commit history was lost. I wanted to rebase this PR on the latest version of |
Test FAILed. |
Oops I think my PR had a problem: https://github.com/dcrankshaw/clipper/pull/3/files#diff-6272c59650820d22a2a09dfd5eba47f3L123 shouldn't have been removed |
Got it. I'll add it back in. I fixed a couple other problems before merging as well, so no worries. |
Test FAILed. |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from joint review with @Corey-Zumar
pass | ||
|
||
|
||
class ClipperConnection(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we come up with a better name than ClipperConnection?
except RequestException as e: | ||
logger.info("Clipper still initializing.") | ||
time.sleep(1) | ||
logger.info("Clipper is running") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does start_clipper
need to call self.connect()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
logger.info(e.msg) | ||
return False | ||
|
||
def connect(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set a boolean here that prevents users from calling other methods if they haven't connected yet. This bool can be set in start_clipper
and in connect
, and then we can provide nice error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
logger.error(msg) | ||
raise ClipperException(msg) | ||
|
||
def _resolve_docker_repo(self, image, registry): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is out of date, I will be changing this.
repo = "{registry}/{image}".format(registry=registry, image=image) | ||
return repo | ||
|
||
def build_and_deploy_model(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract the build
part from this function and make a public API method that let's users build a model container and push to a registry without deploying to Clipper.
":") | ||
if model_name is not None and model_name == c_name: | ||
if keep_version is None or keep_version != c_version: | ||
c.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I'm going to rewrite this function anyway, but this misses the case where model_name is None and we want to remove all model containers
func, | ||
default_output="None", | ||
version=1, | ||
slo_micros=100000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump default slo_micros to 3 seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
version, | ||
input_type, | ||
func, | ||
base_image="clipper/python-closure-container", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a version tag associated with it
serialization_dir = save_python_function(name, func) | ||
logger.info("Python closure saved") | ||
# Deploy function | ||
deploy_result = clipper_conn.deploy_model(name, version, input_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think deploy_model
returns anything anymore?
sc, | ||
default_output="None", | ||
version=1, | ||
slo_micros=100000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump default slo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Test FAILed. |
Test FAILed. |
Sweet. Actually hold off on merging until I get a chance to update the
quickstart and tutorial. Will do that in a few minutes.
…On Wed, Aug 30, 2017 at 2:56 PM, Corey Zumar ***@***.***> wrote:
***@***.**** approved this pull request.
LGTM. Confirmed that the python closure deployment issue is only present
for recent versions of anaconda. Will merge once this is rebased on develop
and tests pass.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#251 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAaV5J2P0MxeeheBy7_xtBVUduEiggDLks5sddqRgaJpZM4OWfWX>
.
|
Test FAILed. |
Test FAILed. |
Test FAILed. |
jenkins test this please |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Confirmed that quick start, the example client, and the tutorial work. Just fix a couple typos and I'll merge once tests pass.
and used purely for user annotations. | ||
registry : str, optional | ||
The Docker container registry to push the freshly built model to. Note | ||
that if you are running Clipper on Kubernetes, this registry must be accesible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s\accesible\accessible\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
In addition, the function must only use pure Python code. More specifically, all of the state | ||
captured by the function will be pickled using `Cloudpickle <https://github.com/cloudpipe/cloudpickle>`_, | ||
so any state captured by the function must be able to be pickled. Most Python libraries that | ||
use C extensions create objects that cannot be pickled. This includes many common machine-learning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be ...use C extensions **to** create objects that cannot be pickled...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's correct as is. It's not the most elegant sentence in the world, but i'm leaving it.
clipper_admin/docs/index.rst
Outdated
captured by the function will be pickled using `Cloudpickle <https://github.com/cloudpipe/cloudpickle>`_, | ||
so any state captured by the function must be able to be pickled. Most Python libraries that | ||
use C extensions create objects that cannot be pickled. This includes many common machine-learning | ||
frameworks including PySpark, TensorFlow, PyTorch, and Caffe. You will have to create your own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s\including\such as\ (we just used "including" earlier in the sentence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -136,18 +158,21 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"To use your newly deployed model to generate predictions, it needs to be linked to your Clipper application. Let Clipper know that your `\"cifar_demo\"` app should use the `\"birds_vs_planes_classifier\"` model to serve predictions.\n", | |||
"To use your newly deployed model to generate predictions, it needs to be linked to your Clipper application. This tells Clipper to route requests for `\"cifar-demo\"` app to the `\"birds-vs-planes-classifier\"` model to make predictions.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s\...to route requests for clipper demo app...\...to route requests for the clipper demo app...\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
This pull request proposes a new implementation of the Clipper admin Python package for managing and deploying a Clipper cluster. The new implementation provides a new API and an internal re-organization of the code. Aside from general cleanup, this implementation makes several improvements. Details and requests for feedback will be coming soon.