8000 Refactor and clean up Clipper admin python package by dcrankshaw · Pull Request #251 · ucbrise/clipper · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Refactor and clean up Clipper admin python package #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Aug 31, 2017

Conversation

dcrankshaw
Copy link
Contributor

This pull request proposes a new implementation of the Clipper admin Python package for managing and deploying a Clipper cluster. The new implementation provides a new API and an internal re-organization of the code. Aside from general cleanup, this implementation makes several improvements. Details and requests for feedback will be coming soon.

@dcrankshaw
Copy link
Contributor Author
dcrankshaw commented Jul 13, 2017

@feynmanliang This is finally ready for another round of work on the k8s support. I'd like to treat this branch as a "feature branch" and submit PRs against it until the package is fully ready to be merged.

Some context

The package has been re-organized pretty substantially and the new package is in clipper_admin_v2/. The relevant piece for you is that the container management functionality has been factored out into a ContainerManager interface and there are currently two implementations: a pure Docker implementation in clipper_admin_v2/clipper_admin/docker/docker_container_manager.py and the K8s implementation you wrote in clipper_admin_v2/clipper_admin/k8s/k8s_container_manager.py.

TODOs for you

k8s_container_manager.py should be the only file you have to modify, I stuck some TODOs in there for you. I tried to document what the semantics of each function should be, but you can also look at how clipper_admin.py uses the ContainerManager interface to see what the functions should do.

If you have any feedback or ideas on how to improve the rest of the package, feel free to implement them or make suggestions, but don't feel obligated to.

Testing

You'll know that your implementation is correct and complete when the integration tests pass with the Kubernetes container manager. Note that you'll need to rebuild the Clipper docker containers locally before running the tests.

You can run them with

./bin/build_docker_images.sh
./bin/run_unittests -i

I've modified the integration tests to use the k8s container manager, but if you'd like to see how they run with just Docker, you can change line 22 in test_utils.py to the following to run them with the Docker container manager.

SERVICE = "docker"

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/567/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/568/
Test FAILed.

@feynmanliang
Copy link
Contributor

I'll look at this today./

@feynmanliang
Copy link
Contributor

The commit history on the k8s work is lost (presumably because you did a merge against ucbrise:develop for 9416786). We could have kept that if this PR used feynmanliang:k8s as the base, but since some merge conflicts were resolved the additional commits don't cleanly cherrypick. I don't think it's worth the effort trying to recover them, so lets leave this as is.

return

@abc.abstractmethod
def add_replica(self, name, version, input_type, repo):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not set_number_replicas to improve referential transparency and also support decrementing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why do we need anything more than just the name if we're operating on an already deployed model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are both good points. The reasons for both choices are historical.

# when calling `add_replica()`. In this case, the two methods
# would require different behavior.
@abc.abstractmethod
def deploy_model(self, name, version, input_type, repo):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the comment, but yes you can start a Deployment with 0 replicas and modify that later. However, you'll need this in the superclass because running on K8s requires first creating the Deployment to manage rolling updates/restarting pods/etc whereas when running locally there is no Deployment to create.

Also, in the old CM, this method registers the model with clipper (self._publish_new_model). I suspect you'll want to do that here as well (it's currently missing in the Docker CM).

pass


def start_clipper(cm):
Copy link
Contributor
@feynmanliang feynmanliang Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this isn't encapsulated in a class? Seems weird to have the end user maintain the reference to the ContainerManager, manage its lifecycle, and pass it into all these functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that does seem weird. At first I was thinking that plain functions would be cleaner and you could potentially wrap this in a CLI, but given that there's still this container management state that needs to be tracked, we should create a class and track it for the user.

self.public_hostname = clipper_public_hostname

@abc.abstractmethod
def start_clipper(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems strange that the cm is responsible for starting clipper, but cm's constructor takes Clipper hostname as an argument (which isn't known until after Clipper is started).

I would've madestart_clipper concrete and have it delegate to an abstract _start_clipper that is overridden by implementations of the abc which returns the hostname, then have start_clipper (this method) set self.public_hostname with the return.

req_json = json.dumps({
"name": name,
"input_type": input_type,
"default_output": default_output,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This missing a field

ClipperException: Received error status code: 400 and message: Error parsing JSON: JSON object does not have required key: candidate_model_names. Expected JSON schema:
  {
   "name" := string,
   "candidate_model_names" := [string],
   "input_type" := "integers" | "bytes" | "floats" | "doubles" | "strings",
   "default_output" := string,
   "latency_slo_micros" := int
  }

+ Joey Gonzalez (@jegonzal)
+ Corey Zumar (@Corey-Zumar)
+ Nishad Singh (@nishadsingh1)
+ Alexey Tumanov (@atumanov)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add me?


# Constants

CLIPPER_QUERY_PORT = 1337
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These must not be hardcoded for k8s compatibility, where there is no guarantee of the actual port bound by a service. Since we don't ever use these without having a reference to a CM, consider making this a property of a running CM instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. That makes sense.


logger.info("Pushing model Docker image to {}".format(repo))
docker_client.images.push(repository=repo)
cm.deploy_model(name, version, input_type, repo, num_replicas=num_replicas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails because num_replicas is not part of CM's public interface (it is provided in the Docker CM). If we want to expose it, we should add it to the CM's public API.

config.load_kube_config()
self._k8s_v1 = client.CoreV1Api()
self._k8s_beta = client.ExtensionsV1beta1Api()
self.redis_port = redis_port
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are duplicated across both CM implementations. Since redis and a registry are both mandatory for a CM to do things (e.g. deploy docker images), this could be initialized in the superclass (via the super constructor and **kwargs).

@feynmanliang
Copy link
Contributor

I sent you a PR with the TODOs (and some of the comments ^^) at dcrankshaw#3

@dcrankshaw
Copy link
Contributor Author

Awesome. Thanks for the comments. I know that the commit history was lost. I wanted to rebase this PR on the latest version of develop and there were a ton of merge conflicts that I had to keep addressing over and over. I made the choice to blow away commit history to make the rebase a lot cleaner. Sorry about any inconvenience.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/591/
Test FAILed.

@feynmanliang
Copy link
Contributor

Oops I think my PR had a problem: https://github.com/dcrankshaw/clipper/pull/3/files#diff-6272c59650820d22a2a09dfd5eba47f3L123 shouldn't have been removed

@dcrankshaw
Copy link
Contributor Author

Got it. I'll add it back in. I fixed a couple other problems before merging as well, so no worries.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/592/
Test FAILed.

@Corey-Zumar Corey-Zumar mentioned this pull request Jul 26, 2017
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/595/
Test FAILed.

Copy link
Contributor Author
@dcrankshaw dcrankshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from joint review with @Corey-Zumar

pass


class ClipperConnection(object):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we come up with a better name than ClipperConnection?

except RequestException as e:
logger.info("Clipper still initializing.")
time.sleep(1)
logger.info("Clipper is running")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does start_clipper need to call self.connect()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

logger.info(e.msg)
return False

def connect(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set a boolean here that prevents users from calling other methods if they haven't connected yet. This bool can be set in start_clipper and in connect, and then we can provide nice error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

logger.error(msg)
raise ClipperException(msg)

def _resolve_docker_repo(self, image, registry):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is out of date, I will be changing this.

repo = "{registry}/{image}".format(registry=registry, image=image)
return repo

def build_and_deploy_model(self,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract the build part from this function and make a public API method that let's users build a model container and push to a registry without deploying to Clipper.

":")
if model_name is not None and model_name == c_name:
if keep_version is None or keep_version != c_version:
c.stop()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I'm going to rewrite this function anyway, but this misses the case where model_name is None and we want to remove all model containers

func,
default_output="None",
version=1,
slo_micros=100000,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump default slo_micros to 3 seconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

version,
input_type,
func,
base_image="clipper/python-closure-container",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a version tag associated with it

serialization_dir = save_python_function(name, func)
logger.info("Python closure saved")
# Deploy function
deploy_result = clipper_conn.deploy_model(name, version, input_type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think deploy_model returns anything anymore?

sc,
default_output="None",
version=1,
slo_micros=100000,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump default slo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/602/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/603/
Test FAILed.

@dcrankshaw
Copy link
Contributor Author
dcrankshaw commented Aug 30, 2017 via email

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/665/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/666/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/667/
Test FAILed.

@dcrankshaw
Copy link
Contributor Author

jenkins test this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/668/
Test FAILed.

Copy link
Contributor
@Corey-Zumar Corey-Zumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Confirmed that quick start, the example client, and the tutorial work. Just fix a couple typos and I'll merge once tests pass.

and used purely for user annotations.
registry : str, optional
The Docker container registry to push the freshly built model to. Note
that if you are running Clipper on Kubernetes, this registry must be accesible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s\accesible\accessible\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

In addition, the function must only use pure Python code. More specifically, all of the state
captured by the function will be pickled using `Cloudpickle <https://github.com/cloudpipe/cloudpickle>`_,
so any state captured by the function must be able to be pickled. Most Python libraries that
use C extensions create objects that cannot be pickled. This includes many common machine-learning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ...use C extensions **to** create objects that cannot be pickled...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's correct as is. It's not the most elegant sentence in the world, but i'm leaving it.

captured by the function will be pickled using `Cloudpickle <https://github.com/cloudpipe/cloudpickle>`_,
so any state captured by the function must be able to be pickled. Most Python libraries that
use C extensions create objects that cannot be pickled. This includes many common machine-learning
frameworks including PySpark, TensorFlow, PyTorch, and Caffe. You will have to create your own
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s\including\such as\ (we just used "including" earlier in the sentence)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -136,18 +158,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To use your newly deployed model to generate predictions, it needs to be linked to your Clipper application. Let Clipper know that your `\"cifar_demo\"` app should use the `\"birds_vs_planes_classifier\"` model to serve predictions.\n",
"To use your newly deployed model to generate predictions, it needs to be linked to your Clipper application. This tells Clipper to route requests for `\"cifar-demo\"` app to the `\"birds-vs-planes-classifier\"` model to make predictions.\n",
Copy link
Contributor
@Corey-Zumar Corey-Zumar Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s\...to route requests for clipper demo app...\...to route requests for the clipper demo app...\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/669/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/670/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/671/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/672/
Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0