-
Notifications
You must be signed in to change notification settings - Fork 30
deployment status #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deployment status #262
Conversation
e5899a8
to
903cf62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left comments. please consider to address them.
return openapi.Response(http.StatusNotImplemented, nil), errors.New("PutDeploymentStatus method not implemented") | ||
func (s *ComputesApiService) PutDeploymentStatus(ctx context.Context, computeId string, jobId string, xAPIKEY string, | ||
deploymentStatus openapi.DeploymentStatus) (openapi.ImplResponse, error) { | ||
err := s.dbService.UpdateDeploymentStatus(deploymentStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments are here:
- This code doesn't use
computeId
passed as argument. It blindly uses computeId indeploymentStatus
. This can potentially pollute the data in the DB at a minimum and be exploited for security vulnerability.
This is a restapi call; so, one can create any arbitrary message without going through flamectl.
- Also, please reconsider whether the computeId field in the message is needed since it is fed as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed api endpoint to send only agent statuses obj , instead of deployment statuses obj.
Constructing deployment status obj locally from both params(job id and compute id) and body.
deploymentStatus openapi.DeploymentStatus) (openapi.ImplResponse, error) { | ||
err := s.dbService.UpdateDeploymentStatus(deploymentStatus) | ||
if err != nil { | ||
errMsg := fmt.Errorf("failed to update deployent status for computeid %s, jobid %s : %v", err, computeId, jobId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errMsg := fmt.Errorf("failed to update deployent status for computeid %s, jobid %s : %v", err, computeId, jobId) | |
errMsg := fmt.Errorf("failed to update deployment status for computeId %s, jobId %s : %v", computeId, jobId, err) |
api/computes_components.partials.yml
Outdated
@@ -101,10 +101,10 @@ DeploymentStatus: | |||
properties: | |||
jobId: | |||
type: string | |||
computeId: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why computeId is needed? As the url path that uses DeploymentStatus has computeId in it, do we really need to duplicate computeId here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need this encapsulation of job id and compute id to make inserting data to MongoDB simple, also will be useful while implementing get multiple deployment status codes.
Anyways for put call I have changed message to contain only agent-statuses obj instead of deployment status obj.
if result.Err() == nil { | ||
taskStatus := map[string]openapi.AgentState{} | ||
for key, val := range deploymentStatus.AgentStatuses { | ||
taskStatus["agentstatuses."+key] = val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make agentstatuses
as a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/deployer/app/resource_handler.go
Outdated
// 2.delete all the job resource specification files | ||
deploymentChartPath := filepath.Join(deploymentDirPath, jobId) | ||
_ = os.RemoveAll(deploymentChartPath) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
is a variable defined from getDeploymentConfig call. So, something doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
cmd/deployer/app/resource_handler.go
Outdated
} | ||
} | ||
|
||
// 2.delete all the job resource specification files | ||
deploymentChartPath := filepath.Join(deploymentDirPath, jobId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such directory that only ends with jobId because now the helm-chart directory consists of three elements: deploymentDirPath, jobId and taskId (see https://github.com/cisco-open/flame/pull/262/files#diff-9d9bde33b386c8e731e98f3cdc370505e45708cf8274a6b701ea7e58b6858353R312).
Each directory should be deleted separately only if revoke is successful. Therefore lines 248-249 should be moved into for taskId := range cfg.AgentKVs { ... }
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
cmd/deployer/app/resource_handler.go
Outdated
deploymentStatus := openapi.DeploymentStatus{ | ||
JobId: jobId, | ||
ComputeId: r.spec.ComputeId, | ||
AgentStatuses: map[string]openapi.AgentState{taskId: status}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify this function so that it can create a request for many agents instead of one agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified this function to create request for many agents.
cmd/deployer/app/resource_handler.go
Outdated
err = r.dplyr.Install("job-"+deploymentConfig.JobId+"-"+taskId[:12], deploymentChartPath) | ||
if err != nil { | ||
errMsg := fmt.Sprintf("failed to deploy tasks: %v", err) | ||
r.postDeploymentStatus(deploymentConfig.JobId, taskId, openapi.AGENT_FAILED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also do this in a batch fashion so that the number requests can reduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to update statuses in batch
cmd/deployer/app/resource_handler.go
Outdated
r.postDeploymentStatus(deploymentConfig.JobId, taskId, openapi.AGENT_FAILED) | ||
return fmt.Errorf(errMsg) | ||
} | ||
r.postDeploymentStatus(deploymentConfig.JobId, taskId, openapi.AGENT_DEPLOYED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few minor comments; otherwise, it looks good to me. please consider to address them. this pr will be ready in the next revision.
api/computes_api.partials.yml
Outdated
@@ -322,7 +322,7 @@ | |||
content: | |||
application/json: | |||
schema: | |||
$ref: '#/components/schemas/DeploymentStatus' | |||
$ref: '#/components/schemas/AgentStatus' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the api spec, there are two components: AgentStatus
and DeploymentStatus
.
I think we should use only one. Currently this change creates inconsistency (see https://github.com/cisco-open/flame/pull/262/files#diff-7371b2cf2240f3412cb798b7b95ab71759c3c637e0dceaf7492ff745b092b07cR376).
My recommendation is to keep DeploymentStatus as is but rename AgentStatus as DeploymentStatus. But remove the original DeploymentStatus in the components.
cmd/deployer/app/resource_handler.go
Outdated
uninstallErr := r.dplyr.Uninstall("job-" + jobId + "-" + taskId[:k8sShortLabelLength]) | ||
if uninstallErr != nil { | ||
zap.S().Warnf("failed to release resources for job %s: %v", jobId, uninstallErr) | ||
err = fmt.Errorf("%v: %v", err, uninstallErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code snippet concatenates errors. The colon(:) separator makes it hard to separate out each errors when it is read by human engineers because colons can be used in an error. See an example from your PR: https://github.com/cisco-open/flame/pull/262/files#diff-9d9bde33b386c8e731e98f3cdc370505e45708cf8274a6b701ea7e58b6858353R332-R333.
So, it may be better to use some other symbol such as semi-colon.
Please take care of the similar issue in other places of this PR too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -173,3 +173,32 @@ func (db *MongoService) GetComputeById(computeId string) (openapi.ComputeSpec, e | |||
} | |||
return currentDocument, nil | |||
} | |||
|
|||
func (db *MongoService) UpdateDeploymentStatus(computeId string, jobId string, agentStatuses map[string]openapi.AgentState) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this handle stale record issue correctly? If not, do you want to make it as a separate PR? If so, can you create a TODO comment in this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overwrites the existing records , we are making sure to send status of all the tasks every time we put the deployment status.
instead of deleting existing records and creating new records we are sending status of all the new records in one batch. here we are simply rewriting existing record with the new data.
c50ab74
to
8e65907
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update commit comments.I am only seeing Co-authored-by: rakhanda <rakhanda@rakhandas-MBP.attlocal.net>
.
Deployer forwards deployment status of each agents to controller. It deployes/revokes each agent one after one and pushes the status of operation.
8e65907
to
444926f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Deployer forwards deployment status of each agents to controller. It deployes/revokes each agent one after one and pushes the status of operation.
1.Code added to forward deployment status to controller .
2.Changed deployer logic to deploy each task separately so as to update task status of individual task.
3.Changed release name of the agents to contain job id and task id (only first 12 characters of task id ) in the place of job id and compute id.
4.Made some refactoring changes to use global constants instead of local strings.
Note :- Deployment status only represents whether the job added is deployed/failed/revoked. it does not contain task status.task status is directly communicated back to controller by flamelet agents.
DB changes :-
1.created a separate collection deployment to mongodb to specifically maintain deployment status of computes.