8000 deployment status by RaviKhandavilli · Pull Request #262 · cisco-open/flame · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

deployment status #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 10, 2022
Merged

Conversation

RaviKhandavilli
Copy link
Contributor
@RaviKhandavilli RaviKhandavilli commented Nov 7, 2022

1.Code added to forward deployment status to controller .
2.Changed deployer logic to deploy each task separately so as to update task status of individual task.
3.Changed release name of the agents to contain job id and task id (only first 12 characters of task id ) in the place of job id and compute id.
4.Made some refactoring changes to use global constants instead of local strings.
Note :- Deployment status only represents whether the job added is deployed/failed/revoked. it does not contain task status.task status is directly communicated back to controller by flamelet agents.
DB changes :-
1.created a separate collection deployment to mongodb to specifically maintain deployment status of computes.

Copy link
Contributor
@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left comments. please consider to address them.

return openapi.Response(http.StatusNotImplemented, nil), errors.New("PutDeploymentStatus method not implemented")
func (s *ComputesApiService) PutDeploymentStatus(ctx context.Context, computeId string, jobId string, xAPIKEY string,
deploymentStatus openapi.DeploymentStatus) (openapi.ImplResponse, error) {
err := s.dbService.UpdateDeploymentStatus(deploymentStatus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments are here:

  1. This code doesn't use computeId passed as argument. It blindly uses computeId in deploymentStatus. This can potentially pollute the data in the DB at a minimum and be exploited for security vulnerability.

This is a restapi call; so, one can create any arbitrary message without going through flamectl.

  1. Also, please reconsider whether the computeId field in the message is needed since it is fed as an argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed api endpoint to send only agent statuses obj , instead of deployment statuses obj.
Constructing deployment status obj locally from both params(job id and compute id) and body.

deploymentStatus openapi.DeploymentStatus) (openapi.ImplResponse, error) {
err := s.dbService.UpdateDeploymentStatus(deploymentStatus)
if err != nil {
errMsg := fmt.Errorf("failed to update deployent status for computeid %s, jobid %s : %v", err, computeId, jobId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errMsg := fmt.Errorf("failed to update deployent status for computeid %s, jobid %s : %v", err, computeId, jobId)
errMsg := fmt.Errorf("failed to update deployment status for computeId %s, jobId %s : %v", computeId, jobId, err)

@@ -101,10 +101,10 @@ DeploymentStatus:
properties:
jobId:
type: string
computeId:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why computeId is needed? As the url path that uses DeploymentStatus has computeId in it, do we really need to duplicate computeId here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need this encapsulation of job id and compute id to make inserting data to MongoDB simple, also will be useful while implementing get multiple deployment status codes.
Anyways for put call I have changed message to contain only agent-statuses obj instead of deployment status obj.

if result.Err() == nil {
taskStatus := map[string]openapi.AgentState{}
for key, val := range deploymentStatus.AgentStatuses {
taskStatus["agentstatuses."+key] = val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make agentstatuses as a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// 2.delete all the job resource specification files
deploymentChartPath := filepath.Join(deploymentDirPath, jobId)
_ = os.RemoveAll(deploymentChartPath)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err is a variable defined from getDeploymentConfig call. So, something doesn't seem right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

}
}

// 2.delete all the job resource specification files
deploymentChartPath := filepath.Join(deploymentDirPath, jobId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such directory that only ends with jobId because now the helm-chart directory consists of three elements: deploymentDirPath, jobId and taskId (see https://github.com/cisco-open/flame/pull/262/files#diff-9d9bde33b386c8e731e98f3cdc370505e45708cf8274a6b701ea7e58b6858353R312).

Each directory should be deleted separately only if revoke is successful. Therefore lines 248-249 should be moved into for taskId := range cfg.AgentKVs { ... } loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

8000
deploymentStatus := openapi.DeploymentStatus{
JobId: jobId,
ComputeId: r.spec.ComputeId,
AgentStatuses: map[string]openapi.AgentState{taskId: status},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify this function so that it can create a request for many agents instead of one agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified this function to create request for many agents.

err = r.dplyr.Install("job-"+deploymentConfig.JobId+"-"+taskId[:12], deploymentChartPath)
if err != nil {
errMsg := fmt.Sprintf("failed to deploy tasks: %v", err)
r.postDeploymentStatus(deploymentConfig.JobId, taskId, openapi.AGENT_FAILED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also do this in a batch fashion so that the number requests can reduced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to update statuses in batch

r.postDeploymentStatus(deploymentConfig.JobId, taskId, openapi.AGENT_FAILED)
return fmt.Errorf(errMsg)
}
r.postDeploymentStatus(deploymentConfig.JobId, taskId, openapi.AGENT_DEPLOYED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here!

@RaviKhandavilli RaviKhandavilli requested a review from myungjin 6DAF November 8, 2022 16:53
Copy link
Contributor
@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few minor comments; otherwise, it looks good to me. please consider to address them. this pr will be ready in the next revision.

@@ -322,7 +322,7 @@
content:
application/json:
schema:
$ref: '#/components/schemas/DeploymentStatus'
$ref: '#/components/schemas/AgentStatus'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the api spec, there are two components: AgentStatus and DeploymentStatus.

I think we should use only one. Currently this change creates inconsistency (see https://github.com/cisco-open/flame/pull/262/files#diff-7371b2cf2240f3412cb798b7b95ab71759c3c637e0dceaf7492ff745b092b07cR376).

My recommendation is to keep DeploymentStatus as is but rename AgentStatus as DeploymentStatus. But remove the original DeploymentStatus in the components.

uninstallErr := r.dplyr.Uninstall("job-" + jobId + "-" + taskId[:k8sShortLabelLength])
if uninstallErr != nil {
zap.S().Warnf("failed to release resources for job %s: %v", jobId, uninstallErr)
err = fmt.Errorf("%v: %v", err, uninstallErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code snippet concatenates errors. The colon(:) separator makes it hard to separate out each errors when it is read by human engineers because colons can be used in an error. See an example from your PR: https://github.com/cisco-open/flame/pull/262/files#diff-9d9bde33b386c8e731e98f3cdc370505e45708cf8274a6b701ea7e58b6858353R332-R333.

So, it may be better to use some other symbol such as semi-colon.

Please take care of the similar issue in other places of this PR too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -173,3 +173,32 @@ func (db *MongoService) GetComputeById(computeId string) (openapi.ComputeSpec, e
}
return currentDocument, nil
}

func (db *MongoService) UpdateDeploymentStatus(computeId string, jobId string, agentStatuses map[string]openapi.AgentState) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle stale record issue correctly? If not, do you want to make it as a separate PR? If so, can you create a TODO comment in this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overwrites the existing records , we are making sure to send status of all the tasks every time we put the deployment status.
instead of deleting existing records and creating new records we are sending status of all the new records in one batch. here we are simply rewriting existing record with the new data.

Copy link
Contributor
@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor
@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update commit comments.I am only seeing Co-authored-by: rakhanda <rakhanda@rakhandas-MBP.attlocal.net>.

Deployer forwards deployment status of each agents to controller.
It deployes/revokes each agent one after one and pushes the status of operation.
Copy link
Contributor
@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

B078

@myungjin myungjin merged commit 873d61a into cisco-open:main Nov 10, 2022
@myungjin myungjin linked an issue Nov 21, 2022 that may be closed by this pull request
dhruvsgarg pushed a commit to dhruvsgarg/flame that referenced this pull request Oct 18, 2024
Deployer forwards deployment status of each agents to controller.
It deployes/revokes each agent one after one and pushes the status of operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Propagate deployer agent creation errors to control plane
2 participants
0