10000 Fix GetChanges issue with changes in both sides by dpordomingo · Pull Request #530 · src-d/lookout · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix GetChanges issue with changes in both sides #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? 8000 Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

dpordomingo
Copy link
Contributor
@dpordomingo dpordomingo commented Feb 11, 2019

fix #486

depends on #532
depends on #526
depends on src-d/lookout-test-fixtures#22
depends on src-d/lookout-sdk#76

This PR fix #486, so when an analyzer reviews the PR of get-changes-from-outdated-pr-candidate against get-changes-from-outdated-pr, it should not fail (as it's currently happening here)

* 3c9686a (get-changes-from-outdated-pr) Deletes something in 'golang.go' file
|
| * 5de771f (get-changes-from-outdated-pr-candidate) Creates a new 'javascript.js' file
| |
| /   
* 44034c9 (common-ancestor) creates a new 'golang.go' file

breaking changes

DataService.GetChanges is currently returning the differences between base and head, so when this is merged, it will return the differences between git merge-base base head and head.

In case any analyzer wants to get the changes between base and head it should configure the ChangeRequest.TwoDotsMode, as true as described by src-d/lookout-sdk#76

alternatives:

Implement it inside of go-git as proposed by src-d/go-git#1078

implementation details:

This PR contains:

  • c232785 integration tests to prove failing test cases
  • 8d4cd2d code solving the issue
    • new service/git/merge_base package (that implements it using go-git)
    • add new functionality in DataService.GetChanges
  • d39d6b4 unit tests for service/git/merge_base

All pieces inside of service/git/merge_base are documented in src-d/go-git#1078.

@dpordomingo dpordomingo added the bug Something isn't working label Feb 11, 2019
@dpordomingo dpordomingo self-assigned this Feb 11, 2019
@dpordomingo dpordomingo requested a review from carlosms February 11, 2019 19:54
@dpordomingo dpordomingo changed the title Get changes Failing test for GetChanges Feb 11, 2019
@dpordomingo dpordomingo force-pushed the get-changes branch 2 times, most recently from 4e47261 to 20be8b5 Compare February 12, 2019 11:32
@carlosms
Copy link
Contributor

Should we keep this PR open until we implement a fix? Otherwise CI will always fail.

@dpordomingo dpordomingo changed the title Failing test for GetChanges [WIP] Failing test for GetChanges Feb 12, 2019
@dpordomingo
Copy link
Contributor Author
dpordomingo commented Feb 12, 2019

@carlosms yes, we should not merge till I commit the fix;
I modified the PR title to notice it, thanks

@dpordomingo
Copy link
Contributor Author
dpordomingo commented Feb 15, 2019

@carlosms I don't like the idea of using git because it forces to access the disc and to force some things.
do you see another way to access the directory containing the repos?
wdyt?

(anyhow, integration tests pass !)

@carlosms
Copy link
Contributor

I don't think using git is much of a problem, assuming this is something that will eventually be handled by go-git.

The main concern I have with the PR is here, 94d2814#diff-e0472a496fbaf199da26f7eb2280bd86R88

path := path.Join(libRoot, base.Repository().Host, base.Repository().Owner, base.Repository().Name)

Because it's duplicating something done by Library here 94d2814#diff-26c99c728d236f5d62dd3d168f19f385R164

func (l *Library) repositoryPath(url *lookout.RepositoryInfo) string {
	return fmt.Sprintf("%s/%s", url.Host, url.FullName)
}

And if we modify Library.repositoryPath, we will probably forget to keep getFrom in sync.

So I'm thinking it would be better to change Root() here 94d2814#diff-26c99c728d236f5d62dd3d168f19f385R32

For RepositoryPath(url *lookout.RepositoryInfo) string, and make something like this:

// RepositoryPath returns the absolute path for the given repository
func (l *Library) RepositoryPath(url *lookout.RepositoryInfo) string {
	return path.Join(l.fs.Root(), l.relativeRepositoryPath(url))
}

func (l *Library) relativeRepositoryPath(url *lookout.RepositoryInfo) string {
	return fmt.Sprintf("%s/%s", url.Host, url.FullName)
}

@carlosms
8000 Copy link
Contributor

And other comments that maybe don't apply because this is a WIP, but things that I think would be good to have in the final code:

  • unit tests that don't depend on the dummy analyzer
  • check for git command on init, and exit with a nice error message

…idate

Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
@dpordomingo dpordomingo force-pushed the get-changes branch 2 times, most recently from 519ee3f to e258b22 Compare February 21, 2019 09:18
@dpordomingo
Copy link
Contributor Author
dpordomingo commented Feb 21, 2019

Thanks for your review!

Considering that accessing disc that way seemed a bit ugly, and following @mcuadros recommendation, I reused my work from past OSD and brought here a proper way to get the common ancestor using go-git.

I left that feature in a separate package service/git/merge-base to keep it separated from lookout stuf. If at some point it is accepted in go-git as a new feature, we should only change the function call and delete this new package.

TODO:

  • update vendor with new generated code once Modify proto with ChangeRequest.TwoDotsMode lookout-sdk#76 has been merged.
  • remove REVIEWER comments (explaining why is it needed to pass the storer.Storer everywhere).
  • uncomment the commented if before merging
  • unit tests for the new package; integration tests (testing merge-base) already pass but it's not enough.

(last one could be done after merging this, if maintaner agrees)

@dpordomingo dpordomingo changed the title [WIP] Failing test for GetChanges Fix GetChanges issue with changes in both sides Feb 21, 2019
@carlosms
Copy link
Contributor

LGTM.

I think it's best to finish all work on this PR (including any needed tests), possibly getting a quick reivew from @src-d/data-retrieval for the merge-base code. Then merge src-d/lookout-sdk#76 & release sdk as the last step.

@dpordomingo dpordomingo force-pushed the get-changes branch 2 times, most recently from 36eec42 to 69378e5 Compare February 25, 2019 23:56
Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
@dpordomingo
Copy link
Contributor Author

Tests are already there.
This PR contains:

  • c232785 integration tests to prove failing test cases
  • 8d4cd2d code solving the issue
    • new service/git/merge_base package (that implements it using go-git)
    • add new functionality in DataService.GetChanges
  • d39d6b4 unit tests for service/git/merge_base

In case someone from @src-d/data-retrieval or @smola wants to review this PR, I think they could focus only on the new service/git/merge_base.

I also pull requested merge-base feature into go-git src-d/go-git#1078 just in case it could be merged as part of go-git itself.

// don't have a common history. That PR won't be able to be created in
// GitHub, but in other situations (ie. with lookout-sdk), an analyzer
// could be interested in analyzing the difference between both commints.
return base, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return an error? If you ask for the common ancestor and it does not exist, maybe returning base makes other things fail later, without any hint of why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An analyzer can be interested on reviewing changes between orphan branches. Couldn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behaviour of git diff base...head if they do not have a common ancestor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried locally, and git diff base...head behaves as .. if there is no shared history between them?

For curiosity, I checked git/git source code, and it seems that ... behaves that way only if there is at least one "merge-base" between base and head; otherwise it seems to fallback as ... Anyhow I'm not used to git codebase.

res := commits
for i := start; i < len(commits); i++ {
from := commits[i]
fromHistoryIter := object.NewCommitIterBSF(from, nil, nil)
Copy link
@jfontan jfontan Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the algorithm here so this is only a comment for future improvements.

It seems to me that this is walking the commit tree almost fully several times. This may be ok for small repositories but it may have a penalty for bigger ones like kubernetes. Testing it with a PR in kubernetes would be be interesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jfontan for your review!!!
Let's see if 👇 this helps you to understand the algorithm.

Yes, the history can potentially be walked as many times as candidates provided for Independents.
But every time a candidate is found as an ancestor of any other, it is removed from the candidates list, so it won't be considered anymore.
The order in which you request the candidates, affects how many times the history is traversed.
In the worst scenario, the history will be traversed as many times as orphan branches are included in the candidates to be analyzed.

given this history:

          o----o----B~2---B^---B
         /         /
--A~4---o---A~2---o---A

---o---C^---C

then:

  • (A) Independents(B, B^); 1step; walks the history only once, because starting by B, it reaches in one step B^; once B^ is reached, it is removed from the candidates list, and since there only one candidate left (B), it is returned with no more steps.
  • (B)Independents(B, B^, B~2); 2steps; walks only once (as in the previous example), but it requires two steps to reach B^ and then B~2
  • (C)Independents(B, A~4); 8steps; walks the history only once (as in the previous example), but it requires many steps to reach A~4
  • (D)Independents(B^, B); 1full + 1step; walks the history twice, because first iteration walking from B^ does not reach B after a full walk from B^; then in the second iteration walking from B it is the same case as (A), so it is returned A with only one extra step
  • (E) Independents(B, A); 2full; walks the history twice; because during first walk from B it won't be reached A, and during the second walk from A, it won't be reached B
  • (F) Independents(B, A, B^); 2full; walks the history twice; first time, it will be found B^ as an ancestor, so it will be removed from candidates list; then it will be the same case than E
  • (G) Independents(B, B^, B~2, A, A~2, A~4, C, C^); 3full; because during the first history walk from B it will be removed B^, B~2, A~2, A~4 from the candidates list; then the second iteration will consider only B, A, C and C^, then, walking from A, it won't find any ancestor in the candidates list; the third iteration, starting from C will find (and remove) C^ from the list, so the result will be only A, B and C.

possible optimizations:

  • sort the candidates by Committer.When desc before start processing them, so most examples of (D) (Independents(B^, B)) will be converted into (A) cases (Independents(B,B^)); it won't work when Committer.When does not reflect the topological order (in such cases, it will cause the problem tried to be solved, but I think it would be a "strange" case)
  • store all traversed commits and exclude them in the next history walk. It will make that (E) Independents(B, A) will require only 1full+2steps instead of 2full because the first iteration (from B) will be used to shortcut the second one (from A) because since A~2, A shares B history.
  • minor fix: replace s/commits/res from line 92 to exclude already excluded ancestors.

and thanks to your question, I spotted a bug:

  • replace start from line 110 with find(from, res) + 1) (pseudocode), because otherwise the iteration starts from the same place as many times as an ancestor is found from the start position, what is clearly too much, and does not reflect the desired (and described) behavior.

Copy link
@jfontan jfontan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than checking of the method works well with bigger repos it LGTM.

@smacker
Copy link
Contributor
smacker commented Mar 6, 2019

I have just reviewed code and I propose to focus on merging go-git related stuff into go-git instead of lookout. According to the logs this problem isn't common for lookout. Staging listens to dozen repositories and we didn't have this errors for months so we can wait for the fix without rushing in my opinion.

I really appreciate your work David, though merging it into lookout and supporting it here makes me worried.

@dpordomingo
Copy link
Contributor Author

For sure I agree with you @smacker that this feature should be in go-git, and we can wait for sure, taking this as a backup if their maintainers prefer not to do it.

@dpordomingo
Copy link
Contributor Author

As suggested by @smacker and @smola the core of merge-base feature was moved to go-git src-d/go-git#1097, so this PR was superseded by #600

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GetChanges is returning files that didn't change in the PR
4 participants
0