-
Notifications
You must be signed in to change notification settings - Fork 36
Fix GetChanges issue with changes in both sides #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? 8000 Sign in to your account
Conversation
4e47261
to
20be8b5
Compare
Should we keep this PR open until we implement a fix? Otherwise CI will always fail. |
@carlosms yes, we should not merge till I commit the fix; |
20be8b5
to
94d2814
Compare
@carlosms I don't like the idea of using (anyhow, integration tests pass !) |
I don't think using The main concern I have with the PR is here, 94d2814#diff-e0472a496fbaf199da26f7eb2280bd86R88 path := path.Join(libRoot, base.Repository().Host, base.Repository().Owner, base.Repository().Name) Because it's duplicating something done by func (l *Library) repositoryPath(url *lookout.RepositoryInfo) string {
return fmt.Sprintf("%s/%s", url.Host, url.FullName)
} And if we modify So I'm thinking it would be better to change For // RepositoryPath returns the absolute path for the given repository
func (l *Library) RepositoryPath(url *lookout.RepositoryInfo) string {
return path.Join(l.fs.Root(), l.relativeRepositoryPath(url))
}
func (l *Library) relativeRepositoryPath(url *lookout.RepositoryInfo) string {
return fmt.Sprintf("%s/%s", url.Host, url.FullName)
} |
And other comments that maybe don't apply because this is a WIP, but things that I think would be good to have in the final code:
|
…idate Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
519ee3f
to
e258b22
Compare
Thanks for your review! Considering that accessing disc that way seemed a bit ugly, and following @mcuadros recommendation, I reused my work from past OSD and brought here a proper way to get the common ancestor using I left that feature in a separate package TODO:
(last one could be done after merging this, if maintaner agrees) |
65ffeca
to
6bf2209
Compare
LGTM. I think it's best to finish all work on this PR (including any needed tests), possibly getting a quick reivew from @src-d/data-retrieval for the |
6bf2209
to
b891e16
Compare
36eec42
to
69378e5
Compare
69378e5
to
b2dfaf3
Compare
b2dfaf3
to
1372dd5
Compare
Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
1372dd5
to
d39d6b4
Compare
Tests are already there.
In case someone from @src-d/data-retrieval or @smola wants to review this PR, I think they could focus only on the new I also pull requested |
// don't have a common history. That PR won't be able to be created in | ||
// GitHub, but in other situations (ie. with lookout-sdk), an analyzer | ||
// could be interested in analyzing the difference between both commints. | ||
return base, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not return an error? If you ask for the common ancestor and it does not exist, maybe returning base
makes other things fail later, without any hint of why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An analyzer can be interested on reviewing changes between orphan branches. Couldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the behaviour of git diff base...head
if they do not have a common ancestor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried locally, and git diff base...head
behaves as ..
if there is no shared history between them?
For curiosity, I checked git/git
source code, and it seems that ...
behaves that way only if there is at least one "merge-base" between base
and head
; otherwise it seems to fallback as ..
. Anyhow I'm not used to git
codebase.
res := commits | ||
for i := start; i < len(commits); i++ { | ||
from := commits[i] | ||
fromHistoryIter := object.NewCommitIterBSF(from, nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the algorithm here so this is only a comment for future improvements.
It seems to me that this is walking the commit tree almost fully several times. This may be ok for small repositories but it may have a penalty for bigger ones like kubernetes. Testing it with a PR in kubernetes would be be interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jfontan for your review!!!
Let's see if 👇 this helps you to understand the algorithm.
Yes, the history can potentially be walked as many times as candidates provided for Independents
.
But every time a candidate is found as an ancestor of any other, it is removed from the candidates list, so it won't be considered anymore.
The order in which you request the candidates, affects how many times the history is traversed.
In the worst scenario, the history will be traversed as many times as orphan branches are included in the candidates to be analyzed.
given this history:
o----o----B~2---B^---B
/ /
--A~4---o---A~2---o---A
---o---C^---C
then:
- (A)
Independents(B, B^)
;1step
; walks the history only once, because starting byB
, it reaches in one stepB^
; onceB^
is reached, it is removed from the candidates list, and since there only one candidate left (B
), it is returned with no more steps. - (B)
Independents(B, B^, B~2)
;2steps
; walks only once (as in the previous example), but it requires two steps to reachB^
and thenB~2
- (C)
Independents(B, A~4)
;8steps
; walks the history only once (as in the previous example), but it requires many steps to reachA~4
- (D)
Independents(B^, B)
;1full + 1step
; walks the history twice, because first iteration walking fromB^
does not reachB
after a full walk fromB^
; then in the second iteration walking fromB
it is the same case as (A), so it is returnedA
with only one extra step - (E)
Independents(B, A)
;2full
; walks the history twice; because during first walk fromB
it won't be reachedA
, and during the second walk fromA
, it won't be reachedB
- (F)
Independents(B, A, B^)
;2full
; walks the history twice; first time, it will be foundB^
as an ancestor, so it will be removed from candidates list; then it will be the same case than E - (G)
Independents(B, B^, B~2, A, A~2, A~4, C, C^)
;3full
; because during the first history walk fromB
it will be removedB^
,B~2
,A~2
,A~4
from the candidates list; then the second iteration will consider onlyB
,A
,C
andC^
, then, walking fromA
, it won't find any ancestor in the candidates list; the third iteration, starting fromC
will find (and remove)C^
from the list, so the result will be onlyA
,B
andC
.
possible optimizations:
- sort the candidates by
Committer.When desc
before start processing them, so most examples of (D) (Independents(B^, B)
) will be converted into (A) cases (Independents(B,B^)
); it won't work whenCommitter.When
does not reflect the topological order (in such cases, it will cause the problem tried to be solved, but I think it would be a "strange" case) - store all traversed commits and exclude them in the next history walk. It will make that (E)
Independents(B, A)
will require only1full+2steps
instead of2full
because the first iteration (fromB
) will be used to shortcut the second one (fromA
) because sinceA~2
,A
sharesB
history. - minor fix: replace
s/commits/res
fromline 92
to exclude already excluded ancestors.
and thanks to your question, I spotted a bug:
- replace
start
fromline 110
withfind(from, res) + 1)
(pseudocode), because otherwise the iteration starts from the same place as many times as an ancestor is found from thestart
position, what is clearly too much, and does not reflect the desired (and described) behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than checking of the method works well with bigger repos it LGTM.
I have just reviewed code and I propose to focus on merging go-git related stuff into go-git instead of lookout. According to the logs this problem isn't common for lookout. Staging listens to dozen repositories and we didn't have this errors for months so we can wait for the fix without rushing in my opinion. I really appreciate your work David, though merging it into lookout and supporting it here makes me worried. |
For sure I agree with you @smacker that this feature should be in |
As suggested by @smacker and @smola the core of |
fix #486
depends on #532depends on #526depends on src-d/lookout-test-fixtures#22depends on src-d/lookout-sdk#76
This PR fix #486, so when an analyzer reviews the PR of
get-changes-from-outdated-pr-candidate
againstget-changes-from-outdated-pr
, it should not fail (as it's currently happening here)breaking changes
DataService.GetChanges
is currently returning the differences betweenbase
andhead
, so when this is merged, it will return the differences betweengit merge-base base head
andhead
.In case any analyzer wants to get the changes between
base
andhead
it should configure theChangeRequest.TwoDotsMode
, astrue
as described by src-d/lookout-sdk#76alternatives:
Implement it inside of
go-git
as proposed by src-d/go-git#1078implementation details:
This PR contains:
service/git/merge_base
package (that implements it usinggo-git
)DataService.GetChanges
service/git/merge_base
All pieces inside of
service/git/merge_base
are documented in src-d/go-git#1078.