8000 Performance and memory issues cloning large repositories · Issue #447 · src-d/go-git · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.
This repository was archived by the owner on Sep 11, 2020. It is now read-only.
Performance and memory issues cloning large repositories #447
Open
@osklyar

Description

@osklyar

When cloning large repositories, with respect to the occupied space and less so with respect to the number of commits go-git uses some sort of a different strategy than git resulting in massive memory footprint and very long clone times. Here cloning a repo that unpacks into 1.5Gb and contains ca. 110k commits, go-git uses up to 5Gb RAM and runs over 4m while git uses 290Mb and runs in about 1m (tested with geat based on go-git):

➜  date && geat clone git@gitserver:myrepo && date            
Mit Jun 21 14:17:28 CEST 2017
=> clone: myrepo cloned from git@gitserver:myrepo into origin
Mit Jun 21 14:21:42 CEST 2017
➜  du -s myrepo 
1463492	myrepo
➜  date && git clone git@gitserver:myrepo && date
Mit Jun 21 14:23:58 CEST 2017
Cloning into 'myrepo'...
remote: Counting objects: 974758, done.
remote: Compressing objects: 100% (167444/167444), done.
remote: Total 974758 (delta 798392), reused 973743 (delta 797550)
Receiving objects: 100% (974758/974758), 791.39 MiB | 67.73 MiB/s, done.
Resolving deltas: 100% (798392/798392), done.
Mit Jun 21 14:25:09 CEST 2017

Memory requirements scales more or less linearly with the commit number and repository size, below e.g. a smaller repo with quite a lot of commits and go-git uses about 8x more memory than git. On the performance side, the growth of the repository size leads to much faster degradation: for the 1.5 Gb repo about the difference is 4 times, for a 10 times smaller repo below the times are about the same for git and go-git while git shows approximately the same times as for 1.5Gb repo.

Cloning github.com:moby/moby with 32k commits and 170Mb overall unpacked size takes about the same 1m20s with both git and go-git. Memory wise, go-git loses uses the max of 320Mb (2x the repo size) and git 45Mb (0.25x the repo size):

➜  date && geat clone git@github.com:moby/moby && date
Mit Jun 21 13:32:03 CEST 2017
=> clone: moby cloned from git@github.com:moby/moby into origin
Mit Jun 21 13:32:38 CEST 2017
➜  date && git clone git@github.com:moby/moby && date
Mit Jun 21 13:33:05 CEST 2017
Cloning into 'moby'...
remote: Counting objects: 229544, done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 229544 (delta 23), reused 17 (delta 14), pack-reused 229495
Receiving objects: 100% (229544/229544), 127.47 MiB | 5.22 MiB/s, done.
Resolving deltas: 100% (152573/152573), done.
Mit Jun 21 13:33:35 CEST 2017

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0