Inspired by some changes made in game of trees, I've
implemented a number of speedups in git9.
First, hashing the chunks during deltification with
murmurhash instead of sha1 speeds up the delta search
significantly.
The stretch function was micro-optimized a bit as well,
since that was taking a large portion of the time when
chunking.
Finally, the full path is not stored. We only care about
grouping files with the same name and path. We don't care
about the ordering. Therefore, only the hash of the path
xored with the hash of the diretory is kept, which saves
a bunch of mallocs and string munging.
This reduces the time spent repacking some test repos
significantly.
9front:
% time git/repack
deltifying 97473 objects: 100%
writing 97473 objects: 100%
indexing 97473 objects: 100%
58.85u 1.39s 61.82r git/repack
% time /sys/src/cmd/git/6.repack
deltifying 97473 objects: 100%
writing 97473 objects: 100%
indexing 97473 objects: 100%
43.86u 1.29s 47.51r /sys/src/cmd/git/6.repack
openbsd:
% time git/repack
deltifying 2092325 objects: 100%
writing 2092325 objects: 100%
indexing 2092325 objects: 100%
1589.48u 45.03s 1729.18r git/repack
% time /sys/src/cmd/git/6.repack
deltifying 2092325 objects: 100%
writing 2092325 objects: 100%
indexing 2092325 objects: 100%
1238.68u 41.49s 1373.15r /sys/src/cmd/git/6.repack
go:
% time git/repack
deltifying 529507 objects: 100%
writing 529507 objects: 100%
indexing 529507 objects: 100%
345.32u 7.71s 369.25r git/repack
% time /sys/src/cmd/git/6.repack
deltifying 529507 objects: 100%
writing 529507 objects: 100%
indexing 529507 objects: 100%
248.07u 4.47s 257.59r /sys/src/cmd/git/6.repack
git used to track cache size in object
count, rather than bytes. This had the
unfortunate effect of making memory use
depend on the size of objects -- repos
with lots of large objects could cause
out of memory deaths.
now, we track sizes in bytes, which should
keep our memory usage flatter.
the pack cache was very stupid: it would close packs
as early as possible, which would prevent packs from
getting reused effectively. It would also select a
bad pack to close.
This picks the oldest pack, refcounts correctly, and
keeps up to Npackcache open at once (though it will
go over if more are in use).
When pulling into a git repository that is group
writable as a non-owner, the pack file is left
in place because we do not have permission to
remove it.
We also leave it behind if we bail out early due
to an error, or due to only listing the changes.
This pushes down the creation of the file, and
cleans it up on error.
thanks to Anthony Martin for spotting the bug.
git/fetch: ensure we clean packfiles on failure
When pulling into a git repository that is group
writable as a non-owner, the pack file is left
in place because we do not have permission to
remove it.
We also leave it behind if we bail out early due
to an error, or due to only listing the changes.
This pushes down the creation of the file, and
cleans it up on error.
Also, while we're here, clean up index caching,
and ensure we close the fd in all cases.
thanks to Anthony Martin for spotting the bug.
Since we now store /dist/plan9front in git, the
initial assumption that the owner of the repo
is the person touching it is not always true.
This change gives us a better heuristic for the
file permissions we should have in the files we
copy around, basing it off of the permissions of
the .git directory.
git/push died within a subshell, which prevented the
whole program from exiting, and lead to an incorrect
ref update line that confused people.
git/send would eventually error out, but would push
all the data before that happened; this was annoying.
We weren't giving all objects to the twixt() function, and
it was making bad life choices -- gambling, smoking, drinking,
and packing in too much data.
With more information, it doesn't do the last.