Commit graph

4 commits

Author SHA1 Message Date
Michael Forney 2e47badb88 git/query: refactor graph painting algorithm (findtwixt, lca)
We now keep track of 3 sets during traversal:
- keep: commits we've reached from head commits
- drop: commits we've reached from tail commits
- skip: ancestors of commits in both 'keep' and 'drop'

Commits in 'keep' and/or 'drop' may be added later to the 'skip' set
if we discover later that they are part of a common subgraph of the
head and tail commits.

From these sets we can calculate the commits we are interested in:
lca commits are those in 'keep' and 'drop', but not in 'skip'.
findtwixt commits are those in 'keep', but not in 'drop' or 'skip'.

The "LCA" commit returned is a common ancestor such that there are no
other common ancestors that can reach that commit.  Although there can
be multiple commits that meet this criteria, where one is technically
lower on the commit-graph than the other, these cases only happen in
complex merge arrangements and any choice is likely a decent merge
base.

Repainting is now done in paint() directly.  When we find a boundary
commit, we switch our paint color to 'skip'.  'skip' painting does
not stop when it hits another color; we continue until we are left
with only 'skip' commits on the queue.

This fixes several mishandled cases in the current algorithm:
1. If we hit the common subgraph from tail commits first (if the tail
   commit was newer than the head commit), we ended up traversing the
   entire commit graph.  This is because we couldn't distinguish
   between 'drop' commits that were part of the common subgraph, and
   those that were still looking for it.
2. If we traversed through an initial part of the common subgraph from
   head commits before reaching it from tail commits, these commits
   were returned from findtwixt even though they were also reachable
   from tail commits.
3. In the same case as 2, we might end up choosing an incorrect
   commit as the LCA, which is an ancestor of the real LCA.
2022-03-16 21:41:59 +00:00
Ori Bernstein f63d1d3ced git: size cache in bytes, not objects
git used to track cache size in object
count, rather than bytes. This had the
unfortunate effect of making memory use
depend on the size of objects -- repos
with lots of large objects could cause
out of memory deaths.

now, we track sizes in bytes, which should
keep our memory usage flatter.
2022-01-02 03:37:23 +00:00
Ori Bernstein c7dcc82b0b git/query: fix spurious merge requests
Due to the way LCA is defined, a using a strict LCA
on a graph like this:

 <--a--b--c--d--e--f--g
     \               /
       +-----h-------

can lead to spurious requests to merge. This happens
because 'lca(b, g)' would return 'a', since it can be
reached in one step from 'b', and 2 steps from 'g', while
reaching 'b' from 'a' would be a longer path.

As a result, we need to implement an lca variant that
returns the starting node if one is reachable from the
other, even if it's already found the technically correct
least common ancestor.

This replaces our LCA algorithm with one based on the
painting we do while finding a twixt, making it give
the resutls we want.
git/query: fix spurious merge requests

Due to the way LCA is defined, a using a strict LCA
on a graph like this:

 <--a--b--c--d--e--f--g
     \               /
       +-----h-------

can lead to spurious requests to merge. This happens
because 'lca(b, g)' would return 'a', since it can be
reached in one step from 'b', and 2 steps from 'g', while
reaching 'b' from 'a' would be a longer path.

As a result, we need to implement an lca variant that
returns the starting node if one is reachable from the
other, even if it's already found the technically correct
least common ancestor.

This replaces our LCA algorithm with one based on the
painting we do while finding a twixt.
2021-09-11 17:46:26 +00:00
Ori Bernstein 1ee1bfaa8c git: got git?
Add a snapshot of git9 to 9front.
2021-05-16 18:49:45 -07:00