Understanging冲突合并算法(Understanging Conflicts Merging Algorithm)

我看着一个合并标记,看起来全部搞砸了。 给你的情况让我们有这样的:

public void methodA() { prepare(); try { doSomething(); } catch(Exception e) { doSomethingElse(); } }

现在进入合并(我使用SourceTree进行拉)。 标记看起来像这样:

<<<<<<<<< HEAD try { doSomething(); } catch(Exception e) { doSomethingElse(); } ============================ private void methodB() { doOtherStuff(); >>>>>>>> 9832432984384398949873ab }

因此,拉取的提交所做的是完全删除methodA并添加methodB。

但是你注意到有些行完全丢失了。

从我所了解的过程中,Git正在尝试所谓的自动合并,如果这种情况失败并且在检测到冲突的情况下,则完全合并由标记为'<<< * HEAD'+'====之前的部分表示'+ + >>> >>> CommitID'并准备手动冲突解决。

那么为什么它会遗漏一些线。 它看起来更像是一个bug。

我使用Windows7和安装的git版本是2.6.2.windows.1 。 虽然最新的版本是2.9,但我想知道是否有任何关于git版本的合并问题已知如此之多? 这不是我第一次经历这样的事情......。

I look at a merge marker that looked all screwed up. To give you the situation lets have this:

public void methodA() { prepare(); try { doSomething(); } catch(Exception e) { doSomethingElse(); } }

Now comes in a merge (I use SourceTree for pull). And the marker looks like this:

<<<<<<<<< HEAD try { doSomething(); } catch(Exception e) { doSomethingElse(); } ============================ private void methodB() { doOtherStuff(); >>>>>>>> 9832432984384398949873ab }

So what the pulled commit does is removing the methodA completely and adding methodB instead.

But you notice that there are some lines entirely missing.

From what I understand of the process, Git is trying a so called auto-merge and if this fails and conflicts where detected, the complete merge is expressed by parts marked with '<<<* HEAD' + before + '====' + after + '>>>* CommitID' and prepare a manual conflict resolution.

So why does it leave out some lines. It looks more like a bug to me.

I use Windows7 and the installed git version is 2.6.2.windows.1. While the newest version is 2.9, I wonder if anything is known about a git version having a merge problem of this magnitude? This is not the first time I experienced something like this... .

最满意答案

你应该担心的是:Git对语言一无所知,其内置的合并算法严格基于线上时间比较。 您不必使用这种内置的合并算法 ,但大多数人会这样做,因为(a)它大部分是正常工作的,并且(b)没有多少选择。

请注意,这取决于您的合并策略 ( -s参数); 下面的文字是默认的recursive策略。 resolve策略与recursive非常相似; octopus策略适用于不止两次提交; 而ours战略完全不同(并且不像-X ours )。 您还可以使用.gitattributes和“合并驱动程序”为特定文件选择替代策略或算法。 而且,这些都不适用于Git决定认为是“二元”的文件:对于这些文件,它甚至不尝试合并。 (我不打算在这里介绍任何这个,只是默认的recursive策略如何处理文件。)

如何git merge (当使用默认的-s recursive )

合并从两个提交开始:当前(也称为“我们”,“本地”和HEAD )和一些“其他”(也称为“他们”和“远程”) 合并找到这些提交之间的合并基础 通常情况下,这只是另一个提交:隐含分支1加入的第一个点 在一些特殊情况下(多个合并基础候选者),Git必须发明一个“虚拟合并基础”(但我们将在这里忽略这些案例) 合并运行两个差异: git diff base local和git diff base other 这些重命名检测已打开 您可以自己运行这些相同的差异来查看合并的结果

你可以将这两种差异看作“我们做了什么”和“他们做了什么”。 合并的目标结合 “我们所做的”和“他们做了什么”。 差异是基于行的,来自最小编辑距离算法2 ,实际上只是Git对我们所做的事情以及他们做了什么的猜测

一个 diff(base-vs-local)的输出告诉Git哪些基本文件与哪些本地文件相对应,即如何将当前提交的名称追溯到基本文件。 然后,Git可以使用基名来发现其他提交中的重命名或删除。 大多数情况下,我们可以忽略重命名和删除问题,以及新文件创建问题。 请注意,Git版本2.9默认为所有差异启用重命名检测,而不仅仅是合并差异。 (您可以通过将diff.renames配置为true来在早期的Git版本diff.renames ;另请参阅diff.renames的git config设置。)

如果文件仅在一侧 (基本到本地或基本到另一侧)发生更改,Git只是简单地进行这些更改。 当文件双方都改变时,Git只需要进行三路合并。

为了执行三路合并 ,Git基本上遍历两个差异(基本到本地和基本到其他),一次一个“差异大块”,比较改变的区域。 如果每个块都影响原始基础文件的不同部分 ,Git只需要这个块。 如果某些块影响基本文件的相同部分,则Git将尝试获取任何更改的副本。

例如,如果本地更改显示“添加近端支撑线”,并且远程更改显示“添加(相同位置,相同缩进)右侧支撑线”,则Git将只接收近端支撑的一个副本。 如果两者都说“删除一条紧密的支撑线”,那么Git只会删除一条线。

只有当两个差异冲突 - 例如,一个说“添加一个紧缩支撑线缩进12个空格”,另一个说“添加一个紧缩支撑线缩进11个空格”将Git声明冲突。 默认情况下,Git将冲突写入文件,显示两组更改 - 并且,如果将merge.conflictstyle设置为diff3 , 还会显示文件的合并版本的代码

Git适用于任何非冲突的差异统计。 如果发生冲突,Git通常会将文件置于“冲突合并”状态。 然而,这两个-X参数( -X ours和-X theirs )修改了这个: -X ours Git在冲突中选择了“我们的”差异,并且将这个变化放在忽略“他们”的变化中。 用-X theirs Git选择“他们的”差异大块并且把这个变化放入,忽略“我们”的变化。 这两个-X参数保证Git最终不会声明冲突。

如果Git能够自己解决这个文件的所有问题,那么它会这样做:在工作树和索引/登台区域中,您可以获得基本文件,以及本地更改,以及其他更改。

如果Git无法自行解决所有问题,则会使用三个特殊的非零索引插槽将文件的基本版本,其他版本和本地版本放入索引/暂存区域。 工作树版本始终是“Git能够解决的问题,再加上由各种可配置项目指导的冲突标记。”

每个索引条目都有四个插槽

像foo.java这样的文件通常放在插槽0中。 这意味着它现在已准备好进入新的提交。 根据定义,其他三个插槽是空的,因为有一个插槽零条目。

在冲突的合并期间,插槽零为空,插槽1-3用于保存合并基础版本,“本地”或“ - ”版本以及其他版本。 工作树持有正在进行的合并。

您可以使用git checkout来提取任何这些版本,或者使用git checkout -m重新创建合并冲突。 所有成功的git checkout命令都会更新文件的工作树版本。

一些git checkout命令不受干扰地保留各个插槽。 一些git checkout命令写入插槽0,擦除插槽1-3中的条目,以便文件准备好提交。 (要知道哪些人做了什么,你只需要记住他们,我在他们头脑中错了很长一段时间。)

在清除所有未合并的插槽之前,您无法运行git commit 。 您可以使用git ls-files --unmerged查看未合并的插槽,或者使用git status来获得更加人性化的版本。 (提示:使用git status ,经常使用!)

成功的合并并不意味着好的代码

即使git merge成功自动合并所有内容,但这并不意味着结果是正确的! 当然,当它停止冲突时,这也意味着Git无法自动合并所有内容,而不是自动合并的内容是正确的。 我喜欢将merge.conflictstyle设置为diff3以便我可以看到Git认为基础是什么,之后它将合并的两侧替换为“基本”代码。 通常会发生冲突,因为差异选择了错误的基础 - 例如一些匹配的大括号和/或空白行 - 而不是因为必须存在实际的冲突。

至少在理论上,使用“耐心”差异可能会导致基础选择不佳。 我自己没有尝试过这个。 Git 2.9中新的“压缩启发式”很有前途,但我还没有尝试过。

您必须始终检查和/或测试合并结果。 如果合并已经提交,你可以编辑文件,编译和测试, git add更正后的版本,并使用git commit --amend来git commit --amend前一个(不正确的)合并提交,并将其与同一父母。 ( git commit --amend的--amend部分git commit --amend是虚假的广告,它不会改变当前提交本身,因为它不能;相反,它会使用与当前提交相同的父ID进行新提交,而不是使用当前提交的ID作为新提交的父项的常规方法。)

你也可以使用--no-commit来禁止合并的自动--no-commit 。 在实践中,我发现对此几乎没有必要:大多数合并大多只是工作,并且快速地目睹git show -m和/或“它编译并通过单元测试”捕捉问题。 然而,在冲突或--no-commit合并期间,一个简单的git diff将会给你一个组合diff(与你在提交合并之后使用git show获得的相同类型),这可能是有用的,或者可能是有用的会更混乱。 正如Gregg在评论中指出的那样,您可以运行更具体的git diff命令和/或检查三个(基本,本地和其他)插槽条目。

看看Git会看到什么

除了使用diff3作为merge.conflictstyle ,你可以看到git merge会看到的差异。 你所需要做的就是运行两个git diff命令 - 与git merge两个命令将运行。

要做到这些,你必须找到 - 或者至少告诉git diff来找到合并基础 。 你可以使用git merge-base ,从字面上找到(或全部)合并基础并打印出来:

$ git merge-base --all HEAD foo 4fb3b9e0570d2fb875a24a037e39bdb2df6c1114

这表示在当前分支和分支foo ,合并基础是提交4fb3b9e... (并且只有一个这样的合并基础)。 然后我可以运行git diff 4fb3b9e HEAD和git diff 4fb3b9e foo 。 但是有一个更简单的方法,只要我可以假设只有一个合并基础:

$ git diff foo...HEAD # note: three dots

这会告诉git diff (并且只有 git diff )才能找到foo和HEAD之间的合并基础,然后比较该提交 - 合并基础 - 提交HEAD 。 和:

$ git diff HEAD...foo # again, three dots

做同样的事情,找到HEAD和foo之间的合并基础 - “merge base”是可交换的,所以它们应该与其他方式相同,例如7 + 2和2 + 7都是9--但是这次差异合并反对提交foo 。 1

(对于其他命令 - 不是git diff的东西 - 三点语法产生了一个对称的区别 :任何一个分支上的所有提交的集合,但不是两个分支。对于具有单个合并基础提交的分支,这是“合并基础之后,每个分支上的每一次提交”:换句话说,这两个分支的合并,不包括合并基础本身和任何先前的提交。对于具有多个合并基数的分支,这将减去所有合并基数。 git diff我们只是假设只有一个合并基础,而不是将它和它的祖先相减,我们用它作为diff的左边或“之前”一边。)


1在Git中,分支名称标识了一个特定的提交,即分支的提示 。 实际上,这是分支实际工作的方式:分支名称命名特定的提交,并且为了向分支分支添加另一个提交,这里意味着提交链 --Git创建了一个新的提交,其父代是当前分支提示,然后将分支名称指向新的提交。 “分支”一词可以指分支名称或整个提交链; 我们应该根据上下文来确定哪一个。

在任何时候,我们都可以命名一个特定的提交,并将其作为一个分支,通过提交该提交及其所有祖先 :它的父代,其父代的父代等。 当我们触及一个合并提交 - 一个包含两个或更多父母的提交时 - 在这个过程中,我们将所有父提交以及他们父母的父母,等等。

2这个算法实际上是可选的。 默认的myers基于Eugene Myers的算法,但Git有其他几个选项。

You are correct to be concerned: Git knows nothing of languages, and its built-in merge algorithm is based strictly on line-at-time comparisons. You do not have to use this built-in merge algorithm, but most people do because (a) it mostly just works, and (b) there are not that many alternatives.

Note that this depends on your merge strategy (-s argument); the text below is for the default recursive strategy. The resolve strategy is pretty similar to recursive; the octopus strategy applies to more than just two commits; and the ours strategy is entirely different (and is nothing like -X ours). You can also select alternative strategies or algorithms for specific files, using .gitattributes and "merge drivers". And, none of this applies to files that Git has decided to believe are "binary": for these, it does not even attempt merging. (I am not going to cover any of that here, just how the default recursive strategy treats files.)

How git merge works (when using the default -s recursive)

Merge starts with two commits: the current one (also called "ours", "local", and HEAD), and some "other" one (also called "theirs" and "remote") Merge finds the merge base between these commits Normally that's just one other commit: the one at the first point where the implied branches1 join up In some special cases (multiple merge base candidates), Git must invent a "virtual merge base" (but we'll ignore these cases here) Merge runs two diffs: git diff base local and git diff base other These have rename detection turned on You can run these same diffs yourself to see what merge will see

You can think of these two diffs as "what we did" and "what they did". The goal of a merge is to combine "what we did" and "what they did". The diffs are line based, come from a minimal edit distance algorithm,2 and are really just Git's guess about what we did, and what they did.

The output of the first diff (base-vs-local) tells Git which base files correspond to which local files, i.e., how to follow names from the current commit back to the base. Git can then use the base names to spot renames or deletes in the other commit as well. For the most part we can just ignore rename and delete issues, and also new-file-creation issues. Note that Git version 2.9 turns on rename detection by default for all diffs, not just merge diffs. (You can turn this on yourself in earlier Git versions by configuring diff.renames to true; see also the git config setting for diff.renameLimit.)

If a file is changed on only one side (base-to-local, or base-to-other), Git simply takes those changes. Git only has to do a three-way merge when a file is changed on both sides.

To perform a three-way merge, Git essentially walks through the two diffs (base-to-local and base-to-other), one "diff hunk" at a time, comparing the changed regions. If each hunk affects a different part of the original base file, Git just takes that hunk. If some hunk(s) affect the same part of the base file, Git tries to take one copy of whatever that change is.

For instance, if the local change says "add a close brace line" and the remote change says "add (the same place, same indentation) close brace line", Git will take just one copy of the close brace. If both say "delete a close brace line" Git will just delete the line once.

Only if the two diffs conflict—e.g., one says "add a close brace line indented 12 spaces" and the other says "add a close brace line indented 11 spaces" will Git declare a conflict. By default, Git writes the conflict into the file, showing the two sets of changes—and, if you set merge.conflictstyle to diff3, also showing the code from the merge-base version of the file.

Any non-conflicting diff hunks, Git applies. If there were conflicts, Git normally leaves the file in "conflicted merge" state. However, the two -X arguments (-X ours and -X theirs) modify this: with -X ours Git chooses "our" diff hunk in the conflict, and puts that change in, ignoring "their" change. With -X theirs Git chooses "their" diff hunk and puts that change in, ignoring "our" change. These two -X arguments guarantee that Git does not declare a conflict after all.

If Git is able to resolve everything on its own for this file, it does so: you get the base file, plus your local changes, plus their other changes, in the work-tree and in the index/staging-area.

If Git is not able to resolve everything on its own, it puts the base, other, and local versions of the file into the index/staging-area, using the three special nonzero index slots. The work-tree version is always "what Git was able to resolve, plus the conflict markers as directed by various configurable items."

Every index entry has four slots

A file such as foo.java is normally staged in slot zero. This means it is ready to go into a new commit now. The other three slots are empty, by definition, because there is a slot-zero entry.

During a conflicted merge, slot zero is left empty, and slots 1-3 are used to hold the merge base version, the "local" or --ours version, and the other or --theirs version. The work-tree holds the in-progress merge.

You can use git checkout to extract any of these versions, or git checkout -m to re-create the merge conflict. All successful git checkout commands update the work-tree version of the file.

Some git checkout commands leave the various slots undisturbed. Some git checkout commands write into slot 0, wiping out the entries in slots 1-3, so that the file is ready for commit. (To know which ones do what, you just have to memorize them. I had them wrong, in my head, for quite a while.)

You cannot run git commit until all unmerged slots have been cleared out. You can use git ls-files --unmerged to view unmerged slots, or git status for a more human-friendly version. (Hint: use git status. Use it often!)

Successful merge does not mean good code

Even if git merge successfully auto-merges everything, that does not mean the result is correct! Of course, when it stops with a conflict, this also means that Git was not able to auto-merge everything, not that what it has auto-merged on its own is correct. I like to set merge.conflictstyle to diff3 so that I can see what Git thought the base was, before it replaced that "base" code with the two sides of the merge. Often a conflict happens because the diff chose the wrong base—such as some matching braces and/or blank lines—rather than because there had to be an actual conflict.

Using the "patience" diff can held with poor base choice, at least in theory. I have not experimented with this myself. The new "compaction heuristic" in Git 2.9 is promising, but I have not experimented with this either.

You must always inspect and/or test the results of a merge. If the merge is already committed, you can edit files, build and test, git add the corrected versions, and use git commit --amend to shove the previous (incorrect) merge commit out of the way and put in a different commit with the same parents. (The --amend part of git commit --amend is false advertising. It does not change the current commit itself, because it can not; instead, it makes a new commit with the same parent IDs as the current commit, instead of the normal method of using the current commit's ID as the new commit's parent.)

You can also suppress the auto-commit of a merge with --no-commit. In practice, I have found little need for this: most merges mostly just work, and a quick eyeballing of git show -m and/or "it compiles and passes unit tests" catches problems. However, during a conflicted or --no-commit merge, a simple git diff will give you a combined diff (the same sort you get with git show without -m, after you commit the merge), which can be helpful, or may be more confusing. You can run more-specific git diff commands and/or inspect the three (base, local, other) slot entries, as Gregg noted in a comment.

Seeing what Git will see

Besides using diff3 as your merge.conflictstyle, you can see the diffs that git merge will see. All you need to do is run two git diff commands—the same two that git merge will run.

To do these, you must find—or at least, tell git diff to find—the merge base. You can use git merge-base, which literally finds the (or all) merge base(s) and prints them out:

$ git merge-base --all HEAD foo 4fb3b9e0570d2fb875a24a037e39bdb2df6c1114

This says that between the current branch and branch foo, the merge base is commit 4fb3b9e... (and there is only one such merge base). I can then run git diff 4fb3b9e HEAD and git diff 4fb3b9e foo. But there is an easier way, as long as I can assume that there is only the one merge base:

$ git diff foo...HEAD # note: three dots

This tells git diff (and only git diff) to find the merge base between foo and HEAD, and then compare that commit—that merge base—to commit HEAD. And:

$ git diff HEAD...foo # again, three dots

does the same thing, find the merge base between HEAD and foo—"merge base" is commutative so these should be the same as the other way around, like 7+2 and 2+7 are both 9—but this time diff the merge base against commit foo.1

(For other commands—things that are not git diff—the three-dot syntax produces a symmetric difference: the set of all commits that are on either branch, but not on both branches. For branches with a single merge base commit, this is "every commit after the merge base, on each branch": in other words, the union of the two branches, excluding the merge base itself and any earlier commits. For branches with multiple merge bases, this subtracts away all the merge bases. For git diff we just assume there's only the one merge base, and instead of subtracting it and its ancestors away, we use it as the left or "before" side of the diff.)


1In Git, a branch name identifies one particular commit, namely the tip of the branch. In fact, this is how branches actually work: a branch name names a specific commit, and in order to add another commit to the branch—branch here meaning the chain of commits—Git makes a new commit whose parent is the current branch-tip, then points the branch name at the new commit. The word "branch" can refer to either the branch name, or the entire chain of commits; we are supposed to figure out which one by context.

At any time, we can name one specific commit, and treat that as a branch, by taking that commit and all its ancestors: its parent, its parent's parent, and so on. When we hit a merge commit—a commit with two or more parents—in this process, we take all the parent commits, and their parents' parents, and so on.

2This algorithm is actually selectable. The default myers is based on an algorithm by Eugene Myers, but Git has a few other options.

更多推荐