Fascinating. Can someone detail the pros/cons of this? Are they copying/cherry-p...

anarazel · on Feb 9, 2017

I think there's two different discussions here. One about cherry-picking for back-branch releases, one about development for the new version.

Re back-branches: It's not particularly practical to use merges from/to stable branches. It may sound nice at first, but there's enough fixes that don't apply to all versions (because a feature didn't exist yet, because refactorings made bugs "accidentally" disappear, ...) that merging either to or from back-branches leads to very very messy merges, where the merge commit has to back out changes and such. I don't know of any larger project that successfully uses merges to merge to/from stable branches. So yes, it's just cherry-picking.

Re development: The case here is a lot less clear. The project used CVS for a long while, and during the migration it was decided to not make the migration harder by changing workflows even more significantly than just CVS->git. So the, enforced, policy became that no merge commits are to appear. It hasn't become a significant problem since, so that policy hasn't evolved at this point. Given the relatively small number of active committers in the project, and the fact that commits in the project are expected to "stand on their own" (i.e. are complete and working), rebasing changes before committing them isn't a problem. Several contributors, including committers, do their development in separate repositories / branches, however. But usually the history there is messy enough that they wouldn't be merged anyway. I think merging is more of a benefit when you have a more hierarchically organized project like the kernel, with subsystem maintainers that aggregate a large volume of changes, which then get pushed to Linus, who then integrates all of those.

Hope that roughly answers your question? If not, feel free to ask more about specifics.

(For context: I'm one of the committers in the project)

anarazel · on Feb 9, 2017

Oh, and because it's relevant for the question: There's a tool in the tree (https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f...) that shows (by heuristics) which branches a commit has been cherry-picked to. Looks like this:

    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Branch: master [9d4ca0131] 2017-01-26 12:18:07 -0500
    Branch: REL9_6_STABLE [2dfc12647] 2017-01-26 12:17:47 -0500
    Branch: REL9_5_STABLE [423ad86f4] 2017-01-26 12:17:47 -0500
    Branch: REL9_4_STABLE [2c1976a6c] 2017-01-26 12:17:47 -0500
    Branch: REL9_3_STABLE [2e024f83b] 2017-01-26 12:17:47 -0500
    Branch: REL9_2_STABLE [fe6120f9b] 2017-01-26 12:17:47 -0500
    
        Ensure that a tsquery like '!foo' matches empty tsvectors.
        
        !foo means "the tsvector does not contain foo", and therefore it should
        match an empty tsvector.  ts_match_vq() overenthusiastically supposed
        that an empty tsvector could never match any query, so it forcibly
        returned FALSE, the wrong answer.  Remove the premature optimization.
        
        Our behavior on this point was inconsistent, because while seqscans and
        GIST index searches both failed to match empty tsvectors, GIN index
        searches would find them, since GIN scans don't rely on ts_match_vq().
        That makes this certainly a bug, not a debatable definition disagreement,
        so back-patch to all supported branches.
        
        Report and diagnosis by Tom Dunstan (bug #14515); added test cases by me.
        
        Discussion: https://postgr.es/m/20170126025524.1434.97828@wrigleys.postgresql.org

Edit: fighting with formatting #2

gsylvie · on Feb 9, 2017

git patch-id is not bad for this:

Here are the four commits with "Fix roundoff problems in float8_timestamptz() and make_interval()" as their commit message:

  $ git log --pretty="%h - %d" --date-order --all --grep=float8_time 
  1888fad440 -  (origin/REL9_4_STABLE)
  7786b98482 -  (origin/REL9_5_STABLE)
  404756fe89 -  (origin/REL9_6_STABLE)
  8f93bd8512 -

And here's what "git patch-id" thinks of them:

  $ git log -p --date-order --all --grep=float8_time | git patch-id
  cd559dd389e09655a64c3c6f41c2f76e6fd72b77 1888fad440036195c7e7a933fc17410fad8dcc3d
  cd559dd389e09655a64c3c6f41c2f76e6fd72b77 7786b984825ea720aed3a11ee465dc3d6cfc8d96
  63a51849977948d89992cbfbbd86c2598c92072f 404756fe89f62735f6075abb594b54be9c262b27
  63a51849977948d89992cbfbbd86c2598c92072f 8f93bd8512466c9b6c4dbc1e5efd0f72b8e2be9a

It can tell that the 9.4 and 9.5 patches are identical to each other, and that the 9.6 and master patches are also identical to each other. But there were differences between the two groups.

richardwhiuk · on Feb 9, 2017

This is fairly easy to do, if you enforce a policy of requiring cherry-picks to reference the original commit (i.e. requiring -x in the git cherry-pick command).

You can the grep the log for all the commits which contain the commit ID you are interested in (and any that mention those commits).

anarazel · on Feb 9, 2017

That's not always that helpful, because you often will want to cherry-pick from another branch than master. E.g. when cherry-picking a change from master into 9.6, 9.5, 9.4, 9.3, 9.2 it'll e.g. often be easier to cherry-pick from 9.3 into 9.2, rather than master into 9.2 (due to conflicts increasing the further back you go). Obviously you could do that manually or script it regardless. But the current heuristics have worked without problems for years, so there seems little reason to change things ;)

mjw1007 · on Feb 9, 2017

Even if fixes always applied cleanly to back branches, git doesn't provide a way to represent a 'backpatch'.

That is, once you've committed a patch to the the current mainline, there's no official way to indicate that a commit on an older branch is "the same".

That's a shame, as there would a natural way to indicate this in git's 'graph', if only you were allowed to tell it about additional parents after a commit is made. But that's not an option as long as a commit's parents are part of its hashed data.

(If you could add parents, you'd make a new commit by applying the patch to the branchpoint, merge that into the release branch, and also add it as a new parent to the original commit, turning the original commit into a merge.)

geekone · on Feb 9, 2017

Might be dated, but check out the No Merge Commits section of the article at https://lwn.net/Articles/409635/ for possible reasoning they had/have.

vog · on Feb 9, 2017

I think that's the only explaination.

justinlaster · on Feb 9, 2017

Probably rebasing more than anything. cherry-picking is a messier, yet easier way to get the same history, you just need to clean up after yourself.

Merge commits are useful, in my opinion, if there was significant work done to get two branches to "align" with each other. Otherwise, just rebase your work for a cleaner history.

So in short:

Merging: More "accurate" history about the work done to bring the work into the main/dev branch, yet messier.

Rebasing/cherry-picking: cleaner history, potentially at the expense of context and "accurate" history about the work done to bring in code to the main/dev branch.