Sunday, March 13, 2011

Merging Branches with SVN

When I read things like "the world's most popular open source version control system" and (paraphrased) "designed to fix to fix CVS's problems" I don't think "took important steps backwards" but I find aspects of SVN have done just that. In particular, merging tags/branches becomes a challenge when the merge source has multiple revision anchors.

As expected, one can accomplish simple merges quickly using explanation of commands available in the SVN Red-bean Book. However, if you're working with multiple committors, or even one or two folk who liberally tag/branch, you'll run into unexpected results quickly. The problem stems from the fact that SVN's merge command does not automatically resolve the previous branch points for merge on a file-by-file basis. As such, when you give a
svn merge -r<source>:<target> ... 
command, you may not be giving SVN enough information to do the correct thing. This is particularly the case if you or your fellow committors do things like merging in bulk from the project's root directory.

Consider an example:

(*)         - r1 User A
 |
(*)         - r2 User A
 |  \
 |    \ 
 |     (*)  - r3 User B
(*)     |   - r4 User C
 |   \  |
 |     (*)  - r5 User A 
 |     /
 |   /
 (?)

In the case visualized above, we have two users committing code to two branches. Remember, SVN uses a single global (to the repository, not the project) incrementally increasing value to represent revision. In the case represented, A user (A) commits twice (r1 and r2) and then a second user (B) branches (r3).

Subsequent to that branch, A continues developing while C commits to what we'll refer to as the 'trunk'. Then, maybe because of a bug fix or lunch discussion, A commits to the branch. What we have, in commit #5, is a situation in which the files/changes (r4) were drawn from the trunk before modifications were made.

At question: how does A merge trunk and branch with 1) least amount of pain and 2) save the highest resolution of change meta-data (and thus make subsequent situations like the r4-->r5 merge less painful)?

While SVN supports multiple approaches to branching/merging, I've found the solution to this problem that optimizes #1 and #2 from above involves a crucial extra step. Consider the following method:
  1. Make sure your own branch is up to date
  2. Determine the revision from which source material forked from target
  3. Construct a merge command based on the computed source revision

The step people skip is #2.  In our example above, merging from the branch represented by commits r3-4 require you to specify r3 as the source of merge material while merging files changed by commit r5 from the branch to the trunk requires specifying r5 to preserve the later merge meta-data (*1). To make this more concrete, imagine that A has a checkout of the trunk, currently at r1 that (s)he intends to update to r5 to reflect the branch's changes. Let's follow the process.
  1. Conducting an update will pull down files from r4, changed in the trunk
  2. Iterating through remaining files in the source tree, two sets of source anchors will be reported: Set X (r2) & Set Y (r5)
  3. Conducting the merge requires issuing two merge commands, each on its respective set from the previous bullet:
    svn merge -r2:HEAD <'branch' URL> <'trunk' path to Set X>
    svn merge -r3:HEAD <'branch' URL> <'trunk' path to Set Y>
    
If committors confine Set X to a single sub-directory, then the commands indicated in the last list item can be issued as they're parametrized: as single <'branch' src> <'trunk' target> tuples. However, if changed files spread across sub-directories, developers conducting merges will have to issue multiple commands, each specifying specific 'branch' source URLs and target paths. Yes, this frustrates everybody involved. The up-shot? as the person conducting the merge, you move slowly and methodically through a merge, understanding changes to each file / directory explicitly (especially where changes have fractured themselves across directories). This has also caused me and my development lead to merge between branches more often than on previous projects--causing us to remain more in sync with each other.

I provide two tools to make the process easier. First, a simple script to accomplish merge step #2: determine the revision at which the source material forked:

 #!/bin/sh


svn log --stop-on-copy | grep '^r' | tail -n 1 | cut -f 1 -d ' ' | cut -f 2- -d 'r'

Name this file something like SVN_determine_revision_anchor and pass it the file / directory of which you desire to know the last branch point.
If you don't like calling the merge command manually (long URL paths can make this a pain), use something like the following, which I named merge.sh:

 
#!/bin/sh


URL_PREFACE="https://svn.myorg.org/svn/repos/dev/myapp"

URL_SUFFIX="current"


TO_MERGE="${URL_PREFACE}/$1/${URL_SUFFIX}"

TARGET="${URL_PREFACE}/$2/${URL_SUFFIX}"

SUPPLYING_BRANCH_REV="`svn log --stop-on-copy ${TO_MERGE} | grep 
"r[0-9]" | tail -n 1 | cut -f 1 -d '|' | cut -f 2 -d 'r' | sed 's/ //'`"

TARGET_BRANCH_REV="`svn info ${TARGET} | grep Revision | cut -f 2- -d ':' |sed 's/^[ ]*//'`"


echo "Merging ${TO_MERGE}@${SUPPLYING_BRANCH_REV} with ${TARGET}@${TARGET_BRANCH_REV}"

svn merge -r${SUPPLYING_BRANCH_REV}:${TARGET_BRANCH_REV} ${TO_MERGE}

Call this script with two parameters: first the source of the merge information and the second the target tag. You'll note, in essence, that this second script incapsulates the functionality of the first. 

Summary
When conducting merges with SVN, please consider the source revision and forks carefully (on a file-by-file basis if necessary). While other mechanisms may work to merge things without this step, combining multiple forks later will likely create unexpected conflicts and difficulty.

(*1) To me, this represents defeat on the part of the version control system. What purpose should a version control system serve if not to keep track of this very branch information for use in resolving merge scenarios?