Directory renaming in SCM

SCM stands for Source Code Management. Pretty much the same thing can be called VCS, Version Control Software. Perhaps even more TLA’s are there out in the wild. It all boils down to a program which allows programmers to manage their source code.

Pretty much everybody who started using SCM, started with CVS and then moved to something else. Probably Subversion, which is meant to be a CVS replacement. For more adventurous or demanding developers, there are many other SCM’s: Git, Bazaar, Monotone, Mercurial, Darcs… and more.

Mark Shuttleworth has written an interesting thing: that file and directory renaming is one of the most important operations to be handled with an SCM. I got curious and wrote a test case for three SCM’s I know: Bazaar, Git and Subversion. The scenario is:

1. A project is created, with one directory and one file in it.

`-- project
    `-- dir-a
        `-- foo.txt

2. Developer A creates a branch and renames the directory

`-- project
    `-- dir-b
        `-- foo.txt

3. At the same time, developer B creates a branch, modifies the file
and adds another file to the directory.

`-- project
    `-- dir-a
        |-- bar.txt
        `-- foo.txt

4. Developer C creates a branch, merges A’s changes first and then B’s
changes. What I would expect as the result after two merges, is one
directory with two files in it.

`-- project
    `-- dir-b
        |-- bar.txt
        `-- foo.txt

From the three tested SCM’s only Bazaar and Darcs got it right. Git would leave two directories (dir-a and dir-b), and Subversion would discard the bar.txt file altogether.

I wrote test case for the three SCM’s I’m familiar with. If you feel like writing a test case for your SCM, I will be glad to see it! Meanwhile, you can download the test cases and run them:

They are written as Bash scripts and are tested under Linux. README file is included.

UPDATE: Dennis Lambe added Darcs test case. I am pleased to tell you that Darcs handles the directory rename in the correct way: after merge, there is one directory with two files. Dennis’ file is currently included in the tarball.

UPDATE: I’ve noticed a short conversation about this issue on #git IRC channel. No conclusions were drawn.

UPDATE, 2007-09-10: Paul wrote Mercurial test case, now included in the tar.gz archive.

Author: automatthias

You won't believe what a skeptic I am.

17 thoughts on “Directory renaming in SCM”

  1. It seems that Mercurial can also pass this test. I modified the script to try it out. The only thing of note is that after both branch-a / branch-b are pulled the 2 heads need to be merged which didn’t have any issues.

  2. m–s:
    sooo… the reality is: darcs is not yet ready for production use (but I really REALLY do hope it will be)

    Darcs 2 came out and largely alleviated those bugs. As it stands, only windows compatibility is really a problem if you hate cygwin.

  3. Clarification — Darcs2 is _going_ to come out. It is currently out as a release candidate, but still undergoing testing and some development.

    I use the current darcs on smaller projects all the time and enjoy it greatly. I’m looking forward to being able to recommend darcs2 for projects of all sizes.

  4. I realize this is old, but I went ahead and wrote a test for Monotone:

    Monotone 0.38 seems to do the right thing here. I also went out of my way to do extra work to more closely simulate multiple developers working off of multiple DBs, which is probably unnecessary (I think you could simulate this and see the same results using a single DB).

  5. Darcs2 still has a ton of problems. Including ridiculous RAM usage and O(something-large) algorithms.

    I have a darcs repo with 1900-ish patches in it and I want to move to git. However, tailor/darcs2git/etc need to be able to perform a ‘darcs pull’ one patch at a time. Darcs2 (or darcs 1) cannot even complete pulling the first patch in the repo. I have a machine with 4GB RAM and darcs runs out of memory.

    I cannot run annotate on files in my repo either, which is part of the reason I want to dump it – my history has become useless. And unless I can get my history into something else, it’s all gone forever. (or until someone can lend me a machine with much more RAM than I have!)

    Darcs is not suitable for production use if you care about your history. Learn git/hg/bzr/monotone – they are all infinitely more reliable, albeit not quite as simple to use.

    Apologies for the rant.

  6. Trying this with git now (v1.6.0), it produced a merge conflict.

    From ../a
    * branch master -> FETCH_HEAD
    error: Untracked working tree file ‘dirc/bar.txt’ would be overwritten by merge.
    fatal: merging of trees ef348a6d2cc53769cd511d2328da08b94b27d928 and 19bf81ec83d14c2b5151fc89ee5f470b3a2ed88b failed
    Merge with strategy recursive failed.

  7. I can see how Git might be viewed as doing the wrong thing here. I can also see how your expected outcome would be confusing to others, even downright harmful.

    Since Git tracks /content/, it only tracks files, not directories. Directories only exist as part of a file’s path, as far as Git is concerned. So when you do “git mv dir-a dir-b” to move/rename a directory, you’re just changing the paths of all the files contained in dir-a. The fact that there’s nothing left in dir-a after this operation makes no difference to Git. That is, if you had done “mkdir dir-b && git mv dir-a/* dir-b/ && rmdir dir-a” the result would have been the same.

    Now, in your example, you merge a branch that moved all files in dir-a to dir-b, you then merge a branch that created a new file in dir-a before the move had happened. It should seem obvious now that the way Git handles this is quite logical. It does not move the new file in dir-a over to dir-b automatically. It would be potentially dangerous to have Git do that by default. How can Git know that it won’t break anything? It can’t know whether the author of the file planned for it to be moved freely. The author may have made changes in other files that expect this file to be where he put it, etc. For this reason I think Git should definitely not move files around like that. In DVCSs where directories are tracked specially (not the case with Git, don’t know about Bazaar or Darcs), this situation should be considered a conflict to be resolved, for the same reasons.

  8. Your git version of the test indeed produce the unexpected result of having 2
    directories instead of one, dir-b.
    While this is true, as far as the script is concerned, this doesn’t
    mean this is the way git is meant to be used with multiple developers.
    I don’t know what about other SCM, but what i instantly recognized is that the
    testcase doesn’t mimic at all a multi-dev workflow, but instead it’s mimic-ing 3
    developers working on the very same *local repository* and this can be very well
    the culprit for the “wrong” result.
    In fact, having 3 devs and everyone of them working on their own local repo does
    not produce the problem, but instead, in the case “Developer A” have not pushed
    the rename change yet (or “Developer B” doesn’t update his repo), “Developer B”
    will be denied to push his changes upstream and he’ll need to resolve them.
    As you may know these are “non-fast-fordward pushes” and are there just to prevent
    any attempt at overwriting others’ changes: of course git offers you the way to
    force it too, but that should be avoided.

  9. * reformulating the “non-fast-fordward push” case *

    dev-b get a non-fast-forward push condition in the case dev-a pushed the rename change but dev-b pulled from the origin previously than dev-a completed the its push: in that case, a pull/fetch request by dev-b is needed to bring in the remote commit and notice the rename thing.

  10. It’s been two years. If you’re so inclined, I would love to see an updated set of results for this with the modern versions of these systems.

  11. Veracity2 seems to make it right, though it’s GUI is rather limited and i ain’t sure i repeated it exactly. I had to commit between merges and had C to make extra file to allow commit to allow new branch creating

  12. So it’s now five years later, and it looks like git+github has almost won the DVCS war, with mercurial in second place, and bazaar a distant third. The others are so far behind they are not worth considering. Git still can’t do directory renames right (understanding git well enough to expect it to get it wrong doesn’t make it right), but mercurial can (which surprised me). I was on the verge of deciding to go with the flow and switch to git, but directory renames are important enough for me to stick with mercurial (which can push/pull directly from git repo’s anyway).

Comments are closed.