Subversion to Perforce Migration Issues and Approaches

I would like to reflect on the lessons learnt from having recently complete the migration of 60 SVN repositories to Perforce for a client.

Migrations – Tarpits of Effort

The first thing to say is that migrations can take a lot of time and effort if you aren’t doing them regularly – like we do at VIZIM :) – perhaps not surprising! There are invariably differences in tools and techniques, and it takes time to plan the issues.

If there is an off-the-shelf tool which does the migration for you, then it is well worth a try, but in most cases various trade-offs will have been made within the tool – and you may not be pleased with the results. This in itself will take time and effort to test and evaluate.

If you choose to “roll your own” then be very careful – it may seem easy to get started, but there usually a lot of edge cases and issues along the way that will suck up your time and effort. How much is it worth becoming an expert in this through hard won experience, for a task that you are typically only going to do once?

Obviously as a consultant, I would say this, but consider bringing in help :)

Summary

For many situations SVN and P4 are very similar to each other, but there are a couple of key differences.

  • The easy bits were the basic adds/edits/deletes
  • SVN Revisions correspond (mostly!) straightforwardly with P4 Changelists
  • branching needs some intelligence applied due to the different underlying implementations – the naive approach has considerable problems (unfortunately it’s the approach used by the official Perforce migrator) – our approach/tool was dramatically faster than the official tool.
  • some SVN history (”kitchen sink revisions”) require extra care and attention due to their complexity (e.g. in the same revision the user deletes a file and then replaces it with a branched copy – or vice versa, or deletes the parent directory, or – lots of other edge cases to consider…)

SVN vs P4 Branching

SVN tags and branches are the same thing – it is just by convention they live in /tags and /branches respectively.

svndumptool -v log <path to repo>

will produce a nice history showing the type of information such as:

------------------------------------------------------------------------
------------------------------------------------------------------------ r157 | autobuild | 2007-06-05T13:11:44.145570Z | 1 line Changed paths: A /tags/1.1-RC1 (from /trunk:156) Milestone tag created: 1.1-RC1

An SVN tag or branch is just a reference to a from-path/from-revision pair. In Perforce, the exact analogy is a “dynamic” label – e.g.

Label:  1.1-RC1

Revision:   @156

View:
    //depot/trunk/...

where the Revision: field identifies the changelist.

This leads on to a major problem…

SVN O(1) branching vs P4 O(N) branching

SVN branches (or tags) are just references and can be created in constant time. In Perforce, if you branch 1,000 files, you end up with 1,000 records in db.integed table (and indeed a similar 1,000 records in db.rev as they are new revisions). There are two issues with this:

  • This metadata starts to mount up over time (large db.* files can degrade performance and increase backup time)
  • The time taken to create a branch is proportional to the number of files being branched – O(N) – and this can become noticeable for larger repositories and lots of users (two tables need to be locked while the branch is being created).

Imagine doing this for 10k files per branch, or 30k files, or … (servers can be locked for minutes).

Poor Performance of the Naive Approach

The simple approach is to create a full Perforce branch for every SVN tag or branch. Depending on the number of branches, this can take quite a long time.

For example, using the official Perforce Subversion migration tool for a small test repository (288 revisions) took 5 minutes 6 seconds.

Our tool took 15 seconds…! (5% of the time)

In addition, the db.* files for the naive approach were significantly larger – 4Mb vs 2.5Mb.

The more tags you have, the worse the problem. We had several repositories with thousands of tags – for a test migration using the naive approach, we killed it after 24 hours. The db.* files were nearly 20Gb at that point and the server was effectively thrashing and getting slower and slower. Our actual migration using the approach below took just over 3 hours for that repository.

Our Approach

There are a couple of reasons for the dramatic speed difference for our tool:

  • we read and parse SVN dump files – they contain the revision information and the contents of the files
  • we perform intelligent handling of tags automatically.

The intelligent handling of tags means that we defaulted to use Perforce “dynamic” labels as the equivalent of SVN tags (or branches) in the first instance. The tool only creates Perforce branches if a file is actually modified on the tagged branch (which can happen quite often in real life even if it is “supposed not to”!).
So the previous SVN history might also contain:

------------------------------------------------------------------------
r158 | autobuild | 2007-06-05T13:12:02.223695Z | 1 line
Changed paths:
   M /tags/1.1-RC1/ivy.xml

Recorded ivy.xml for 1.1-RC1

The problem as can see from the SVN log, is that a file is then modified on the “tag” branch. You can’t do this to a Perforce label. If you wish to replicate this type of history you must create a full Perforce branch for that tag and check the modified file in on that branch.

It is also frequently the case that tags in SVN are created and then later deleted. No problem if they correspond to the creation and deletion of labels in Perforce, but potentially very expensive with thousands of branched files. If you have a “spec” depot in Perforce then you have a full record of the label being created and deleted.

This approach has quite a few edge cases that need to be considered, including:

  • branching from tags – and checking files in
  • having multiple levels (not just /tags/<tagname>, but also /tags/sublevel/<tagname> etc) – how do you decide what to do?

Wrapping Up

Migration from SVN into Perforce is beneficial for many companies as the size of their repositories grows and the number of people using it. The tools are sufficiently similar to make user acceptance and training very straight forward.

However, you do need to perform full-history migrations with some care – feel free to contact me for more details. VIZIM has full history migration tools for Subversion, ClearCase and CM Synergy to Perforce.

http://www.robertcowham.com/blog/wp-content/plugins/sociofluid/images/digg_48.png http://www.robertcowham.com/blog/wp-content/plugins/sociofluid/images/reddit_48.png http://www.robertcowham.com/blog/wp-content/plugins/sociofluid/images/stumbleupon_48.png http://www.robertcowham.com/blog/wp-content/plugins/sociofluid/images/technorati_48.png
  • Great post !!
    Very soon there will be a war room where we will be pitting Perforce against subversion(also being used by a good chunk of ppl). Sure perforce would win hands down. I will be discussing about svn->p4 migration. This post would be of immense help to me.
    I did not understand what you meant by "kitchen sink revisions". google apparently failed me :-(..
  • robertcowham
    Glad you enjoyed it. The term comes from "everything but the kitchen sink". In this case I am referring to revisions which mix deletions, file merges, adds, everything. The big problem is that you can find in the same revision a delete of a file, then a copy of the file from another branch, and then perhaps even a delete of the directory which contains the file!!
  • bgabrhelik
    Robert, nice article. I didn't get what is your real approach. Did you code your own utility or script? ...Or did you configured the official migration tool so it recognizes Labels? It seems like a huge project just for migration.

    We will have to migrate next year to p4, but we use extensively svn:externals for bringing of dependencies into main projects (including sources). The svn:externals refer tag of the other projects in the same repository. We are little bit svn-centric. It seems that svn:externals will have to be replaced before migration by some dependency resolver like Ivy or Gradle, but we are not just Java, but also C/C++ with different platform dependencies. Do you have any idea? Copy label into dependent project is bad idea I thing as it has its own life after that and you can hardly recognize the origin.

    Thanks,
    Bronislav
  • Vladimir-Mihai Pacuraru
    I was trying to mimic svn:externals by the means of client view mapping, but I'm under the impression that I cannot map a depot path twice in the same client, under two different places. Am I wrong or right to do that?
  • robertcowham
    Hi Bronislav

    Yes we coded our own tool. The basics already existed before the Perforce tool, and we realised the Perforce one wouldn't fly anyway, so carried on.

    Robert
blog comments powered by Disqus