I would like to reflect on the lessons learnt from having recently complete the migration of 60 SVN repositories to Perforce for a client.
Migrations – Tarpits of Effort
The first thing to say is that migrations can take a lot of time and effort if you aren’t doing them regularly – like we do at VIZIM
– perhaps not surprising! There are invariably differences in tools and techniques, and it takes time to plan the issues.
If there is an off-the-shelf tool which does the migration for you, then it is well worth a try, but in most cases various trade-offs will have been made within the tool – and you may not be pleased with the results. This in itself will take time and effort to test and evaluate.
If you choose to “roll your own” then be very careful – it may seem easy to get started, but there usually a lot of edge cases and issues along the way that will suck up your time and effort. How much is it worth becoming an expert in this through hard won experience, for a task that you are typically only going to do once?
Obviously as a consultant, I would say this, but consider bringing in help
Summary
For many situations SVN and P4 are very similar to each other, but there are a couple of key differences.
- The easy bits were the basic adds/edits/deletes
- SVN Revisions correspond (mostly!) straightforwardly with P4 Changelists
- branching needs some intelligence applied due to the different underlying implementations – the naive approach has considerable problems (unfortunately it’s the approach used by the official Perforce migrator) – our approach/tool was dramatically faster than the official tool.
- some SVN history (”kitchen sink revisions”) require extra care and attention due to their complexity (e.g. in the same revision the user deletes a file and then replaces it with a branched copy – or vice versa, or deletes the parent directory, or – lots of other edge cases to consider…)
SVN vs P4 Branching
SVN tags and branches are the same thing – it is just by convention they live in /tags and /branches respectively.
svndumptool -v log <path to repo>
will produce a nice history showing the type of information such as:
------------------------------------------------------------------------------------------------------------------------------------------------ r157 | autobuild | 2007-06-05T13:11:44.145570Z | 1 line Changed paths: A /tags/1.1-RC1 (from /trunk:156) Milestone tag created: 1.1-RC1
An SVN tag or branch is just a reference to a from-path/from-revision pair. In Perforce, the exact analogy is a “dynamic” label – e.g.
Label: 1.1-RC1 Revision: @156 View: //depot/trunk/...
where the Revision: field identifies the changelist.
This leads on to a major problem…
SVN O(1) branching vs P4 O(N) branching
SVN branches (or tags) are just references and can be created in constant time. In Perforce, if you branch 1,000 files, you end up with 1,000 records in db.integed table (and indeed a similar 1,000 records in db.rev as they are new revisions). There are two issues with this:
- This metadata starts to mount up over time (large db.* files can degrade performance and increase backup time)
- The time taken to create a branch is proportional to the number of files being branched – O(N) – and this can become noticeable for larger repositories and lots of users (two tables need to be locked while the branch is being created).
Imagine doing this for 10k files per branch, or 30k files, or … (servers can be locked for minutes).
Poor Performance of the Naive Approach
The simple approach is to create a full Perforce branch for every SVN tag or branch. Depending on the number of branches, this can take quite a long time.
For example, using the official Perforce Subversion migration tool for a small test repository (288 revisions) took 5 minutes 6 seconds.
Our tool took 15 seconds…! (5% of the time)
In addition, the db.* files for the naive approach were significantly larger – 4Mb vs 2.5Mb.
The more tags you have, the worse the problem. We had several repositories with thousands of tags – for a test migration using the naive approach, we killed it after 24 hours. The db.* files were nearly 20Gb at that point and the server was effectively thrashing and getting slower and slower. Our actual migration using the approach below took just over 3 hours for that repository.
Our Approach
There are a couple of reasons for the dramatic speed difference for our tool:
- we read and parse SVN dump files – they contain the revision information and the contents of the files
- we perform intelligent handling of tags automatically.
The intelligent handling of tags means that we defaulted to use Perforce “dynamic” labels as the equivalent of SVN tags (or branches) in the first instance. The tool only creates Perforce branches if a file is actually modified on the tagged branch (which can happen quite often in real life even if it is “supposed not to”!).
So the previous SVN history might also contain:
------------------------------------------------------------------------ r158 | autobuild | 2007-06-05T13:12:02.223695Z | 1 line Changed paths: M /tags/1.1-RC1/ivy.xml Recorded ivy.xml for 1.1-RC1
The problem as can see from the SVN log, is that a file is then modified on the “tag” branch. You can’t do this to a Perforce label. If you wish to replicate this type of history you must create a full Perforce branch for that tag and check the modified file in on that branch.
It is also frequently the case that tags in SVN are created and then later deleted. No problem if they correspond to the creation and deletion of labels in Perforce, but potentially very expensive with thousands of branched files. If you have a “spec” depot in Perforce then you have a full record of the label being created and deleted.
This approach has quite a few edge cases that need to be considered, including:
- branching from tags – and checking files in
- having multiple levels (not just /tags/<tagname>, but also /tags/sublevel/<tagname> etc) – how do you decide what to do?
Wrapping Up
Migration from SVN into Perforce is beneficial for many companies as the size of their repositories grows and the number of people using it. The tools are sufficiently similar to make user acceptance and training very straight forward.
However, you do need to perform full-history migrations with some care – feel free to contact me for more details. VIZIM has full history migration tools for Subversion, ClearCase and CM Synergy to Perforce.