Entries Tagged 'SCM' ↓

Review of Pratical Perforce (Part 1)

This is a partial review of Practical Perforce by Laura Wingerd, published by O’Reilly (ISBN 0-596-10185-6). The reason it is partial is that I intend to comment in more detail in future blog articles on some parts of the book, and wanted to post this without waiting for the whole thing!

As Laura mentions in the preface, the book is not intended for complete beginners, but more for readers with experience in other SCM (software configuration management) tools who are looking to understand how Perforce works.

To quote the introduction, there are two parts to this book:

  • Part I (Chapters 1-6) is a whirlwind technical tour of Perforce commands and concepts. It’s not a tutorial, nor a reference, but helpful nonetheless.
  • Part II (Chapters 7-11) describes the big picture, using Perforce in a collaborative software development environment. It is particularly strong on branching patterns, how to structure codelines and tips and tricks in this area.

The real meat of the book for most Perforce sites is thus Part II, but there are definitely some goodies in Part I.

Chapter 1 presents some fundamentals about Perforce syntax and concepts. The diagrams on pages 6 &7 explain the relationship between revisions and changelists very well.

In Chapter 2, Laura discusses client workspaces and things like view syntax. She also describes basic check outs (open for edit in Perforce command line parlance), and introduces branching when she refers to cloning of files. She includes concepts of renaming and replacing content in files, reconciling changes made offline, and even introduces a couple of bits of undocumented syntax such as “p4 files @=1452″. Quite a chunk of information in this chapter.

Resolving and Merging are the subject of chapter 3 and includes some very useful diagrams showing various scenarios. If you have ever had any questions about 3-way merging in Perforce – read this! On pp68-69 she gives examples of reconciling changes you have made to a file someone else renamed using the undocumented merge3 command – interesting if a touch esoteric (also referred to in “How to undo a merge” on p80. The recommendation on p74 to sync and resolve one changelist at a time is certainly worth considering, although I think it will depend on your environment as to how necessary that is.

The basics of branching are covered in chapter 4 including initial scenarios and how to track merge requirements across branches. She makes quite a lot of use of the interchanges command (not yet exposed in the GUIs) and explains the gory details of “yours”, “theirs” and “base” nicely. Her approach of using filespec integrations for the initial examples is nice and simple, but I suspect more people are likely to use branch specs in real life. On p111 she gives a useful couple of commands to show how to find which changes have been merged in (more likely to be automated in scripts for most sites I would expect). Other subjects ocvered include all the gory details of what integrate actually does, as well as some very useful details as to what the interchanges command can tell us, particular with respect to cherry picked integrations.

Chapter 5 is quite short on labels and jobs and shows all the basics. A quick note on the final section where a job is used as a reference for a changelist – as of release 2005.2 there is an undocumented “dynamic label” option where a label can have an attached revision which probably makes the job trick unnecessary.

Chapter 6 gets into the subject of remote depots and proxies and also mentions the very useful spec depot option (automatic versioning of all spec objects). There is also some good advice on using p4web in browse mode to access your repository. The section on triggers and automation is a little light, but understandable.

Part II starts with Chapter 7 “How software evolves”. This chapter is perhaps the highlight of the book, and introduces concepts that are totally independent of Perforce and apply to many SCM tools. Fortunately the chapter is currently available as a free PDF document from the O’Reilly website for the book. A firm understanding of the concepts introduced here will make it much easier for you to come up with suitable branching patterns for use in your organisations, and also, perhaps more importantly, give you some incredibly useful concepts for explaining your structure to other people within the organisation. Most SCM problems are due to poor communication rather than poor tools, or poor ideas. Laura relates the problems in the real world prevent us from an overly simplistic ideal world, and yet how some simple concepts allow us to manage this real world complexity. The “flow of change” and the “tofu scale” are classic concepts which should be in everyone’s SCM vocabulary.

Summary

I am going to stop this post here, and will get to further chapters and some detailed comments on them as I have time.

But I will finish with the recommendation - buy this book!!

BCS CMSG News and Events for 2006

Tools Fair on 15th June – Cradle to Grave Support

As Chair of the BCS CMSG now (Shirley Lacy passed on the baton in November but fortunately for me remains in the background as vice chair), I have been feeling a little concerned about responsibility for the BCS CMSG Tools Fair – Cradle to Grave Support on 15th June. I am relieved that things are looking good now with good selection of sponsors supporting us:

Gold sponsors

  • Marval
  • Frontrange
  • Touchpaper
  • Square Mile Systems
  • MKS
  • Serena
  • Aldon
Standard Sponsors

  • Perforce
  • Telelogic
  • Axios
  • Accurev
  • Unicom
  • SpectrumSCM

Some newish names to me not being hugely up on the service management/ITIL side, but change, configuration and release management are a major focus in that field so look forward to meeting and hearing about them and the issues and solutions available.

A couple of big names missing on the software side which is a shame. A couple of organisations undergoing a re-org and thus no marketing budget to spend (well not now anyway). IBM/Rational aren’t there because I can’t find anyone to talk to that would appear to be able to make a decision. We used to have good relations with Rational but since they were acquired by IBM can’t get anything out of them (I wonder what their responsiveness is if I were looking for support or trying to buy a produce?!). If anyone knows who to contact in the UK on marketing/events I’d be grateful for a heads up. The other one I have failed to get to anyone appropriate is Microsoft – would be good to find out about Team System etc, but all enquiries get passed from pillar to post and a deafening silence is the result.

Show me the money – identifying the value of Configuration Management

The other event we have on the 27th April is an evening event which is a relatively new departure for us. This will be at London South Bank University in their conference centre (which we have used before), and is in the form of a workshop lead by David Cuthbertson. This is going to focus on selling the benefits of CM and we hope will be a useful way of bringing together both new and more experienced people in the field to share experiences. Details to go up on the web site very shortly! David has spoken several times in the past and always gotten excellent reviews. He is also skilled at running workshops and getting contributions from others present. Should be a great evening. Apologies to those not close enough to London who want to attend, but we are looking at running (in particular evening) events elsewhere in the country – get in touch if you have any ideas.

Perforce Automatic Merging

Updated: 2005-11-29 – see link to scripts.
Updated: 2006-03-16 – some clarifications and link to Miki Tebeka’s scripts

A common branching pattern is to have mainline and then task branches where work is done and then “published” by merging to the mainline as shown in Figure 3 of the article Building for Success.

In perforce the “publish” and the “catchup” are both performed by using the integrate command, typically with a branch spec. For example, you might have a branch spec

Branch:	task/fred
View:
	//depot/main/... //depot/task/fred/...

The key thing is that Fred should “catchup” before doing the “publish”. This is so that the more risky merging is done in his task branch and not in the mainline. When doing the merge Fred should be bringing all changes by other people in the project into his branch and with the publish he should just be able to copy his code into the mainline.

There are several ways to achieve this:

  • Education – tell Fred what to do and rely on him doing it – so what happens if he doesn’t – can you “persuade” him not to?!
  • Don’t give Fred write access to the mainline (e.g. via protections or a trigger), and instead have the integration team do it. The problem then being that the integration team may not know the code as well as Fred and are perhaps more likely to make a mistake.

The basic way of detecting in Perforce whether a catchup has been done is to do a preview integrate from main to task/fred and check that nothing needs to be done, so check for no results from:

p4 integrate -n -b task/fred

If the above produces no results then proceed to do the publish:

p4 integrate -r -b task/fred
p4 resolve -as

The key step is that the “resolve -as” (safe automatic merge), resolves as many files as it can. It looks to see if their are only “theirs” (source) diff chunks or “yours” (target) diff chunks and will select the theirs or yours file appropriately. (Note that “theirs” and “both” or “yours” and “both” are also processed in the same way). The key point is that if there are “theirs” and “yours” (or “conflict”) diff chunks, then safe automatic resolve will not process that file.

Of course having done the automatic resolve and with the changed files sitting in our client workspace it is usually a good idea to do things like a build and smoke test – there’s not a whole heap of trust otherwise…

Thus, in our script we can check for anything not safely resolved automatically (resolve -n shows what still needs to be resolved).

p4 integrate -r -b task/fred
p4 resolve -as
p4 resolve -n
if any results from above command then exit with error

build
if any problems then exit with error

run smoke tests
if any problems then exit with error

If no errors at this point we are ready to submit.

There are some extra wrinkles to this:

  • the “resolve -as” may validly not work if you have done a merge with edit during the catchup (”edit from” in the terminology from p4 integrated). You need to detect such situations and do a “resolve -at” to copy them over.
  • as soon as one person has done a publish then all other branches will require to do a catchup to pull it in – this means publishes are going to become serial
  • there is a window during which the publish is being performed when someone else might sneak in and do a publish thus meaning you need to do a catchup – consider a simple “locking” strategy to prevent this.

Note that the build and smoke test steps need to take an appropriate amount of time. If it takes hours to do them then the process is unlikely to work. Thus a few minutes or tens of minutes is likely to be the limit – this may mean cutting down on the number of tests that are executed – but that is usually OK. Apply judgement!

Although at the last point you might think it would be a brave person to do the submit automatically in a script! It’s probably a good idea to do things like run diffs and give a visual once over before finally checking in, but it should be a pretty easy decision at this point.

In terms of automating the above, I can heartily recommend the various scripting languages with built-in calls to the Perforce API: Ruby, Python and Perl. Getting the results of a command is easy, and exceptions allow you to make error handling pretty easy as well.

Very brief examples of scripts designed to perform the above check in Python and Ruby are now available online. Note they are designed to be run from Custom Tools menu in p4v/p4win. Please note that Miki Tebeka has kindly published more production quality script implementations in python including a GUI front end. He also includes an example of an installer to automate installation of tools for p4win and p4v – check them out. Thanks Miki!

p.s. The above works for any codeline for which you have responsibility for accepting changes – it doesn’t have to be a mainline – it can be a subsidiary integration line with third parties contributing, or team members “proposing” changes. You could automate the script and put it on an intranet page – let people try it out, and if successful then have changelists auto-checked in.

Fast Perforce Checkpointing

There was an interesting discussion not too long back on how to do fast check-pointing for your server.

The basic procedure with check-pointing for backups is shown in the System Administrator’s Guide. For larger sites it can take tens of minutes, getting up to hours sometimes, which becomes inconvenient if you have limited windows for backup due to people working in different time-zones etc.

As an aside on the -z option to zip a checkpoint while backing up – it is worth checking for your server hardware the performance of the CPU overhead of zipping vs. the writing to disk of the checkpoint. Thus in some circumstances it might be worth check-pointing and zipping as you go, and in others zipping offline.

With thanks to Chris Bartz who posted it to the Perforce User mailing list in such a well documented fashion:

Okay, here are the gory details. I can’t take credit for inventing it; Perforce tech support gave me most of the details and I’m pretty sure others are doing very similar things. To bootstrap the process you need to create an offline database. This is done by:

1) use “p4 counter journal” to get journal counter value. The checkpoint name will be checkpoint.<journal counter+1>.
2) “p4 admin checkpoint” (or “p4d -jc” if you prefer)
3) Optional. Zip and backup the truncated journal file
4) Delete old offline database db.* files
5) Build offline database with “p4d -r <offlineDir> -jr <checkpoint>”
6) Zip and backup checkpoint

We do the above steps once a week so that we start each week with a fresh offline database. We currently keep all the journals between rebuilding the offline database so we could recover from a real checkpoint plus journal files if there was some problem with the offline database.
Rebuilding and keeping all the journals in between isn’t really required but when I set it up I wasn’t 100% confident in the whole process. If I were making other changes to the process I would probably go with once a month rebuilds and maybe not keep all the journals.

The offline checkpoint is done daily with:

1) use “p4 counter journal” to get value of journal counter
2) Truncate journal file with “p4d -r <root> -jj <journal filename>” This creates a files <journal filename>.jnl.<journal counter> and starts a new journal file
3) Read truncated journal into offline database with “p4d -r <offline root> -jr <journal filename>.jnl.<journal counter>”
4) Optional. Zip and backup journal file
5) Checkpoint offline database with “p4d -r <offline root> -jd <checkpoint>.<journal counter + 1>”. The journal file + 1 is so it has the same name as perforce would give it if we checkpointed the live database.
6) Optional. Zip and backup checkpoint
7) Optional. Delete old checkpoints and journals (we keep all journals between rebuilding the offline database and 3 checkpoints).

When this is done we have a checkpoint and journal file that should be exactly the same as if we did the “p4 admin checkpoint” on the live database. There is essentially zero downtime (except the weekly rebuild).

The offline database could be on another machine and the checkpoint could be done there if disk space or processing power were an issue.

The depot files are backed up after this process is done. We do not shut perforce down for that backup. You really don’t need to; what perforce does to handle this is simple and does work.

Another note on the final remark – imagine the metadata (as restored from the checkpoint + journal) has information about 9 revisions of a file, and due to the backup having happened a little time after the checkpoint (and journal being a little out of date), and yet the RCS format archive file actually contains 10 revisions. The server will carry on fine. Obviously the opposite is not true (metadata has 10 revisions and archive file only 9). In both cases, if there is some inconsistency you will potentially lose some work, but most recent activity will be stored in people’s workspaces (and they may remember any changelists they have recently submitted).

Thus your disaster recovery scenario needs to include what happens when you get your server back online and what people need to do (e.g. Tech Note 2 – Working Disconnected). Sally Page of Symbian gave an excellent presentation to the UK User Group on Symbian’s DR experience and lessons learnt.

Perforce Recovery Story (and tips for processing seriously large journal files)

The Problem

I was recently at a client site looking at sorting out a perforce repository which had some database errors (due to disk problems on the NAS server). A quick look at the server log showed entries like:

Perforce server error:
	Date 2005/12/01 10:03:12:
	Operation: user-fstat
	Operation 'dbscan' failed.
	Database scan error on db.have!
	dbscan: db.have: Cannot create a file when that file already exists.
	Corrupt tree

The Solution

The easiest solution initially seemed to be to restore from a previous checkpoint and the current journal. (I started by copying all the db.* and journal files before starting to make any changes.)

Then things started becoming more complicated. The journal file turned out to be 9Gb in size (and yes that meant they hadn’t done a checkpoint for a looong time!).

I tried the journal recovery with high hopes (after removing db.* files in that directory), but…

E:\Perforce>p4d -r . -jr g:\backup.ckp.51 journal
Recovering from g:\backup.ckp.51...
Recovering from journal...
Perforce server error:
        Journal file 'journal' replay failed at line 42327877!
        Bad opcode '' journal record!

Note the size of the journal file (43M lines!):

E:\Perforce>time /t && wc journal && time /t
11:58
43,611,160 398236488 9536756331 journal
12:01

Fortunately I had installed some Unix utilities for Windows (from unxutils.sf.net and yes that is unx not unix) including wc and as it later turned out, sed.

Trying to edit a 9Gb with Notepad or Write (all that were available on a Windows 2003 Server) was not possible. I couldn’t find any other easily downloadable editor capable of such feats and I was wondering how I could do anything sensible with this journal file. But then I realised I had sed available to me, and it was then fairly easy to start identifying the problem and set about resolving it.

Identifying and Fixing The Corrupted Lines

To print out the block of lines around the error:

E:\perforce>sed -n -e "42327850,42327900{p;}" journal > extract.txt

This showed the some spaces or other strange characters on the start of line 42,327,877 (lines chopped for brevity):

@pv@ 1 @db.have@ @//FSmith/x-platform/packages/doc/html-tool/spur.gif@ ...
                     @rv@ 3 @db.user@ @bjones@ @bjones@@somecompany.com@ @@ 1121675402 ...
@rv@ 3 @db.domain@ @UK-A7993@ 99 @UK-A7993@ @d:\dev@ @@ @@ @fredb@ 1125593143 ...

(the problem line is the second with @rv@). I was able to �fix� this line with the following (all on one line):

time /t &&
sed -n -e "1,42327876{p;};42327877,42327877{s/^[^@]*//;p;};42327878,${p;}" journal >
new.journal && time /t

The sed command just prints all lines except for the offending line on which it runs a regular expression removing the unwanted chars at the start of the line (it turns out they weren’t just spaces either). This created new.journal with the offending line fixed (�time/t� just shows current time � took 5-6 minutes to process 9Gb file).

As it later turned out, this new journal file still had some problems since it appeared to be out of order with respect to the latest checkpoint file (shown by any record for the db.counter value for journal not being correct). As a result, I started to lose confidence in the reliability of the journal file at all.

So in the end I took a different tack using the “undocumented” (see “p4 help undoc”) commands p4d -xv/-xr to both validate the various database tables (db.*) and then to recover them. There only appeared to be an error in db.have table which is not that worrying (it is a list of all files synced to client workspaces and thus can be reset by the users in the last resort).

Validating db.review
Validating db.have
Problems Summary:
pages which are not connect to tree or freelist
Validating db.label
Validating db.integ

(The -xr option just fixed things).

And The Moral of the Story is…

Well there are potentially lots of morals here, but a selection is:

  1. Keep your journal file on a different disk (volume) to your database (db.*) files if you can to avoid a single disk problem corrupting both.
  2. Do regular checkpoints! (Once a week probably bare minimum, though usually once every 24 hours is ideal). There are various mechanisms for dealing with large databases and if checkpoint times become a burden (e.g. many tens of minutes).
  3. When dealing with large files, don’t forget those unix tools such as sed which are always there and very powerful – also easy to install on Windows.
  4. Remember to talk to support since they will know about relevant undocumented or otherwise commands (in some circumstances they have “fixed” a checkpoint or journal file using internal tools and resent it to the client). At the very least they will act as a sounding board and confirm that what you are planning to do makes sense – always worth doing given that you are often dealing with the “crown jewels” of a company’s intellectual property and also that commands often take a reasonable amount of time to run – hours can flit by unnoticed (well unless you are holding up a project team when every minute is begrudged).
  5. Consider disaster recovery up front (all part of business continuity – look at ITIL/BS15000 for some ideas on this). Spend an appropriate amount of money on your server and disks (RAID etc) to try and avoid these errors in the first place. However, Murphy’s law is always lurking and it is often the little things that catch you out (e.g. air conditioning dies and server then dies). Thus you need the backup strategies (checkpointing etc) in place as appropriate.

Does your tool “Own the World”?

There was a column a few months back on CMCrossroads on tool selecting, and these thoughts escaped our Agile SCM column for various reasons. Meanwhile, Brad has commented on his blog, so I thought I would weigh in too!

Agile SCM requires minimal disruption of flow to developer�s lives, and thus tools that help, not hinder, this process. The right processes are obviously key to effective and efficient development and it would appear that if you have a good process, then the more you can enforce it the safer life will be. However, as we wrote inThe Illusion of Control, too much safety leads to much reduced productivity.

As Joel Spolsky writes regarding defect trackers and enforcing process:

Historically, I am opposed to custom fields in principle, because they get abused. People add so many fields to their bug databases to capture everything they think might be important that entering a bug is like applying to Harvard. End result: people don’t enter bugs, which is much, much worse than not capturing all that information.

Best of Breed vs Integrated Suite?

This is a classic conundrum and there will always be good arguments for both sides. Indeed it is not possible to come down on one side or the other without knowing the details of a particular organization and all the nitty-gritty requirements.

However, let�s consider how some ideas might help us in our decision making process.

Who Owns the World?

Many tools make the mistake of thinking that they own the world � well at least the environment that they are going to run in. The user (developer) is going to be safely cocooned inside the wonderfully productive environment of the tool and never going to have to leave for the big bad world outside. Thus the vendors make little attempt to provide any interface to the outside world.

Now obviously considering that some things are �beyond the pale� has some advantages. Unfortunately the disadvantages are often considerable.

Over the years many vendors have come up with wonderful development paradigms, development environments, 4th generation languages and similar which were indeed very productive. Often these environments included (very) rudimentary version control. The problem is that they implemented it badly (and often unreliably) and provided no hooks to external SCM tools which were used elsewhere in the organization.

This is a bit like the XP principle of YAGNI and refactoring as opposed to big upfront design. As Martin Fowler writes in Is Design Dead?:

People aren’t good at anticipating, so it’s best to strive for simplicity. However, people won’t get the simplest thing first time, so you need to refactor in order get closer to the goal.

Thus an attempt to anticipate everything developers are likely to need would seem to be rather difficult.

The rise of Eclipse which is making the basic environment a commodity with a good design for plugins and extensibility is perhaps a result of this worldview. Companies such as Borland are now repositioning previously standalone components such as J-Builder as add-ons to Eclipse rather than as competition.

This is somewhat similar to the arguments for performing builds using a general purpose language. The grand daddy of them all, Make, has its own arcane syntax and issues which have evolved and been tweaked and extended to try and solve the ever expanding set of problems posed by building systems.

In Martin Fowler experiences with Rake (a Ruby based build tool) he suggests:

The fact that rake is an internal DSL [domain specific language] for a general purpose language is a very important difference between it and [make and ant]. It essentially allows me to use the full power of ruby any time I need it, at the cost of having to do a few odd looking things to ensure the rake scripts are valid ruby. [�] Furthermore since ruby is a full blown language, I don’t need to drop out of the DSL to do interesting things – which has been a regular frustration using make and ant. Indeed I’ve come to view that a build language is really ideally suited to an internal DSL because you do need that full language power just often enough to make it worthwhile – and you don’t get many non-programmers writing build scripts.

My personal current favourite for such tools is Scons (written in Python) which has a similar approach that I have found to be very powerful.

Advantages of Integrated Suites

Of course integrated suites can be a big win if your problem domain is sufficiently close to what the designers had in mind. In addition, if the suite and its existing processes can be tailored easily to your requirements, then of course you should rate it highly in your selection criteria.

But you need to be careful. There�s an awful lot of �shelfware� out there consisting of tools sold as a �silver bullet� and never really used. The psychology seeming to be that if you pay enough money then the problem will be taken away from you.

Wolf Suites In Sheep�s Clothing

Then we have the category of suites that purport to be integrated but under the covers all is not what it first seemed. The classic example is of different tools which a company acquires by buying the original vendor, and with a lick of paint and a wave of the magic wand over the marketing materials suddenly has an �integrated suite�. The challenges of integrating tools not designed to do so are considerable and it can take many years for good integration to happen (if it ever does).

Customizability

Is also a two-edged sword in that you can spend all your time customizing the tool and not enough actually doing the work.

As has been noted by various people on CMCrossroads, workflows implemented by scripting can have a cycle of �script a little, test a little, repeat�. This would suggest that tools which offer the ability to design workflows graphically are immune from these problems. However, that is often not the case, and has been pointed out before, some of these workflow designers are in fact very difficult both to change control and to debug. A complicated workflow is an inherently hard problem to both comprehend and manage (which does point towards keeping it as simple as possible).

The key area of customizability is that of being able to link the tool (or suite) to the external world in the form of other tools. Thus providing sensible hooks to allow third party tools to link in would seem ideal. And yet this can be rather difficult to do in practice, and thus gets pushed to the back of the queue by the vendor. In addition, I suspect there is often a business rationale that they think if they don’t make it easy to link to external tools then the user will be forced to stick with the vendor’s tool.

It’s not quite on topic, but I could resist commenting on a classic example of third party hooks that don’t work very well – Microsoft’s SCC integration to Visual Studio .Net. This is based on a lowest common denominator API which used to be released under NDA but is now relatively freely available. It worked reasonably well with Visual Studio 6, but was totally re-implemented for Visual Studio .Net and rather badly it seems.

Conclusion

Thus I would personally tend to come out on the side of linking best of breed tools together rather than going for an integrated suite, since my experience is that integrated suites don’t try hard enough to provide clean interfaces to the outside world. Of course, some best of breed tools make integrations with other tools a bit like teaching a pig to sing – “frustrates you and annoys the hell out of the pig”!

So the real answer is that every potential customer for SCM tools needs to draw up their own requirements and evaluate against them. Delaying commitment by avoiding lock-in is a very valuable feature, but it doesn’t always pay off as it should perhaps in theory!

Do not turn brain off when choosing tool! An ounce of requirements analysis and evaluation is worth a ton of tweaking down the track.

Sparse Branching in Perforce

Perforce uses inter-file branching (branching in pathspace) and the standard branching model is to branch ahead of time and branch all the files you need.

For example, you branch all the files //depot/main/jam/… to //depot/dev/robert/my-task/jam/… and then update the files independently. Thus if you have 100 files in the original project you get all 100 in the branch as well. Behind the scenes this is an efficient operation in that Perforce only updates its meta data to make the 100 “copied” files appear to be there – the server uses “links” between the copied files and the original ones (via entries in the metadata database) so it doesn’t duplicate what it doesn’t need to.

The ClearCase model would be to setup a config spec which states that if you change any files then they are branched behind the scenes. Thus if you only change 5 files on the branch, then only those files are actually branched.

I categorise the CC model as sparse branching and the Perforce model as inclusive branching. See conclusion for pros and cons of each approach.

Normal Perforce Branching Model

Branch:	robert/my-task-inclusive
View:
	//depot/main/jam/... //depot/dev/robert/my-task-inclusive/jam/...

We branch the files like so:

C:\work\robert-my-task>p4 integ -b robert/my-task-inclusive
//depot/dev/robert/my-task-inclusive/Build-new.com#1 - branch/sync from //depot/main/jam/Build-new.com#1
//depot/dev/robert/my-task-inclusive/Build.com#1 - branch/sync from //depot/main/jam/Build.com#1,#9
etc...

C:\work\robert-my-task>p4 submit

which “copies” the 100 or so files in Jam to the new branch.

Our client workspace to work on the branch is very simple:

Client:	robert-ws-incl
View:
    //depot/dev/robert/my-task-inclusive/jam/... //robert-ws-incl/jam/...

So now we can work on the files on the branch and integrate changes to and from main in the normal manner (see below for integrating back to main).

Sparse Branching in Perforce

A new Perforce server feature makes this easier than it used to be.

Let’s create a branch spec and in the branch spec we branch only the files we know we are going to change, in this case one file at a time in the view:

Branch:	dev/robert/my-task-sparse
View:
	//depot/main/jam/jam.c //depot/dev/robert/my-task-sparse/jam/jam.c
	//depot/main/jam/jam.h //depot/dev/robert/my-task-sparse/jam/jam.h
And use it:
C:\work\robert-my-task>p4 integ -b robert/my-task-sparse
//depot/dev/robert/my-task-sparse/jam/jam.c#1 - branch/sync from //depot/main/jam/jam.c#1,#35
//depot/dev/robert/my-task-sparse/jam/jam.h#1 - branch/sync from //depot/main/jam/jam.h#1,#49
C:\work\robert-my-task>p4 submit

How do I most easily use the newly branched files?

Well I can create a client workspace like this:

Client:	robert-ws-sparse
View:
    //depot/main/jam/... //robert-ws-sparse/jam/...
    +//depot/dev/robert/my-task-sparse/jam/... //robert-ws-sparse/jam/...

Note that when we sync this, the following happens:

C:\work\robert-my-task-sparse>p4 sync
//depot/main/jam/Build-new.com#1 - added as c:\work\robert-my-task-sparse\jam\Build-new.com
//depot/main/jam/Build.com#9 - added as c:\work\robert-my-task-sparse\jam\Build.com
//depot/main/jam/headers.h#2 - added as c:\work\robert-my-task-sparse\jam\headers.h
//depot/dev/robert/my-task-sparse/jam/jam.c#1 - added as c:\work\robert-my-task-sparse\jam\jam.c
//depot/dev/robert/my-task-sparse/jam/jam.h#1 - added as c:\work\robert-my-task-sparse\jam\jam.h
//depot/main/jam/Jam.html#2 - added as c:\work\robert-my-task-sparse\jam\Jam.html
etc...

Note that we get the branched copies of jam.c and jam.h and the normal copies of everything else. Thus we can work happily (though it might look a little strange in the p4win/p4v depot view). This uses the feature of “+” mappings which became official with 2005.1 of the server (they were undoc’ed for a while before that). From “p4 help views”:

A mapping line that begins with a + overlays the later mapping on the earlier one: if files match both the earlier and later mappings, then the file matching the later mapping is used. Overlay mappings are only allowed on client views, and make it possible to map multiple server directories to the same client directory.

Branching More Files

If we want to add a new file to the sparse branch then it is a little fiddly. The steps are:

  • Add an appropriate mapping to the branch spec view
  • Use “p4 integrate” and the modified branch spec to branch the file (and submit it)
  • Re-sync our client workspace which will automatically bring in the newly branched file rather than the mainline version.

Integrating Changes Back to the Mainline

Note that for both of these models we can integrate our changes back to the mainline with a very simple client workspace:

Client:	robert-ws-main
View:
    //depot/main/jam/... //robert-ws-main/jam/...

This is because to integrate we only need the target of the integrate to be mapped in our client workspace so we can integrate from either the sparse or the inclusive version.

Finding Changed Files on an Inclusive Branch

It is actually rather easy in most cases to find the files that have changed on the branch (one of the sticking points for many people), even if you branched everything in the first place. The recipe is:

  • find the changelist which created the branch (e.g. N)
  • list all files on the branch from changelist N+1 to the head.
C:\bruno_ws>p4 changes //depot/dev/robert/jam/...
Change 724 on 2006/10/19 by bruno@bruno_ws 'Some more stuff '
Change 723 on 2006/10/19 by bruno@bruno_ws 'Another change '
Change 720 on 2006/10/17 by bruno@bruno_ws 'Dev change '
Change 719 on 2006/10/17 by bruno@bruno_ws 'new brach '

C:\bruno_ws>p4 files //depot/dev/robert/jam/...@720,#head
//depot/dev/robert/jam/Build.com#2 - edit change 720 (text)
//depot/dev/robert/jam/Build.mpw#2 - edit change 720 (text)
//depot/dev/robert/jam/command.c#2 - edit change 720 (text)
//depot/dev/robert/jam/jam.c#2 - edit change 723 (text)
//depot/dev/robert/jam/jam.h#2 - edit change 723 (text)
//depot/dev/robert/jam/scan.c#2 - edit change 724 (text)

You can get slightly cleverer than the above if you are dealing with large numbers of changelists (e.g. pipe to tail, or look for changes to #1 of files). While this is command line stuff (and not easy in P4V for example), it can be very simply scripted and made available as a custom tool.

Doing Branching Within a Single Workspace/Environment

One time when I would certainly think more seriously about using the sparse model is when you wish to do some branching and yet you only have a single workspace/environment to test in. Now this is not a good situation to be in (and you should try very hard to get out of it), but I have seen it where for example people were doing database 4GL development and only had one database to develop and test in, and the costs to set up another environment appeared prohibitive.

In such a situation, you can use sparse branching and by changing your client workspace view (I usually cut and paste lines between the view and the description to save retyping), re-sync your workspace to either include the sparse branch or not. This is fiddly and error prone, but possibly useful in such situations.

Choosing Which Model

The disadvantages of the sparse model are the extra effort required to get new files onto the branch, and some extra server performance overhead in usage. The other problem is that you can’t use the sparse branch in its own right – a workspace must know the “base” branch to use together with the sparsely branched files. This might seem easy to do in the short term, but over time can become harder to communicate and there is more room for confusion. Inclusively branched files are just there and the whole branch can be used at any time in a standalone manner. Together with the recent advances in common ancestor detection, it is not difficult to propagate changes in a flexible manner.

The disadvantage of the Perforce model are:

  • You have to branch ahead of time – it is fiddly to suddenly decide in the midst of making a change that you want to move this change to a separate branch (I will discuss options for this in a later article).
  • It can be a little more messy to see what has really changed on the branch since all the files are branched and you have to check which ones have actually been changed (this is actually quite easy - see recipe).

As we have seen above, a disadvantage for the sparse model is that it is fiddly to add new files to the sparsely branched set.

Thus I think that the inclusive branching model is easier for most people. It is also the way Perforce was designed to work, and thus the way in which people have the most experience, so you will be in good company. It is often a recipe for making life hard when you start applying the thinking of another tool. One of the worst examples I saw of this is in using labels – a customer had 85,000 labels and counting – a huge mess (it wasn’t just labels, but labels of labels – impossible easily to tell which labels a file was in due to the composition). They got there by applying PVCS thinking to Perforce, instead of understanding and applying branching as it was designed to do. You have been warned!!

However, I am sure that there are situations where a branch of only a few files out of a largeish number is required and it is very useful to be able to see that only half a dozen say have been branched.

You pays yer money and makes yer choice (and bears the consequences)!

2nd BCS CMSG Conference (21 & 22 June 2005)

The 2nd BCS CMSG Conference was a great success and very enjoyable. Being second time around and having it in the same location (Homerton college Cambridge) was a definite plus for making the organisation easier (very important consideration for something being organised by volunteers!).

We were very pleased with the quality of presentations and speakers – a definite advance on the previous conference, partly in that we had more choice from which to select papers. We also had a good representation from overseas: US, Europe and even Australia.

The general feedback was very positive. People particularly enjoyed the networking possibilities in terms of the conference being residential and there being quite a bit of time in the evenings for chats over a Pimms/beer/wine. We asked specifically about running the event in London next time but the general feeling was against this. The advantages of allowing more people to attend and perhaps for parts of days or specific presentations were outweighed by the lack of networking opportunities that would inevitably occur as people split up to different hotels or went back to their offices etc.

There was a very nice feeling throughout the event, and I think we have generated more buzz that will feed into increased attendance next time (we were just under 100 people not including vendor staff).

The 3 streams (large scale systems, service management/ITIL and commercial software development) seemed to work fairly well. It certainly seemed like people mixed and matched.

We did discuss running the event on an annual basis rather than every 2 years, but the problem is the lead time and amount of work that goes in to it would put a real strain on the committee. Thus our current intentions are to run a rather bigger 1 day event next year with space for an exhibition, and then the 2 day residential conference again in 2007.

The papers (or most of them) have now been uploaded to the web site and are available for people to look at. Rather fewer papers and more slides this time than last which makes for less information if you weren’t there which is a shame.

Meanwhile we are having discussions about the future events for the BCS CMSG and our October event has a theme and title “Resistance is Futile – Satisfy your Compliance Auditor!” around corporate governance, Sarbanes Oxley, software asset management etc. Suggestions welcome for future events – probably via the BCS CMSG forum on CMCrossroads.

CM as a means of communicating information

Brad has been writing on the theme of Trust and how to build it which I have found interesting.

Something I have been concerned with for quite some time now is the concept of “Selling CM” (Configuration Management) – how to communicate the value of it to other people. As an aside, I am looking forward to a couple of the presentations at the upcoming BCS CMSG Conference on this theme (Richard Morreale and David Cuthbertson). There are of course some external issues which are raising the profile of CM – in particular Corporate Governance (Sarbanes Oxley in the US and The new Companies Bill in the UK). These help to give it a higher profile, but if we’re not careful will lead to being accused of “crying wolf”.

As someone working in the field and convinced by the value and benefits of CM, it was fairly easy when working on development projects just to get on and do it. I tended to find that developers with some experience had been bitten by problems that resulted from bad CM practices, and were reasonably happy with a sensible system that helped avoid it (because they didn’t want to implement and I did, they were happy to let me at it!).

As I am now working as an external consultant, I tend to be at one remove in that I am providing consultancy and training and much more frequently come across people who don’t understand the need for it. Since I can’t do it for them, I have to persuade and educate them in such a way that they become interested enough to do it themselves (of course I come across a wide range of people on training courses, from the keen enthusiastic ones to the “I’m here because my boss sent me and I’m not interested” types).

I’ve been developing as a trainer in terms of how to best communicate things and put people in situations that they discover things for themselves, and this is very much an ongoing field of study for me.

However, back to the original topic of communication…

I realised recently that we can look at CM as a means of communicating information to other people. Writing the code itself is a means of communication – one of the experiences highlighted by agile is how much documentation gets generated in software development that turns out to be useless. Valuing working systems over documentation is because a working system communicates more accurately than documentation which tends to be out of date.

CM repositories are where people store their code. But how people use those repositories makes a huge difference to their usefulness. If you think about what you are trying to communicate when you use the repository you can increase the amount of information communicated with little or no extra effort – just some thought.

For example, let’s look at the pattern of Task Level Commit which talks about grouping files together as change sets. Rephrasing it a bit, the grouping of changes to individual files into one change set is a very useful piece of information which has value to other people. If I am using a tool which doesn’t support change sets, then I may try and communicate this extra information by checking in the individual files at a around the same time with the same comment (or reference to some external defect). The problem is how easy that information is to extract – often difficult if the tool doesn’t support it.

Another way of losing information is by checking in a large change set with say the fixes to 5 defects all mixed together. This makes it difficult to extract the information about individual bug fixes. In some instances this is not a major problem (if say the bugs all needed to be fixed anyway and are all going in to the same release). However, if you are using branches and wish to propagate changes between branches, and not necessarily all of the changes, then you have to split the all singing all dancing change set up and extract the relevant bits. This is usually more work if you are lacking the information about which changes fixed which bug because the changes are recorded as a change set.

I have also heard people ask about making a fix to 2 branches – instead of making the fix once and propagating it, they think it is easier just to make the same code change in the 2 places with only a comment to link them. Again, this is losing information which can be useful in the future. (Note that sometimes it is worth fixing a bug in 2 branches in 2 different ways – for example a work around quick fix in a release branch, and a considered proper fix in the mainline – in such circumstances you need to think about how to convey the information).

The precise details of how you convey the information will depend on your tool. The key step when making any change is to think “what information about this is important for the future and how can I best use the tool to record that information for posterity?”.

One way I have heard of this being described is to “use the tool to record your intention” rather than just make a change.

P4Python – Python interface to Perforce via API

I recently released P4Python – an interface to Perforce via it’s C++ API for Python. Fortunately I didn’t have to start from scratch. First I used the model for P4Ruby andP4Perl by Tony Smith. Then I was able to build on the original Python work by Mike Meyer (see license for link to his original).

Writing Python extensions in C++ (as this had to be because of the Perforce API) is a bit of a palaver due to reference counting etc. I had a look round at various toolkits such as SWIG, and Boost.Python. However, I wanted to make it as easy as possible for other people to download and build, so in the end decided to evolve what Mike had originally done. I quickly wrapped it in distutils, so that installation (when compile/build is required), should be just a case of:

python setup.py install

It might be necessary to hack setup.py to set a couple of perforce specific compile/link flags and point at the appropriate API. It is easy to provide some prebuilt Windows binaries.

The implementation turned out to be pretty straightforward. One of the neat tricks Mike had done which I continued gratefully was to push quite a bit of processing out from C++ into a Python wrapper module which made the code a whole lot easier to write and test. I discovered a few tricks to achieve the equivalent of Ruby’s method_missing magic, such as using __getattr__ and the like along the way. Together with a nice test harness (unfortunately I can’t release the Perforce repository which it assumes is there but principles are the same) it should hopefully be nice and easy to maintain.

All in all quite a fun project. As a final validation, I include in the download a simple performance testing script which compares 50 runs of the “p4 info” command. The average time taken to run it via the new interface is 0.0013 seconds, whereas spawning a p4.exe and parsing the output takes 0.2297 seconds (1,700% increase!). This may not be terribly significant when executing a few commands (where the server time taken is likely to be much longer than the overhead), but is nice to have in a variety of situations. One particular area is the writing of Perforce triggers themselves – nice to be able to do these in Python at speed!

Note that I copied the P4Ruby idiom and raise exceptions etc leading to nice clean code. From the web page:

  from p4 import P4

  template = "my-client-template"
  client_root = r"c:\work\my-root"

  p4 = P4()
  p4.parse_forms()
  p4.connect()

  try:
      # Run a "p4 client -t template -o" and convert it into a Python dictionary
      spec = p4.fetch_client("-t", template)

      # Now edit the fields in the form
      spec["Root"] = client_root

      # Now save the udpated spec and sync it
      p4.save_client(spec)
      p4.run_sync()

  except:
    # If any errors occur, we'll jump in here. Just log them
    # and raise the exception up to the higher level
    for e in p4.errors:
        print e

As a final exercise, I have just published it to PyPI the Python Package Index (www.python.org/pypi).