Entries Tagged 'process' ↓

Environmental Time Wasting

The situation I found at a recent client engagement was not unusual – large amounts of time being wasted due to poor processes around managing their environments – particularly test environments.

This post is about how we have set about improving this type of situation.

Background

Like many companies, processes and procedures had grown by accretion – a gradual build up of a tribal lore of what to do to get a working environment. As is very common, this was typically not well understood and poorly and inaccurately documented. New starters “cloned” an existing environment and then diverged!

Their processes were quite formal in some respects – multi-page forms requiring paper sign off and deployment instructions consisting of word documents with literally hundreds of manual steps listed in some cases!

Production Problems are Rare, but…

The systems being developed by this client are “only” used internally, but with tens or hundreds of thousands of client transactions being processed daily, it was not surprising that they needed to avoid problems.

The processes in place did mean that relatively few problems were being introduced into production – the key problem was that they realised they were not very efficient in how they achieved this – it required a lot of effort every time and there were far too many manual steps. A few key people were overworked and stressed, and productivity was low.

There were a number of issues causing problems, but the biggest one was the lack of control of the environment. They realised that their testers were regularly spending hours, and in some cases days of wasted effort to make sure that test failures were due to the actual code under test and not just due to the environment being incorrectly set up.

Thus an hour of two of testing could take multiple days to perform – no wonder testing was perceived as a bottle neck.

Virtual Machines

There were some good practices – the use of VMs for testing meant people could to some extent separate their working tools from the environment used for tests.

Theoretically this meant that a VM could be replaced by a fresh image, but in practice that was seldom done. Thus the various VMs had gradually drifted apart in terms of operating systems and patches and other application installations.

Configuration Identification

A key configuration management process is that of being able to accurately identify the particular versions of the programs and files in use. This was reasonably accurately being done (even if somewhat inefficiently as the build process was largely manual and they were using multiple Visual SourceSafe repositories!).

The problem was that with hundreds of .exes and .dlls as well as other files, it was effectively impossible to manually control what was going on, and also to remain up-to-date with a constant stream of updates and enhancements.

Audit

The first action to improve things was to write a small audit tool. This took as input a single master spreadsheet of executable names and specified version numbers. It then scanned the local machine and detected which versions of which .exes and .dlls were actually present, and also a list of other files.

We then started by running this on the production machines. With a little bit of batch file scripting we were able to quickly include things like registry contents and services installed.

It produced a report containing 4 basic sections:

  • files that were expected to be on the machine where the version numbers matched
  • files with mis-matching version numbers
  • files that were expected to be on the machine but were not in their expected location
  • unexpected files found locally

The results were fairly typical:

  • the spreadsheet was not up-to-date itself
  • version number mismatches, e.g. 2.00 instead of 2.0.0.0
  • file location differences – the wrong directory path specified
  • there were unexpected differences between the production and the disaster recovery sites

Master Control List

It takes detailed time and effort to go through a spreadsheet with a thousand or more entries and ensure that everything is accurate – but this is a vital step and needs to be done. Once the data is accurate it is usually fairly easy to maintain (including of course saved versions as you go to track changes).

Later on, you can look at how this is done and try and reduce the manual steps required to keep it up-to-date – but to start with it just needs to be made accurate!

The existence of the audit tool made it very easy to check and environment and find out which versions were where and make a judgement as to what needed to be fixed. This in itself is a (possibly surprisingly) big win!

Automatic Deployment

Once you have such an audit tool, it is usually pretty easy to create an automated deployment tool – in this case it shared much of the same code, and the extra requirements were:

  • check the version found locally
  • if the file doesn’t exist, or the version is not correct, then extract the specified file from the repository (and check its version again!)
  • register DLLs if required
  • install services etc.

The details of this will vary depending on the technology in use (in this case Windows with programs written in VB6 and various versions of VB.Net as well as C++)

There can be some extra complications for things like configuration data, e.g. there needs to be a registry entry with a particular key name, but the value of that entry will be specific to the name of the machine currently in use. But these are not difficult to solve.

Just getting the basic automatic deployment working is a huge win. Even if there are several steps required  - focus on making this as easy as possible.

Testing a new change still required some manual steps to install the right versions of the files to be tested. But if you are doing this on a known clean baseline, this is not difficult.

Summary

I haven’t gone into all the details here, but hopefully the principles are clear:

  • you need to keep on top of your environments
  • manual process typically don’t work
  • configuration identification is absolutely vital
  • don’t forget the “extras”: registry settings, services, database configurations and contents
  • look to automate as much as you can – saves vast amounts of time
  • you don’t need to automate everything in one go – pick the “quick wins” and go from there – but keep looking to improve!