Smart firms expect the unexpected

It doesn’t have to be an act of God, it could just as easily be a system crash or a power cut that corrupts irreplaceable data or destroys bespoke applications and hardware. How much would it cost you?

‘If our data server goes down for more than 10 minutes, our traders will lose more money than you’ll ever make in your life,’ said Mich Talebzadeh, consultant at international finance service provider Deutsche Bank.

If the worst case unfurls, you’ll need some form of disaster recovery plan to get systems and business back up and running, in some cases at a moments notice.

Well-constructed policies will allow for the restoration of computing and telecoms. Associated business continuity plans will determine how your firm keeps functioning until normal facilities are restored, while off-site backup allows for the effective restoration of vital customer data.

Four years ago, Deutsche Bank introduced distributed processing to allow its traders to exchange data on multiple machines globally, and implemented a range of disaster recovery policies.

Talebzadeh realised that, without local servers, traders would log on to one centralised application server. If that server crashed, it would create serious problems.

‘We needed a system that provided a fast response, and one that didn’t rely on a central database in London,’ he explained.

Deutsche Bank looked at introducing localised servers and disaster recovery in April 1998. Five months later it launched a small-scale project linking trading desks in Hong Kong and London. By October, Tokyo was also live.

Such initiatives are rare. Some 60 per cent of UK and 90 per cent of European companies with global revenues in excess of £66m have no formal business continuity plans in place, according to researcher IDC.

Passing the buck
The blame rests at board level, according to Rakesh Kumar, vice president at analyst Meta Group. ‘The executives of large companies are abdicating responsibility,’ he said. ‘They pass it on to IT teams, which do a data backup and assume that disaster recovery is a done deal. But it isn’t.’

Responsible disaster recovery requires hardware investment. Deutsche Bank now runs two global disaster recovery systems. The first links convertible bonds trades between Hong Kong, London, Tokyo and New York. The second links equities cash desks in London and Frankfurt.

Rather than one centralised server, held in one specific location, the bank has a three-tier architecture. Built-in spare capacity allows for easy disaster recovery. ‘We don’t have downtime because redundancy is built in,’ said Talebzadeh.

The primary tier is a web client, powered by Netscape or Internet Explorer. A second application server layer is powered by individual hardware at each location. A final layer runs on a Sybase data server using replication server technology. Additional bespoke early warning systems confirm that the three-tier layer is running effectively.

Loss of data has meant that Deutsche Bank’s Hong Kong office has had to rely on load balancing twice in the past four years. The company then asks its Hong Kong traders to fail over to spare capacity on the Tokyo data server. Shrewd planning has allowed the bank to implement an effective disaster recovery policy.

David Jennings, EMC’s European practice leader in business continuity, maintained that effective load balancing is a critical component of any disaster recovery plan.

‘If you can’t effectively rebalance the remaining computing resources, your company will simply waste time and people,’ he said.

Lessons not learned
However, not all IT managers are implementing an effective disaster recovery policy. ‘Everyone panicked after 11 September,’ says Kumar. ‘But after two months, no one had got any further.’

One problem is the lack of integration between the IT team and internal departments. Just over three-quarters of non-IT departments only become aware of the importance of disaster recovery after downtime has occurred, according to IDC.

Rather than relying on IT specialists, Kumar insisted that good disaster recovery is the responsibility of every worker from a range of departments, including human resources, business operations and security.

‘Certain workers will find themselves professionally accountable for what occurs,’ he explained. ‘Most companies have someone responsible for risk management. But most of these managers don’t have the remit to set up policies and plans, and executives are passing the buck.’

Alistair Maughan, a partner at IT lawyers Shaw Pittman, suggested that the starting point for setting a disaster recovery policy is working out who is in charge. ‘Too many people in too many organisations believe that it is someone else’s responsibility,” he said. “It’s a cross-departmental function.’

A concise disaster recovery policy should also encompass the network, and outsourced solutions provide one way for businesses to continue trading after a crisis.

Hosting company Data Lifeline, for example, has set up a 255Mbps fibre circuit around the UK, with 29 points of presence to offer leased-line connections back to its data centre.

Andy Batty, chairman of the Fibre Channel Industry Association in Europe, insisted that companies need to pay more attention to fibre connections.

‘When you look at how much revenue and business a lot of companies regularly lose because of missing data, you realise that a suitable fibre channel-based disaster recovery strategy is a good form of insurance,’ he said.

While outsourcing can provide an answer for complicated network and data backup problems, Kumar warned that there can be problems.

‘If you outsource part of your disaster recovery process, you create a technical barrier between your IT department and this is where errors can occur,’ he said. ‘I’m opposed to partial solutions: outsource the whole lot or try to cover disaster recovery internally.’

Organising priorities
Kumar added that it is important to isolate the most vital applications. He believes that most companies will actually need only 20% of their total applications in the 12 hours following a disaster.

‘Look at your associated data because, if this is lost, this is really lost,’ said. ‘Ensure that you have a second copy of your data at two different sites that are far away from each other.’

Paul Hocking, head of infrastructure development at Churchill Insurance, explained that this is advice which his company has always followed. In the early 1990s, it installed an off-site tape storage facility.

Over the past two years, it has introduced two data centres and replication storage technologies. All data backup is managed by the company’s Bromley data centre. Storage specialist Network Appliance’s SnapMirror software mirrors data to a second centre at Biggin Hill Airport in Kent.

Hocking emphasised that replication is a standard element of his company’s IT policy. Rather than buy one standalone server, it bought two servers for each data centre.

‘We constantly ship data between the two sites,’ he said. ‘So if something goes wrong, recovery should be a lot less painful. The user shouldn’t be concerned and shouldn’t see a difference in response.’

Data backup via replication can be expensive. However, Hocking pointed out that the initial outlay on hardware is a price worth paying. ‘We work for other blue chips, and we can’t afford the bad press of taking out other companies,’ he said.

Working out potential risk and required spend is tricky. A medium-sized business outsourcing its continuity plan to a company such as IBM Global Services can expect to pay between £40,000 and £90,000 a year.

While this cover ‘isn’t cheap’, Kumar warned that companies must attribute the right amount of money for any crisis. Applications, data files and individuals will need to be protected from technology failures and physical disasters.

It never rains …
Bad weather in November 2000 flooded Morphy Richards’ headquarters in South Yorkshire when the river Don burst its banks and left the office under two feet of water.

Thankfully Morphy Richards had introduced a Synstar business continuity plan seven years ago when the company was still running Unix-based systems. It continued to work with Synstar as it moved over to AS/400 servers in 1998.

The annual business continuity contract provides enough equipment to keep Morphy Richards’ head office running in an emergency. Synstar supplies a mirror-image AS/400, terminals, printers, storage and Ethernet hubs.

Morphy Richards’ computer services manager Russell Dalgety stressed that the key to his company’s disaster recovery plan is ensuring that data on the mirror-image AS/400 is regularly updated.

‘We do this every night to ensure that back-office files are saved. We also do a monthly system save,’ he said.

Regular data updating proved invaluable as the heavy rain fell. Flooding meant that alternative premises had to found in nearby Rotherham, where Synstar quickly helped set up the mirror-image AS/400 for 150 users.

‘Synstar was with us in five hours and we had the AS/400 system running in 24 hours,’ explained Dalgety. It found a suitable location and helped install the relevant software and network equipment.

‘We were running at about 70 per cent capacity, but it didn’t have a massive impact,’ explained Dalgety. ‘Get a policy and outline specific functions and roles for your workers in an emergency.’

Kumar agreed that concentration on staff roles in a disaster recovery situation is vital. ‘Get your processes documented so that data administration and security workers know what they’re doing,’ he said.

Testing, testing
Human resources is just one element of disaster recovery; if you’re under risk you also have to put the money behind testing to ensure that your technology runs smoothly. Analyst Meta suggests doing at least one annual worst case assessment.

It’s also important to cover all areas of your technology infrastructure. A thorough disaster recovery policy should include telecoms.

Skipton Building Society’s head office, for example, started subscribing to BT CommSure’s business continuity service in 1998. CommSure provides replacement phones and PBX switches if Skipton is involved in a disaster.

The company also subscribes to BT’s Major Incident Procedure plan, an enhanced version of BT’s normal maintenance programme operated in association with the local BT service centre in Leeds.

On 3 May 2001 Skipton called in BT maintenance workers. Skiptons’ network services manager Tony Halsall explained that it had problems ‘a couple of days before and the BT maintenance guys settled them. But on the day it went belly up, it was clear that they weren’t going to get anywhere fast.’

The network services team contacted BT CommSure, which connected cables to a mobile PBX unit in the Skipton car park. The installation was completed at 8.30am the next day, but was never used. By the following morning, the maintenance team had corrected the fault.

Halsall said that it was an invaluable learning experience. ‘The existing BT CommSure contract didn’t provide a like-for-like service and we had to ask the internal departments to prioritise which connections they would need,’ he explained.

Connection prioritisation would have reduced Skipton’s phone capacity to about 35%. Having spent time trying to understand the minimum requirements of his staff, Halsall spent a further two weeks formalising network provision.

After the survey, he and his network services team extended phone provision to give Skipton workers ‘a degree of comfort’.

Business continuity allowed Skipton’s network service’s team to create internal safety and security. Like Skipton, you should ensure that your business has a comprehensive disaster recovery policy, which is supported and tested. It could be the least disastrous decision you ever make.

Related reading

Life Belt with Computer Folders
HMRC banknotes