Buying VoIP? Here's how to make sure that your vendor provides proper disaster recovery

Vendors of IPBXs need to be asked many, difficult, questions about disaster recovery when you decide to adopt VoIP. The unacceptable alternative to asking hard questions is flaky phones!

When you buy a new PBX, you want it to be as resilient as possible, especially given that VoIP technology has a reputation for not quite being as bullet proof as older products.

Disaster prevention, overall reliability, disaster mitigation, and recovery capabilities and performance are therefore extremely important in any RFP you issue and the responding vendor should be required to provide information dealing with these issues and concerns as part of the bid proposal.

I have seen too many RFP templates that do an inadequate job of defining the enterprise's requirements. This tip should go a long way toward making vendors respond adequately by addressing their system reliability, disaster mitigation and recovery capabilities.

Some vendors have participated in laboratory testing of their products. If there has been independent testing of the vendor's product reliability, disaster and recovery capabilities, this information should be part of the bid proposal. If the vendor does not have the data or will not provide it, then the following points should be evaluated during a pilot operation of the system.

Selecting the vendor with the highest reliability is a key component of the consideration of the final proposing vendor. The fewer the hardware and software failures, the less frequently the enterprise has to exercise the recovery capabilities of the product. The following should be part of the RFP:

  • What is the hardware MTBF and MTTR for the proposed system (not a boilerplate example)? Is the answer field experience or predicted by calculation and, if so, by what calculation method?
  • What is the hardware MTBF and MTTR for the proposed IP phones and gateways? Is the answer field experience or predicted by calculation and, if so, by what calculation method?
  • What has been the field experience reliability over the past two years of the software for the:

    • Server?
    • Gateways?
    • IP phones?
    • Softphones?


  • What is the anticipated availability ( of the proposed system combining both hardware and software?

Disaster prevention

The primary goal of the enterprise is to avoid any disasters. The vendor may have optional capabilities that can be acquired which will reduce or possibly prevent a disaster for the system operation. Consider the following as potential questions for the RFP:

  • What features and capabilities does the proposed system have for preventing disasters (such as redundant components and fault-tolerant operation)?
  • What happens to the calls in progress when the server is not reachable? Do they drop? Stay up? Can they hang up and initiate new calls? If so, with what features?
  • Are the aforementioned features and capabilities integral parts of the system or are they optional?
  • What are the vendor's recommendations for disaster prevention with the proposed systems?
  • What additional solutions can the enterprise implement to reduce disasters?

Disaster mitigation and recovery features
Once a disaster occurs, the system has to return to normal operation. Getting it up and running acceptably is expected. The vendor may have mechanisms that will reduce/mitigate the disaster so that some operation can continue. The following questions should be part of the RFP:


  • Can the gateways, IP phones and softphones be dual-registered to the primary and backup server, and is this integral or optional to the products proposed?
  • Is the backup server software an integral part of the system or optional?
  • What are the proposed gateway backup, redundancy and fault-tolerant features and capabilities?
  • Can the gateways proposed connect the legacy phones to the PSTN when the IP network is no longer available?
  • Can the gateways proposed connect the IP and softphones to the PSTN when the IP network is no longer available?
  • How do gateways respond to T1, PRI and analog trunk port failures?
  • How do gateways respond to digital and analog phone port failures?
  • How far apart can the primary and backup servers be located? Is it distance, network delay…?
  • What does the vendor recommend for the number of backup servers -- one to one, one backup for every two primary servers, or other possible configurations?

Recovery performance
How fast the recovery will occur may vary greatly from vendor to vendor. Many of the VoIP/IPT products have been independently tested for recovery response times. These tests have been published and should be referenced by the vendor in the proposal. The following are common measurements for recovery times:

  • How long does the server take to reboot after a power or hardware failure?
  • How long does the backup server take to determine that the primary server is unreachable?
  • How long does it take backup server take to control of all the network of gateways, IP phones and softphones after the primary server is unreachable?
  • What is the switchover time for a gateway to connect the legacy and IP phones to the PSTN when the IP network is not available?
  • How long does it take for an endpoint to discover that the server is no longer reachable?
  • How long do the phones and gateways take to switch over to a backup server?
  • Is the previous answer network size dependent and, if so, what would be the switchover times for 100 and 1000 phones?
  • How long does a gateway take to recover from an internal hardware, software or power failure?

About the author:
Gary Audin has more than 40 years of computer, communications and security experience. He has planned, designed, specified, implemented and operated data, LAN and telephone networks. These have included local area, national and international networks as well as VoIP and IP convergent networks in the U.S., Canada, Europe, Australia and Asia.

Read more on Voice networking and VoIP