As VoIP moves from the labs to production environments, the network becomes a more important corporate asset and thus the effect of downtime will be more acute. While companies spend millions on upgrading infrastructure for VoIP, little attention is given to solving the largest source of downtime – configuration-related outages due to human error. A well-defined change management process built around a configuration management system can virtually eliminate the "self-inflicted" errors, which currently account for about 60% of all network outages.
The problem: The blind leading the blind
Most organizations lack network-wide control of configuration baselines and changes, which directly leads to an unnecessarily high number of network outages. This lack of information also impairs a network operations team's ability to quickly find and repair the event that caused the outage.
Many organizations are in constant fire-fighting mode, so much of the troubleshooting is done ad hoc. In fact, while troubleshooting, the engineer may make several configurations changes without documenting what has been done. With networks growing rapidly, there is a continual need to establish and maintain baseline configurations as well as having the ability to audit them for all of the network devices at any moment in time. A collection of ad hoc tools and poor process cannot do this, which leads to many of the following common problems:
- Configuration drift:
When different individuals make a number of changes to many network elements, device configurations tend to become inconsistent. This leads to elements that are similar in profile, but have widely different configurations where the baseline for each device is lost.
- Loss of configuration information:
When changes are done ad hoc and the network engineer attempts to document the change after the fact, information is invariably lost.
- Unnecessarily long downtime:
When troubleshooting network problems, it is important that engineers be able to restore a network device to a stable state. This, at least, puts the device in a functional condition while the engineer continues to identify the root cause of the problem. Without a tool to automatically restore the device to the baseline, the device remains down for an extended period, further impacting the business.
- Increased mean time to repair:
Problem isolation and incident repair take much longer because the engineer needs to manually put the device into a stable condition through ad hoc trial and error.
The sum of all of these problems is a higher cost of downtime, longer repair times and overall lower service reliability. The configuration of a network object and the impact the device has on dependant devices is one of the first things an engineer investigates during an outage. Without a consistent process to device changes it is almost impossible to correlate these changes manually.
The solution: Network configuration management
In the FCAPS (fault, configuration, accounting, performance and security) model for network management, the "C" is often overlooked. Most of the network management vendors focus on everything but the configuration aspect so the bulk of configuration management tools have been delivered by the equipment vendors. The vendor-supplied tools provide some strong features, but the scope is limited to that particular vendor. As a result, most network managers require many configuration management tools to support the entire network making correlation of information very difficult.
However there are a number of independent vendors such as Intelliden, Voyence, Opsware and Tripwire that offer multi-vendor products that can be used as the focal point for the change management process.
I can't express strongly enough how important this aspect of running a network is for supporting real-time applications such as VoIP. I've talked to many organizations that have gone through the laborious task of deploying VoIP only to have the implementation suffer due to poor change management process and tools. As companies build more automation into the network, manageability will be a key to success with configuration management enabling it.
Companies implementing a configuration management strategy will realize the following benefits:
- Faster, more accurate configurations and changes:
A multi-vendor tool can be used as a centralized resource for faster changes and provisioning leading to more manageable devices.
- Configuration change tracking and more accountability:
Network engineers can see when changes are made and who made the change. Also, companies will gain the ability to detect when unauthorized changes are made and tie the configuration change to a particular individual.
- The ability to return the device to a known state:
Most of the products today contain the ability to roll the device back to the previous or predefined state. Without this, engineers usually rely on TFTP servers or simple cut and paste from configurations stored on laptops.
Overall companies will see the benefit of a more consistent, uniform set of configurations that are easier to troubleshoot and maintain. Also, by removing the ad hoc configuration changes, the majority of self-inflicted errors will go away.
What to look for in a product
There are many features and functions to these products and your criteria will be different from other companies, but here are the main things I would recommend looking for:
- The ability to support as broad a variety of vendors as you need
- Rollback and restoration tools
- The ability to create an event based on a configuration change
- An intuitive GUI for ease of use
- Roles-based access and permissions
- The ease of exporting the information to other systems
I stated this before, but it's important to not underestimate the importance this can play in the long-term success of running a network capable of supporting real-time applications such as VoIP. It's realistic to expect a 20% improvement in the efficiency of the network operations team and a 25% reduction in overall mean time to repair. Think of it this way -- it's the same impact as adding an additional one headcount for every five in network operations. More importantly, it will allow your network operations staff to scale as more network-dependant applications are deployed.
So, add implementing a configuration management tool to your New Year's resolutions! Happy New Year!
About the author:
Zeus Kerravala is senior vice president of Yankee Group's infrastructure research and consulting. His areas of expertise involve working with customers to solve their business issues through the deployment of infrastructure technology solutions, including switching, routing, network management, voice solutions and VPNs.
Before joining Yankee Group, Kerravala was a senior engineer and technical project manager for Greenwich Technology Partners, a leading network infrastructure and engineering consulting firm. Prior to that, he was a vice president of IT for Ferris, Baker Watts, a mid-Atlantic based brokerage firm, acting as both a lead engineer and project manager deploying corporate-wide technical solutions to support the firm's business units. Kerravala's first task at FBW was to roll out a new frame relay infrastructure with connections to branch offices, service providers, vendors and the stock exchange. Kerravala was also an engineer and technical project manager for Alex. Brown & Sons, responsible for the technology related to the equity trading desks.
Kerravala obtained a B.S. degree in physics and mathematics from the University of Victoria (Canada). He is also certified by Citrix and NetScout.