top of page

VoIP Networks: The Art of VoIP Troubleshooting


February Issue, Business Communications Review

VoIP Troubleshooting … The statement conjures up the thought of troubleshooting a traditional TDM-based network, troubleshooting in the traditional voice sense of the word, or fears of the unknown and the changes your organization will require to adapt to such an environment. 

VoIP troubleshooting requires a holistic, visionary view of Telecommunications. I am sure you have heard many times from vendors at trade shows: “voice is just another application on the data network”. In fact, it is. Voice packets in an IP world need to be prioritized, and the rules that apply to managing a data network now apply to the voice network as well.

Some individuals think of VoIP Troubleshooting as the end game, as an operational area only. How untrue! The operational phase is only the third phase, in my opinion, of a three-pronged approach to developing a complete VoIP troubleshooting model. 

As the above diagram indicates, VoIP now requires a fully converged organization, best practices associated with design and deployment, and converged IT Operations and troubleshooting.

 

IT Organizational Structure - The Converged Organization

 

Prior to deploying a VoIP network (WAN and IP Telephony), the IT organization as a whole requires a fresh look at the current organization and skills necessary for a fully supported VoIP model. VoIP is a truly converged environment based on extensive knowledge of LAN, WAN, network management, and security.

A traditional TDM infrastructure runs parallel to the data infrastructure; the converged environment is exactly that, converged. Separate networks for switched voice and packet data are now combined using packet technology. Companies now must migrate from parallel, independent teams to fully converged teams that support together both the voice and data infrastructure. 

New processes and procedures must be developed around such a converged environment. Organizational models will vary, but the converged network should include disciplines from the following areas:

  • The data network group – brings the data expertise and current data infrastructure baseline environment together

  • The voice network group – brings the telephony environment, applications, carrier-related functions, and carrier billing to the network

  • The project management group – facilitates deployment of the overall converged network and manages the details associated with the successful migration

  • The operations group – brings the ongoing support elements and processes to the network and ongoing management of servers and VoIP applications

  • The help desk group – brings the help desk function, now cross -trained for voice and data troubleshooting and is supplied with the necessary tools to support such an environment.

It is critical, within all of the group functions listed above, that cross training and cross-involvement of all voice and data disciplines be presented to the groups at every opportunity. 

Finally, change control/change management is the largest change in the organization and will require the ability to track and monitor all system changes and determine if such a change has impact on other devices on the network. Regular monthly meetings discussing any changes pending within the quarter will help facilitate this function.

 

Best Practices Design/Deployment

 

The single, most practical way of troubleshooting today’s VoIP network is to avoid the trouble to begin with. Developing a network design that is resilient, redundant, and cost effective provides this ability. One should consider a five 9s (99.999%) uptime approach wherever possible when designing the new network.

Traditional TDM PBXs are built around a five 9s model (99.999%), or an outage of less than 5 minutes per year, while WANs are commonly built around a three 9s network, or less than 9 hours annually. The difference is significant. Dial tone is a God-given right and is “always on”. To manage a network to less than this is an attempt to change business culture, a significant feat, at best.

The newly converged network must live up to the challenge of a converged environment that approaches the five 9s model wherever possible. 

In order to prepare for a VoIP-enabled network deployment, a VoIP Assessment first needs to be performed. This will identify the current environment and network goals, including at least the following:

  • Review of the current switch/router environment

  • Review of current and planned data network for topology and bandwidth

  • Evaluation of policies related to possible security vulnerabilities

  • Determination of acceptable level of risk in relation to cost

  • Consideration of the physical environment in terms of vulnerability to break-ins

  • Use of VoIP assessment tools to identify bandwidth and segmentation needs for the end-state network

When designing the converged network infrastructure consider:

  • Upgrade or replacement of routers, LAN switches - Upgrade or replacement of routers, LAN switches as necessary to accommodate QoS (switches, routers), PoE (switches), Layer 3 protocols, multiple VLANs 

  • Centralized vs. decentralized models – consider centralized or decentralized voice mail, voice servers/CPUs, PSTN at each site, and Unified Communications,

  • Security measures – VoIP-enabled networks are subject to the same issues a data-only network has. Be sure to include in the security measures:

    • Access Control Lists (ACLs) and voice-enabled firewalls and intrusion detection/prevention systems, ensuring that system access is restricted to eligible users.

    • For Virtual Private Network (VPN) remote worker applications, encryption technologies need to be identified across the enterprise.

    • Network Address Translation (NATs) hide actual IP addresses, and strong authentication is used to secure VoIP gateways. NAT requirements will vary depending on the manufacturer used.

  • Traffic engineering - Traffic engineering that will properly design the total amount voice and video port allocation and bandwidth per site, based on traffic engineering models, bandwidth-related characteristics, busy hour, expected growth, and seasonal variation requirements, anticipated data bandwidth for nightly back-ups, peak file transfer periods, and other requirements for proper design of the network total voice and data bandwidth requirements,

  • The CODEC to be used – You will need to determine at the LAN and WAN levels the CODEC(s) to be used – choose primarily between best quality and best speed that will match to your network goals and bandwidth requirements. Consider half-duplex vs. full duplex (recommend full duplex for best use of bandwidth) and the highest possible Mean Opinion Score (MOS score). Common CODECs and associated MOS scores include:

  • MOS – Mean Opinion Score (ITU P.800) - 4.0+ is considered toll-quality speech.The two most widely used CODECs include G.729 (best speed) and G.711 (best quality). You should allocate at least 30k with header per voice channel using a G.729 CODEC, and at least 80k using a G.711 CODEC at full duplex.

  • Data network topology and structure - Data network topology and structure, such as MPLS, ATM, frame relay, or an Internet-based IPVPN network. It is particularly important to use a network topology that provides QoS. Note that IPVPNs provide best effort only and typically cannot be attached to a QoS-based SLA.

  • Features/technologies to be deployed - Features/technologies to be deployed should include, but limited to: 

    • Standard telephony features

    • Contact center features & Web Agents

    • Integration of all voice, FAX, and e-mail messages in a centralized environment

    • Network Management

    • LDAP directories, nodal and network element management, SNMP, and traffic reporting

    • Web-based administration

    • IP audio conferencing

    • IP videoconferencing

    • Unified Communications for converged desktop, presence, IM/chat, MS Live Communications Server, other

    • Network number portability

    • Simplified network routing

    • Softphones

    • Remote hop-offs

    • Remote offices, remote workers

    • Virtual office applications

    • IP trunking

    • Centralized control of soft move and changes, reducing ongoing costs considerably.

  • UPS requirements – PoE is now a local closet concern rather than a centralized power concern (TDM). Many larger TDM environments ensure uptime of at least 4 hours (in the event of a power outage) with battery back up. The new, localized PoE environment should match (or exceed) the current environment using the appropriate UPS. Electrical and HVAC requirements may need modification in line with this as well.

  • Network Management Tools – one of the most critical components in the design of the new/upgraded network is robust network management tools. Some of the tools are native to the manufacturer (Cisco for example), while other products are third party and provide specific tools and data to help facilitate the converged voice/data network. Some of the third party manufacturers include NetIQ Chariot/Attachmate, HP Openview w/VoIP Probe, Empirix Hammer, Fluke Enterprise LANMeter, and Finisar Explorer among others. Tools should be able to provide fault management, configuration management, performance management, and security management among others. Don’t underestimate the importance of tools like these – data network-only tools will only provide data-centric information and will not provide the ability to isolate VoIP-related problem areas quickly. Tools can either be purchased or provided as a managed service by the chosen carrier.Network management tools should thoroughly measure all end-points on the network and measure for availability, latency, packet loss, and jitter and perform traditional network management functions such as auto-detect of network devices, continuous ping, trace route, DNS lookup, network scans, and SNMP. Below you will typical SLAs required to maximize voice quality on a VoIP network over the WAN infrastructure that should be measured as part of the network management model:

 

  • Network Health Check – ALL VoIP network implementations require a network health check – different tools have different requirements – all will test the network and populate simulated voice traffic to check the health and readiness of the network.  The results of a network health check will indicate any areas where there is a network deficiency and will grade each site tested for appropriate MOS scores. Any MOS score at 3.9 or above will pass the acid test and will provide acceptable voice quality.

  • Deployment for sniffers, TFTP servers and syslog servers at all sites -Deploy sniffers, TFTP servers, and syslog servers at all sites to measure UDP and other traffic populating the network. Sniffers can be deployed to mirror specific voice-affiliated LAN ports or entire PBXs and can measure IP traffic within a specific period. The type of traffic populated during a network event can be better isolated using sniffers at all critical corporate sites.Ethereal is one example of an open-source protocol analyzer software tool, and is available for a free download (www.ethereal.com) and can monitor these packets using a stand-alone PC. Ethereal runs on all popular computing platforms, including Unix, Linux, and Windows. TFTP servers are used for storing configuration files and software images for network devices. Routers and switches are capable of sending system log messages to a syslog server. Both facilitate the troubleshooting function when problems are encountered and can be used to perform a root cause analysis (RCA) when required. 

  • The chosen manufacturer and channel partner(s) – Using an RFP with key specification and question criteria, you will be able to identify the SLAs, knowledge, certifications, knowledge base, geographic presence, partnership status, and escalation procedures for troubleshooting and getting to root cause quickly. Questions to consider:

    • How many IPT systems has the channel partner installed?

    • What level of distribution is the channel partner (Silver, Gold, Platinum etc.)?

    • What is the profile of the engineering and field staff and what credentials/certifications do they carry?

    • What is the channel partner’s strategy for deploying VoIP across multiple sites consistently to the manufacturer’s standards?

  • Staff Training – staff training and cross training voice-to-data and data-to-voice is critical in a converged network model. Certification in specific technical disciplines is highly encouraged over time.

  • Other considerations - Other considerations to include when design the VoIP network include:

  • Redundant or back-up WAN links

  • Disaster Recovery models – LAN, WAN, PSTN, virtual office, IP phone reroute to second PBX, hot site

  • Redundant data switches, stacked where possible

  • Separate VLANs to reduce the amount of broadcast traffic the IP phone will receive, minimizing network collisions

  • A robust IP scheme that includes IP addressing for all voice clients, data clients, servers (voice and data), routers, switches, printers, scanners, other.

 

Converged IT Operations and Troubleshooting

 

Last (and finally), the Converged IT Operations group will perform VoIP troubleshooting on an ongoing basis. Network tools will be used in a converged environment and provide automatic notification via e-mail, pager, cell, home phone when network parameters are measured outside the SLAs established by the Network Management team. The following areas should be monitored on a 24x7 basis by the selected tools:

  • The IP PBX – The chosen vendor should include Site Event Buffers (SEBs) and will automatically notify the VAR NOC center if any system-related alarm conditions arise (T1 down, CPU down, etc.). NOCs will typically notify the individual on-call when an alarm event “hits”.

  • The WAN – The chosen carrier will have the capability of providing network managed services, some of these include VoIP over the WAN . For SLA and network integrity purposes, we recommend such for managing a multi-site network, 

  • The Network Infrastructure – Network Monitoring Tools identified earlier help monitor for network characteristics, including network availability, bandwidth available, packet loss, delay, jitter.

PEPs and patches updates need to be scheduled on a regular basis to address known issues and prevent new issues from taking place. PEPs and patches should be the same current release on all VoIP PBXs and switches across the network for consistency purposes.

The following table provides problems and whether the problem occurs intermittently, periodically, or continuously (www.voiptroubleshooter.com):

 

A Couple of Real-World Troubleshooting Examples

 

  • Dropped Calls

    • Description – Calls drops, gets disconnected during call

    • Root Cause – Bad T1 card in PBX

    • Solution – Temporarily shut down T1 circuit and card until replacement arrives; route all calls through second T1 card and analog overflow

  • Poor Call Quality for Remote Worker

    • Description – Broken or choppy speech using remote worker hard phone or softphone

    • Root Cause – Not enough bandwidth during peak period, remote VPN software and VPN router required fine tuning

    • Solution – Added bandwidth for remote worker, modified VPN software and VPN router configurations, changed best speech to best bandwidth CODEC.

  • No connectivity from IP Phones to IP-PBX

    • Description - All IP Phones down in the branch office

    • Root Cause – Bad Layer 3 data switch, key connection cable from PBX to data side connected to switch; all data clients and printers also down.

    • Solution – Replaced connector cable to similar VLAN port on second switch in stack

  • IP Phone reset

    • Description – IP Phone(s) reset/reboot

    • Root Cause – Broadcast storm exceeds levels of set tolerance.

    • Solution – Isolated with sniffer level of broadcast traffic, reviewed VLANs for traffic separation, obtained latest patches for IP PBX and IP Phones that further insulate this issue.

 

Conclusion

 

After reading this, you may say to yourself, is this all really worth it?  From my personal experience, the answer is a resounding Yes. The outcomes include:

  • A converged, functional IT organization, with cross-trained rather than one that is disparate,

  • A more robust, more resilient data network capable of handling voice, data, and video in a single IP environment, capable of approaching five 9s reliability,

  • More robust network management tools capable of centrally managing and reporting all data and voice components,

  • Features and technologies tailored to an IP environment, including remote workers, softphones, Unified Communications, LDAP directories, and IP trunking among others,

  • ROI that takes advantage of a converged infrastructure, reduced conferencing costs, reduced long distance costs, efficient soft moves and changes, and reduced costs for remote workers.

The old cliché “failing to plan is planning to fail” could never be more appropriate for a transition such as this. Make sure you plan to troubleshoot VoIP through organization change, and best practices design and deployment, and the troubleshooting process will be a whole lot easier.

bottom of page