QoS - Really?
One of the most overlooked areas of a Unified Communications deployment, in my opinion and our experience with several clients, is just one word/acronym: QoS, aka, Quality of Service. By design, it requires all voice and video real-time latency sensitive traffic to be prioritized in front of any non-real time sensitive (most data traffic). Simply put, voice and video traffic are the first types of traffic to suffer in the event of a data network spike or anomaly or degradation in network quality.
With VoIP/UC being available over 15 years now, QoS continues to be the most frequent issue relative to Unified Communications, despite the technology’s maturity.
Our Firm’s Experience
Here is some of our firm’s experience with and without QoS:
Voice and Video Real Time – Voice and video are real-time while most other technologies, including email, SMS, downloading files, and imaging transfers are not real-time. In my experience, by the time an employee notices some level of lag with e-mail, the IT Help Desk will have already received at least a dozen or more ”Priority 1” voice outage or voice quality issues, unable to manage the conversation with a customer or colleague.
Bandwidth Required – Voice and video (especially) do take up bandwidth and need to be recognized for it.
For a G.711 CODEC, voice conversations take up 85K-95K per, based on headers and footers used by the vendor (G.729 @ 30K-35K), and H.264 HD video takes up anywhere from 1Mb – 1.3Mb including headers and footers
In our experience, voice typically takes up between 10-15% of the total bandwidth of a Layer 2 Ethernet or Layer 3 MPLS circuit, and adding video, depending on usage (internally and externally, can take up as much as 50-60% of the total bandwidth of an existing circuit. (We perform traffic engineering for our clients to determine the total bandwidth requirement necessary.)
For a combined voice + video + data infrastructure, considering additional bandwidth may be a must at some point. Note the expanded "viral" need for video throughout the enterprise. And public side SIP trunking will increase voice traffic over the WAN considerably, in some cases by as much as 50% or more.
Many enterprises are under the assumption that throwing enough bandwidth at a circuit solves the QoS issue. It does not. The additional bandwidth helps prevent more frequent QoS-related issues, but does not prevent them. Without QoS, a network at 80+% fill in a peak period will begin to notice the effects of voice and video suffering or lagging
Minimum Characteristics for Voice and Video – Layer 2 or Layer 3 circuit characteristics need to accommodate the following minimum criteria. These are minimal circuit characteristics and DO NOT replace voice and video prioritization (note that some enterprises think that having more than enough bandwidth and network characteristics that are "within spec" are a means to a network that is not exposed. In our experience, this helps but does not solve the problem of a network carrying voice and video traffic
Availability/Uptime - 99.9% - 99.99% (the or four nines)
Latency - < 120ms
Jitter - < 3ms
Packet Loss - < 1%
Most Frequent Root Cause – The most frequent root cause of a voice or video issue is the network and not the UC software and hardware. Therefore including QoS statements in the Layer 2/3 switches and enabling QoS on the MPLS network are key to driving effective voice and video. A Root Cause Analysis (RCA), in nearly every case we have experienced, was due to a network issue and lack of QoS affect voice and video first
Class of Service and VLANs – We recommend
Prioritizing voice as a COS 1 (Class of Service 1) and Video as COS 2 based on voice frequency and bandwidth utilization
Voice and video Virtual LANs (VLANs) are required to segregate traffic from data
QoS at All Touch Points – QoS is necessary at all touch points of the network – the LAN, the WAN, vendor side routers, customer-side routers and layer 3 switches, SIP trunking, SBCs, Layer 2 and layer 3 switches on net, and all end points
Network Assessments – Network Assessments are an absolute must in order to ensure that a network is QoS-ready and free from any anomalies. Network Assessments are software tools developed to test for QoS readiness prior to going full production in a UC environment. Network Assessments are run for a 24 hour period and simulate peak voice and video traffic and provide rich feedback for network readiness. Reports are automatically generated and provide 50+ pages of detail about the network and whether the network has a pass/fail grade associated with QoS. In many cases, determining the root cause is quick and almost painless to correct; the anomaly is cited and corrected prior to going live in a full production environment. After correction, we recommend running a Network Assessment again for a green light passing grade the second time. We also recommend a temporary freeze on any change control until after Go Live, so as to minimize any possible introduction of issues beyond the Network Assessment. The interesting thing about Network Assessments is that many are no longer performed by VARs delivering on a UC implementation. In my opinion, that is short sighted, and so we purchased Network Assessments software some time back and now provide this is as a service to our clients in the event a vendor does not provide such. Network Assessments can be provided for premise, cloud, and hybrid models.
Networks Now Tied To a 99.999% (Five Nines) Model – UC is extremely network dependent, and a robust QoS-enabled redundant network is in order. UC is a centralized model and there is a great dependency on the network to provide dial tone to all end points for end points and client connectivity to all UC clients. There are, of course, local lines for failover and survivable remotes in the event an office loses a connection, but these are only used in the event of a hard network down. We recommend a minimum of 2 circuits to every site, one MPLS/Ethernet and a cable modem, and better yet two MPLS/Ethernet circuits with diverse routes to the site. With SIP trunking now front ending and centralizing all public voice traffic across the enterprise WAN, the need for uptime and QoS now makes a redundant network even more critical. You should also consider SD-WAN as a going-forward strategy for ‘load-balancing’ two circuits together, thus increasing the bandwidth by aggregating the two together, rather than using one primary and one back up as with a more established network topology.
A QoS Story
Our team was waiting to go live at a customer site for a new UC system. We had some wait time after the staff left for the weekend, and received approval from the customer to intentionally create an internal broadcast storm on the site and create a 100% fill Gb LAN. The result was no ability to send or receive email, no access to the Internet, and no access to the corporate SAN remotely. However, with QoS enabled, we were able to make and receive calls locally, long distance and over the WAN. The test proved one clear point: with QoS enabled even a 100% saturated network will still prioritize voice in front of any other traffic and will maintain the integrity of real-time communications.
It’s Worth Your Livelihood
So in the end, QoS is essential and critical to successful real-time communications for voice and video over the network. Without such, plan on a QoS failure at some point, it is just a matter of time. How vital is QoS? With a broadcast storm we have witnessed IT management lose their jobs over such an anomaly, as well as CIOs "tested" for their livelihood in front of senior management for poor call quality, dropped calls, and losing customers in the process (several told the client so). QoS is not just a nicety, it is an absolute essential requirement for any network carrying realt-time voice and video traffic, period.
If the information in this post was interesting to you and you want to learn more PM me here or email firstname.lastname@example.org