Mean Opinion Score (MOS) is a ridiculously incomplete metric for network-dependent application performance -- VoIP...
or otherwise. And yet we need it desperately.
Sit 10 people down in a room, have them take a voice-over IP (VoIP) call, rank it according to their subjective impression, and then record the average score. That's MOS (approximately). And it makes some sense given the psycho-acoustic nature of the application -- the user is the best judge of conversation quality.
However, as an application, this doesn't fully describe VoIP performance. Besides the transmission of audio in real-time, VoIP involves a variety of processes, such as call setup and call recovery, that aren't specifically about voice clarity. In addition, a VoIP phone offers a range of functionalities such as:
- Call waiting
- Caller ID
- Call forwarding
- Voice mail
- Filter/block/redirect by ID
- Conference calling
- Call transfer
- Fax-over IP
- Soft (on PC) and hard sets
Back when the only use case of importance was dialing on an unconfigurable appliance and then talking, MOS might have captured the essence of the telephone. But not any longer.
So why are we using MOS so assiduously?
The reason is straight-forward -- MOS is a substantial index of network-dependent application performance that has been standardized and can be reliably reproduced (in some sense at least). Most other applications don't have anything even remotely similar. And so, that advantage for VoIP is highly appreciated. As incomplete as it is.
Why incomplete? Consider another network-dependent application like a Web browser. If its performance was reduced to a measure of how quickly it downloaded a Web page, there wouldn't be much to compare between Microsoft's Internet Explorer and Mozilla's Firefox.
In general, defining an appropriate performance metric for a particular class of application, specifically with regard to the influence of the network on it, is not a simple thing. And yet it is exceedingly important. The interaction of a particular application with a misbehaving or mis-configured network can result in severe performance degradation, while another class of network-dependent application is left completely unaffected.
Let's define some unique categories of networked applications:
- Real-time -- applications like video- and voice-over-IP that is composed primarily of asynchronous, constant, low-rate streams that are somewhat robust to loss and jitter;
- Transactional -- applications like interactive collaborative systems or distributed file systems processes that maintain some form of state at two or more locations; they rely on intensive, bursty, synchronous traffic that varies from very small amounts of data to huge exchanges and are highly sensitive to latencies;
- Data transfer -- transfers of massive amounts of data, such as for data backup and recovery, that rely on sustained one-way flows at maximum rates of transfer and are dependent on the characteristics of the end-host transmission protocols (i.e. TCP);
- Best-effort -- the majority of simple applications like e-mail, Web browsing, and remote login that are largely stateless and do not have critical requirements for network responsiveness.
As is immediately apparent, there are well-defined dependencies between these categories of applications and the networks that support them. What is not apparent are the specific use cases wherein a user will experience the effects of degraded network performance. In particular, a user's interaction with an application may strongly affect the subjective assessment of the quality of the experience.
A simple example is already well known within VoIP circles: Extreme latencies can impact conversational quality (two people engaged in a dialogue) as the network delay interferes with the natural rhythm of speaking and listening. In a monologue or purely listening context, this very same latency will have little or no effect on the experience. Subsequently, MOS has now been broken out into two variants, a measure for listening quality, MOS-LQ, and a measure for conversational quality, MOS-CQ.
So how best do we define an application's performance in terms of its interaction with the network?
We select a series of high-level categories for types of network-dependent applications, construct a network-specific metric appropriate to each, and then develop an assessment model that describes the primary qualitative feature(s). That's how we got to MOS for VoIP, albeit somewhat indirectly. Now we need to extend that approach to several more critical areas that desperately need something similar.
Did I mention that MOS was a ridiculous metric for application performance?
This article originally appeared on SearchNetworking.com.
Chief Scientist for Apparent Networks, Loki Jorgenson, PhD, has been active in computation, physics and mathematics, scientific visualization, and simulation for over 18 years. Trained in computational physics at Queen's and McGill universities, he has published in areas as diverse as philosophy, graphics, educational technologies, statistical mechanics, logic and number theory. Also, he acts as Adjunct Professor of Mathematics at Simon Fraser University where he co-founded the Center for Experimental and Constructive Mathematics (CECM). He has headed research in numerous academic projects from high-performance computing to digital publishing, working closely with private sector partners and government. At Apparent Networks Inc., Jorgenson leads network research in high performance, wireless, VoIP and other application performance, typically through practical collaboration with academic organizations and other thought leaders such as BCnet, Texas A&M, CANARIE, and Internet2. www.apparentnetworks.com