|
|
||||||||||||||||||||
| Home > Learning Guide: How does VoIP work? | |
| Learning Guide: |
|
||
Figure 2 shows a simplified block diagram of VoIP operation from an analog signal deriving from a standard telephone, which is digitized and transmitted over the Internet via a conversion device. Then, at the distant end, it is converted back to analog telephony using a similar device suitable for input to a standard telephone. A "gateway" is placed between the voice codec and the digital data transport circuit. An identical device will also be found at the far end of the link. This equipment carries out the signaling role on a telephone call, among other functions. Moving from left to right in Figure 2, we have the spurty analog voice signal developed by the standard telephone. The signal is then converted to a digital counterpart using one of the seven or so codecs from which the VoIP system designer has to select. Some of the more popular codecs for this application are listed in Table 1. The binary output of the codec is then applied to a conversion device (i.e., a packetizer) that loads these binary 1's and 0's into an IP payload between 20 and 40 octets in length.
Figure 2. Elements of basic operation of VoIP, where the input signal derives from a conventional analog telephone The output of this converter consists of IP packets* that are transmitted on the Web or other data circuit for delivery to the distant end. At the far end, the IP packets are input to a converter that strips off the IP header, stores the payload, and then releases it in a constant bit stream to a codec. Of course, this codec must be compatible with its near-end counterpart. The codec converts the digital bit stream back to an analog signal that is input to a standard telephone. The insightful reader will comment that many steps of translation and interface have been left out. Most of these considerations will be covered in Section 3.1, in our discussion of the "gateway." *The output may be ATM cells (See Chapter 16, Telecommunication System Engineering, 4th ed, Wiley, N.Y., 2004) if the intervening network is an ATM network. VoIP gateway Figure 3. A media gateway, from one perspective. API = applications program interface. From IEC online (Jan. 2003) Media gateways are part of the physical transport layer. They are regulated by a call control function housed in a media gateway controller. A media gateway, with its associated gateway controller, is necessary for the network transformation to packetized voice. Several of the media gateway functions are listed below:
Notes: The most powerful gateway supports the PSTN, requiring a high-reliability device to meet the PSTN availability requirements. It will be required to process many thousands of digital circuits. As shown in Figure 3, it has a network management capability most often based on simple network management protocol (SNMP -- see Chapter 21 of Telecommunication System Engineering, 4th ed., Wiley, N.Y.). A somewhat less formidable gateway is employed to provide VoIP for small and medium-sized businesses. Some texts call this type of gateway an "integrated access device" (IAD) if it can handle data and video products as well. An IAD will probably be remotely configurable. The least powerful and most economic gateways are residential. They can be deployed in at least five settings:
Figure 4 shows gateway interface functions via a block diagram. On the left are time slots of a PCM bit stream (T1, in this case). The various signal functions are shown to develop a stream of data packets carrying voice or data. The output on the right consists of IP packets.
An IP packet, as used for VoIP
In the case of G.711, standard PSTN PCM, there may be a transmission rate of 100 packets per second with 80 bytes in the payload of each packet. Of course, our arithmetic comes out just right and we get 8,000 samples per second, the Nyquist sampling rate for a 4 KHz analog voice channel. Another transmission rate for G.711 is 50 packets per second, where each packet will have 160 bytes, again achieving 8,000 samples per second per voice channel. The total raw bytes per channel come out as follows: Layers 3 and 4 overhead (IP): 40 bytes plus 8 bytes for Layer 2 (link layer) overhead. So we add 48 to 80 or 160 bytes and get 128 or 208 bytes for a raw packet. The efficiency is nothing to write home about. Keep in mind that the primary concern of the VoIP designer is delay.
The delay tradeoff
The delay objective -- one-way -- for a VoIP voice connectivity is less than 100 msec. With bridging for conference calls, that value doubles, owing to the very nature of bridging.
One way to speed things up is to increase the bit rate per voice data stream. To do this, the aggregate bit rate may have to be increased or the number of voice streams may be reduced on the aggregate bit rate so that each stream can be transmitted at a faster rate.
Lost packet rate
For example, Section 3.3 described a de-jitterizing buffer. It has a finite size. Once the time is exceeded by a late packet, the packet in question is lost. In the case of G.711, this would be the time equivalent to 16 or 26 msec -- duration of a packet including its header. Another cause of packet loss may be excessive error rate on a packet, whereby it is deleted. When the lost or discarded packet rate begins to exceed 10%, quality of voice starts to deteriorate. If high-compression algorithms -- such as G.723 or G.729 -- are employed, it is desirable to maintain the packet loss rate below 1%. Router buffer overflow is another source of packet loss. IP through TCP has excellent retransmission capabilities for erred frames or packets, but they are not practical for VoIP because of the additional delay involved. When there is a packet in error, the receiving end of the link transmits a request (RQ) to the transmitting end for a packet retransmission and its incumbent propagation delay. This must be added to the transmission delay with some processing delay to send the offending packet back to the receiver again.
Concealment of lost packets
There are packet loss concealment (PLC) procedures that can camouflage gaps in the output voice signal. The simplest techniques require a little extra processing power, and the most sophisticated techniques can restore speech to a level approximating the quality of the original signal. Concealment techniques are most effective for about 40 to 60 msec of missing speech. Gaps longer than 80 msec usually have to be muted. One of the most elementary PLCs simply smooths the edges of gaps to eliminate audible clicks. A more advanced algorithm replays the previous packet in place of the lost one, but this can cause harmonic artifacts such as tones or beeps. Good concealment methods use variation in the synthesized replacement speech to make the output more like natural speech. There are better PLCs that preserve the spectral characteristics of the talker's voice and maintain a smooth transition between estimated signal and surrounding original. The most sophisticated PLCs use CELP (codebook-excited linear predictive) or a similar technique to determine the content of the missing packet by examining the previous one (Ref. 11). Lost packets can be detected by packet sequence numbering.
Echo and echo control
Return to the How does VoIP work? main page.
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||