Introduction This document explains voice codec bandwidth calculations and features to modify or conserve bandwidth when Voice over IP VoIP is used. For example, the G. Codec Sample Interval ms This is the sample interval at which the codec operates. With MOS, a wide range of listeners judge the quality of a voice sample on a scale of one bad to five excellent. The scores are averaged in order to provide the MOS for the codec.
Voice Payload Size Bytes The voice payload size represents the number of bytes or bits that are filled into a packet. The voice payload size must be a multiple of the codec sample size. For example, G. Voice Payload Size ms The voice payload size can also be represented in terms of the codec samples.
For example, a G. For example, for a G. Available settings: 30 and 60 ms. For example: G. The new command syntax follows: Cisco-Router config-dial-peer codec gr8 bytes? Each codec sample produces 10 bytes of voice payload.
Valid sizes are: 10, 20, 30, 40, 50, 60, 70, 80, 90, , , , , , , , , , , , , , Any other value within the range will be rounded down to nearest valid size.
This example illustrates this: G. Voice Activity Detection With circuit-switched voice networks, all voice calls use 64 Kbps fixed-bandwidth links regardless of how much of the conversation is speech and how much is silence.
The exact heuristics used at present in order to detect RTP packets for compression are: The destination port number is even. The destination port number is in the range or The RTP version field is set to two. The RTP extension field is set to zero.
Contributed by Cisco Engineers. Was this Document Helpful? For the encoder speech sender to use frame interleaving in its outbound RTP packets for a given session, the decoder speech receiver needs to indicate its support via out-of-band means see Section 8.
Bandwidth-Efficient or Octet-Aligned Mode For a given session, the payload format can be either bandwidth efficient or octet aligned, depending on the mode of operation that is established for the session via out-of-band means.
In the octet-aligned format, all the fields in a payload, including payload header, table of contents entries, and speech frames themselves, are individually aligned to octet boundaries to make implementations efficient. In the bandwidth-efficient format, only the full payload is octet aligned, so fewer padding bits are added.
Note, octet alignment of a field or payload means that the last octet is padded with zeroes in the least significant bits to fill the octet. Also note that this padding is separate from padding indicated by the P bit in the RTP header. Between the two operation modes, only the octet-aligned mode has the capability to use the robust sorting, interleaving, and frame CRC to make the speech transport more robust to packet loss and bit errors.
This payload format is expected to be useful for both conversational and streaming services. Low delay is one very important factor, i. Low overhead is also required when the payload format traverses low bandwidth links, especially as the frequency of packets will be high.
For low bandwidth links, it is also an advantage to support UED, which allows a link provider to reduce delay and packet loss, or to reduce the utilization of link resources. A streaming service has less strict real-time requirements and therefore can use a larger number of frame-blocks per packet than a conversational service. However, including several frame-blocks per packet makes the transmission more vulnerable to packet loss, so interleaving may be used to reduce the effect that packet loss will have on speech quality.
A streaming server handling a large number of clients also needs a payload format that requires as few resources as possible when doing packetization. The octet-aligned and interleaving modes require the least amount of resources, while CRC, robust sorting, and bandwidth-efficient modes have higher demands. This is specified in Section 4. AMR's capability to do fast mode switching is exploited in some non- IP networks to optimize speech quality.
To preserve this functionality in scenarios including a gateway to an IP network, a codec mode request CMR field is needed. The gateway can alternatively set a lower CMR value, if desired, as one means to control congestion on the IP network. The details of the control algorithm are left to the implementation. The only differences are in the types of codec frames contained in the payload.
The payload format consists of the RTP header, payload header, and payload data. This payload format uses the fields of the header in a manner consistent with that specification. The RTP timestamp corresponds to the sampling instant of the first sample encoded for the first frame-block in the packet.
The timestamp clock frequency is the same as the sampling frequency, so the timestamp unit is in samples. For AMR, the sampling frequency is 8 kHz, corresponding to encoded speech samples per frame from each channel. A packet may contain multiple frame-blocks of encoded speech or comfort noise parameters. If interleaving is employed, the frame- blocks encapsulated into a payload are picked according to the interleaving rules as defined in Section 4.
Otherwise, each packet covers a period of one or more contiguous 20 ms frame-block intervals. In case the data from all the channels for a particular frame-block in the period is missing for example, at a gateway from some other transport format , it is possible to indicate that no data is present for that frame-block rather than breaking a multi-frame- block packet into two, as explained in Section 4.
To allow for error resiliency through redundant transmission, the periods covered by multiple packets MAY overlap in time. A receiver MUST be prepared to receive any speech frame multiple times, in exact duplicates, in different AMR rate modes, or with data present in one packet and not present in another. The payload length is always made an integral number of octets by padding with zero bits if necessary.
If additional padding is required to bring the payload length to a larger multiple of octets or for some other purpose, then the P bit in the RTP in the header may be set and padding appended as specified in [ 8 ].
The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile under which this payload format is being used will assign a payload type for this encoding or specify that the payload type is to be bound dynamically. Payload Structure The complete payload consists of a payload header, a payload table of contents, and speech data representing one or more speech frame- blocks.
The following sections describe the variations taken by the payload format depending on whether the AMR session is set up to use the bandwidth-efficient mode or octet-aligned mode and any of the OPTIONAL functions for robust sorting, interleaving, and frame CRCs.
Bandwidth-Efficient Mode 4. The value of the CMR field is set to the frame type index of the corresponding speech mode being requested. CMR value 15 indicates that no mode request is present, and other values are for future use. The codec mode request received in the CMR field is valid until the next codec mode request is received, i. Therefore, if a terminal continuously wishes to receive frames in the Sjoberg, et al.
In a multi-channel session, the codec mode request SHOULD be interpreted by the receiver of the payload as the desired encoding mode for all the channels in the session. That may include adjusting the codec mode, but also includes adjusting the level of redundancy or number of frames per packet.
The codec mode selection MAY be restricted by a session parameter to a subset of the available modes. This is to avoid the loss of data synchronization in the depacketization process, which can result in a huge degradation in speech quality. The extra comfort noise frame types specified in table 1a in [ 2 ] i. Q 1 bit : Frame quality indicator. The frame quality indicator enables damaged frames to be forwarded to the speech decoder for error concealment.
This can improve the speech quality more than dropping the damaged frames. See Section 4. For multi-channel sessions, the ToC entries of all frames from a frame-block are placed in the ToC in consecutive order as defined in Section 4.
When multiple frame-blocks are present in a packet in bandwidth-efficient mode, they will be placed in the packet in order of their creation time. The following figure shows an example of a ToC of three entries in a single-channel session using bandwidth-efficient mode.
Speech Data Speech data of a payload contains zero or more speech frames or comfort noise frames, as described in the ToC of the payload. The length of the speech frame is implicitly defined by the mode indicated in the FT field. As specified there, the bits of speech frames have been rearranged in order of decreasing sensitivity, while the bits of comfort noise frames are in the order produced by the encoder.
The resulting bit sequence for a frame of length K bits is denoted d 0 , d 1 , Algorithm for Forming the Payload The complete RTP payload in bandwidth-efficient mode is formed by packing bits from the payload header, table of contents, and speech frames in order as defined by their corresponding ToC entries in the ToC list , and to bring the payload to octet alignment, 0 to 7 padding bits. They are packed contiguously into octets beginning with the most significant bits of the fields and the octets.
To be precise, the four-bit payload header is packed into the first octet of the payload with bit 0 of the payload header in the most significant bit of the octet. The four most significant bits numbered of the first ToC entry are packed into the least significant bits of the octet, ending with bit 3 in the least significant bit. Packing continues in the second octet with bit 4 of the first ToC entry in the most significant bit of the octet.
If more than one frame is contained in the payload, then packing continues with the second and successive ToC entries. Bit 0 of the first data frame follows immediately after the last ToC bit, proceeding through all the bits of the frame in numerical order. Bits from any successive frames follow contiguously in numerical order for each frame and in consecutive order of the frames. Payload Examples 4. Single-Channel Payload Carrying a Single Frame The following diagram shows a bandwidth-efficient AMR payload from a single-channel session carrying a single speech frame-block.
The encoded speech bits, d 0 to d , are arranged in descending sensitivity order according to [ 2 ]. Finally, two padding bits P are added to the end as padding to make the payload octet aligned. The first frame is a speech frame at 6. The fourth frame in the payload is a speech frame at 8. As shown below, the payload carries a mode request for the encoder on the receiver's side to change its future coding mode to AMR-WB 8. The encoded speech and SID bits, d 0 to d , g 0 to g 39 , and h 0 to h , are arranged in the payload in descending sensitivity order according to [ 4 ].
Note, no speech bits are present for the third frame. Finally, seven zero bits are padded to the end to make the payload octet aligned. Multi-Channel Payload Carrying Multiple Frames The following diagram shows a two-channel payload carrying 3 frame- blocks, i.
In the payload, all speech frames contain the same mode 7. The CMR is set to 15, i. The two channels are defined as left L and right R in that order. The encoded speech bits is designated dXY Exemplifying this, for frame-block 1 of the left channel, the encoded bits are designated as d1L 0 to d1L Octet-Aligned Mode 4. R: is a reserved bit that MUST be set to zero. Interleaving MUST be performed on a frame-block basis i. The following example illustrates the arrangement of speech frame- blocks in an interleaving group during an interleaving session.
We also assume that the first payload packet of the interleaving group is s, and the number of speech frame-blocks carried in each payload is N. Then we will have: Sjoberg, et al. There will be no interleaving effect unless the number of frame- blocks per packet N is at least 2. The sender of the payload MUST only apply interleaving if the receiver has signalled its use through out-of-band means. Instead, the presence and order of the frame-blocks in a packet will follow the pattern described in 4.
The following example shows the ToC of three consecutive packets, each carrying three frame-blocks, in an interleaved two-channel session. Here, the two channels are left L and right R with L coming before R, and the interleaving length is 3 i.
This results in the interleaving group size of 9 frame-blocks. FT 4 bits, unsigned integer : see definition in Section 4.
Q 1 bit : see definition in Section 4. It only exists if the use of CRC is signalled out-of-band for the session. When present, each CRC in the list is 8 bits long and corresponds to a speech frame NOT a frame- block carried in the payload. Calculation and use of the CRC is specified in the next section. This section provides more details on how to use the frame CRC in the octet-aligned payload header together with a partial transport layer checksum to achieve UED.
Note, the number of class A bits for various coding modes in AMR codec is specified as informative in [ 2 ] and is therefore copied into Table 1 in Section 3. The receiver of the payload SHOULD examine the data integrity of the received class A bits by re-calculating the CRC over the received class A bits and comparing the result to the value found in the received payload header.
See [ 6 ] and [ 7 ] more details. In binary form, the polynomial appears as follows: MSB.. The CRC Sjoberg, et al. This operation is repeated for each bit that the CRC should cover. In this case, the first bit would be d 0 for the speech frame for which the CRC should cover. When the last bit e. Speech Data In octet-aligned mode, speech data is carried in a similar way to that in the bandwidth-efficient mode as discussed in Section 4. The padding bits MUST be ignored on reception.
In other words, each speech frame MUST be octet-aligned. Since the bits within each frame are ordered with the most error-sensitive bits first, interleaving the octets collects those sensitive bits from all frames to be nearer the beginning of the packet.
The details of assembling the payload are given in the next section. The use of robust sorting order for a payload type MUST be agreed via out-of-band means. Section 8 specifies a media type parameter for this purpose. Note, robust sorting order MUST only be performed on the frame level and thus is independent of interleaving, which is at the frame-block level, as described in Section 4.
In other words, robust sorting can be applied to either non-interleaved or interleaved payload types. Methods for Forming the Payload Two different packetization methods, namely, normal order and robust sorting order, exist for forming a payload in octet-aligned mode.
In both cases, the payload header and table of contents are packed into the payload the same way; the difference is in the packing of the speech frames. The payload begins with the payload header of one octet, or two octets if frame interleaving is selected.
The payload header is followed by the table of contents consisting of a list of one-octet ToC entries. The speech data follows the table of contents, or the CRCs if present. For packetization in the normal order, all of the octets comprising a speech frame are appended to the payload as a unit. The E Model uses a computational method that includes factors such as noise, signal level, loudness ratings, impairments, delay, codec type, and even network type to derive a quality score.
Scoring includes consideration for the type of subjective test used for scoring. For IP networks, the score assumes ideal conditions outside the IP cloud and bases the scores on the relevant IP impairments such as packet loss, latency, jitter, and even when these impairments occur over the duration of the call. Buyer's Guide Item No. Number of bits per second which needs to be transmitted to deliver a voice call. We can calculate bit-rate as follows: For G.
The voice payload size represents the number of bytes or bits that are filled into a packet.
0コメント