Dr. Thomas Wiegand, arguably the world's foremost authority on video compression and one of the chairmen of the Joint Video Committee responsible for the H.264 standard for video compression, spoke with us about what the recent adoption of the Scalable Video Coding extension to H.264 means for video conferencing.
Traditional video communication uses H.264/AVC. So why is it that traditional video conferencing based on H.264/AVC was not considered to be efficient enough?
Traditional video coding using H.264/AVC is very sensitive to transmission errors since errors are typically visible for some period due to error propagation within the video. Mitigating this is very costly and typically requires a sudden increase in bit rate to stop error propagation. However, since most errors in Internet transmissions are caused by congestion, increasing the bit rate is not the right way to stop that. Also, H.264/AVC bitstream protection through forward error correction (FEC) or automatic repeat request (ARQ) has so far not been shown to work well due to the significant bit rate overhead and the associated delay. Consequently, dedicated lines offering high Quality of Service are often used. But because dedicated networks are typically constant bit rate (CBR) and are very costly, the bit rates are generally kept as low as possible.
All these considerations apply to point–to–point as well as multipoint transmissions. In the latter case, however, the problems are aggravated since the CBR constraints of multiple transmission lines need to be considered – and that often results in people running their systems at the lowest common denominator CBR. That means the transmission rate of the transmission channel with the lowest rate effectively becomes the maximum rate everybody else can use.
So what constitutes a more efficient system for video conferencing over general-purpose IP networks?
An efficient system architecture for video conferencing over general–purpose IP networks has to look very similar to the rest of the Internet – that is, with little processing being required inside the network and instead being handled out at the edge of the network, and with the network itself being a best–effort network. That means the video encoder and decoder at the endpoints should do almost all the processing, with the media routers in the network left to do only lightweight packet operations with practically no delay. And, of course, all transmissions should actually run over the general–purpose Internet. Such a system should also share other Internet properties, including low cost, high efficiency and scaling characteristics. Therefore, SVC–based video conferencing and the Internet are a perfect match.
It´s also worth noting that traditional video conferencing architectures are the complete opposite of the Internet architecture. They place computationally heavyweight transcoding MCUs inside the network and also require dedicated lines.