Multiplexing H.264 Video With AAC Audio Bit Streams, Demultiplexing And Achieving Lip Synchronization During Playback

Multiplexing H.264 Video With AAC Audio Bit Streams, Demultiplexing And Achieving Lip Synchronization During Playback

Date

2007-08-23T01:56:51Z

Publisher

Electrical Engineering

Abstract

H.264, MPEG-4 part-10 or AVC [5], is the latest digital video codec standard which has proven to be superior than any early standards in terms of compression ratio, quality, bit rates and error resilience. However, the standard just defines a video codec and has no mention of any audio compression. In order to have a meaningful delivery of the video to the end user, we need to associate an audio stream along with it. AAC (advanced audio coding) [1] is the latest digital audio codec standard defined in MPEG-2 and later in MPEG-4 with few changes. The audio quality of an AAC stream is observed to be better than both MP3 and AC3, which were widely used as the audio coding standard in various applications, at lower bit rates. Adopting H.264 as video codec and AAC as audio codec, for transmission of digital multimedia through air (ATSC, DVB) or through the internet (video streaming, IPTV), facilitates us to take advantage of the leading technologies in both audio and video. However, for these applications, treatment of video and audio as separate streams requires us to multiplex the two in order to create a single bit stream for transmission. The objective of the thesis is to propose a method for effectively multiplexing the streams for transmission and at the receiving end, demultiplexing the streams and achieve lip sync between the audio and video when the two streams are played back. The proposed method takes advantage of the frame wise arrangement of data in both audio and video codecs. The audio and video frames are used as the first layer of packetization. The frame numbers of the audio and video data blocks are used as the reference for aligning the streams in order to achieve lip sync. The synchronizing information is embedded in the headers of the first layer of packetization. Then second layer of packetization is carried out from the first layer in order to meet the various requirements of transmission channels. Adopted method uses playback time as the criteria for allocating data packets during multiplexing in order to prevent buffer overflow or underflow at the demultiplexer end. More information is embedded into the headers to ensure effective and fast demultiplexing process and to detect and correct errors. Advantages and limitations of the proposed method are discussed in detail.