MPEG-1 Data Structures

The ISO/IEC 11172 specification defines the audio, video and multiplexing standards collectively and colloquially referred to as the MPEG-1 (Motion Picture Experts Group) compression standard. The data structures for the various components in an encoded bitstream are given in a pseudo-C syntax, and are extensively discussed. However, it is difficult to get the big picture from reading the spec. More practically, in order to parse an MPEG-1 bitstream, it is necessary to know byte offsets within each structure. To make this information more readily accessible, we have condensed it into graphic form. Of course, this is no substitute for the original spec. Where more information is required than can be squeezed into the diagram, references are provided to the spec.

The Big Picture

A multiplexed MPEG-1 stream is composed of distinct Packs. Each Pack consists of a Pack header and any number of Packets. Within those Packets is either video or audio data. These structures above the video or audio level are called the system layer. Video or audio data is divided into Packets without regard to lower-level structures -- Groups, Pictures, etc. may break across Packet boundaries. Video information is composed of individual Pictures. We will not discuss the substructures of Pictures. Pictures themselves are of three types: I (intra), P (predictive), and B (bidirectional). I Pictures are self-contained, compressing the image using Discrete Cosine Transform (DCT) processing. P Pictures use less data and are predicted from the preceding I Picture. B Pictures use the least data and are interpolated using information from surrounding P and I Pictures. Pictures are organized into Groups of (typically) 15 or so Pictures. If a Group is preceded by a Sequence header, its first Picture is called an entrypoint. Audio information is composed of Frames. We will not discuss the substructure of Frames. There are no higher-level audio structures.