User:Ryan Cooley/MPEG1: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Rcooley
m (6)
imported>Rcooley
(7 getting difficult)
Line 41: Line 41:
Part 2 of the MPEG-1 standard covers video and is defined in [[ISO/IEC 11172-2]]   
Part 2 of the MPEG-1 standard covers video and is defined in [[ISO/IEC 11172-2]]   


== D-frames ==
=== D-frames ===


MPEG-1 has a unique frame type not found in later video standards.  D-frames or DC-pictures are independent images (intra-frames) that have been encoded DC-only (AC coefficients are removed) and hence are very low quality.  D-frames are never used/referenced by I, P or B frames.  D-frames are only useful for fast previews of video, for instance when seeking through a video at high speed.
MPEG-1 has a unique frame type not found in later video standards.  D-frames or DC-pictures are independent images (intra-frames) that have been encoded DC-only (AC coefficients are removed) and hence are very low quality.  D-frames are never used/referenced by I, P or B frames.  D-frames are only useful for fast previews of video, for instance when seeking through a video at high speed.
Line 49: Line 49:
=== DCT ===
=== DCT ===


Each 8x8 macroblock is encoded using the ''Forward'' Discrete Cosign Transform ([[FDCT]]).  This process by itself is lossless, and is reversed by the ''Inverse'' DCT ([[IDCT]]) upon playback to produce the original values.   
Each 8x8 macroblock is encoded using the ''Forward'' Discrete Cosign Transform ([[DCT|FDCT]]).  This process by itself is lossless, and is reversed by the ''Inverse'' DCT ([[IDCT]]) upon playback to produce the original values.   


The FDCT process converts the 64 uncompressed pixel values (brightness) into 64 different ''frequency'' values.  One large value that is average of the entire 8x8 block (the '''DC coefficient''') and 63 smaller, positive or negative values (the '''AC coefficients'''), that are relative to the value of the DC coefficient.   
The FDCT process converts the 64 uncompressed pixel values (brightness) into 64 different ''frequency'' values.  One large value that is average of the entire 8x8 block (the '''DC coefficient''') and 63 smaller, positive or negative values (the '''AC coefficients'''), that are relative to the value of the DC coefficient.   


The (large) DC coefficient remains mostly consistent from one block to the next, and can so can be compressed quite effectively.  A significant number of the AC coefficients will be near 0, which can then be more efficiently compressed in a later step.  Additionally, the frequency conversion is necessary for the quantization step.  
The (large) DC coefficient remains mostly consistent from one block to the next, and can be compressed quite effectively with [[DPCM]], so only the amount of difference between each DC value needs to be stored.  A significant number of the AC coefficients will be near 0, which can then be more efficiently compressed in a later step.  Additionally, the frequency conversion is necessary for the quantization step.  


=== Quantization ===
=== Quantization ===
Line 65: Line 65:
=== Lossless Data Compression ===
=== Lossless Data Compression ===
Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed on decoding to produce exactly the same values.  Since these lossless data compression steps don't add noise into or otherwise change the video (unlike quantization), it is often called [[noiseless coding]].  
Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed on decoding to produce exactly the same values.  Since these lossless data compression steps don't add noise into or otherwise change the video (unlike quantization), it is often called [[noiseless coding]].  
=== Huffman Coding ===
After perceptual coding, the data is analyzed to find strings that repeat often.  Those strings are then put into a special table, with the most frequently repeating data assigned the shortest code to keep the data as small as possible.
Once the table is constructed, those strings are in the data are replaced with their (much smaller) codes, which references the appropriate entry in the table.


=== RLE ===
=== RLE ===
[[Run-length encoding]] (RLE) is a very simple method of compressing repetition.  Given a string of "333333333" RLE would replace it with the values "3,9" simply telling the decoder to replace it with "333333333".  RLE is very effective after quantization, as a significant number of the AC coefficients are zero, and can be represented in the file with just a couple bytes.
[[Run-length encoding]] (RLE) is a very simple method of compressing repetition.  A sequential string of characters, no matter how long, can be replaced with a few bytes, noting the value that repeats, and how many times.


=== Huffman coding ===
RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero, and can be represented with just a couple bytes (in a special 2-dimensional Huffman table that codes the run-length and the ending character).
The data is then analyzed to look for strings that repeat often.  Those strings are then put into a table.  Wherever those strings are found in the data, they are replaced by a (much smaller) reference to the location in the table.




Line 100: Line 104:
     Two MV per macroblock (forward/backward pred)
     Two MV per macroblock (forward/backward pred)
     Prediction error
     Prediction error
    DPCM encoded, just like DC coeffs
     Blockiness
     Blockiness
   CBR/VBR
   CBR/VBR
Line 111: Line 116:


MPEG-1 audio utilizes perceptual masking with sub-band coding with a polyphased filter bank to reduce the bitrate of the audio stream.  It has been shown to be particularly efficient on high quality percussive sounds (impulses) thanks to the very effective time-domain concealment characteristics of its 32 sub-band [[polyphased filter bank]].
MPEG-1 audio utilizes perceptual masking with sub-band coding with a polyphased filter bank to reduce the bitrate of the audio stream.  It has been shown to be particularly efficient on high quality percussive sounds (impulses) thanks to the very effective time-domain concealment characteristics of its 32 sub-band [[polyphased filter bank]].
[[Channel Encoding]]:
*Mono
*Joint Stereo (impulse encoded)
*Stereo
*Dual (two uncorrelated mono channels)


*[[Sampling rate]]s: 32, 44.1 and 48 kHz
*[[Sampling rate]]s: 32, 44.1 and 48 kHz
*[[Bitrate]]s: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s
*[[Bitrate]]s: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s


"digital frames of 1152 sampling intervals"
   mono, stereo, joint stereo (impulse, m/s), dual.*
 
   mono, stereo, joint stereo (impulse, m/s), dual.
   efficient time-domain concealment characteristics  
   efficient time-domain concealment characteristics  


Line 124: Line 133:
MPEG-1 Layer I is nothing more than a simplified version of Layer II, designed for low-delay and low complexity to facilitate [[real-time]] encoding on the hardware available in 1990 for applications like teleconferencing and studio editing.  With the substantial performance improvements in digital processing since, it has now been long obsolete.
MPEG-1 Layer I is nothing more than a simplified version of Layer II, designed for low-delay and low complexity to facilitate [[real-time]] encoding on the hardware available in 1990 for applications like teleconferencing and studio editing.  With the substantial performance improvements in digital processing since, it has now been long obsolete.


It saw limited adoption in it's time, and most notably was used on the defunct [[Digital Compact Cassette]].  Layer I audio files will most often use the extension '''.mp1'''
It saw limited adoption in it's time, and most notably was used on the defunct [[Digital Compact Cassette]] at 384 kbps.  Layer I audio files use the extension '''.mp1'''


=== Layer II ===
=== Layer II ===


Despite some 20 years of progress in the field of digital audio coding, MP2 remains the preeminent lossy audio coding standard due to its especially high audio coding performances on highly critical audio material such as castanet, symphonic orchestra, male and female voices and particularly high quality percussive sounds (impulses) like triangle and glockenspiel.  Testing has shown MP2 to be equivalent or superior to than much more recent audio codecs, such as [[Dolby Digital]] AC-3. <ref>Wustenhagen et al, ''Subjective Listening Test of Multi-channel Audio Codecs'', AES 105th Convention Paper 4813, San Francisco 1998</ref><!--MP2 scored the same as AC-3, despite using an inferior matrixed mode for 5.1 surround-->  
Despite some 20 years of progress in the field of digital audio coding, MP2 remains the preeminent lossy audio coding standard due to its especially high audio coding performances on highly critical audio material such as castanet, symphonic orchestra, male and female voices and particularly high quality percussive sounds (impulses) like triangle and glockenspiel.  Testing has shown MP2 to be equivalent or superior to than much more recent audio codecs, such as [[Dolby Digital]] AC-3. <ref>Wustenhagen et al, ''Subjective Listening Test of Multi-channel Audio Codecs'', AES 105th Convention Paper 4813, San Francisco 1998</ref><!--MP2 scored the same as AC-3, despite using an inferior matrixed mode for 5.1 surround-->  


Subjective audio testing by experts, in the most critical conditions ever implemented, have shown MP2 to offer transparent audio compression at 256kbps for 16-bit 44.1khz [[CD]] audio. <ref>http://www.faqs.org/faqs/mpeg-faq/part1/ "You can compress the same stereo program down to 256 Kbits/s with no loss in discernable quality." (the original papers would be much, much better refs, but I can't seem to find them! This just proves they exist!)</ref>  That (approx) 1:6 compression ratio for CD audio is particularly impressive since it's quite close to upper theoretical limit of [[Perceptual Entropy]], at just over 1:8. <ref>J. Johnston, ''Estimation of Perceptual Entropy Using Noise Masking Criteria,'' in Proc. ICASSP-88, pp. 2524-2527, May 1988.</ref>
Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offer transparent audio compression at 256kbps for 16-bit 44.1khz [[CD]] audio. <ref>http://www.faqs.org/faqs/mpeg-faq/part1/ "You can compress the same stereo program down to 256 Kbits/s with no loss in discernable quality." (the original papers would be much, much better refs, but I can't seem to find them! This just proves they exist!)</ref>  That (approx) 1:6 compression ratio for CD audio is particularly impressive since it's quite close to upper theoretical limit of [[Perceptual Entropy]], at just over 1:8. <ref>J. Johnston, ''Estimation of Perceptual Entropy Using Noise Masking Criteria,'' in Proc. ICASSP-88, pp. 2524-2527, May 1988.</ref>
<ref>6. J. Johnston, ''Transform Coding of Audio Signals Using Perceptual Noise Criteria,'' IEEE J. Sel. Areas in Comm., pp. 314-323, Feb. 1988.</ref>
<ref>6. J. Johnston, ''Transform Coding of Audio Signals Using Perceptual Noise Criteria,'' IEEE J. Sel. Areas in Comm., pp. 314-323, Feb. 1988.</ref>
Achieving much higher compression is simply not possible without discarding some perceptible information.  
Achieving much higher compression is simply not possible without discarding some perceptible information.  


   audio broadcasting
   audio broadcasting
   error resilient
   error resilient
   Musicam
   Musicam
  32 sub-bands
  Exceeds MP3 somewhere between 192-256 kbps




=== Layer III/MP3 ===
=== Layer III/MP3 ===
The hybrid ([[filter bank]] + MDCT) design of MP3 imposes some limitations as well.  It causes a factor of 12 - 36 times worse temporal resolution than MP2, which can cause artifacts due to transients sounds like percussive events.  This results in audible smearing and pre-echo. <ref>http://www.cs.columbia.edu/~coms6181/slides/6R/mpegaud.pdf pp.8</ref>  Because of these issues, MP2 sound quality is actually superior to MP3 above 112 kbps/channel <!--uncited facts are bad, but it's still true. I just can't yet find a source that EXPLICITLY says this part is so.  It can be clearly inferred from a combination of a few other citations.-->
  "The combination of the two filter banks creates aliasing issues that are only partially handled by the "aliasing compensation" stage, but that create excess energy to be coded in the frequency domain, thereby decreasing coding efficiency.  Frequency resolution is limited by the small long block window size, decreasing coding efficiency
No scale factor band for frequencies above 15.5/15.8 kHz"


   9 months?
   9 months?
   ASPEC (Fraunhoffer)  
   ASPEC (Fraunhoffer)  
   freq transform encoder  
   freq transform encoder  
   entropy coding
   entropy coding (Huffman)
   Hybrid MDCT
   Hybrid filtering
    MDCT (overlapping DCT)
     pre-echo worse
     pre-echo worse
     aliasing issues
     aliasing issues
  "aliasing compensation"
    "aliasing compensation"?
   mid/side (or impulse) joint stereo  
   mid/side (or impulse) joint stereo  
   576 frequency components
   576 frequency components

Revision as of 18:27, 19 March 2008

MPEG-1 articles (MPEG-1, MP1, MP2, MP3) on wikipedia are complete crap. Disorganized, slanted, incomplete, misconstrued, etc. It's far easier to start from scratch than try to fix all the individual existing ones, and will give far better end results; I will use some small bits of content from the existing articles.

Do not make any changes to this page for now. This is my mind-dump and accommodating others before I'm done will just make much, much more work for me. Put any suggestions on the Talk page, and I will eventually address them. -RC


MPEG-1 was an early standard for lossy compression of video and audio. It was designed to compress raw video and CD audio to 1.5Mb/s without discernible quality loss, making Video CDs and Digital Video Broadcasting possible.

Perhaps the most well-known part of the MPEG-1 standard today is the MP3 audio format it introduced.

The MPEG-1 standard is published as ISO/IEC 11172.

History

Modeled on the successful collaborative approach and the compression technologies developed by the Joint Photographics Expert Group and CCITT's Experts Group on Telephony (creators of the JPEG image compression standard and the H.261 standard for video conferencing over ISDN lines respectively) the MPEG working group was established in January 1988. MPEG was formed to address the need for standard video and audio encoding formats, and build on H.261 to get better quality through the use of more complex (non-realtime) encoding methods. [1]

Development of the MPEG-1 standard began in May 1988. 14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at (combined video+audio) data rates of 1.5Mbps. The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated. [2]

After 20 meetings of the full group in various cities around the world, and 4 1/2 years of development and testing, (a draft standard was produced September 1990, and only minor changes were introduced) the final standard was approved in early November 1992. [3] Before the MPEG-1 standard had even been finalized, work began on a second standard, MPEG-2, intended to extend MPEG-1 technology to provide full broadcast-quality at high bitrates (3 - 15 Mbps), and support for interlaced video. [4] Due in part to the similarity between the two codecs, the MPEG-2 standard included full backwards compatibility with MPEG-1 video.

Today, MPEG-1 is by far the most widely compatible lossy audio/video format in the world. Due to its age, most patents on MPEG-1 Video and Layer II audio technology have expired (MP3 being a notable exception), and can be implemented without payment of license fees in almost all countries. Most computer software for video playback includes MPEG-1 decoding, in addition to any other supported formats. The immense popularity of MP3 audio has established a massive installed base of hardware that can playback all 3 layers of MPEG-1 audio. The widespread popularity of MPEG-2 (mostly with broadcasters) means MPEG-1 is playable by most digital cable/satellite set-top-boxes, and digital disc and tape players.

Notably, the MPEG-1 standard very strictly defines the bitstream, and decoder function, but does not define how MPEG-1 encoding is to be performed (although they did provide a reference implementation). This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors.


Application

 VCD players
 DVB
 DAB
 MP3
 MPEG-2?
 audio:
 SVCD
 DVD players (not surround)
 ATSC/HDTV (failed)

Video

Part 2 of the MPEG-1 standard covers video and is defined in ISO/IEC 11172-2

D-frames

MPEG-1 has a unique frame type not found in later video standards. D-frames or DC-pictures are independent images (intra-frames) that have been encoded DC-only (AC coefficients are removed) and hence are very low quality. D-frames are never used/referenced by I, P or B frames. D-frames are only useful for fast previews of video, for instance when seeking through a video at high speed.

Given moderately higher performance decoding equipment, this feature can be approximated by processing I-frames, and discarding the AC coefficients before display.

DCT

Each 8x8 macroblock is encoded using the Forward Discrete Cosign Transform (FDCT). This process by itself is lossless, and is reversed by the Inverse DCT (IDCT) upon playback to produce the original values.

The FDCT process converts the 64 uncompressed pixel values (brightness) into 64 different frequency values. One large value that is average of the entire 8x8 block (the DC coefficient) and 63 smaller, positive or negative values (the AC coefficients), that are relative to the value of the DC coefficient.

The (large) DC coefficient remains mostly consistent from one block to the next, and can be compressed quite effectively with DPCM, so only the amount of difference between each DC value needs to be stored. A significant number of the AC coefficients will be near 0, which can then be more efficiently compressed in a later step. Additionally, the frequency conversion is necessary for the quantization step.

Quantization

A quantization table is a string of 64-numbers (0-255) that tells the encoder what visual information is most important, and which is not. Each number corresponds to a certain frequency component of the video image.

Each of the 64 frequency values of the DCT block are divided by their corresponding value in the quantization table. This reduces the information in some frequencies, deemed less visually important, while other frequency components may be eliminated completely. This quantization process usually reduces a significant number of the AC coefficients to zero.

This quantization process eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the source of most MPEG-1 video artifacts, like blockiness, color banding, noise, ringing, discoloration, et al. when video is encoded with an insufficient bitrate.

Lossless Data Compression

Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed on decoding to produce exactly the same values. Since these lossless data compression steps don't add noise into or otherwise change the video (unlike quantization), it is often called noiseless coding.

Huffman Coding

After perceptual coding, the data is analyzed to find strings that repeat often. Those strings are then put into a special table, with the most frequently repeating data assigned the shortest code to keep the data as small as possible.

Once the table is constructed, those strings are in the data are replaced with their (much smaller) codes, which references the appropriate entry in the table.

RLE

Run-length encoding (RLE) is a very simple method of compressing repetition. A sequential string of characters, no matter how long, can be replaced with a few bytes, noting the value that repeats, and how many times.

RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero, and can be represented with just a couple bytes (in a special 2-dimensional Huffman table that codes the run-length and the ending character).


 Dimentions 4094x4094
 Datarate
 Constrained Parameters Bitstream
 Luma
 Chroma
 I-frames (Intraframe) 
   Seeking
 P-frames (Predicted)
 B-frames (Bidirectional)
   Complexity (memory)
   Delay
 GOP
   Keyframe placement
 Quantization*
   Ringing (large coefficients in high frequency sub-bands)
   zigzag
 Macroblocks
   16 dimentions
 Motion Vectors/Estimation
   Black borders/Noise
   pel precision (half pixel IIRC)
   Two MV per macroblock (forward/backward pred)
   Prediction error
   DPCM encoded, just like DC coeffs
   Blockiness
 CBR/VBR
 Spacial Complexity
 Temporal Complexity


Audio

Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC_11172-3

MPEG-1 audio utilizes perceptual masking with sub-band coding with a polyphased filter bank to reduce the bitrate of the audio stream. It has been shown to be particularly efficient on high quality percussive sounds (impulses) thanks to the very effective time-domain concealment characteristics of its 32 sub-band polyphased filter bank.

Channel Encoding:

  • Mono
  • Joint Stereo (impulse encoded)
  • Stereo
  • Dual (two uncorrelated mono channels)
  • Sampling rates: 32, 44.1 and 48 kHz
  • Bitrates: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s
 mono, stereo, joint stereo (impulse, m/s), dual.*
 efficient time-domain concealment characteristics 

Layer I

MPEG-1 Layer I is nothing more than a simplified version of Layer II, designed for low-delay and low complexity to facilitate real-time encoding on the hardware available in 1990 for applications like teleconferencing and studio editing. With the substantial performance improvements in digital processing since, it has now been long obsolete.

It saw limited adoption in it's time, and most notably was used on the defunct Digital Compact Cassette at 384 kbps. Layer I audio files use the extension .mp1

Layer II

Despite some 20 years of progress in the field of digital audio coding, MP2 remains the preeminent lossy audio coding standard due to its especially high audio coding performances on highly critical audio material such as castanet, symphonic orchestra, male and female voices and particularly high quality percussive sounds (impulses) like triangle and glockenspiel. Testing has shown MP2 to be equivalent or superior to than much more recent audio codecs, such as Dolby Digital AC-3. [5]

Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offer transparent audio compression at 256kbps for 16-bit 44.1khz CD audio. [6] That (approx) 1:6 compression ratio for CD audio is particularly impressive since it's quite close to upper theoretical limit of Perceptual Entropy, at just over 1:8. [7] [8] Achieving much higher compression is simply not possible without discarding some perceptible information.

 audio broadcasting
 error resilient
 Musicam


Layer III/MP3

The hybrid (filter bank + MDCT) design of MP3 imposes some limitations as well. It causes a factor of 12 - 36 times worse temporal resolution than MP2, which can cause artifacts due to transients sounds like percussive events. This results in audible smearing and pre-echo. [9] Because of these issues, MP2 sound quality is actually superior to MP3 above 112 kbps/channel

 "The combination of the two filter banks creates aliasing issues that are only partially handled by the "aliasing compensation" stage, but that create excess energy to be coded in the frequency domain, thereby decreasing coding efficiency.  Frequency resolution is limited by the small long block window size, decreasing coding efficiency

No scale factor band for frequencies above 15.5/15.8 kHz"


 9 months?
 ASPEC (Fraunhoffer) 
 freq transform encoder 
 entropy coding (Huffman)
 Hybrid filtering
   MDCT (overlapping DCT)
   pre-echo worse
   aliasing issues
   "aliasing compensation"?
 mid/side (or impulse) joint stereo 
 576 frequency components
 selectivity
 "If there is a transient, 192 samples are taken instead of 576 to limit the temporal spread of quantization noise"?
 psychoacoustic model and frame format from MP1/2
 ringing
 CBR/VBR
 Frames are not independent

Systems

Part 1 of the MPEG-1 standard covers systems which is the logical layout of the encoded audio, video, and other bitstream data.

"The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure." [10]

 Program Stream
 Interleaving
 PES
   Wrap-around
 DTS
 Timebase correction
 Pixel/Display Aspect Ratio


See Also

  • MPEG The Moving Picture Experts Group
  • MP3 The Cultural Phenomenon in Music

References

  1. http://www.cis.temple.edu/~vasilis/Courses/CIS750/Papers/mpeg_6.pdf pp.2
  2. http://www.chiariglione.org/mpeg/meetings/santa_clara90/santa_clara_press.htm
  3. http://www.chiariglione.org/mpeg/meetings.htm
  4. http://www.chiariglione.org/mpeg/meetings/london/london_press.htm
  5. Wustenhagen et al, Subjective Listening Test of Multi-channel Audio Codecs, AES 105th Convention Paper 4813, San Francisco 1998
  6. http://www.faqs.org/faqs/mpeg-faq/part1/ "You can compress the same stereo program down to 256 Kbits/s with no loss in discernable quality." (the original papers would be much, much better refs, but I can't seem to find them! This just proves they exist!)
  7. J. Johnston, Estimation of Perceptual Entropy Using Noise Masking Criteria, in Proc. ICASSP-88, pp. 2524-2527, May 1988.
  8. 6. J. Johnston, Transform Coding of Audio Signals Using Perceptual Noise Criteria, IEEE J. Sel. Areas in Comm., pp. 314-323, Feb. 1988.
  9. http://www.cs.columbia.edu/~coms6181/slides/6R/mpegaud.pdf pp.8
  10. http://www.chiariglione.org/mpeg/faq/mp1-sys/mp1-sys.htm

External Links