3 Compression
3 Compression
3 Compression
Multimedia Applications
Usage
Multimedia
Multimedia
Learning & Teaching Design User Interfaces
Compression Content
Docu- Synchro-
Group
Services
Process- Security ... Communi-
ments nization
ing cations
Databases Programming
Systems
Media-Server Operating Systems Communications
Prof. Dr.-Ing. Lars Wolf
Opt. Memories Quality of Service Networks
Compression
TU Braunschweig Computer
Institut fr Betriebssysteme und Rechnerverbund Archi- Image &
Basics
tectures Animation Video Audio
Graphics
Mhlenpfordtstrae 23, 38106 Braunschweig, Germany
Section Email: [email protected] Section
All All
3-compression.fm 3-compression.fm
1 2
Contents 1. Motivation
1. Motivation
2. Requirements General Digital video in computing means for
Multimedia
Multimedia
Text:
3. Fundamentals Categories 1 page with 80 char/line and 64 lines/page and 2 Byte/Char
80 x 64 x 2 x 8 = 80 kBit/page
4. Source Coding
Image:
5. Entropy Coding 24 Bit/Pixel, 512 x 512 Pixel/image
512 x 512 x 24 = 6 MBit/Image
6. Hybrid Coding
Audio:
7. JPEG CD-quality, samplerate44,1 kHz, 16 Bit/sample
Mono: 44,1 x 16 = 706 kBit/s
8. H.261 and related ITU Standards Stereo: 1.412 MBit/s
9. MPEG-1 Video:
full frames with 1024 x 1024 Pixel/frame, 24 Bit/Pixel, 30 frames/s
10. MPEG-2 1024 x 1024 x 24 x 30 = 720 MBit/s
more realistic
Section 11. MPEG-4 Section
360 x 240 Pixel/frame = 60 MBit/s
All
14. Conclusion All Hence compression is necessary
3-compression.fm 3-compression.fm
3 4
2. Requirements General Requirements
Multimedia
Synchronization of audio, video, and other media
low delay
Dialogue mode requirements:
Compression and decompression in real-time
intrinsic scalability (e.g. 25 frames/s)
End-to-end delay < 150ms
high quality
Symmetric:
compression
compression and decompression take the same time
Multimedia
Arithmetic Coding
entropy coding DPCM
hybrid Prediction
- ignoring semantics of the data DM
coding
- lossless FFT
Source Transformation
DCT
source entropy
coding encoding
Coding Bit Position
- entropy Layered Coding Subsampling
- based on semantic of the data and Sub-Band Coding
- often lossy source Vector Quantization
coding JPEG
channel coding Hybrid MPEG
- adaptation to communication channel Coding H.261, H.263
Section
- introduction of redundancy Section proprietary: Quicktime, ...
All All
3-compression.fm 3-compression.fm
7 8
Categories & Techniques, Cont. Categories & Techniques: Symmetric / Asymmetric
Multimedia
2. Reduction Coding: Eliminate Irrelevance / Low-Relevance (lossy) o.k. if compression non real-time, "only once" (movie!)
may involve number-crunchers (...owned by content provider)
Preparatory Step: Decorrelation - Eliminate Interdependencies
Symmetric: "required" for real-time, e.g., videoconferencing
this is the essence of source coding in reality, often not 100% symmetric
changes "representation" of media
goal usually: reduce dependencies between data
as such, is a preparatory step!! and usually, does not compress
All
note: reduction coding is "smart deletion", not really "compression" All
3-compression.fm 3-compression.fm
9 10
Multimedia
"smooth" bit stream ("isochronous")? DPCM = Differential Pulse-Code Modulation
terms: CBR (const. bit rate) vs. VBR (variable bit rate)
Assumptions:
may be "over time": e.g., packet size BigSmallSmall BigSmallSmall...
may be simulated w/ loop-back filter plus buffer Consecutive samples or frames have similar values
"progressive" (mainly: non-continuous media): display-while-download Prediction is possible due to existing correlation
"streaming": ~ same for video (here, rather an issue of software) Fundamental steps:
more subtle issues previous actual
Predict next data next data:
"open" standard?
data:
based on previously processed data prediction
good "performance" (ratio, speed) for all kinds of media? 1000 1005
Determine difference between 5
bullet-proof, well-understood? actual next data and prediction 1000
code
...
Code difference only
Section Section
Challenge: optimal predictor
All All Delta modulation (DM): 1 bit as difference signal
3-compression.fm 3-compression.fm
11 12
Source Coding: Transformation Source Coding: Sub-Band
Assumptions: Assumption:
Data in the transformed domain is easier to compress Some frequency ranges are more important than others
Multimedia
Multimedia
Related processing is feasible
Example:
Example: frequency spectrum of the signal
Fourier Transformation
Multimedia
Entropy (in information theory): information content/ "density"
example: given 4 possible symbols (words) in source code
symbols/words equally likely: high entropy (full of information)
i) IF all equal p=1/4: H(P)=2; ii) IF p= 1/2, 1/4, 1/8, 1/8 --> H(P)= 1 6/8
otherwise: lower entropy (suboptimal representation of info, less dense)
"Entropy coding" means:
note:
mean length of file equals (~almost) entropy
probability
low
Entropy here: "little info" because
"most of picture is in same gray"
grey levels
Section Section
All All
3-compression.fm 3-compression.fm
15 16
Run-Length Entropy Coding: Huffman
Multimedia
E.g., character frequencies of the English language
Long sequences of identical symbols
Fundamental principle:
Example:
... A B C E E E E E E D A C B... Frequently occurring symbols are coded with shorter bit strings
compression
... A B C E ! 6 D A C B...
symbol number of
occurrences
special flag
Multimedia
Given probabilities of occurrence:
p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15
symbol code
A 11
symbol probability coding tree code B A C D A B E B A E
B 10
1 C 011
A 30% A A = 11 10 11 011 010 11 10 00 10 11 00
1 D 010
60% 00
0 E
B 30% B B = 10
1
C 10% C 100% C = 011
1
25%
0
D 15% D 0 D = 010
40%
0
Section E 15% E E = 00 Section
All All
3-compression.fm 3-compression.fm
19 20
6. Hybrid Coding 7. JPEG
Basic Encoding Steps
JPEG: Joint Photographic Expert Group
Multimedia
Multimedia
video:
lossy International Standard:
audio: lossy For digital compression and coding of continuous-tone still images:
lossless (sometimes lossless) lossless Gray-scale
Color
Since 1992
data data
source quanti- entropy compresse Joint effort of:
pre- pro-
data zation encoding data ISO/IEC JTC1/SC2/WG10
paration cessing
Commission Q.16 of CCITT SGVIII
e.g. e.g.
Compression rate of 1:10 yields reasonable results
e.g. e.g.
- resolution - DCT - linear - runlength
- frame rate - sub-band - DC, AC - Huffman
Section coding Section
values
All All
3-compression.fm 3-compression.fm
21 22
Independence of:
Multimedia
Multimedia
image image entropy
com-
Image resolution pre- pro- encoding
source paration cessing pressed
Image and pixel aspect ratio
quanti-
runlength
Color representation pixel
image zation image
Image complexity and statistical characteristics predictor Huffman
block
MCU FDCT Arithm.
Well-defined interchange format of encoded data
Implementation in:
Software only MCU: Minimum Coded Unit
FDCT: Forward Discrete Cosine Transformation
Software and hardware
Multimedia
* * * * *
* * ** ** ** * * brightness
** * * * * * * Yi resolution
* *** *** *** ** * C3 sampling frequency 13.5 MHz
* of plane i
* * * * * *
* * * * * C2
* * * * * Xi Chrominance (U, V):
* * * * * C1 color differences
sampling frequency 6.75 MHz
Planes:
1 N 255 components Ci (e.g., one plane per color)
Contain data units
Pixels in lossless mode, 8*8-blocks in lossy mode
Different planes may have different resolutions
Multimedia
* * * * * * *
Expanded lossy DCT-based mode
left * * * * * * * right
* * * * * * * Progressive image display
I.e. from coarse to fine resolution
bottom
Lossless mode
Interleaved encoding:
Lossless compression error-free decompression
C1 C2 C3
* * * * * * * * * * * * * * * Hierarchical mode
* * * * * * * * * * * * * * * + +
* * * * * * * * * Compression with multiple resolutions
* * * * * * * * * * * * * * * = MCU
Multimedia
interpret as periodic (infinite) oscillating waveform
represent as sum of sin/cos waves ai sin t; i=0...(N-1); same for cos
image image
ai coefficients; a0 = DC (direct current= shift wrt. 0-axis),
8x8 others: how much of the respective sin or cos wave is part of waveform
blocks FDCT tables
tables tables
i increasing frequencies (usually N = no. of samples in block)
DCT in JPEG etc.:
Baseline mode is mandatory for all JPEG implementations: same idea, but 2-dimensional cos-waves
Often restricted to certain resolution cut out square blocks from picture (NxN)
cos waves all have independent frequencies in horizontal/vertical direction
Often only three planes with predefined color set-up
comparable to smooth hills, # of valleys may differ horiz/vert.
Image preparation: again: interpret sample as periodic (2D) waveform
--> represent as sum of (2D) cos wave "hill areas"
Step 1a: Pixel resol. multiples of p=8 bit yields 8x8 pixel blocks (data units) why only cos??
Step 1b: unsigned --> signed integer (prepare for "oscillation" --> sin/cos) trick: picture swapped around axes
--> 4fold size --> picture symmetric to axes --> sin parts become zero
... other steps see below
Section Section 4fold size no problem: 3 parts redundant
Step 4a: zigzag linearization (see below) axes have double "weight" (pix. row/col. "0") --> factor Cu/Cv in formula
All All
3-compression.fm
29
Steps 4b, c, ...: several entropy coding algorithms applied 3-compression.fm
30
JPEG Baseline Mode: Image Processing JPEG Baseline Mode: Image Processing
Multimedia
2y + 1 ) v-
s yx cos ----------------------------- cos (----------------------------
S vu = --- C u C v
4 16 16 not pixels into pixels
x = 0y = 0
Example:
with:
cu, cv = 1
------- , for u, v= 0; else cu, cv = 1 Calculation of S00
2
Quantization of DCT-coefficients:
Map interval of real numbers to one integer number
Multimedia
Multimedia
Especially: small values are mapped to 0, yielding long zero sequences
Section Section
All All
3-compression.fm 3-compression.fm
33 34
JPEG Baseline Mode: Entropy Encoding JPEG Baseline Mode: Entropy Coding
DC-coefficients: 63 AC coefficients:
Compute the differences: Ordering in zig-zag form
Multimedia
Multimedia
DCi-1 DCi AC01 AC07
* * * * * * * *
DC * * * * * * * *
... block block ... * * * * * * * *
* * * * * * * *
* * * * * * * *
DIFF = DCi - DCi-1 * * * * * * * *
* * * * * * * *
Encode differences instead of the DCi values AC70 * * * * * * * *
AC77
Reason: DC values of adjacent blocks are often similar
reason: coefficients in lower right corner are likely to be zero
Huffman coding of all coefficients:
Transformation into a code
where amount of bits depends on frequency of respective value
Subsequent runlength coding of zeros
Section Section
All All
3-compression.fm 3-compression.fm
35 36
JPEG: Details of (one possible) Entropy coding JPEG: Sample Compression of 1 Block: 8x8 Matrices
Multimedia
139 144 149 153 155 155 155 155 235.6 1.0 -12.1 -5.2 2.1 -1.7 -2.7 1.3
assumption: there will rarely be two non-zero AC values in sequence 144 151 153 156 159 156 156 156 -22.6 -17.5 -6.2 -3.2 -2.9 -0.1 0.4 -1.2
--> regard seq. as iteration of non-zero AC-values and zero-runlengths 150 155 160 163 158 156 156 156 -10.9 -9.3 -1.6 1.5 0.2 -0.9 -0.6 -0.1
--> sometimes, the zero-runlength will have "length zero" 159 161 162 160 160 159 159 159 -7.1 -1.9 0.2 1.5 0.9 -0.1 0.0 0.3
159 160 161 162 162 155 155 155 -0.6 -0.8 1.5 1.6 -0.1 -0.7 0.6 1.3
code non-zero AC-values as VLIs (variable length integers ) 161 161 161 161 160 157 157 157 1.8 -0.2 1.6 -0.3 -0.8 1.5 1.0 -1.0
--> need to transmit VLI-lengths 162 162 161 163 162 157 157 157 -1.3 -0.4 -0.3 -1.5 -0.5 1.7 1.1 -0.8
(difference to Huffman: end of code not found by decoder) 162 162 161 161 163 158 158 158 -2.6 1.6 -3.8 -1.8 1.9 1.2 -0.6 -0.4
assume: last DC value was 18 --> encoded difference is 3 On the following slides: picture of Yosemite Valley
--> only 3, -2, -1 occur as non-zero values. Source: pico.phys.chemie.tu-muenchen.de/people/krempl/JMT
Multimedia
Multimedia
Their VLI-encoding is as follows:
3 11 with various degrees of compression:
-2 01
-1 0 Bitmap: no compression
1024 * 671 pixels
This makes the iteration look as follows (VLIs still represented as integers):
3 bytes / pixel
(2)(3), (1,2)(-2), (0,1)(-1), (0,1)(-1), (0,1)(-1), (2,1)(-1), (0,0) (<-- abbreviation for "til end")
2014 KByte file size
1:31 63 KByte
The following Huffman encoding is defined:
(2) 011 1:50 40 KByte
(0,0) 1010
(0,1) 00 240 0 -10 0 0 0 0 0 1:100 21 KByte
(1,2) 11011 -24 -12 0 0 0 0 0 0
(2,1) 11100 -14 -13 0 0 0 0 0 0 1:155 13 KByte
0 0 0 0 0 0 0 0
...so that the bitstream finally consists of 0 0 0 0 0 0 0 0
Multimedia
Section Section
All All
3-compression.fm 3-compression.fm
41 42
Multimedia
Section Section
All All
3-compression.fm 3-compression.fm
43 44
JPEG: 1:155 JPEG 4 Modes of Compression
Multimedia
(baseline mode)
lossless mode
hierarchical mode
Section Section
All All
3-compression.fm 3-compression.fm
45 46
JPEG Extended Lossy DCT-Based Mode JPEG Extended Lossy DCT-Based Mode
Multimedia
Good for large and complicated images
Top to bottom
Good for small images and fast processing
Section Section
All All
3-compression.fm 3-compression.fm
47 48
JPEG Extended Lossy DCT-Based Mode JPEG Lossless Mode
Multimedia
Order of pixel/block processing changed
Image processing:
By spectral selection: Selection of a predictor for each pixel
code prediction
Selection according to importance of DC, AC value
0 no prediction
All DC values of whole image first 1 x=A
c b 2 x=B
All AC values in order of importance subsequently 3 x=C
a x 4 x=A+B+C
By successive approximation: 5 x=A+((B-C)/2)
6 x=B+((A-C)/2)
Selection according to position of bits
7 x=(A+B)/2
First the most significant bit of all blocks
Then the second significant bit of all blocks Entropy coding:
Until the least significant bit of all blocks Same as lossy mode
Section Section Code of chosen predictor and its difference to the actual value
All All
3-compression.fm 3-compression.fm
49 50
Coding of each image with several resolutions: Goal: Establish a follow-on standard to JPEG
Image scaling Started Feb. 1996
Multimedia
Multimedia
Differential encoding Call for Proposals March 1997
First, coded with lowest resolution image A Standardization Dec. 2000 (target date)
Coded with increasing horizontal & vertical resolution image A
Features:
Difference between both images is computed B = A - A (*)
Compression based on Wavelet technology
Iteration for higher resolutions See Section 11
Section
(*) note for all scalable approaches: Section
Increased capacity for metadata
relate higher-res version B (or B) to receivers de-coded I.e. information about the image
All lower-res version A (to avoid accumulation of quantization errors) All
3-compression.fm 3-compression.fm
51 52
8. H.261 and related ITU Standards H.261 Image Preparation
Multimedia
CCITT standard from 1990 Luminance signal (Y)
Two color difference signals (Cb,Cr)
For ISDN
Subsampling according to CCIR 601 (4:1:1)
With p=1,..., 30
Quarter Common Intermediate Format (QCIF) resolution:
Technical issues:
Mandatory
Real-time encoding/decoding
Y: 176 x 144 pixel ("pruning" 180-->176)
Max. signal delay of 150ms
Constant data rate
At 29.97 frames/s appr. 9.115 Mbit/s (uncompressed) CIF: 360*288
but: encoder may leave out up to 3 frames (--> ~8 fps)
Implementation in hardware (main goal) and software
QCIF
Common Intermediate format (CIF) resolution:
Optional
Y: 352 x 288 pixel
Section Section
At 29.97 frames/s appr. 36.46 Mbit/s (uncompressed) i.e. ~ 570 * 64kbps
All All
3-compression.fm 3-compression.fm
53 54
Multimedia
Macroblock of: 4 Y blocks, 1 Cr block, 1 Cb block basically DCT as in JPEG baseline mode
Group of blocks (GOBs) of 3 x 11 macroblocks DCT w/ same quantization factor for all AC values
Picture: this factor may be adjusted by loopback filter (see below)
QCIF picture: 3 GOBs
CIF picture: 12 GOBs
Section Section
All All
3-compression.fm 3-compression.fm
55 56
H.261: Image Compression Interframe H.261: Image Compression
Multimedia
Frame 1 Frame 2 Encode:
Motion vectors between macroblock pairs
Components are encoded yielding code words of variable length
Differences between macroblock pairs
DCT if value higher than a specific threshold
No further processing if value less than this threshold
motion vector
Quantization:
interframes: f1,f2,f3,... relative to f0 (differential encoding)
Linear
in H.261: intraframes rare (bandwidth!, main application videophone)
Adaptation of step size (loopback filter) constant data rate
Search for similar macroblock (16x16) in previous image Coarse quantization if many values to be transmitted
Position of this macroblock defines motion vector Fine quantization if few values to be transmitted
("leaky bucket": constant 64kbps "drop out";
Search range for similar block is implementation-dependent:
loopback filter: adjust quantization factor if bucket filled
Section max. 15 pixel Section above threshold1 or below threshold 2, respectively)
All but: motion vector may also always be 0 ("bad" software encoder) All
3-compression.fm 3-compression.fm
57 58
Multimedia
max. bitrate: H.263 approx. 2.5 x H.261; lowest bitrates suitable f. modem optimal PB-frames (2 combined pictures: 1 B- & 1 P-Frame)
optional overlapped block motion compensation
optional motion vector pointing outside image
Source Image Formats half pel motion compensation (instead of full pel)
SQCIF 128 x 96 optional required unlimited search space for motion vector
--> fast encoder can do better
QCIF 176 x 144 required required
..
CIF 352 x 144 optional optional
4CIF 704 x 576
not defined optional
Section 16CIF 1408 x 1152 Section
All All
3-compression.fm 3-compression.fm
59 60
H.320, H.32x Family 9. MPEG-1
Multimedia
ISO/IEC working group(s)
adapt MPEG 2 for communication over B-ISDN (ATM)
ISO/IEC JTC1/SC29/WG11
H.321 ISO IS 11172 since 3/93
define videoconferencing terminal for B-ISDN (instead of N-ISDN)
Starting point: MPEG-1
H.322 Audio/video at about 1.5 Mbit/s
adapts H.320 for guaranteed QoS LANs (like ISO-Ethernet) Based on experiences with JPEG and H.261
Terminal for low bit rate communication (over V.34 Modems) MPEG-7: support for content-based search and retrieval
Section Section MPEG-21:future framework
All All
3-compression.fm 3-compression.fm
61 62
Color model: Y Cb Cr
MPEG
4:2:0 subsampling
Multimedia
Multimedia
audio video system Y value for each pixel
Cb and Cr in every fourth pixel only
combined stream
coding data stream coding data stream
common buffer Resolution:
management
At most 768 x 576 pixel / image
8 bit/pixel in each layer (i.e., for Y, Cr, Cb)
Consideration of other standards:
14 pixel aspect ratios
JPEG
horizontal : vertical = 1:1 or 16:9 or 4:3 or ...
H.261
8 frame rates
Symmetric and asymmetric compression 23.976 Hz, 24 Hz, 25 Hz, 29.97 Hz, 30 Hz, 50 Hz, 59.94 Hz, 60 Hz
Lower rates not allowed!
Constant data rate, should be < 1856 kbit/s
No user defined MCU like JPEG
Section Original target rate ~ 1.2 Mbps incl. audio (=1x CD-ROM: 150 kbps) Section
No progressive mode like JPEG
All All
3-compression.fm 3-compression.fm
63 64
MPEG Video: Processing Step MPEG: Video - Processing Step
Multimedia
Coding independent of other frames
B
P
P-frames (predictive coded frames): B
Coding depends on previous I- or P-frames B
motion vector
Based on motion vector P
B-frames (bi-directional predictive coded I MPEG does not define how to determine the motion vectors
frames): I.e. specifies only the format to describe them
t
Coding depends on previous and subsequent but no algorithm to find them
I- and P- frames Programmer is free to implement any algorithm
Multimedia
I1 B1 B2 P1 B3 B4 P2 I2 at P-frames: i.e. decode previous I-frame first
at B-frame: i.e. decode I and P-frames first
Multimedia
Sound Pressure Level (dB)
40 20 masker
SLT
av
masking
patterns 0
20 -50 50 100 150 ms 0 50 100 150 200
Dt tv
absolute threshold
0 of hearing
Section Section
All All
3-compression.fm 3-compression.fm
69 70
Multimedia
frame
packing
Audio channels:
Mono (single, 1 channel)
psychoacoustical controls: how many bits reserved
for which sub-band Stereo (2 channels)
model
dual channel mode (independent, e.g., bilingual)
optional: joint stereo (exploits redundancy and irrelevancy)
Audio channel:
Between 32 and 448 kbit/s Application Example: DAB Digital Audio Broadcasting
In steps of 16 kbit/s uses MPEG layer 2 (compression also known as MUSICAM =
(Masking pattern adapted Universal Subband Integrated Coding And Multiplexing)
Definition of 3 layers of quality delays, for VLSI implementation:
Layer 1: max. 448 Kbit/s (approx. 1.4 compression) max. 30 ms encoding
max. 10 ms decoding
Layer 2: max. 384 Kbit/s (approx. 1:6-1:8, common, e.g. as MUSICAM in DAB)
Section
Layer 3: max. 320 Kbit/s
Section SW codec delays vary for different layers, implementations, computers (rule-of-thumb
may be 50/100/150 ms for layer 1/2/3, which makes MP3 rather inappropriate for real-
All MP3 files: compression up to 1:12 / 1:14 with no hearable losses All
time conversation)
3-compression.fm 3-compression.fm
71 72
MPEG Audio and Video Data Streams Follow-Up MPEG Standards
Multimedia
2. Audio access units Multiple layers and profiles with different degrees of compression and quality
3. Slots
MPEG-3
Video Data Stream Layers: Initially HDTV, but MPEG-2 scaled up to subsume MPEG-3
1. Video sequence layer
MPEG-4:
2. Group of pictures layer
Initially, lower data rates for e.g. mobile communication
3. Single picture layer Then, coding and additional functionalities based on image contents
4. Slice layer
MPEG-7:
5. Macroblock layer
Content description
6. Block layer
Basis for search and retrieval
MPEG-21 (upcoming):
Section Section
Framework for multimedia business, delivery... whats missing?
All All
3-compression.fm 3-compression.fm maybe eCommerce focus --> e.g., security, watermarking?
73 74
Motivation
From MPEG-1 to MPEG-2 analog: continuous decrease in quality if errors occur
Multimedia
Multimedia
Improvement in quality digital: need for tolerance whenever error occur, i.e scaling
from VCR to TV to HDTV
Option: Spatial scaling
No CD-ROM based constraints
higher data rates reduction of resolution
MPEG-1: about 1.5 Mbit/s approach
MPEG-2: 2-100 Mbit/s image sampled with half resolution, then MPEG algorithms applied,
output processed with better FEC (base layer)
Evolution Image decoded, substracted from original, to difference MPEG algorithms applied,
1994: International Standard output processed with worseFEC (enhanced layer)
Section Section
approach
Base layer: DCT output, more significant bits encoded with better FEC
All All
3-compression.fm 3-compression.fm
Enhanced layer:DCT output, less significant bits encoded with worse FEC
75 76
MPEG-2 Video Profiles und Levels MPEG-2 Audio
High Level 80 Mbit/s 100 Mbit/s (two modest) extension to MPEG-1 audio:
1920 pixels/line
1152 lines
1) "low sample rate extension" LSE:
Multimedia
Multimedia
High-1440 Level 60 Mbit/s 60 Mbit/s 80 Mbit/s
1440 pixels/line 1/2 of all MPEG-1 rates: 16, 22.05, 24kHz
1152 lines
quantization down to 8 bits/sample
Main Level 15 Mbit/s 15 Mbit/s 15 Mbit/s 20 Mbit/s
720 pixels/line
576 lines 2) "multichannel extension": more channels, i.e. up to
Low Level 4 Mbit/s 4 Mbit/s 5 full bandwidth channels (surround system)
352 pixels/line
288 lines left and right front
Simple Main SNR Spatial High center (in front)
Profile Profile Scalable Scalable Profile left and right back
Profile Profile
LEVELS "matrixing": rule for backward compatible conversion --> stereo (x, y = 0.71)
and No B-frames B-frames B-frames B-frames B-frames
PROFILES Left for Stereo = Left_f + xCenter + yLeft_b
4:2:0 4:2:0 4:2:0 4:2:0 4:2:0 or 4:2:2
Right for Stereo = Right_f + xCenter + yRigtht_b
Not Scalable Not Scalable SNR SNR Scalable SNR Scalable
Scalable or Spatial or Spatial option: +1 "low freq. extension" (LFE) channel for subwoofer
Scalable Scalable
"multilingual extension": 7 more, i.e. up to 12 channels (multiple languages,
Section Section commentary)
All All
3-compression.fm 3-compression.fm
77 78
Multimedia
2. PES(es) combined to Program Stream or Transport Stream
all MPEG-1 audio format can be processed by MPEG-2
only 3 MPEG-2 audio codec will not provide backward compatibility Program stream:
(in the range between 256 - 448 Kbit/s)
Error-free environment
Packets of variable length
One single stream with one timing reference
Transport stream:
Designed for noisy (lossy) media channels
Multiplex of various programs with one or more time bases
Packets of 188 byte length
All All
3-compression.fm 3-compression.fm
79 80
11. MPEG-4 MPEG-4: Timeline
Multimedia
MPEG-4 (ISO 14496) originally: 1997: Committee Draft
Targeted at systems with very scarce resources 1998: Final Committee Draft
To support applications like 1998: Draft International Standard
Mobile communication
1999-2000: International Standard
Videophone and E-mail
Max. data rates and dimensions (roughly):
Between 4800 and 64000 bits/s
176 columns x 144 lines x 10 frames/s
Section Section
All All
3-compression.fm 3-compression.fm
81 82
Multimedia
"audio/visual objects" or AVOs specification for decoder implementations
1
object coding independent of Rhubarb 2
3 Description language
other objects, surroundings Rhubarb
and background binary syntax of an AV objects bitstream representation
Audio Audio
natural and synthetic objects object 1 video objects object 2 scene description information
All Interact with the audiovisual scene generated at the decoders site All
3-compression.fm 3-compression.fm
83 84
MPEG-4: Scope (cont.) MPEG-4: Video and Image Encoding
Multimedia
Very Low Bit-rate Video coding similar to MPEG-1/2
5 - 64 Kbit/s motion prediction
image sequences with CIF resolution and up to 15 frames/s texture coding
Higher-quality video Images and video of arbitrary
64 Kbit/s - 4 Mbit/s shape
quality like digital TV as done in conventional approach
8x8 DCT or shape-adaptive DCT
Natural audio coding
plus coding of shape and transparency information
2 - 64 Kbit/s
Encoder
Must generate timing information
speed of the encoder clock = time base
desired decoding times and/or expiration times
by using time stamps attached to the stream
Section Section
Can specify the minimum buffer resources needed for decoding
All All
3-compression.fm 3-compression.fm
85 86
Multimedia
Rhubarb primitive AVO
Rhubarb compound object
compound object
Section Section
Interaction with scenes
All All
3-compression.fm
e.g. change viewing point, drag object, start/stop streams, select language 3-compression.fm
87 88
MPEG-4: Scaling MPEG-4: Synthetic Objects
Multimedia
decoder displays textures and visual objects at a reduced spatial resolution start object: neutral-expression face
by decoding only a subset of the total bit stream animated via FDPs and/or FAPs
32 levels max. for textures and still images FAP (facial anim param): animate current display
3 levels max. for video sequences FDP (facial def. param): alternative shape/texture
Temporal scalability Mesh + texture mapping: for 2D & 3D meshes
decoder displays video at a reduced temporal resolution 2D mesh may also be used for human face anim., see above
by decoding only a subset of the total bit stream only triangular 2D meshes, vertices may be moved (mv!), texture is warped
3 levels max. e.g. virtual background
Quality scalability Texture coding for view-dependent applications
bitstream is parsed into a number of bit stream layers of different bit-rates texture, e.g. virt. background; decoder/encoder loop for "minimal" Xmission
either during transmission or in the decoder
subset of the layers still yields a meaningful signal
Multimedia
speech generation from given text and prosodic parameters
face animation control CoDec
CoDecCoDecCoDec Coding / Decoding
Score driven synthesis
Access Units e.g. video or audio frames
music generation from a score or scene description commands
more general than MIDI
Adaptation Layer
Special effects
A/V object data
Elementary Streams + stream type info, sync. info, QoS req.,...
Multimedia
interaction with Error-prone
remote interactive peers
broadcast systems MPEG-4 concepts for error handling:
storage systems
Resynchronization
establishment of channels with specific QoSs and bandwidths
enables receiver to tune in again
Controls based on markers within bitstream
FlexMux layer
Data recovery
TransMux layer
enables receiver to reconstruct lost data
encode data in an error-resilient manner
Error concealment
enables receiver to bridge gaps in data
e.g. by repeating parts of old frames
Section Section
All All
3-compression.fm 3-compression.fm
93 94
Motivation Compressor
Multimedia
Multimedia
Forward Wavelet
JPEG / DCT problems: Transformation
Quantizer Encoder
Section Section
All All
3-compression.fm 3-compression.fm
95 96
Wavelets: Fundamental Idea Wavelets: Transformation Steps
Image is transformed into the frequency domain (as in JPEG) "Discrete Wavelet Transformation" (Mallat, 1989)
But: based on Wavelet functions instead of cosine functions Split image recursively by using high and low pass filters
Multimedia
Multimedia
read by
cosine: Wavelet e.g.: read by column
line lower
L c1 ...
frequencies
L
... ... H transformed
d11 image with
L d12 reduced size
Advantage: Wavelets yield zero value outside a limited interval H
higher
Wavelet is confined to a part of the image L Low Pass H d13 frequencies
H High Pass
Image needs not be splitted into blocks
All All
3-compression.fm 3-compression.fm
97 98
Multimedia
containing the high frequency parts of the image Inherent scaling
representing "details" of the image based on the dxi for i=1,2,3,...
submitted to Wavelet transformation
Lower time complexity for the transformation
or thrown away in case of scaling
DCT: O(n*logn),
i
One image c :
DWT: O(n) (n=number of values to be transformed)
containing the lower frequency parts of the image
Higher flexibility: Wavelet function can be freely chosen
representing the original image with less details / at a lower resolution
submitted to step i+1
Afterwards:
Quantization
Section Section
Entropy encoding
All All
3-compression.fm as with DCT 3-compression.fm
99 100
Wavelets: Further Issues 13. Fractal Image Compression
Multimedia
Then apply wavelets to such a filtered image
Application to video:
In-2 ...
In-1 In-1 - In-2
Image n In - In-1
Compute Wavelet
differences compressor
t Im t ...
Mandelbrot
recursive construction of images
infinite granularity
Section Section
self-similarities in images
All All
3-compression.fm 3-compression.fm Zi = RealConst. * Zi-1 + ComplexConst
101 102
Use of Fractals for Compression??? Overview (1) Use of Fractals for Compression??? Overview (2)
Multimedia
idea: can natural images be described w/ fractal geometry?? (almost) transformed into itself!
first published by Barnsley & Sloan (88), first impl. 89 by Arnaud Joquin
First algorithm published (Joaquin):
Key #1: Iterated Function Systems IFS: partition image into (small, non-overlapping) "range blocks"
a b x e
input (sub-)picture subject to math. transform. of type + search (larger, overlapping) "domain blocks" which can be
c d y f
"contracted" into range blocks
picture moved, rotated / mirrored, and contracted
--> all transformations are "contractions" for each range block, find domain block and contraction
(lots of possibilities!!)
Key #2: Banachs Fixed Point Theorem:
apply a set Wimg={Wi} of contractions to an image
after infinitely many applications, a specific image appears
... called "attractor" or "fractal"
this process is independent of initial "start" image!!
Section human perception: iteration can stop "pretty soon" (finite no. of iterations) Section
Multimedia
recursive contruction of
rotation
images
scaling
Sirpinky triangle brightness adaptation
to produce self-similar
structures IFS:
infinite steps applied to
Iterative Function System
different source images ideally completely self-similar
lead to same result
example see right
known as
Sirpinski-triangle PIFS:
"Grenzwert" also known Partitioned Iterative Funcion
as attractor System
real images are
Section Section not completly self-similar
All All Wimg?
3-compression.fm 3-compression.fm
105 106
Multimedia
Let W: FF be a contractive mapping
i.e. there exists an s, 0<s<1, with | W(x)-W(y) | s | x-y | for all x,y F
Then W has exactly one fixed point xf
i.e. W(xf) = xf
xf can be computed as xf = limn Wn(x) with any x F
Section Section
Stop when error falls below some bound
All All
3-compression.fm 3-compression.fm
Error can be calculated by "Collage Theorem"
107 108
How to Find Wimg?
Systematic search based on Compression rate? Example: for each (8*8) range block:
"Partitioned Iterative Function System (PIFS)" contraction factor fixed
Multimedia
Multimedia
Partition image into "range blocks" Ri
3 bit for transformation
8*8 pixel blocks
16 bit for domain block coordinates
non-overlapping
12 bit for brightness/contrast adaptation
Consider all "domain blocks" Dj of double size
16*16 pixel blocks --> factor is 8x8x8 : 31= 512:31 (cf. JPEG example)
overlapping
Find for each Ri the most similar Dj
consider rotations (0o/90o/180o/270o) and mirroring
adapt brightness and contrast of Dj to that of Ri
translation, rotation, mirroring, brightness adaptation
define a (partial) affine function
Combine partial functions to Wimg
Section Section
All All
3-compression.fm 3-compression.fm
109 110
Multimedia
Here: better than JPEG ("cross-over point" at about 1:10 to 1:30)
+ Scalability
decompression steps yield iteratively improving image
JPEG:
Multimedia
H.261 / H.263:
Established standard by telecom world
Preferable hardware realization
Section
All
3-compression.fm
113