Lempel Ziv

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 11

Lempel-Ziv coding

In Lempel-Ziv coding algorithm, instead of using single


character as the basis of coding, uses string of characters.
For the compression of text, a table containing all possible
characters strings(words) that occur in text to be transferred
is held by both the encoder and decoder.
As each word occurs in the text, instead sending the word as
a set of individuals say ASCII- codewords, the encoder sends
only the index of where the word is stored in the table.
And on receipt of each index, decoder uses this to access the
corresponding word from the table and reconstruct the text.

Thus the table used as a dictionary.


The LZ algorithm is known as a dictionary-based
compression algorithm.
Most word processing packages have a
dictionary associated with them which is used for
both spell check and for compression.
Typically they contain in the region of 25000
words and hence 15 bits are required to encode
the index.
For exp. To send multimedia , just 15 bits are
required.

This is efficient for the transmission of text


created by standard word processing
packages.
But, it becomes inefficient if the text to be
transmitted comprises only a small subset of
words.
Hence a variation of LZ algorithm is developed
which allows the dictionary to be built up
dynamically by the encoder and decoder as
the compressed text is being transferred.

Lempel-Ziv-Welsh Coding
LZW starts out with a dictionary of 256 characters (in
the case of 8 bits) and uses those as the "standard"
character set.
It then reads data 8 bits at a time (e.g., 't', 'r', etc.) and
encodes the data as the number that represents its
index in the dictionary.
Everytime it comes across a new substring (say, "tr"), it
adds it to the dictionary; everytime it comes across a
substring it has already seen, it just reads in a new
character and concatenates it with the current string to
get a new substring.
The next time LZW revisits a substring, it will be
encoded using a single number.

Now, let's suppose our input stream we wish


to compress is "banana_bandana", and that
we are only using the initial dictionary:
Index Entry
0 a
1 b
2 d
3 n
4 _ (space)

Current String

Seen this
Before?

yes

nothing

none

ba

ba

no

ba / 5

ban

an

no

1,0

an / 6

bana

na

no

1,0,3

na / 7

banan

an

yes

no change

none

banana

ana

no

1,0,3,6

ana / 8

banana_

a_

no

1,0,3,6,0

a_ / 9

banana_b

_b

no

1,0,3,6,0,4

_b / 10

banana_ba

ba

yes

no change

none

banana_ban

ban

no

1,0,3,6,0,4,5

ban / 11

banana_band

nd

no

1,0,3,6,0,4,5,3

nd / 12

banana_banda

da

no

1,0,3,6,0,4,5,3,2

da / 13

banana_bandan

an

yes

no change

banana_bandana

ana

yes

Input

Encoded Output

1,0,3,6,0,4,5,3,2,

New Dictionary
Entry/Index

none
none

Now, let's suppose our input stream


we wish to compress is "abababab",
and that we are only using the initial
dictionary:
Index Entry
0
a
1
b

The encoding process begins:

Input

a
ab
aba
abab
ababa
ababab
abababa
abababab

Current Seenthis Encoded


String
Before?
Output
a
ab
ba
ab
aba
ab
aba
abab

yes
no
no
yes
no
yes
yes
no

nothing
0
0,1
nochange
0,1,2
nochange
nochange
0,1,2,4

New
Dictionar
y
Entry/Ind
ex
none
ab/2
ba/3
none
aba/4
none
none
abab/5

Uncompression
The uncompression process for LZW is also
straightforward. In addition, it has an advantage
over static compression methods because no
dictionary or other overhead information is
necessary for the decoding algorithm.
A dictionary identical to the one created during
compression is reconstructed during the process.
Both encoding and decoding programs
must start with the same initial dictionary.
(with all 256 ASCII characters in standard case)

There is an exception where the


algorithm fails, and that is when the
code calls for an index which has not
New
yet beenDictionar
entered.
Encoded
y
Input
Translati
on

Decoded
Output

Current Dictionar
String yEntry/
Index

0=a

none

none

0,1

1=b

ab

ab/2

0,1,2

2=ab

abab

ba/3

0,1,2,4

4=???

abab???

ab

???

As you can see, the decoder comes across an index of 4.


To understand why this happens, take a look at the encoding
table. Immediately after "aba" (with an index of 4) is entered
into the dictionary, the next substring that is encoded is an
"aba.
Thus, the only case in which this special case can occur is if
the substring begins and ends with the same character ("aba" is
of the form <char><string><char>).
So, to deal with this exception, you simply take the substring
you have so far, "ab", and concatenate its first character to
itself, "ab"+"a" = "aba", instead of following the procedure as
normal.
Therefore the pseudocode provided above must be altered a bit
in order to handle all cases.

You might also like