Lempel Ziv

Lempel-Ziv coding
In Lempel-Ziv coding algorithm, instead of using single

character as the basis of coding, uses string of characters.
For the compression of text, a table containing all possible
characters strings(words) that occur in text to be transferred
is held by both the encoder and decoder.
As each word occurs in the text, instead sending the word as
a set of individuals say ASCII- codewords, the encoder sends
only the index of where the word is stored in the table.
And on receipt of each index, decoder uses this to access the
corresponding word from the table and reconstruct the text.
Thus the table used as a dictionary.

The LZ algorithm is known as a dictionary-based
compression algorithm.
Most word processing packages have a
dictionary associated with them which is used for
both spell check and for compression.
Typically they contain in the region of 25000
words and hence 15 bits are required to encode
the index.
For exp. To send multimedia , just 15 bits are
required.
This is efficient for the transmission of text

created by standard word processing
packages.
But, it becomes inefficient if the text to be
transmitted comprises only a small subset of
words.
Hence a variation of LZ algorithm is developed
which allows the dictionary to be built up
dynamically by the encoder and decoder as
the compressed text is being transferred.
Lempel-Ziv-Welsh Coding
LZW starts out with a dictionary of 256 characters (in
the case of 8 bits) and uses those as the "standard"
character set.
It then reads data 8 bits at a time (e.g., 't', 'r', etc.) and
encodes the data as the number that represents its
index in the dictionary.
Everytime it comes across a new substring (say, "tr"), it
adds it to the dictionary; everytime it comes across a
substring it has already seen, it just reads in a new
character and concatenates it with the current string to
get a new substring.
The next time LZW revisits a substring, it will be
encoded using a single number.
Now, let's suppose our input stream we wish

to compress is "banana_bandana", and that
we are only using the initial dictionary:
Index Entry
0 a
1 b
2 d
3 n
4 _ (space)
Current String
Seen this
Before?
yes
nothing
none
ba
ba
no
ba / 5
ban
an
no
1,0
an / 6
bana
na
no
1,0,3
na / 7
banan
an
yes
no change
none
banana
ana
no
1,0,3,6
ana / 8
banana_
a_
no
1,0,3,6,0
a_ / 9
banana_b
_b
no
1,0,3,6,0,4
_b / 10
banana_ba
ba
yes
no change
none
banana_ban
ban
no
1,0,3,6,0,4,5
ban / 11
banana_band
nd
no
1,0,3,6,0,4,5,3
nd / 12
banana_banda
da
no
1,0,3,6,0,4,5,3,2
da / 13
banana_bandan
an
yes
no change
banana_bandana
ana
yes
Input
Encoded Output
1,0,3,6,0,4,5,3,2,
New Dictionary
Entry/Index
none
none
Now, let's suppose our input stream

we wish to compress is "abababab",
and that we are only using the initial
dictionary:
Index Entry
0
a
1
b
The encoding process begins:
Input
a
ab
aba
abab
ababa
ababab
abababa
abababab
Current Seenthis Encoded

String
Before?
Output
a
ab
ba
ab
aba
ab
aba
abab
yes
no
no
yes
no
yes
yes
no
nothing
0
0,1
nochange
0,1,2
nochange
nochange
0,1,2,4
New
Dictionar
y
Entry/Ind
ex
none
ab/2
ba/3
none
aba/4
none
none
abab/5
Uncompression
The uncompression process for LZW is also
straightforward. In addition, it has an advantage
over static compression methods because no
dictionary or other overhead information is
necessary for the decoding algorithm.
A dictionary identical to the one created during
compression is reconstructed during the process.
Both encoding and decoding programs
must start with the same initial dictionary.
(with all 256 ASCII characters in standard case)
There is an exception where the

algorithm fails, and that is when the
code calls for an index which has not
New
yet beenDictionar
entered.
Encoded
y
Input
Translati
on
Decoded
Output
Current Dictionar
String yEntry/
Index
0=a
none
none
0,1
1=b
ab
ab/2
0,1,2
2=ab
abab
ba/3
0,1,2,4
4=???
abab???
ab
???
As you can see, the decoder comes across an index of 4.

To understand why this happens, take a look at the encoding
table. Immediately after "aba" (with an index of 4) is entered
into the dictionary, the next substring that is encoded is an
"aba.
Thus, the only case in which this special case can occur is if
the substring begins and ends with the same character ("aba" is
of the form <char><string><char>).
So, to deal with this exception, you simply take the substring
you have so far, "ab", and concatenate its first character to
itself, "ab"+"a" = "aba", instead of following the procedure as
normal.
Therefore the pseudocode provided above must be altered a bit
in order to handle all cases.

Lempel Ziv

Uploaded by

Copyright:

Available Formats

Lempel Ziv

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lempel Ziv

Uploaded by

Copyright:

Available Formats

Lempel-Ziv coding

In Lempel-Ziv coding algorithm, instead of using single

Thus the table used as a dictionary.

This is efficient for the transmission of text

Now, let's suppose our input stream we wish

Now, let's suppose our input stream

The encoding process begins:

Current Seenthis Encoded

There is an exception where the

As you can see, the decoder comes across an index of 4.

You might also like