Ahl Swede 00

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

1204

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 4, JULY 2000

Network Information Flow


Rudolf Ahlswede, Ning Cai, Shuo-Yen Robert Li, Senior Member, IEEE, and
Raymond W. Yeung, Senior Member, IEEE

AbstractWe introduce a new class of problems called network


information flow which is inspired by computer network applications. Consider a point-to-point communication network on which
a number of information sources are to be mulitcast to certain sets
of destinations. We assume that the information sources are mutually independent. The problem is to characterize the admissible
coding rate region. This model subsumes all previously studied
models along the same line. In this paper, we study the problem
with one information source, and we have obtained a simple characterization of the admissible coding rate region. Our result can be
regarded as the Max-flow Min-cut Theorem for network information flow. Contrary to ones intuition, our work reveals that it is in
general not optimal to regard the information to be multicast as a
fluid which can simply be routed or replicated. Rather, by employing coding at the nodes, which we refer to as network coding,
bandwidth can in general be saved. This finding may have significant impact on future design of switching systems.
Index TermsDiversity coding, multicast, network coding,
switching, multiterminal source coding.

I. INTRODUCTION

ET be the set of nodes of a point-to-point communication network. Such a network is represented by a directed
, where is the set of edges, such that ingraph
formation can be sent noiselessly from node to node for all
. An example of this type of networks is the Internet
backbone, where with proper data link protocols information
can be sent between nodes essentially free of noise.
be mutually independent information
Let
is
sources. The information rate (in bits per unit time) of
. Let
denoted by , and let
and
be arbitrary mappings. The source
is generated at node
, and it is multicast to node for
. The mappings
and the vector specify a set
all
of multicast requirements.
In this model, the graph may represent a physical network,
while the set of multicast requirements may represent the aggreManuscript received February 25, 1998; revised March 6, 2000. This work
was supported in part under Grants CUHK95E/480 and CUHK332/96E from
the Research Grant Council of the Hong Kong Special Administrative Region,
China. The material in this paper was presented in part at the IEEE International
Symposium on Information Theory, MIT, Cambridge, MA, August 1621, 1998
R. Ahlswede is with Fakultt fr Mathematik, Universitt Bielefeld, 33501
Bielefeld, Germany (e-mail: [email protected]).
N. Cai was with Fakultt fr Mathematik, Universitt Bielefeld, 33501
Bielefeld, Germany. He is now with the Department of Information Engineering, The Chinese University of Hong Kong, N.T., Hong Kong (e-mail:
[email protected]).
S.-Y. R. Li and R. W. Yeung are with the Department of Information Engineering, The Chinese University of Hong Kong, N.T., Hong Kong (e-mail:
[email protected]; [email protected]).
Communicated by R. L. Cruz, Associate Editor for Communication Networks.
Publisher Item Identifier S 0018-9448(00)05297-4.

gated traffic pattern the network needs to support. In other situations, the graph may represent a subnetwork in a physical
network, while the set of multicast requirements may pertain to
a specific application on this subnetwork, e.g., a video-conference call.
In existing computer networks, each node functions as a
switch in the sense that it either relays information from an
input link to an output link, or it replicates information received
from an input link and sends it to a certain set of output links.
From the information-theoretic point of view, there is no
reason to restrict the function of a node to that of a switch.
Rather, a node can function as an encoder in the sense that
it receives information from all the input links, encodes, and
sends information to all the output links. From this point of
view, a switch is a special case of an encoder. In the sequel, we
will refer to coding at a node in a network as network coding.
be a nonnegative real number associated with the
Let
, and let
. For a fixed set of muledge
ticast requirements, a vector is admissible if and only if there
exists a coding scheme satisfying the set of multicast requirements such that the coding rate from node to node (i.e., the
average number of bits sent from node to node per unit time)
for all
. (At this point we
is less than or equal to
leave the details of a coding scheme open because it is extremely
difficult to define the most general form of a coding scheme. A
class of coding schemes called -codes will be studied in Secis called the capacity of the edge
tion III.) In graph theory,
. Our goal is to characterize the admissible coding rate region , i.e., the set of all admissible , for any graph and
and .
multicast requirements
The model we have described includes both multilevel diversity coding (without distortion) [12], [8], [13] and distributed
source coding [14] as special cases. As an illustration, let us
show how the multilevel diversity coding system in Fig. 1 can
be formulated as a special case of our model. In this system,
and
. Decoder 1 reconstructs
there are two sources,
only, while all other decoders reconstruct both
and
. Let
be the coding rate of Encoder ,
. In our model, the
system is represented by the graph in Fig. 2. In this graph,
node 1 represents the source, nodes 2, 3, and 4 represent the inputs of Encoders 1, 2, and 3, respectively, nodes 5, 6, and 7 represent the outputs of Encoders 1, 2, and 3, respectively, while
nodes 8, 9, 10, and 11 represent the inputs of Decoders 1, 2, 3,
and 4, respectively. The mappings and are specified as

and

00189448/00$10.00 2000 IEEE

AHLSWEDE et al.: NETWORK INFORMATION FLOW

1205

Fig. 3. A single-level diversity coding system.


Fig. 1.

A multilevel diversity coding system.

Fig. 2. The graph

G representing the coding system in Fig. 1.

and
represents the information rates of
and
.
correNow all the edges in except for
spond to straight connections in Fig. 1, so there is no constraint
on the coding rate in these edges. Therefore, in order to determining , the set of all admissible for the graph (with the
and , we set
set of multicast requirements specified by
for all edges in except for
to
obtain the admissible coding rate region of the problem in Fig. 1.
A major finding in this paper is that, contrary to ones intuition, it is in general not optimal to consider the information
to be multicast in a network as a fluid which can simply be
routed or replicated at the intermediate nodes. Rather, network
coding has to be employed to achieve optimality. This fact is illustrated by examples in the next section.
In the rest of the paper, we focus our discussion on problems
, which we collectively refer to as the single-source
with
, we refer to them collecproblem. For problems with
tively as the multisource problem. The rest of the paper is organized as follows. In Section II, we propose a Max-flow Min-cut
theorem which characterizes the admissible coding rate region
of the single-source problem. In Section III, we formally state
the main result in this paper. The proof is presented in Sections
IV and V. In Section VI, we show that very simple optimal codes
do exist for certain networks. In Section VII, we use our results
for the single-source problem to solve a special case of the multisource problem which has application in video conferencing.
In this section, we also show that the multisource problem is
extremely difficult in general. Concluding remarks are in Section VIII.

Fig. 4.

The graph representing the coding system in Fig. 3.

other words, the information source


is generated at node
and is multicast to nodes
. We will call the source
the sinks of the graph . For a specific , the
and
problem will be referred to as the one-source -sink problem.
Let us first define some notations and terminology which will
be a graph with
be used in the rest of the paper. Let
. The capacity of an edge
source and sinks
is given by
, and let
. The subgraph of
from to ,
, refers to the graph
,
where
is on a directed path from
is a flow in

such that for all

except for

from

to

to
if for all

and

i.e., the total flow into node is equal to the total flow out of
is referred to as the value of in the edge
.
node .
The value of is defined as

which is equal to
II. A MAX-FLOW MIN-CUT THEOREM
In this section, we propose a theorem which characterizes the
admissible coding rate region for the single-source problem. For
, and
. In
this problem, we let

is a max-flow from to in if is a flow from to


whose value is greater than or equal to any other flow from to

1206

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 4, JULY 2000

(a)

(b)

(c)

Fig. 5. A one-source one-sink network.

. Evidently, a max-flow from to in is also a max-flow


from to in . For a graph with one source and one sink
(for example, the graph ), the value of a max-flow from the
source to the sink is called the capacity of the graph.
We begin our discussion by first reviewing a basic result of
diversity coding by considering the single-level diversity system
is the only information source (with
in Fig. 3. In this system,
rate ), and it is reconstructed by all the decoders. Henceforth,
and
when there is only
we will drop the subscripts of
one information source. Let be the coding rate of encoder ,
. In order for a decoder to reconstruct ,
and let
it is necessary that the sum of the coding rates of the encoders
accessible by this decoder is at least . Thus the conditions
(1)
(2)
(3)
(4)
are necessary for to be admissible. On the other hand, these
conditions are seen to be sufficient by the work of Singleton [10]
(also cf. [12]).
We now give a graph-theoretic interpretation of the above result. The graph corresponding to the system in Fig. 3 is given
in Fig. 4, where we use to label the source and to label a
, and
correspond to
, and
sink. Now
, respectively. So the edges
and
are
and are interlabeled accordingly. The quantities
preted as the capacity (in the sense of graph theory) of the corresponding edges. For the other edges in the graph, each one
of them corresponds to a straight connection in the system in
Fig. 3. Since there is no constraint on the coding rate in these
edges, we interpret the capacity of each of them as infinity. To
keep the graph simple, we do not label these edges. By considering the subgraph from to in Fig. 4, the condition in (1) can
be interpreted as the value of the max-flow from to being
greater than or equal to , the information rate of the source.
Similar interpretations can be made for the conditions in (2)-(4).
Based on the graph-theoretic interpretation of the above
diversity coding problem (which is a one-source four-sink
problem), we make the following conjecture.
be a graph with source and
Conjecture 1: Let
, and the capacity of an edge
be denoted
sinks
. Then
is admissible if and only if the values of
by

a max-flow from to
are greater than or equal
to , the rate of the information source.
The spirit of our conjecture resembles that of the celebrated
Max-flow Min-cut Theorem in graph theory [1]. Before we end
this section, we give a few examples to illustrate our conjecture.
We first illustrate by the example in Fig. 5 that the conjecture is
. Fig. 5(a) shows the capacity of each edge. By the
true for
Max-flow Min-cut Theorem [1], the value of a max-flow from
to is , so the flow in Fig. 5(b) is a max-flow. In Fig. 5(c), we
from to based
show how we can send three bits
on the max-flow in Fig. 5(b). The conjecture is trivially seen
, because when there is only one sink, we
to be true for
only need to treat the raw information bits as physical entities.
The bits are routed at the intermediate nodes according to any
fixed routing scheme, and they will all eventually arrive at the
sink. Since the routing scheme is fixed, the sink knows which
bit is coming in from which edge, and the information can be
recovered accordingly.
Next we illustrate by the example in Fig. 6 that the conjecture
. Fig. 6(a) shows the capacity of each edge. It
is true for
is easy to check that the value of a max-flow from to and
to are and , respectively. So the conjecture asserts that we
to and simultaneously, and
can send 5 bits
Fig. 6(b) shows such a scheme. Note that in this scheme, bits
only need to be replicated at the nodes to achieve optimality.
We now show another example in Fig. 7 to illustrate that the
. Fig. 7(a) shows the capacity of each
conjecture is true for
edge. It is easy to check that the value of a max-flow from to
is ,
. So the conjecture asserts that we can send 2 bits
to and simultaneously, and Fig. 7(b) shows such a
scheme, where denotes modulo addition. At , can be
. Similarly, can be recovered at
recovered from and
. Note that when there is more than one sink, we can no longer
think of information as a real entity, because information needs
to be replicated or transformed at the nodes. In this example,
information is coded at the node 3, which is unavoidable. For
, network coding is in general necessary in an optimal
multicast scheme.
Finally, we illustrate by the example in Fig. 8 that the con. Fig. 8(a) shows the capacity of each
jecture is true for
edge. It is easy to check that the values of a max-flow from to
all the sinks are . In Fig. 8(b), we show how we can multicast
2 bits
to all the sinks.

AHLSWEDE et al.: NETWORK INFORMATION FLOW

1207

(a)

(b)

Fig. 6.

A one-source two-sink network without coding.

Fig. 7.

A one-source two-sink network with coding.

(a)

The advantage of network coding can be seen from the examples in Figs. 7 and 8. As an illustration, we will quantify this
advantage for the example in Fig. 8 in two ways. First, we investigate the saving in bandwidth when network coding is allowed.
For the scheme in Fig. 8(b), a total of 9 bits are sent. If network
coding is not allowed, then it is easy to see that at least one more
and to recover both
bit has to be sent in order that for
and . Thus we see that a very simple network code can
save 10% in bandwidth. Second, we investigate the increase in
throughput when network coding is allowed. Using the scheme
in Fig. 8(b), if 2 bits are sent in each edge, then 4 bits can be
multicast to all the sinks. If network coding is not allowed (and
2 bits are sent in each edge), we now show that only 3 bits can
be the set
be multicast to all the sinks. Let
of bits to be multicast to all the sinks. Let the set of bits sent in
be , where
,
. At node , the
the edge
received bits are duplicated and sent in the two out-going edges.
Thus 2 bits are sent in each edge in the network. Since network
for any
.
coding is not allowed,
Then we have

(b)

Therefore,

which implies
. In Fig. 8(c), we show how 3 bits
and can be multicast to all the sinks by sending 2 bits in each
edge. Therefore, the throughput of the network can be increased
by one-third using a very simple network code.
III. MAIN RESULT
In this section, we formally present the main result in this
be a directed graph with source and
paper. Let
, and
be the capacity of an edge
in
sinks
. Since our conjecture concerns only the values of max-flows
from the source to the sinks, we assume without loss of generality that there is no edge in from a node (other than ) to ,
because such an edge does not increase the value of a max-flow

1208

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 4, JULY 2000

(a)

(b)

(c)

(d)

Fig. 8. A one-source three-sink network.

from to a sink. Further, we assume


for all
for
the same reason.
Let us consider a block code of length . We assume that ,
the value assumed by , is obtained by selecting an index from
a set with uniform distribution. The elements in are called
, node can send information to node
messages. For
which depends only on the information previously received by
node . Since the graph is arbitrary and may contain (directed)
cycles, a network code can in general be very complicated. In
this paper, we confine our discussion to a class of block codes,
called the -code, which is defined in the next paragraph.
-code on a graph is defined by
An
the following components (the construction of an -code from
these components will be described after their definitions are
given):
.
,

1) a positive integer
2)

, such that

.
,

3)

where
4) If

, then

, otherwise

, such that

where

and
5)

, where

such that for all


,
denotes the value of
where

for all
,
as a function of .

-code is constructed from these


The
components as follows. At the beginning of the coding sesis available to node . In the coding session, the value of
sion, there are transactions which take place in chronological
order, where each transaction refers to a node sending informaencodes
tion to another node. In the th transaction, node
to node
. The doaccording to and sends an index in
so far, and
main of is the information received by node
, the domain of is . If
we distinguish two cases. If
,
gives the indices of all previous transactions for
, so the domain of
which information was sent to node
is
. The set
gives the indices of all transactions
is the
for which information is sent from node to node , so
number of possible index-tuples that can be sent from node to
gives the indices
node during the coding session. Finally,
of all transactions for which information is sent to , and is
the decoding function at .

AHLSWEDE et al.: NETWORK INFORMATION FLOW

1209

We remark that the -code is not the most general possible


definition of a block code. For example, the order of transactions can depend on the value of . Also, coding can be done
probabilistically. (However, we prove in the Appendix that probabilistic coding does not improve performance.) Instead of a
block code, it is also possible to use a variable-length code.
. A tuple
is -admisLet
there exists, for sufficiently large , an
sible if for any
-code on such that
for all
bility.) Define

Thus

. (Note that -admissibility implies admissiis -admissible

The problem is to characterize


for any and .
with source , sinks
For a directed graph
, and the capacity of an edge
equals
, let
be the set consisting of all
such that the values of a
are greater than or equal to
max-flow from to
The following theorem is the main result in this paper.
Theorem 1:

IV. THE CONVERSE


In this section, we prove that
there exists for sufficiently large
-code on such that

, i.e., if for any


an

Minimizing the right-hand side over all

, we have

By the Max-flow Min-cut Theorem [1], the first term on the


right-hand side is equal to the value of a max-flow from to .
, we obtain the desired conclusion.
Letting
As a remark, even if we allow an arbitrarily small probability
of decoding error in the usual Shannon sense, by modifying our
proof by means of a standard application of Fanos inequality
[2], it can be seen that it is still necessary for the value of a
to be greater than or equal
max-flow from to ,
to . The details are omitted here.
V. ADMISSIBILITY

for all

, then the values of a max-flow from


are greater than or equal to .
and any
such that
Consider any
and
. Let

to

and
Let

where
and
denotes the value of as a function of
.
is all the information known by during the whole
coding session when the message is . Since for an -code,
is a function of the information previously received by
, we see inductively that
is a function of
node

In this section, we prove that


. In Section V-A,
we first prove the result when the graph is acyclic. Then this
result will be used to prove the general case in Section V-B.
A. Acyclic Networks
Assume the graph is acyclic. Let the vertices in be lain the following way. The source has
beled by
the label . The other vertices are labeled in a way such that for
,
implies
. Such a labeling is
as aliases
possible because is acyclic. We regard
of the corresponding vertices.
-code on the
We will consider an
graph defined by

for all
Since

such that

, where

can be determined at node , we have


for all

such that

, and

for all
such that
for all
(redenotes the value of
as a function of ). In
call that
is the encoding function for the edge
, while
the above,
is the decoding function for the sink . In the coding sesis applied before
if
, and
is applied
sion,

1210

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 4, JULY 2000

before
if
. This defines the order in which the enif
, all
coding functions are applied. Since
the necessary information is available when encoding at node
is done. If the set
is empty, we adopt the
is an arbitrary constant taken from the set
convention that
. An
-code is a special
-code defined in Section III.
case of an
being the
Now assume that the vector is such that, with
in , for all
, the values of
capacity of the edge
a max-flow from to is greater than or equal to . It suffices
, there exists for sufficiently
for us to show that for any
-code on such that
large an
. Instead, we will show the existence of an
-code satisfying the same set of conditions, and this will be done by a random procedure. For the
, where
time being, let us replace by
is any constant greater than . Thus the domain of
is exfor
.
panded from to
We now construct the encoding functions as follows. For all
such that
, for all
,
is defined
to be a value selected independently from the set
with uniform distribution. For all
, and for all

where

Let
take

be any fixed positive real number. For all


such that

for some

. Then

for all

In the second inequality above, we have used


, and the
last inequality follows from the Max-flow Min-cut Theorem [1].
Note that this upper bound does not depend on . Since has
subsets, and is some subset of

Further,
for some
is defined to be a value selected independently from the
with uniform distribution.
, and for
, let
, where
and
denotes the
as a function of .
is all the information
value of
received by node during the coding session when the message
, and
are indistinguishable at
is . For distinct
. For all
, define
the sink if and only if
set
Let

if
for some
otherwise.

Let
. Then
as
.
Hence, there exists a deterministic code for which the number
of messages which can be uniquely determined at all sinks is at
least

and

is equal to if and only if cannot be uniquely determined


and
.
at at least one of the sinks. Now fix
not equal to and define the sets
Consider any

Obviously,
Now suppose
, where

Therefore,

and
and

.
. Then
. For any

Therefore,

for some

which is greater than


for sufficiently large . Let
such messages in . Upon defining
any set of

where

such that

we have obtained a desired


theorem is proved.

to be

and

-code. The

B. Cyclic Networks
For cyclic networks, there is no natural ordering of the nodes
which allows coding in a sequential manner as in our discussion
on acyclic networks in the last section. In this section, we will

AHLSWEDE et al.: NETWORK INFORMATION FLOW

1211

prove our result in full generality which involves the construction of a more elaborate code.
(acyclic or cyclic) with
Consider any graph
, and the capacity of an edge
source and sinks
given by
. Assume for all
, the value
of a max-flow from to is greater than or equal to . We will
is -admissible.
prove that
We first construct a time-parametrized graph
from the graph . The set
consists of
layers of nodes,
each of which is a copy of . Specifically,

, where is the maximum length of a simple path from


to .
be fixed. Let
be a max-flow
Proof: Let
with value such that
does not contain
from to in
a positive directed cycle. Using the last lemma, we can write
, where
,
contains a
from to only. Specifically
positive simple path
if
otherwise

(6)

where
(7)
, let
Let be the length of . For an edge
be the distance of node from along . Clearly,

where

Now for

, define

As we will see later, is interpreted as the time parameter. The


consists of the following three types of edges:
set
where

1)

2)
3)

, let
be the source, and let
be a sink
For
. Clearly,
which corresponds to the sink in ,
is acyclic because each edge in
ends at a vertex in a layer
with a larger index.
be given by
Let the capacities of the edges in
. Let
be the capacity of an
, where
edge
if
for some
otherwise
and let

if
if
where
if
otherwise.

and

(5)

Lemma 1: Let and be the source and the sink of a graph ,


respectively. Then there exists a max-flow in expressible
as the sum of a number of flows for which each of them consists
of a simple path (i.e., a directed path without cycle) from to
only.
Proof: Let be a max-flow from to in which does
be
not contain a positive directed cycle (cf. [1, p. 45]). Let
is simple), and
any positive path from to in (evidently
let be the minimum value of in an edge along . Let
be the flow from to along
with value . Subtracting
from , is reduced to
, a flow from to which does
not contain a positive directed cycle. Apply the same procedure
is reduced to the zero flow. The lemma is
repeatedly until
proved.
Lemma 2: For
if the value of a max-flow from
to in is greater than or equal to , then the value of a
to in
is greater than or equal to
max-flow from

(8)

Since

the third case in (8) and hence


is well defined for
.
is a flow from
to in
derived
in as follows. A flow of is generated at
from the flow
and enters the th layer of nodes from
. Then the flow
traverses consecutive layers of nodes by emulating the path
in until it eventually reaches
, and it finally leaves
at the sink . Based on
, we construct

and

We will prove that


componentwise. Then
is a flow
, and from (7), its value is given by
from to in

This implies that the value of a max-flow from to in


is
, and the lemma is proved.
at least
, we only need to consider
Toward proving that
such that
for some
and
, because
is infinite otherwise (cf. (5)).

1212

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 4, JULY 2000

For notational convenience, we will adopt the convention that


for
. Now for
and

Let be a positive integer, and let


, we now construct an
on
-code on , where

. Using the -code

which is defined by the following components:


1) for
the set

such that

, a constant

taken from

2) for

for all

In the above, the third equality follows from (8), and the first
inequality is justified as follows. First, the inequality is justified
since
for
. For
,
for
, we
we distinguish two cases. From (8) and (6), if
have

If

, we have

such that

, where

and for

for all
such that
(if the set
is empty, we adopt the convention that
;
stant taken from

is a con-

3) for

Thus the inequality is justified for all cases. Hence we conclude


, and the lemma is proved.
that
From Lemma 2 and the result in Section V-A, we see that
is -admissible, where
.
, there exists for sufficiently large a
Thus for every
-code on
such that

for all
. For this -code on
, let us use
to
, and use
denote the encoding function for an edge
to denote the decoding function at the sink ,
.
Without loss of generality, we assume that for

for all

such that
denotes the value of

for all
(recall that
as a function of );

where
such that
,
1) for
is an arbitrary constant in
is empty);
, for all
,
and all
for all

2) for
and for

(
since

such that

,
,

in

in
3) for

for all

in

and

for all

in

Note that if the -code does not satisfy these assumptions, it can
readily be converted into one.

The coding process of the -code consists of


phases:
such that
, node sends
1) In Phase 1, for all
to node , and for all
such that
,
to node .
node send
, for all
, node sends
2) In Phase ,
to node , where
denotes the value of
as a function of , and it depends only on

AHLSWEDE et al.: NETWORK INFORMATION FLOW

1213

for all
such that
, i.e., the information
.
received by node during Phase
, for
, the sink uses to
3) In Phase
decode .
From the definitions, we see that an
-code on is a special case of an
-code on . For the -code we have constructed

Fig. 9.

for all
. Finally, for any
large , we have

Hence, we conclude that

, by taking a sufficiently
We now show that is admissible by presenting a coding
from the
scheme which can multicast
source to all the sinks. To simplify notation, we adopt the confor
. At time
, information
vention that
transactions occur in the following order:

is -admissible.

VI. AN EXAMPLE
Despite the complexity of our proof of Theorem 1 in the last
two sections, we will show in this section that very simple optimal codes do exist for certain cyclic networks. Therefore, there
is much room for further research on how to design simple optimal codes for (single-source) network information flow. The
code we construct in this section can be regarded as a kind of
convolutional code, which possesses many desirable properties
of a practical code.
in Fig. 9, where
Consider the graph

and
1)
2)
3)
4)
5)

contains the following types of edges for

;
;
and
;
;

denotes modulo addition. Here is the source and


are the sinks. In the graph , the edges
,
, and
form a cycle.
In this example, we let the information rate be . Without
loss of generality, we assume that the information source genat time
erates three symbols
where
are elements of some finite field GF . For the
purpose of our discussion, we can regard the sequence of symas deterministic. Consider the rate tuple
,
bols
the vector whose components are all equal to . Then the value
and Theorem 1 asof a max-flow from to is ,
serts that is admissible.
where

An example of a cyclic network.

T1.
T2.
T3.
T4.
T5.
T6.
T7.
T8.
T9.
T10.
T11.

sends
sends
sends
sends
sends
sends
sends
sends
decodes
decodes
decodes

to ,
to ,

, and

,
to
to
to
to
to
to

where denotes addition in GF . Note that the coding rate


in each edge is equal to , since exactly one symbol is sent in
each edge in one unit time.
We now show that these information transactions can actually
. At
, T1 and T2 can
be performed. Let us start at
obviously be performed. T3 and T4 can be performed because
. T5 and T6 can be performed since
and
have been sent to from and , respectively. T7
and
have
and T8 can be performed since
from
and , respectively. T9 can obviously
been sent to
. T10 can be performed since
be performed since
,
, and
have been sent to from ,
, and , respectively. Finally, T11 can be performed since
,
, and
have been sent to
from , , and , respectively.
Now assume that T1-T11 can be performed up to time
for some
, and we will show that they can be performed at
time . T1 and T2 can obviously be performed. Just before T3
has been sent to
from at time , and
is performed,
and
have been sent
from and , respectively, at time
. Therefore, at
to
, and T3
time , can determine

1214

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 4, JULY 2000

and T4 can be performed accordingly. By similar arguments, we


see that T5T8 can be performed. Just before T9 is performed,
and
have been sent to
from and , respectively, at time , and
has been
. Therefore, at time , T9 can be
sent to from at time
performed. By similar arguments, we see that T10 and T11 can
be performed.
and
are sent to from
and
At time ,
, respectively,
. With T9T11, at time , and
can recover
and
for all
, while
can recover
and
for all
, and
for
. Note the unit time delay for to recover
.
all
to
Thus our coding scheme can multicast
is admissible.
all the sinks, and hence
It is also possible to design convolutional codes for an acyclic
network. Compared with the block code we used in proving
Theorem 2, it seems that a convolution code has the advantage
that the code can be very simple, and both the memory at each
node and the end-to-end decoding delay can be very small.
VII. MULTIPLE SOURCES
In the classical information theory for point-to-point communication, if two information sources are independent, optimality can be achieved (asymptotically) by coding the sources
separately. This coding method is referred to as coding by superposition [12]. If this coding method is always optimal for
multisource network information flow problems, then in order
to solve the problem, we only need to solve the subproblems
for the individual information sources separately, where each of
these subproblems is a single-source problem. However, as we
will see shortly, the multisource problem is not a trivial extension of the single-source problem, and it is extremely difficult
in general.
Let us consider the multilevel diversity coding system in Fig.
. Since the sources
and
are
1. Assume that
independent, if coding by superposition is always optimal, then
, for
,
for any admissible coding rate triple
we can write

as prescribed above. Therefore, coding by superposition is not


optimal in general, even when the two information sources are
generated at the same node.
In [5],1 it was found that coding by superposition is optimal
for 86 out of all 100 configurations of multilevel diversity
coding systems with three encoders. In [8] and [13] it was
shown that coding by superposition is optimal for all symmetrical multilevel diversity coding systems. However, how
to characterize multilevel diversity coding systems for which
coding by superposition is always optimal is still an open
problem.
Although the multisource problem in general is extremely
difficult, there exist special cases which can be readily solved
by the results for the single-source problem. Consider a netinformation sources.
work information flow problem with
, and suppose
,
.
Let
is multicast from to sinks
Here, each information source
. It turns out that this problem can be reduced to a
single-source problem by adding the following components to
the graph :
1) a node ;
2) edges

to , the information rate of ,


, and call
Set
this augmented graph . Then we can regard all information
as one information source
(with rate
sources
) generated at node , where
is sent to node via
at rate . Then the problem can be regarded as a
edge
with source and
one-source -sink problem on the graph
.
sinks
In video-conferencing, the information generated by each
participant is multicast to all other participants on the network.
This is a special case of the situation described in the last
paragraph.
VIII. DISCUSSION

where and are the subrates associated with sources


and
, respectively. Since
is multicast to all the decoders, from
the discussion in Section II, we have the following constraints
:
for

In this paper, we have proposed a new class of problems called


network information flow which is inspired by computer network applications. This class of problems consolidates all previous work along this line [12], [8], [14] into a new direction in
multiterminal source coding.
In the past, most results in multiterminal source coding are
generalizations of either the SlepianWolf problem [9] or the
multiple descriptions problem [3]. The class of problems we
have proposed are generalizations of neither of these problems.
Further, they distinguish themselves from most classical multiterminal source coding problems in the following ways:

Similarly, since
is multicast to Decoders 2, 3, and 4, we have
:
the following constraints for

1) there is no rate-distortion consideration;


2) the sources are mutually independent;
3) the network configuration, described by a graph, is arbitrary;
4) the reconstruction requirements are arbitrary.
Our formulation covers a large class of problems instead of
one particular problem. For most classical multiterminal source

However, it was shown in [12] that the rate triple


is admissible, but it cannot be decomposed into two sets of subrates

1The

reader can contact Raymond Yeung for a copy of this reference.

AHLSWEDE et al.: NETWORK INFORMATION FLOW

coding problems, the problem degenerates if there is no rate-distortion consideration and the sources are mutually independent.
For our class of problems, neither of these assumptions is made.
Yet they are highly nontrivial problems.
In this paper, we have characterized the admissible coding
rate region of the single-source problem. Our result can be regarded as the Max-flow Min-cut Theorem for network information flow. We point out that our discussion is based on a class of
block codes called -codes. Therefore, it is possible, though not
likely, that our result can be enhanced by considering more general coding schemes. Nevertheless, we prove in the Appendix
that probabilistic coding does not improve performance.
In analog telephony, when a point-to-point call is established,
there is a physical connection between the two parties. When
a conference call is established, there is a physical connection
among all the parties involved. In computer communication
(which is digital), we used to think that for multicasting, there
must be a logical connection among all the parties involved
such that raw information bits are sent to the destinations
via such a connection. The notion of a logical connection
in computer communication is analogous to the notion of a
physical connection in analog telephony. As a result, multicasting in a computer network is traditionally being thought
of as replicating bits at the nodes, so that each sink eventually
receive a copy of all the bits. The most important contribution
of the current paper is to show that the traditional technique for
multicasting in a computer network in general is not optimal.
Rather, we should think of information as being diffused
through the network from the source to the sinks by means
of network coding. This is a new concept in multicasting in a
point-to-point network which may have significant impact on
future design of switching systems.
In classical information theory for point-to-point communication, we can think of information as a fluid or some kind of
physical entity. For network information flow with one source,
this analogy continues to hold when there is one sink, because
information flow conserves at all the intermediate nodes in an
optimal scheme. However, the analogy fails for multicasting because information needs to be replicated or coded at the nodes.
The problem becomes more complicated when there are
more than one source. In the classical information theory for
point-to-point communication, if two sources are independent,
optimality can be achieved (asymptotically) by coding the
sources separately. However, it has been shown by a simple
example in [12] that for simultaneous multicast of two sources,
it may be necessary to code the sources jointly in order to
achieve optimality. A special case of the multisource multisink
problem which finds application in satellite communication
has been studied in [14]. In this work, they obtained inner and
outer bounds on the admissible coding rate region.
For future research, the multisource multisink problem is a
challenging problem. For the single-source problem, there are
still many unresolved issues which are worth further investigation. In proving our result for acyclic graphs, we have used a
random block code. Recently, Li and Yeung [4] have devised a
systematic procedure to construct linear codes for acyclic networks. Along another line, the example in Section V shows
that convolutional codes are good alternatives to block codes.
It seems that convolutional codes have the advantage that the
code can be very simple, and the memory at each node and the

1215

end-to-end decoding delay can be very small. These are all desirable features for practical codes.
Finally, by imposing the constraint that network coding is not
allowed, i.e., each node functions as a switch in existing comis admisputer networks, we can ask whether a rate tuple
sible. Also, we can ask under what condition can optimality be
achieved without network coding. These are interesting problems for further research.
Recently, there has been a lot of interest in factor graph
[7], a graphical model which subsumes Markov random field,
Bayesian network, and Tanner graph. In particular, the problem
of representing codes in graphs [11], [6] has received much
attention. The codes we construct for a given network in this
paper can be regarded as a special type of codes in a graph.
APPENDIX
PROBABILISTIC CODING DOES NOT IMPROVE PERFORMANCE
For an -code, the th transaction of the coding process is
specified by a mapping . Suppose instead of the mapping ,
the th transaction is specified by a transition matrix from the
domain of to the range of . Also, instead of the mapping ,
decoding at sink is specified by a transition matrix from the
. Then the code bedomain of to the range of ,
comes a probabilistic code, and we refer to such a code as a probabilistic -code. With a slight abuse of notation, we continue to
to denote the code in the th transaction (where
is a
use
to denote
.
random variable), and we use
In general, one can use probabilistic coding schemes instead of deterministic coding schemes. By using probabilistic
schemes, it may be possible to multicast information from
to ,
at a rate higher than that permitted by
deterministic schemes. Before showing that this is impossible,
however, we first discuss a subtlety of probabilistic coding.
For a probabilistic -code on a graph , it seems intuitively
and any
such that
correct that for any
and
, the information source ,
,
form a Markov chain because all the information sent
and
from to has to go through the set of nodes
. If this is the case, then by the Data Processing Theorem
[2], we have

where the last equality holds because can be recovered at sink


. However, we show next by an example that the Markov chain
asserted is not valid in general.
and .
Consider the graph in Fig. 10 with three nodes
be uniformly distributed on GF
, and
Let
be independent of and uniformly distributed on GF . Consider the following probabilistic -code with five transactions:

1216

Fig. 10.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 4, JULY 2000

A three-node network.

Note that the fourth transaction is possible because upon


and
, can be determined. For
,
knowing
. Then

and

Now observe that for a fixed , the coding scheme becomes deterministic. Therefore, the probabilistic coding scheme is actually a mixture of deterministic coding schemes. By time-sharing
(use
these deterministic coding schemes according to
approximation if necessary), we obtain a deterministic coding
scheme. Hence, any coding rate tuple achievable by a probabilistic coding scheme can be achieved asymptotically by a sequence of deterministic coding schemes.

ACKNOWLEDGMENT
Raymond Yeung would like to thank Ho-leung Chan and
Lihua Song for their useful inputs.

REFERENCES
Now from , both
and
can be recovered. However,
, it is impossible to recover
.
from
Therefore, the Markov chain asserted in the last paragraph is
invalid.
We now show that the use of probabilistic coding cannot reduce coding rates. Consider any probabilistic coding scheme,
and let be the random parameter (assumed to be real) of the
. Without loss of genscheme with distribution function
erality, we assume that is independent of . This assumption
can be justified by showing that if is not independent of ,
then we can construct an equivalent probabilistic coding scheme
is independent of . Define an inwhose random parameter
, where denotes
dependent random vector
,
are mutually independent;
the alphabet set of ;
has marginal distribution function
. We then
and
as the random parameter of the coding scheme when the
use
message is . This coding scheme, which uses as the random
parameter, is equivalent to the original scheme using as the
random parameter.
be the coding rate tuple
Let
incurred when the message is and the random parameter takes
the value . (Here the coding scheme can be variable-length, so
may depend on .) Since and are independent, the average
coding rate tuple of this coding scheme is given by

[1] B. Bollobas, Graph Theory, An Introductory Course. New York:


Springer-Verlag, 1979.
[2] T. M. Cover and J. A. Thomas, Elements of Information Theory. New
York: Wiley, 1991.
[3] A. El Gamal and T. M. Cover, Achievable rates for multiple descriptions, IEEE Trans. Inform. Theory, vol. IT28, pp. 851857, Nov. 1982.
[4] S.-Y. R. Li and R. W. Yeung, Linear network coding, IEEE Trans.
Inform. Theory, submitted for publication.
[5] K. P. Hau, Multilevel diversity coding with independent data streams,
M.Phil. thesis, The Chinese Univ. Hong Kong, June 1995.
[6] R. Koetter and A. Vardy, Factor graphs, trellis formations, and generalized state realizations, presented at the Institute for Mathematics
and Its Applications, Aug. 6, 1999, available at http://www/ima.umn.
edu/talks/workshops/aug12-13.99/koetter.html.
[7] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and
the sum-product algorithm, IEEE Trans. Inform. Theory, submitted for
publication.
[8] J. R. Roche, R. W. Yeung, and K. P. Hau, Symmetrical multilevel diversity coding, IEEE Trans. Inform. Theory, vol. 43, pp. 10591064,
May 1997.
[9] D. Slepian and J. K. Wolf, Noiseless coding of correlated information
sources, IEEE Trans. Inform. Theory, vol. IT-19, pp. 471480, July
1973.
[10] R. C. Singleton, Maximum distance q-nary codes, IEEE Trans. Inform. Theory, vol. IT-10, pp. 116118, Apr. 1964.
[11] N. Wiberg, H.-A. Loileger, and R. Koetter, Codes and iterative
decoding on general graphs, Euro. Trans. Telecommun., vol. 6, pp.
513526, Sept. 1995.
[12] R. W. Yeung, Multilevel diversity coding with distortion, IEEE Trans.
Inform. Theory, vol. 41, pp. 412422, Mar. 1995.
[13] R. W. Yeung and Z. Zhang, On symmetrical multilevel diversity
coding, IEEE Trans. Inform. Theory, vol. 45, pp. 609621, Mar. 1999.
[14]
, Distributed source coding for satellite communications, IEEE
Trans. Inform. Theory, vol. 45, pp. 11111120, May 1999.

You might also like