Ahl Swede 00
Ahl Swede 00
Ahl Swede 00
I. INTRODUCTION
ET be the set of nodes of a point-to-point communication network. Such a network is represented by a directed
, where is the set of edges, such that ingraph
formation can be sent noiselessly from node to node for all
. An example of this type of networks is the Internet
backbone, where with proper data link protocols information
can be sent between nodes essentially free of noise.
be mutually independent information
Let
is
sources. The information rate (in bits per unit time) of
. Let
denoted by , and let
and
be arbitrary mappings. The source
is generated at node
, and it is multicast to node for
. The mappings
and the vector specify a set
all
of multicast requirements.
In this model, the graph may represent a physical network,
while the set of multicast requirements may represent the aggreManuscript received February 25, 1998; revised March 6, 2000. This work
was supported in part under Grants CUHK95E/480 and CUHK332/96E from
the Research Grant Council of the Hong Kong Special Administrative Region,
China. The material in this paper was presented in part at the IEEE International
Symposium on Information Theory, MIT, Cambridge, MA, August 1621, 1998
R. Ahlswede is with Fakultt fr Mathematik, Universitt Bielefeld, 33501
Bielefeld, Germany (e-mail: [email protected]).
N. Cai was with Fakultt fr Mathematik, Universitt Bielefeld, 33501
Bielefeld, Germany. He is now with the Department of Information Engineering, The Chinese University of Hong Kong, N.T., Hong Kong (e-mail:
[email protected]).
S.-Y. R. Li and R. W. Yeung are with the Department of Information Engineering, The Chinese University of Hong Kong, N.T., Hong Kong (e-mail:
[email protected]; [email protected]).
Communicated by R. L. Cruz, Associate Editor for Communication Networks.
Publisher Item Identifier S 0018-9448(00)05297-4.
gated traffic pattern the network needs to support. In other situations, the graph may represent a subnetwork in a physical
network, while the set of multicast requirements may pertain to
a specific application on this subnetwork, e.g., a video-conference call.
In existing computer networks, each node functions as a
switch in the sense that it either relays information from an
input link to an output link, or it replicates information received
from an input link and sends it to a certain set of output links.
From the information-theoretic point of view, there is no
reason to restrict the function of a node to that of a switch.
Rather, a node can function as an encoder in the sense that
it receives information from all the input links, encodes, and
sends information to all the output links. From this point of
view, a switch is a special case of an encoder. In the sequel, we
will refer to coding at a node in a network as network coding.
be a nonnegative real number associated with the
Let
, and let
. For a fixed set of muledge
ticast requirements, a vector is admissible if and only if there
exists a coding scheme satisfying the set of multicast requirements such that the coding rate from node to node (i.e., the
average number of bits sent from node to node per unit time)
for all
. (At this point we
is less than or equal to
leave the details of a coding scheme open because it is extremely
difficult to define the most general form of a coding scheme. A
class of coding schemes called -codes will be studied in Secis called the capacity of the edge
tion III.) In graph theory,
. Our goal is to characterize the admissible coding rate region , i.e., the set of all admissible , for any graph and
and .
multicast requirements
The model we have described includes both multilevel diversity coding (without distortion) [12], [8], [13] and distributed
source coding [14] as special cases. As an illustration, let us
show how the multilevel diversity coding system in Fig. 1 can
be formulated as a special case of our model. In this system,
and
. Decoder 1 reconstructs
there are two sources,
only, while all other decoders reconstruct both
and
. Let
be the coding rate of Encoder ,
. In our model, the
system is represented by the graph in Fig. 2. In this graph,
node 1 represents the source, nodes 2, 3, and 4 represent the inputs of Encoders 1, 2, and 3, respectively, nodes 5, 6, and 7 represent the outputs of Encoders 1, 2, and 3, respectively, while
nodes 8, 9, 10, and 11 represent the inputs of Decoders 1, 2, 3,
and 4, respectively. The mappings and are specified as
and
1205
and
represents the information rates of
and
.
correNow all the edges in except for
spond to straight connections in Fig. 1, so there is no constraint
on the coding rate in these edges. Therefore, in order to determining , the set of all admissible for the graph (with the
and , we set
set of multicast requirements specified by
for all edges in except for
to
obtain the admissible coding rate region of the problem in Fig. 1.
A major finding in this paper is that, contrary to ones intuition, it is in general not optimal to consider the information
to be multicast in a network as a fluid which can simply be
routed or replicated at the intermediate nodes. Rather, network
coding has to be employed to achieve optimality. This fact is illustrated by examples in the next section.
In the rest of the paper, we focus our discussion on problems
, which we collectively refer to as the single-source
with
, we refer to them collecproblem. For problems with
tively as the multisource problem. The rest of the paper is organized as follows. In Section II, we propose a Max-flow Min-cut
theorem which characterizes the admissible coding rate region
of the single-source problem. In Section III, we formally state
the main result in this paper. The proof is presented in Sections
IV and V. In Section VI, we show that very simple optimal codes
do exist for certain networks. In Section VII, we use our results
for the single-source problem to solve a special case of the multisource problem which has application in video conferencing.
In this section, we also show that the multisource problem is
extremely difficult in general. Concluding remarks are in Section VIII.
Fig. 4.
except for
from
to
to
if for all
and
i.e., the total flow into node is equal to the total flow out of
is referred to as the value of in the edge
.
node .
The value of is defined as
which is equal to
II. A MAX-FLOW MIN-CUT THEOREM
In this section, we propose a theorem which characterizes the
admissible coding rate region for the single-source problem. For
, and
. In
this problem, we let
1206
(a)
(b)
(c)
a max-flow from to
are greater than or equal
to , the rate of the information source.
The spirit of our conjecture resembles that of the celebrated
Max-flow Min-cut Theorem in graph theory [1]. Before we end
this section, we give a few examples to illustrate our conjecture.
We first illustrate by the example in Fig. 5 that the conjecture is
. Fig. 5(a) shows the capacity of each edge. By the
true for
Max-flow Min-cut Theorem [1], the value of a max-flow from
to is , so the flow in Fig. 5(b) is a max-flow. In Fig. 5(c), we
from to based
show how we can send three bits
on the max-flow in Fig. 5(b). The conjecture is trivially seen
, because when there is only one sink, we
to be true for
only need to treat the raw information bits as physical entities.
The bits are routed at the intermediate nodes according to any
fixed routing scheme, and they will all eventually arrive at the
sink. Since the routing scheme is fixed, the sink knows which
bit is coming in from which edge, and the information can be
recovered accordingly.
Next we illustrate by the example in Fig. 6 that the conjecture
. Fig. 6(a) shows the capacity of each edge. It
is true for
is easy to check that the value of a max-flow from to and
to are and , respectively. So the conjecture asserts that we
to and simultaneously, and
can send 5 bits
Fig. 6(b) shows such a scheme. Note that in this scheme, bits
only need to be replicated at the nodes to achieve optimality.
We now show another example in Fig. 7 to illustrate that the
. Fig. 7(a) shows the capacity of each
conjecture is true for
edge. It is easy to check that the value of a max-flow from to
is ,
. So the conjecture asserts that we can send 2 bits
to and simultaneously, and Fig. 7(b) shows such a
scheme, where denotes modulo addition. At , can be
. Similarly, can be recovered at
recovered from and
. Note that when there is more than one sink, we can no longer
think of information as a real entity, because information needs
to be replicated or transformed at the nodes. In this example,
information is coded at the node 3, which is unavoidable. For
, network coding is in general necessary in an optimal
multicast scheme.
Finally, we illustrate by the example in Fig. 8 that the con. Fig. 8(a) shows the capacity of each
jecture is true for
edge. It is easy to check that the values of a max-flow from to
all the sinks are . In Fig. 8(b), we show how we can multicast
2 bits
to all the sinks.
1207
(a)
(b)
Fig. 6.
Fig. 7.
(a)
The advantage of network coding can be seen from the examples in Figs. 7 and 8. As an illustration, we will quantify this
advantage for the example in Fig. 8 in two ways. First, we investigate the saving in bandwidth when network coding is allowed.
For the scheme in Fig. 8(b), a total of 9 bits are sent. If network
coding is not allowed, then it is easy to see that at least one more
and to recover both
bit has to be sent in order that for
and . Thus we see that a very simple network code can
save 10% in bandwidth. Second, we investigate the increase in
throughput when network coding is allowed. Using the scheme
in Fig. 8(b), if 2 bits are sent in each edge, then 4 bits can be
multicast to all the sinks. If network coding is not allowed (and
2 bits are sent in each edge), we now show that only 3 bits can
be the set
be multicast to all the sinks. Let
of bits to be multicast to all the sinks. Let the set of bits sent in
be , where
,
. At node , the
the edge
received bits are duplicated and sent in the two out-going edges.
Thus 2 bits are sent in each edge in the network. Since network
for any
.
coding is not allowed,
Then we have
(b)
Therefore,
which implies
. In Fig. 8(c), we show how 3 bits
and can be multicast to all the sinks by sending 2 bits in each
edge. Therefore, the throughput of the network can be increased
by one-third using a very simple network code.
III. MAIN RESULT
In this section, we formally present the main result in this
be a directed graph with source and
paper. Let
, and
be the capacity of an edge
in
sinks
. Since our conjecture concerns only the values of max-flows
from the source to the sinks, we assume without loss of generality that there is no edge in from a node (other than ) to ,
because such an edge does not increase the value of a max-flow
1208
(a)
(b)
(c)
(d)
1) a positive integer
2)
, such that
.
,
3)
where
4) If
, then
, otherwise
, such that
where
and
5)
, where
for all
,
as a function of .
1209
Thus
, we have
for all
to
and
Let
where
and
denotes the value of as a function of
.
is all the information known by during the whole
coding session when the message is . Since for an -code,
is a function of the information previously received by
, we see inductively that
is a function of
node
for all
Since
such that
, where
such that
, and
for all
such that
for all
(redenotes the value of
as a function of ). In
call that
is the encoding function for the edge
, while
the above,
is the decoding function for the sink . In the coding sesis applied before
if
, and
is applied
sion,
1210
before
if
. This defines the order in which the enif
, all
coding functions are applied. Since
the necessary information is available when encoding at node
is done. If the set
is empty, we adopt the
is an arbitrary constant taken from the set
convention that
. An
-code is a special
-code defined in Section III.
case of an
being the
Now assume that the vector is such that, with
in , for all
, the values of
capacity of the edge
a max-flow from to is greater than or equal to . It suffices
, there exists for sufficiently
for us to show that for any
-code on such that
large an
. Instead, we will show the existence of an
-code satisfying the same set of conditions, and this will be done by a random procedure. For the
, where
time being, let us replace by
is any constant greater than . Thus the domain of
is exfor
.
panded from to
We now construct the encoding functions as follows. For all
such that
, for all
,
is defined
to be a value selected independently from the set
with uniform distribution. For all
, and for all
where
Let
take
for some
. Then
for all
Further,
for some
is defined to be a value selected independently from the
with uniform distribution.
, and for
, let
, where
and
denotes the
as a function of .
is all the information
value of
received by node during the coding session when the message
, and
are indistinguishable at
is . For distinct
. For all
, define
the sink if and only if
set
Let
if
for some
otherwise.
Let
. Then
as
.
Hence, there exists a deterministic code for which the number
of messages which can be uniquely determined at all sinks is at
least
and
Obviously,
Now suppose
, where
Therefore,
and
and
.
. Then
. For any
Therefore,
for some
where
such that
to be
and
-code. The
B. Cyclic Networks
For cyclic networks, there is no natural ordering of the nodes
which allows coding in a sequential manner as in our discussion
on acyclic networks in the last section. In this section, we will
1211
prove our result in full generality which involves the construction of a more elaborate code.
(acyclic or cyclic) with
Consider any graph
, and the capacity of an edge
source and sinks
given by
. Assume for all
, the value
of a max-flow from to is greater than or equal to . We will
is -admissible.
prove that
We first construct a time-parametrized graph
from the graph . The set
consists of
layers of nodes,
each of which is a copy of . Specifically,
(6)
where
(7)
, let
Let be the length of . For an edge
be the distance of node from along . Clearly,
where
Now for
, define
1)
2)
3)
, let
be the source, and let
be a sink
For
. Clearly,
which corresponds to the sink in ,
is acyclic because each edge in
ends at a vertex in a layer
with a larger index.
be given by
Let the capacities of the edges in
. Let
be the capacity of an
, where
edge
if
for some
otherwise
and let
if
if
where
if
otherwise.
and
(5)
(8)
Since
and
1212
such that
, a constant
taken from
2) for
for all
In the above, the third equality follows from (8), and the first
inequality is justified as follows. First, the inequality is justified
since
for
. For
,
for
, we
we distinguish two cases. From (8) and (6), if
have
If
, we have
such that
, where
and for
for all
such that
(if the set
is empty, we adopt the convention that
;
stant taken from
is a con-
3) for
for all
. For this -code on
, let us use
to
, and use
denote the encoding function for an edge
to denote the decoding function at the sink ,
.
Without loss of generality, we assume that for
for all
such that
denotes the value of
for all
(recall that
as a function of );
where
such that
,
1) for
is an arbitrary constant in
is empty);
, for all
,
and all
for all
2) for
and for
(
since
such that
,
,
in
in
3) for
for all
in
and
for all
in
Note that if the -code does not satisfy these assumptions, it can
readily be converted into one.
1213
for all
such that
, i.e., the information
.
received by node during Phase
, for
, the sink uses to
3) In Phase
decode .
From the definitions, we see that an
-code on is a special case of an
-code on . For the -code we have constructed
Fig. 9.
for all
. Finally, for any
large , we have
, by taking a sufficiently
We now show that is admissible by presenting a coding
from the
scheme which can multicast
source to all the sinks. To simplify notation, we adopt the confor
. At time
, information
vention that
transactions occur in the following order:
is -admissible.
VI. AN EXAMPLE
Despite the complexity of our proof of Theorem 1 in the last
two sections, we will show in this section that very simple optimal codes do exist for certain cyclic networks. Therefore, there
is much room for further research on how to design simple optimal codes for (single-source) network information flow. The
code we construct in this section can be regarded as a kind of
convolutional code, which possesses many desirable properties
of a practical code.
in Fig. 9, where
Consider the graph
and
1)
2)
3)
4)
5)
;
;
and
;
;
T1.
T2.
T3.
T4.
T5.
T6.
T7.
T8.
T9.
T10.
T11.
sends
sends
sends
sends
sends
sends
sends
sends
decodes
decodes
decodes
to ,
to ,
, and
,
to
to
to
to
to
to
1214
Similarly, since
is multicast to Decoders 2, 3, and 4, we have
:
the following constraints for
1The
coding problems, the problem degenerates if there is no rate-distortion consideration and the sources are mutually independent.
For our class of problems, neither of these assumptions is made.
Yet they are highly nontrivial problems.
In this paper, we have characterized the admissible coding
rate region of the single-source problem. Our result can be regarded as the Max-flow Min-cut Theorem for network information flow. We point out that our discussion is based on a class of
block codes called -codes. Therefore, it is possible, though not
likely, that our result can be enhanced by considering more general coding schemes. Nevertheless, we prove in the Appendix
that probabilistic coding does not improve performance.
In analog telephony, when a point-to-point call is established,
there is a physical connection between the two parties. When
a conference call is established, there is a physical connection
among all the parties involved. In computer communication
(which is digital), we used to think that for multicasting, there
must be a logical connection among all the parties involved
such that raw information bits are sent to the destinations
via such a connection. The notion of a logical connection
in computer communication is analogous to the notion of a
physical connection in analog telephony. As a result, multicasting in a computer network is traditionally being thought
of as replicating bits at the nodes, so that each sink eventually
receive a copy of all the bits. The most important contribution
of the current paper is to show that the traditional technique for
multicasting in a computer network in general is not optimal.
Rather, we should think of information as being diffused
through the network from the source to the sinks by means
of network coding. This is a new concept in multicasting in a
point-to-point network which may have significant impact on
future design of switching systems.
In classical information theory for point-to-point communication, we can think of information as a fluid or some kind of
physical entity. For network information flow with one source,
this analogy continues to hold when there is one sink, because
information flow conserves at all the intermediate nodes in an
optimal scheme. However, the analogy fails for multicasting because information needs to be replicated or coded at the nodes.
The problem becomes more complicated when there are
more than one source. In the classical information theory for
point-to-point communication, if two sources are independent,
optimality can be achieved (asymptotically) by coding the
sources separately. However, it has been shown by a simple
example in [12] that for simultaneous multicast of two sources,
it may be necessary to code the sources jointly in order to
achieve optimality. A special case of the multisource multisink
problem which finds application in satellite communication
has been studied in [14]. In this work, they obtained inner and
outer bounds on the admissible coding rate region.
For future research, the multisource multisink problem is a
challenging problem. For the single-source problem, there are
still many unresolved issues which are worth further investigation. In proving our result for acyclic graphs, we have used a
random block code. Recently, Li and Yeung [4] have devised a
systematic procedure to construct linear codes for acyclic networks. Along another line, the example in Section V shows
that convolutional codes are good alternatives to block codes.
It seems that convolutional codes have the advantage that the
code can be very simple, and the memory at each node and the
1215
end-to-end decoding delay can be very small. These are all desirable features for practical codes.
Finally, by imposing the constraint that network coding is not
allowed, i.e., each node functions as a switch in existing comis admisputer networks, we can ask whether a rate tuple
sible. Also, we can ask under what condition can optimality be
achieved without network coding. These are interesting problems for further research.
Recently, there has been a lot of interest in factor graph
[7], a graphical model which subsumes Markov random field,
Bayesian network, and Tanner graph. In particular, the problem
of representing codes in graphs [11], [6] has received much
attention. The codes we construct for a given network in this
paper can be regarded as a special type of codes in a graph.
APPENDIX
PROBABILISTIC CODING DOES NOT IMPROVE PERFORMANCE
For an -code, the th transaction of the coding process is
specified by a mapping . Suppose instead of the mapping ,
the th transaction is specified by a transition matrix from the
domain of to the range of . Also, instead of the mapping ,
decoding at sink is specified by a transition matrix from the
. Then the code bedomain of to the range of ,
comes a probabilistic code, and we refer to such a code as a probabilistic -code. With a slight abuse of notation, we continue to
to denote the code in the th transaction (where
is a
use
to denote
.
random variable), and we use
In general, one can use probabilistic coding schemes instead of deterministic coding schemes. By using probabilistic
schemes, it may be possible to multicast information from
to ,
at a rate higher than that permitted by
deterministic schemes. Before showing that this is impossible,
however, we first discuss a subtlety of probabilistic coding.
For a probabilistic -code on a graph , it seems intuitively
and any
such that
correct that for any
and
, the information source ,
,
form a Markov chain because all the information sent
and
from to has to go through the set of nodes
. If this is the case, then by the Data Processing Theorem
[2], we have
1216
Fig. 10.
A three-node network.
and
Now observe that for a fixed , the coding scheme becomes deterministic. Therefore, the probabilistic coding scheme is actually a mixture of deterministic coding schemes. By time-sharing
(use
these deterministic coding schemes according to
approximation if necessary), we obtain a deterministic coding
scheme. Hence, any coding rate tuple achievable by a probabilistic coding scheme can be achieved asymptotically by a sequence of deterministic coding schemes.
ACKNOWLEDGMENT
Raymond Yeung would like to thank Ho-leung Chan and
Lihua Song for their useful inputs.
REFERENCES
Now from , both
and
can be recovered. However,
, it is impossible to recover
.
from
Therefore, the Markov chain asserted in the last paragraph is
invalid.
We now show that the use of probabilistic coding cannot reduce coding rates. Consider any probabilistic coding scheme,
and let be the random parameter (assumed to be real) of the
. Without loss of genscheme with distribution function
erality, we assume that is independent of . This assumption
can be justified by showing that if is not independent of ,
then we can construct an equivalent probabilistic coding scheme
is independent of . Define an inwhose random parameter
, where denotes
dependent random vector
,
are mutually independent;
the alphabet set of ;
has marginal distribution function
. We then
and
as the random parameter of the coding scheme when the
use
message is . This coding scheme, which uses as the random
parameter, is equivalent to the original scheme using as the
random parameter.
be the coding rate tuple
Let
incurred when the message is and the random parameter takes
the value . (Here the coding scheme can be variable-length, so
may depend on .) Since and are independent, the average
coding rate tuple of this coding scheme is given by