A Network-Flow-Based RDL Routing Algorithmz For Flip-Chip Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO.

8, AUGUST 2007 1417

A Network-Flow-Based RDL Routing Algorithmz


for Flip-Chip Design
Jia-Wei Fang, Student Member, IEEE, I-Jye Lin, Student Member, IEEE,
Yao-Wen Chang, Member, IEEE, and Jyh-Herng Wang, Member, IEEE

Abstract—The flip-chip package gives the highest chip density


of any packaging method to support the pad-limited application-
specific integrated circuit designs. In this paper, we propose the
first router for the flip-chip package in the literature. The router
can redistribute nets from wire-bonding pads to bump pads and
then route each of them. The router adopts a two-stage technique
of global routing followed by detailed routing. In global routing,
we use the network flow algorithm to solve the assignment problem
from the wire-bonding pads to the bump pads and then create
the global path for each net. The detailed routing consists of
three stages, namely: 1) cross-point assignment; 2) net ordering
determination; and 3) track assignment, to complete the routing.
Experimental results based on seven real designs from the industry
demonstrate that the router can reduce the total wirelength by
10.2%, the critical wirelength by 13.4%, and the signal skews by
13.9%, as compared with a heuristic algorithm currently used in
industry.
Index Terms—Detailed routing, global routing, physical design.

I. I NTRODUCTION
Fig. 1. (a) Flip-chip. (b) Flip-chip package.
A. Flip-Chip Design
outside devices of the package. The die of a PGA/BGA package
D UE TO THE increasing complexity and decreasing fea-
ture size of very large scale integration (VLSI) designs,
the demand of more I/O pads has become a significant problem
is attached to the carrier face up, and later, a wire is bonded first
to the die, then looped and bonded to the carrier. In contrast,
of package technologies. A relatively new packaging technol- the interconnection between the die and carrier in the flip-chip
ogy, i.e., the flip-chip package, as shown in Fig. 1, is created for package is made through a conductive bump ball that is placed
higher integration density and rising power consumption. Flip- directly on the die surface. Finally, the bumped die is flipped
chip bonding was first developed by IBM in the 1960s. It gives over and placed face down, with the bump balls connecting to
the highest chip density of any packaging method to support the the carrier directly. The flip-chip technology is the choice in
pad-limited application-specific integrated circuit designs. high-speed applications because of the following advantages:
Flip-chip is not a specific package, or even a package type, reduced signal inductance (high speed), reduced power/ground
e.g., pin grid array (PGA) or ball grid array (BGA). Flip-chip inductance (low power), reduced package footprint, smaller die
describes the method of electrically connecting the die to the size, higher signal density, and lower thermal effect. However,
package carrier. The package carrier, which is either a substrate in recent integrated circuit designs, the I/O pads are still placed
or a lead frame, provides the connection from the die to the along the boundary of the die. This placement does not suit the
flip-chip package. As a result, we use the top metal or an extra
metal layer, which is called a redistribution layer (RDL), as
Manuscript received January 4, 2006; revised September 17, 2006. This work shown in Fig. 2, to redistribute the wire-bonding pads to the
was supported in part by Faraday Technology Corporation of Taiwan and in
part by the National Science Council of Taiwan under Grants NSC 93-2215-
bump pads without changing the placement of the I/O pads.
E-002-009, NSC 93-2215-E-002-029, and NSC 93-2752-E-002-008-PAE. This Since the RDL is the top metal layer of the die, the routing angle
paper was presented in part at the 2005 IEEE/ACM International Conference in an RDL cannot be any angle as in the PGA/BGA packages.
on Computer-Aided Design, San Jose, CA, November 2005. This paper was
recommended by Associate Editor J. Hu.
Bump balls are placed on the RDL and use the RDL to connect
J.-W. Fang and I-J. Lin are with the Graduate Institute of Electronics to wire-bonding pads by bump pads.
Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: The flip-chip package is generally classified into two types,
[email protected]; [email protected]).
Y.-W. Chang is with the Department of Electrical Engineering and Graduate namely: 1) the peripheral array, as shown in Fig. 3(a), and
Institute of Electronics Engineering, National Taiwan University, Taipei 106, 2) the area array, as shown in Fig. 3(b). In the peripheral array,
Taiwan, R.O.C. (e-mail: [email protected]). the bump balls are placed along the boundary of the flip-chip
J.-H. Wang is with Apache Design Solutions, Hsinchu 300, Taiwan, R.O.C.
(e-mail : [email protected]) package. The disadvantage of the peripheral array is that we
Digital Object Identifier 10.1109/TCAD.2007.891364 only have a limited number of bump balls. In the area array,
0278-0070/$25.00 © 2007 IEEE
1418 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 8, AUGUST 2007

can be placed anywhere in the chip, it is a nameplate-complete


problem, and thus, most likely, there exists no efficient optimal
algorithm for the planar routing. In the flip-chip routing, since
wire-bonding pads and bump pads are placed in arrays, we can
take the advantage of the regular structure to find an efficient
algorithm for the RDL routing. Thus, the flip-chip routing
problem is also different from the planar routing one.

Fig. 2. Cross section of RDL. C. Our Contributions


To our best knowledge, this paper is the first work in the
literature to propose an RDL router to handle the routing
problem of flip-chip designs with real industry applications. We
present a unified network flow formulation to simultaneously
consider the concurrent assignment of the wire-bonding pads to
the bump pads and the routing between them. Our algorithm
consists of two phases. The first phase is the global routing
that assigns each wire-bonding pad to a unique bump pad.
By formulating the assignment as a maximum-flow problem
and applying the minimum-cost maximum-flow (MCMF) al-
gorithm, we can guarantee 100% detailed routing completion
Fig. 3. (a) Peripheral array. (b) Area array. (c) RDL routing result.
after the assignment. The second phase is the detailed routing
the bump balls are placed in the whole area of the flip-chip that efficiently distributes the routing points between two adja-
package. The advantage of the area array is that the number of cent wire-bonding (bump) pads and assigns wires into tracks.
bump balls is much more than that of the peripheral array; thus, In addition to the traditional single-layer routing with only
it is more suitable for modern VLSI designs. Since the flip-chip routability optimization, our RDL router also tries to optimize
design is for high-speed circuits, the issue of signal skews is the total wirelength and the signal skews between a pair of
also important. Thus, a special router, i.e., the RDL router [11], signal nets under the 100% routing completion constraint. Ex-
is needed to reroute the peripheral wire-bonding pads to the perimental results based on seven real designs from the industry
bump pads and then connect the bump pads to the bump balls. demonstrate that the router can reduce the total wirelength by
Consider that the routing of multipin nets and the minimization 10.2%, the critical wirelength by 13.4%, and the signal skews
of the total wirelength and the signal skews are also needed for by 13.9%, as compared with a heuristic algorithm currently
an RDL router. Fig. 3(c) shows one RDL routing result for an used in industry.
area-array flip-chip. The rest of this paper is organized as follows: Section II
gives the formulation of the RDL routing problem. Section III
details our global and detailed routing algorithms. Section IV
B. Previous Work shows the experimental results. Finally, conclusions are given
To the best knowledge of the authors, there is no previous in Section V.
work in the literature on the routing problem for flip-chip
designs. Similar works are the routing for PGA packages, BGA II. P ROBLEM F ORMULATION
packages, and planar graphs, including [1]–[4], [8]–[10], and
[12]–[15]. Yu and Dai [14] used the geometric and symmetric We introduce the notations used in this paper and formally
attributes of the pin positions in the BGA packages to assign define the routing problem for flip-chip packages. Fig. 4 shows
pins of the BGA packages. However, in flip-chip designs, the the modeling of the routing structure of the flip-chip package.
positions of wire-bonding pads and bump pads do not always Let P be the set of wire-bonding pads, and let B be the set
have these geometric and symmetric attributes. PGA routers are of bump pads. For practical applications, the number of bump
presented in [3] and [10], whereas a BGA router is provided pads is larger than or equal to the number of wire-bonding
in [4]. These three routers are any-angle multilayer routers pads, i.e., |B| ≥ |P |, and each bump pad can be assigned to
without considering the pin assignment problem, single-layer more than one wire-bonding pad. Let Rb = {r1b , r2b , . . . , rm b
}
routing, and total wirelength minimization. Wang et al. [12] and be a set of m bump pad rings in the center of the package,
Yu et al. [15] applied the minimum-cost network flow algorithm and let Rp = {r1p , r2p , . . . , rkp } be a set of k wire-bonding
to solve the I/O pin routing problems. All these routers focused pad rings at the boundary of the package. Each bump pad
only on routability and did not consider multipin nets and ring rib consists of a set of q bump pads {bi1 , bi2 , . . . , biq }, and
signal skews. Wang et al. [12] also did not consider the routing each wire-bonding pad ring rjp consists of l wire-bonding
congestion problem. Furthermore, they assumed that wires can pads {pj1 , pj2 , . . . , pjl }. Let N be the set of nets (could be
be any angle; thus, their methods are not suitable for the RDL two-pin or multipin nets) for routing. Each multipin net n
routing, typically with a 90◦ angle routing. For the previous in N is defined by a set of wire-bonding pads and a set of
works on the planar routing [1], [2], and [5], since the pins bump pads that should be connected. Each two-pin net can
FANG et al.: NETWORK-FLOW-BASED RDL ROUTING ALGORITHM FOR FLIP-CHIP DESIGN 1419

Fig. 5. (a) Monotonic routing. (b) Nonmonotonic routing.

Fig. 6. Monotonic routing with and without detours.

routing. Furthermore, the signal skew, i.e., the difference of


wirelength between the longest net and the shortest one, should
also be considered for routing on the flip-chip package.
Based on the aforementioned definition, the routing problem
can be formally defined as follows.
Fig. 4. Four sectors in a flip-chip package. Problem 1: The single-layer flip-chip routing problem is to
connect a set of p ∈ P and a set of b ∈ B so that no wire crosses
be assigned to a bump pad not included in the sets of bump each other, the routing is monotonic, and the total wirelength
pads for the multipin nets. Since the RDL routing for current and the signal skew are minimized.
technology is typically on a single layer, it does not allow
wire crossings, for which two wires intersect each other in the III. R OUTING A LGORITHM
routing layer. As shown in Fig. 4, based on the two diagonals of
In this section, we present our routing algorithm. First, we
the flip-chip package, we partition the whole package into four
give the overview of our algorithm. Then, we detail the methods
sectors, namely: 1) N orth = {PN , BN , RpN , RbN }; 2) East =
used in each phase.
{PE , BE , RpE , RbE }; 3) South = {PS , BS , RpS , RbS }; and
4) W est = {PW , BW , RpW , RbW }, where Pi (Bi ) and Rpi (Rbi ),
A. Algorithm Overview
i ∈ {N, E, S, W } are the set of the wire-bonding (bump) pads
and the set of the wire-bonding (bump) pad rings in the i sector, According to the routing flow shown in Fig. 7, our algorithm
respectively. For practical applications, the wire-bonding pads consists of two phases, namely: 1) global routing based on the
in one sector only connect to the bump pads in the same sector. MCMF algorithm [5] and 2) detailed routing based on the cross-
We define an interval to be the segment between two adjacent point assignment, the net ordering determination, and the track
bump pads in the same ring rib or the segment between two assignment.
adjacent wire-bonding pads in the same ring rjp . Given a flip- In the first phase, we construct four flow networks, namely:
chip routing instance, there are two types of routing, namely: 1) GN ; 2) GE ; 3) GS ; and 4) GW , one for each sector, to solve
1) the monotonic routing and 2) the nonmonotonic routing. A the assignment of the wire-bonding pads to the bump pads.
monotonic routing can be formally defined as follows. Since we have only one layer for routing, the assignment should
Definition 1: A monotonic routing is a routing such that for not create any wire crossings. We avoid the wire crossings
each net n connecting from a wire-bonding pad p to a bump by restricting the edges in the networks not to intersect each
pad b, n intersects exactly one interval in each ring rib and other. We first consider two-pin nets and then multipin nets. The
exactly one interval in each ring rjp . reason is that multipin nets allow more than one wire-bonding
As shown in Fig. 5(a), the nets n2 and n4 are monotonic pad to connect to one bump pad. Thus, the multipin nets may
routes. If we exchange the positions of two bump pads b2 and block the two-pin nets. Under this condition, a wire-bonding
b4 , the routings of n2 and n4 are nonmonotonic, as shown pad may not find a global path. Thus, the two-pin nets need to
in Fig. 5(b). The wirelengths of the nets n2 and net n4 are be considered first. We will detail the reason in Section III-B4.
increased. This shows a drawback of the nonmonotonic rout- After applying MCMF, we obtain the flows representing the
ing. Since the nonmonotonic routing occupies more routing routes from wire-bonding pads to bump pads for the nets. Those
resource, it causes significant problems for the single-layer flows give the global paths for the nets.
routing. Thus, a good flip-chip package routing should be a In the second phase, we use the cross-point assignment,
monotonic routing without detours, as shown in Fig. 6, because the net ordering determination, and the track assignment to
the monotonic routing results in smaller total wirelength and determine detailed routes. A cross point is the point for a net
higher routing completion, as compared to the nonmonotonic to pass through an interval. First, we find the cross points for
1420 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 8, AUGUST 2007

1) Basic Network Formulation: We describe how to con-


struct the flow network GS to perform the concurrent as-
signment for the South sector. The other three sectors can be
processed similarly. As shown in Fig. 9(a), we define DS =
{dS1 , dS2 , . . . , dSh } to be a set of h intermediate nodes. Each
intermediate node represents an interval (pjy , pjy+1 )((bix , bix+1 ))
in a wire-bonding (bump) pad ring. TS = {tS1 , tS2 , . . . , tSu }
is a set of u tile nodes. Each tile node represents a tile
(pjy , pjy+1 , pj+1 j+1
y  , py  +1 )((bx , bx+1 , bx , bx +1 )) between two
i i i+1 i+1

adjacent wire-bonding (bump) pad rings. We construct a graph


GS = (PS ∪ DS ∪ BS ∪ TS , E) and add a source node s and
a sink node t to GS . Each intermediate node d has a capacity
of Kd , where Kd represents the maximum number of nets that
are allowed to pass through an interval d. Each tile node t has
a capacity of Lt , where Lt represents the maximum number of
nets that are allowed to pass through a tile t. We will detail how
to handle the capacity of the intermediate nodes and the tile
nodes so that MCMF can be applied in Section III-B2. There
are 11 types of edges:
1) edges from a wire-bonding pad to a bump pad;
2) edges from a wire-bonding pad to an intermediate node;
Fig. 7. RDL routing flow.
3) edges from a wire-bonding pad to a tile node;
4) edges from an intermediate node to a bump pad;
5) edges from an intermediate node to another intermediate
node;
6) edges from an intermediate node to a tile node;
7) edges from a tile node to a bump pad;
8) edges from a tile node to an intermediate node;
9) edges from a tile node to another tile node;
10) edges from the source node to a wire-bonding pad;
11) edges from a bump pad to the sink node.
Each edge is associated with a (cost, capacity) tuple to be
described in the following sections. Recall that we do not allow
wire crossings for all wires. Since E represents the possible
global paths for all nets, we can guarantee that no wire crossings
will occur if there are no crossings in edges. Thus, we construct
all the edges and avoid crossings of all edges at the same time.
Fig. 9(b) shows an example flow network GS for the South
sector. The last two types of edges are not shown here. Further-
more, we do not construct edges between the two tile nodes in
the center of the two wire-bonding pad rings because the place-
Fig. 8. Overview of the RDL routing algorithm. ment of these tile nodes is symmetric. We can solve MCMF
all nets passing through the same interval. For all nets that in time O(|V |2 |E|) based on the network flow algorithm
pass through the same interval, we evenly distribute these cross presented in [5], where V is the vertex set in the flow network.
points. Second, we use the net ordering determination technique Theorem 1: Given a flow network with the vertex set V
presented in [7] to create the routing sequence between two and edge  set E, the global routing problem can be solved in
adjacent rings so that we can guarantee to route all nets. O(|V |2 |E|) time.
Finally, we assign at least one track to each net based on the Proof: Immediate from the aforementioned discussions.
routing sequence obtained from the net ordering determination 
algorithm. Fig. 8 summarizes our routing algorithm. 2) Capacity Assignment and Node Construction: Now, we
introduce the capacity of each edge, the intermediate nodes,
and the tile nodes. Fig. 10 shows the capacity and cost for all
B. Global Routing
11 types of edges in the complete flow network. For an edge e,
In this section, we first show the basic flow network formula- if e is from a wire-bonding pad to a bump pad, an intermediate
tion. Then, we detail the capacity of each edge, the intermediate node, or a tile node, the capacity of e is set to one. If e is from
nodes, the tile nodes, and the cost of each edge. Finally, we an intermediate node or a tile node to a bump pad b, then the
discuss how to handle the multipin nets. capacity of e is set to Mb , where Mb is the maximum number
FANG et al.: NETWORK-FLOW-BASED RDL ROUTING ALGORITHM FOR FLIP-CHIP DESIGN 1421

Fig. 9. (a) Intermediate nodes and tile nodes. (b) Flow network for the South sector.

Fig. 11. (a) Capacity and cost on intermediate nodes. (b) Capacity and cost
on tile nodes.

edges of d are now connected from d , with a capacity of K̄d ,


Fig. 10. Capacity and cost on edges.
and all incoming edges of d are now connected to d , with a
capacity of Kd . Each tile node t is also decomposed into two
of nets that are allowed to connect to the bump pad b. Recall tile nodes t and t , and the capacity of a tile node t is set to
that an intermediate node d has a capacity of Kd , where Kd is Lt , where Lt is the maximum number of nets that are allowed
the maximum number of nets that are allowed to pass through to pass through this tile node t. The capacity of the edges from
this intermediate node d. This means that the capacity of each the source node to the wire-bonding pads is set to one, and the
incoming edge of an intermediate node d is equal to Kd . If e capacity of the edges from each bump pad b to the sink node is
is an incoming edge of a tile node t, then the capacity of e is set to Mb . There are three worst cases of congestion in a tile,
set to Lt , where Lt is the maximum number of nets that are as shown in Fig. 12. The four nodes in the three figures are
allowed to pass through the tile node t. As shown in Fig. 11, in all bump pads. In Fig. 12(a) and (c), the maximum number of
order to model this situation, we decompose each intermediate nets passing through the tile is 2K. In Fig. 12(b), the maximum
node d into two intermediate nodes d and d , and an edge is number of nets passing through the tile is 3K. If we do not use
connected from d to d , with a capacity of Kd . All outgoing the tile node t, the maximum number of nets in Fig. 12(a)–(c)
1422 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 8, AUGUST 2007

source node to the wire-bonding pads and the costs of the edges
from the bump pads to the sink node are both set to zero.
4) Multipin Net Handling: For practical flip-chip routing,
if a bump pad can be assigned to any two or even more wire-
bonding pads, we can just increase the capacity of the bump
pad to connect more wire-bonding pads. Since we construct the
edges for the two-pin nets and the multipin nets simultaneously,
the global routing result is optimal. However, for other practical
flip-chip routing, a net may connect multiple wire-bonding
Fig. 12. Three kinds of congestion in a tile. pads (which are assigned the same signal such as power or
ground pads) to a bump pad. This bump pad cannot be assigned
to other nets. As stated before, we first assign two-pin nets
and then multipin nets. We only construct the edges associated
with the two-pin nets and apply MCMF for the assignment.
After the assignment, we delete all edges from the source node
s and all edges to the sink node t. (However, the flows of the
edge e, the intermediate node d, and the tile node t for each
assigned two-pin net will be kept in the flow network.) The
global paths of the assigned two-pin nets are not deleted and
considered as blockages F during the construction of the edges
for the multipin nets. Recall that if there are no edge crossings
Fig. 13. Adjustment of α values on edges in the South sector. in the flow network, then there are no wire crossings in the final
routing solution. When we construct the edges for the multipin
could exceed the capacity of a tile (2K > Lt or 3K > Lt ). nets, an edge e exists only if e does not intersect any blockages
Since the capacity of each tile node is well modeled in our flow or never crosses the assigned two-pin nets. Then, we add the
network, we can totally avoid this congestion problem. edges from the source node to the wire-bonding pads associated
3) Cost of Edges: The cost function of each edge is defined with the multipin nets and the edges from the bump pads
by the following equation: associated with the multipin nets to the sink node. Fig. 14(a)
illustrates an example. We assume that a multipin net n consists
Cost = α × WL (1) of ((p2 , p4 , p5 ), (b3 , b9 )), which means that three wire-bonding
pads 2, 4, and 5 are only free to be assigned to one of the two
where WL denotes the Manhattan distance between two ter- bump pads 3 and 9. No other wire-bonding pads can be assigned
minals of an edge, and α is an adaptive parameter to adjust to these two bump pads. Redundant edges are deleted by the
the cost of different types of edges. By adjusting the value of blockage fi . For example, the edge from p2 to the intermediate
α, we can control the wirelength of each net to avoid large node between b8 and b9 is deleted because it intersects the
signal skews among different nets. As an example shown in blockage (p3 , b8 ). By using MCMF, the wire-bonding pads
Fig. 13(a), we assign the smallest α to the dashed (red) edge and bump pads are grouped into two sets: {p2 , b3 } and
that connects an intermediate node to a bump pad to assign the {p4 , p5 , b9 }. Fig. 14(b) illustrates why we handle two-pin nets
intermediate node to the bump pad first. By doing so, the rout- first. In this example, we assume that only the bump pad 1 for
ing for a net starting from a preceding ring can be completed two-pin nets and the bump pad 2 for multipin nets can be
earlier to reduce its routing length (and, thus, signal skew). assigned to wire-bonding pads. If we handle the multipin net 2
As an example shown in Fig. 13(b), the dashed (red) edge that first, then the two-pin net 1 cannot be assigned to the bump
connects one tile node to another tile node is also assigned the pad 1 to find a global path. The reason is that the multipin net 2
smallest α to assure that fewer bump pad rings are used. Since divides the region into two subregions and blocks the wire-
the wirelength between the tile node t and the bump pad 1 is the bonding pad 1. In order to avoid this situation, we shall handle
same as that between t and the bump pad 2, we have to assign two-pin nets first. The similar idea is applied in the planar
the smallest α to the dashed (red) edge to make t connect the routing. As shown in Fig. 15(a), in a planar routing, if a net
bump pad 2 first. Thus, we can reduce the number of long nets such as net 1 or net 2 is routed to divide the region into two
to reduce the signal skew by using fewer bump pad rings. If a subregions, it should be routed later. Otherwise, as shown in
wire-bonding pad is assigned to a bump pad directly, it might Fig. 15(b), other nets such as net 3 or net 4 may cross the net.
generate a very short net. Hence, we assign the largest α to the Based on the global routing algorithm, we have the following
dotted (blue) edge that connects a wire-bonding pad to a bump theorem.
pad to avoid too short connections between the two types of Theorem 2: Given a set of wire-bonding pads, a set of
pads to reduce the signal skew. Finally, the solid (black) edge bump pads, and a set of nets, if there exists a feasible solution
that connects two intermediate nodes, a tile node to a bump pad, computed by the MCMF algorithm, we can guarantee 100%
or an intermediate node to a tile node is assigned a medium α. detailed routing completion.
Since the solid (black) edge does not influence the signal skew, Proof: In our global routing model, MCMF is optimal
the medium α is set to one. The costs of the edges from the for two-pin nets and suboptimal for multipin nets. Since we
FANG et al.: NETWORK-FLOW-BASED RDL ROUTING ALGORITHM FOR FLIP-CHIP DESIGN 1423

Fig. 14. (a) Assign multipin nets. (b) Handle multipin nets first.

Fig. 16. Redefined global paths.

Fig. 15. (a) Routing sequence: {5, 3, 4, 1, 2}. (b) Routing sequence:
{1, 2, 3, 4, 5}.

consider the routing resource in the global routing stage and


will never assign nets to exceed the capacity of an interval or a
tile, we will never violate the design rules. Also, because we do
not allow edge crossings during the flow network construction,
the final routing solution will not generate wire crossings. Thus,
after the assignment, all global paths are routable in the detailed
routing stage. 
Fig. 17. Cross-point assignment.

C. Detailed Routing node into two cross points. Since the maximum number of
In this section, we explain the three methods used in our intermediate nodes is ((|B| − |Rb |) + (|P | − |Rp |)), we have
detailed routing. As shown in Fig. 16, after the global routing, the following theorem.
each global path contains only wire-bonding pads, intermediate Theorem 3: The cross-point assignment problem can be
nodes, and bump pads. The two global paths dk , t, dl and solved in O(|B| + |P |) time.
dy , t, bx , which pass through the tile node t, are remodeled Proof: If there are qi bump pads in the bump pad ring rib in
as dk , dl and dy , bx . Tile nodes are not needed for the final the South sector, there will be (qi − 1) intervals of the bump pad
representations of the global paths because a tile node is just ring rib . Hence, there will be (qi − 1) intermediate nodes. Since
used to avoid the congestion overflow. the number of bump pad rings is |RbS |, the maximum number
|RS |
1) Cross-Point Assignment: Based on the global routing re- of intermediate nodes is i=0b (qi − 1) = |BS | − |RbS |. As for
sult (discussed in Section III-B), we use the cross-point assign- the wire-bonding pads, the condition is the same as that of the
ment algorithm to evenly distribute nets that pass through the bump pads, and the conditions of the remaining three sectors are
same interval (see Fig. 17 for an example). As shown in Fig. 17, the same as those of the South sector. Thus, the maximum num-
the two nets from wire-bonding pads p2 and p3 pass through ber of intermediate nodes is ((|B| − |Rb |) + (|P | − |Rp |)),
the same intermediate node. Thus, we split the intermediate and the time complexity is O(|B| + |P |) (the upper bound of
1424 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 8, AUGUST 2007

Fig. 18. (a) Net segments between two adjacent rings. (b) Stack for net ordering determination.

((|B| − |Rb |) + (|P | − |Rp |))) to assign cross points to each segment, then we search the circular list for the next terminal.
intermediate node.  We keep searching the circular list until all nets are matched.
2) Net Ordering Determination: After the assignment of As shown in Fig. 18(b), we start with the terminal 1. Since
cross points, each net has its path to cross each interval. For the terminal 1 is a source, we push it into the stack. Then,
two adjacent rings, we can treat the routing between the two we search each terminal on the boundary counterclockwise.
rings as a channel routing. Thus, we can use the net ordering The terminal 1 is searched, and it is a destination; thus, we
determination algorithm presented in [7] to generate a routing compare it with the top element of the stack. Because these
sequence S = (ns1 , nd1 ), (ns2 , nd2 ), . . . , (nsk , ndk ) , with k net two terminals belong to the same net segment, we pop the top
segments. Each net segment ni (j, j  ) is represented by a element and determine the routing sequence of the net segment
source–destination pair/tuple (nsi , ndi ). We first determine n1 . Keeping on searching, since the terminals 2 , 3 , and 4 are
the source and destination for each net based on the all destinations, we do not push them into the stack. Since the
counterclockwise traversing distance along the leftmost terminals 5 , 6 , 7 , 8 , 9 , and 10 are all sources, we push them
and rightmost boundaries. If the counterclockwise traversing into the stack. Then, we process the terminal 10, which is a
distance along the leftmost boundary is shorter than the destination and matches the top element in the stack. Thus, we
counterclockwise traversing distance along the rightmost pop the net segment n10 and add it into the routing sequence.
boundary, the terminal j is a source, and the terminal j  is a Repeating this step, we can get the resulting routing sequence.
destination. Otherwise, the terminal j is a destination, and the With this sequence S, we can guarantee that each net segment
terminal j  is a source. For example, given the net 1 shown in between two adjacent rings can be routed without intersecting
Fig. 18(a), since the counterclockwise traversing distance along each other. For example, given an instance shown in Fig. 18(a),
the leftmost boundary is shorter than the counterclockwise according to the above-described net ordering determination
traversing distance along the rightmost boundary, we make the algorithm, we can obtain the sequence S = (n1 , n1 ), (n10 ,
terminal 1 a source and the terminal 1 a destination. For the n10 ), (n9 , n9 ), (n8 , n8 ), (n7 , n7 ), (n6 , n6 ), (n5 , n5 ), (n2 , n2 ),
net 10, however, since the counterclockwise traversing distance (n3 , n3 ), (n4 , n4 ) . According to the net ordering determina-
along the leftmost boundary is longer than the counterclockwise tion algorithm, we have the following theorem.
traversing distance along the rightmost boundary, we make the Theorem 4: Given a set N of nets, the net ordering determi-
terminal 10 a source and the terminal 10 a destination. Starting nation problem can be solved in O(|N |2 ) time.
from an arbitrary terminal, we then generate a circular list Proof: According to the net ordering determination algo-
for all terminals ordered counterclockwise according to their rithm, the worst case happens when only one net is matched
positions on the boundaries. A stack is used to check if there during each searching cycle. In this case, the total number of
|N |−1
exist crossovers among the net segments. For each terminal terminal searches is i=0 2(|N | − i) = |N |2 + |N |. Hence,
of net segment ni , if it is a source, then we push it into the the time complexity is O(|N |2 ). 
stack. If this terminal is a destination and the top element of 3) Track Assignment: With the net ordering, we can use
the stack belongs to the same net segment, then net segment maze routing to route all nets for any two adjacent rings.
ni is matched, and the top element is popped. Otherwise, if However, maze routing is quite slow and generates too many
the stack is empty, or this terminal is a destination and the bends. (For example, for a small circuit with 513 nets, we need
top element of the stack does not belong to the same net 25 min on a 1.2-GHz SUN Blade 2000 workstation with 8-GB
FANG et al.: NETWORK-FLOW-BASED RDL ROUTING ALGORITHM FOR FLIP-CHIP DESIGN 1425

Fig. 19. (a) Example for track assignment. (b) Blocking point.

memory to complete the detailed routing.) Thus, we propose a


track assignment algorithm to assign tracks to each net segment
of any two adjacent rings. For each net segment ni in S,
according to the relative locations of nsi and ndi , we search a
track to be assigned to ni from the top to the bottom or from
the bottom to the top. We search the tracks from the top to
the bottom if nsi is on the top-right side of ndi or nsi is on the
bottom-right side of ndi . Otherwise, we search the tracks from
the bottom to the top. If we find a track h and it does not create
any overlap with other wires, then we assign h to ni . As shown
in Fig. 19(a), we assign net segment n1 first. Since the terminal
1 is a source and the net ordering determination algorithm
makes each net routed counterclockwise from the source to the
destination along the boundary, we search from track 1 to track
6. Thus, n1 is assigned to track 1 first. Since the terminal 5 is a
source, we search from track 6 to track 1. Thus, n5 is assigned
to track 6 first. Also, we record the blocking points Q for ni . A
blocking segment is a wire on track h + 1 (if we search from the
top to the bottom) or h − 1 (if we search from the bottom to the
top) to stop ni from being assigned to h + 1 or h − 1 without
creating any overlap with it. A blocking point qi is a terminal of
the blocking segment whose projection on h overlaps with ni .
As shown in Fig. 19(b), the point q3 on track h2 is the blocking
point for net n3 . If we cannot find such h, we rip up and reroute
all net segments n1 to ni−1 . For each net segment nk to be
rerouted, we use the concept of the dogleg in the channel rout- Fig. 20. Algorithm for track assignment.
ing to break a segment into two segments based on the blocking
point qk , such as q3 in Fig. 19(b). Then, we assign the segment
that will not overlap with qk on the lowest possible track (if we since, now, each net segment may be assigned with more than
search from the top to the botto m) or on the highest possible one track, we may have more than one blocking point for
track (if we search from the bottom to the top). After assigning each net. Fig. 20 summarizes the track assignment algorithm.
tracks, we record the new blocking points for nk . Note that According to this algorithm, we have the following theorem.
1426 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 8, AUGUST 2007

Theorem 5: Given a set N of nets and the number of number of bump pad rings; and “#b” denotes the number of
tracks H, the track assignment problem can be solved in bump pads. In each of fs900, fs2116, and fs4096, the number of
O(|N |2 H(|Rb | + |Rp |)) time. wire-bonding pads equals the number of bump pads. Thus, each
Proof: The worst case of the track assignment algorithm wire-bonding pad needs to be assigned to exactly one bump
is that we have to rip up and reroute every time while assigning pad. Hence, these three cases are more difficult for routing than
the next net. Thus, the total number of times to assign nets to the other four cases.
|N |
tracks is i=1 i = |N |2 + |N |/2. When assigning a net, the In Table II, we show how to calculate the values of Mb ,
maximum number of track searches is H, and the number of Kd , Hi , and Lt . As defined in the previous sections, Mb is
channels is (|Rp | + |Rb | − 1). Hence, the time complexity is the maximum number of nets allowed to connect to a bump
O(|N |2 H(|Rb | + |Rp |)).  pad b, Kd is the maximum number of nets allowed to pass
through an intermediate node d, Hi is the maximum number
of tracks between two adjacent pad rings i and i + 1, and Lt
D. Complexity Analysis
is the maximum number of nets allowed to pass through a tile
If there are |B| bump pads, |P | wire-bonding pads, |D| node t. All these variables can be expressed by the equations
intermediate nodes, |T | tile nodes, |Rb | bump pad rings, and shown in the table. We calculate the values of these variables
|Rp | wire-bonding pad rings, we can construct a flow network during the RDL routing process. The parameters used to cal-
composed of |V | vertices and |E| edges for the global routing, culate the Mb , Kd , Hi , and Lt variables are listed in Table III.
where |V | = |B| + |P | + |D| + |T | and |E| = edges among They are all structure and design-rule related parameters.
these vertices. In the detailed routing, there are |N | global Since there are no flip-chip routing algorithms in the litera-
paths, and each channel is divided into H tracks. Hence, the ture, we compared our algorithm with the following heuristic
time complexity is as follows: O(|V |2 |E|) (global routing)+ currently used in industry. This heuristic is called the nearest
O(|B|+|P |) (cross-point assignment)+ O(|N |2 )(net ordering node connection (NNC) algorithm. In NNC, the wires are
determination)
 + O(|N |2 H(|Rb | + |Rp |)) (track assignment) routed sequentially. If a wire-bonding pad p can find a free
= O(|V |2 ( |E| + H(|Rb | + |Rp |))). The space complexity is bump pad b in a restricted area of the nearest bump pad ring
O(|E|) since we have O(|B| + |P | + |D| + |T |) nodes and rmb b
, then it connects p to b. If there are no free bump pads in rm ,
O(|E|) edges. Thus, we can solve the RDL routing problem then we search for a free bump pad in the next bump pad ring
b
in polynomial time. rm+1 . This process is repeated until we find a free bump pad.
Theorem 6: Given a set P of wire-bonding pads, a set B of The experimental results are shown in Table IV. We report
bump pads, and a set N of nets, if there exists a feasible solution the total wirelength, the critical wirelength (the wirelength of
computed by the RDL routing algorithm, the RDL routing the longest net), the maximum signal skews, and the central
problem can be solved in O(|V |2 ( |E| + H(|Rb | + |Rp |))) processing unit times. Since the routability is guaranteed to be
time and O(|E|) space. 100%, we do not report it. As compared with NNC, the ex-
Proof: The time complexity analysis is immediate from perimental results show that our network-flow-based algorithm
Theorems 1, 3, 4, and 5. First, since |V | = |B|+|P |+|D|+|T |, reduces the total wirelength by 10.2%, the critical wirelength
the time complexity of the global routing O(|V |2 |E|) domi- by 13.4%, and the signal skews by 13.9%, in reasonably longer
nates that of the cross-point assignment O(|B| + |P |). Second, running time. Note that for fs2116 and fs4096, NNC fails to find
the time complexity of the track assignment O(|N |2 H(|Rb | + a routing solution. In Fig. 21, the running time of our algorithm
|Rp |)) dominates that of the net ordering determination is plotted as a function of the number of nets. Empirically,
O(|N |2 ). Finally, since |N | = |P |, the time complexity the running time of our RDL routing algorithm approaches
O(|V |2 ) dominates O(|N |2 ). Thus, the timecomplexity of the quadratic (about N 2.17 ) to the number of nets N , with the least
RDL routing algorithm is given by O(|V |2 ( |E| + H(|Rb | + square analysis for the log–log plot of the function. In Table V,
|Rp |))). The space complexity of pads O(|B| + |P |) dominates we report the memory usage (in kilobytes) for each circuit for
that of nodes O(|D| + |T |). Since there is an edge from the the RDL routing. In Fig. 22, the memory requirement of our
source node s to every wire-bonding pad of P , and there algorithm is plotted as a function of the number of nets. The em-
is an edge from every bump pad of B to the sink node t, pirical memory complexity of our RDL routing algorithm is be-
the space complexity of edges O(|E|) dominates that of pads tween linear and quadratic (about N 1.47 ) to the number of nets
O(|B| + |P |). Thus, we can reduce the space complexity from N , again with the least square analysis for the log–log plot of
O(|B| + |P | + |D| + |T | + |E|) to O(|E|).  the function. The experimental results show that our network-
flow-based RDL algorithm is effective and efficient for flip-
chip designs. Fig. 23 shows the RDL routing result of fs900.
IV. E XPERIMENTAL R ESULTS
We also explore the effects of different α on wirelength and
We implemented our algorithm in the C++ programming skew. In Table II, we give the equations for the computation of
language on a 1.2-GHz SUN Blade 2000 workstation with the upper bound of the smallest α and the lower bound of the
8-GB memory. The benchmark circuits, which are listed in largest α. The two bounds come from the geometric relation of
Table I, are real industry designs. In Table I, “Circuits” denotes the pad placement. Nets that are composed of the wire-bonding
the names of circuits; “#Nets” denotes the number of nets; pads of the inner wire-bonding pad ring and the bump pads of
“#Rp ” denotes the number of wire-bonding pad rings; “#p” the outer bump pad ring are often short. Thus, by modeling dP B
denotes the number of wire-bonding pads; “#Rb ” denotes the and dBB into the equation of the lower bound of the largest α,
FANG et al.: NETWORK-FLOW-BASED RDL ROUTING ALGORITHM FOR FLIP-CHIP DESIGN 1427

TABLE I
BENCHMARK CIRCUITS FOR RDL ROUTING

TABLE II
EXPERIMENTAL VARIABLES

TABLE III
STRUCTURE AND DESIGN-RULE RELATED PARAMETERS

we can avoid short nets. Furthermore, we want to make the minimize the signal skew, the largest α has to be larger than
nets of the outer wire-bonding pad rings be assigned to the the lower bound, and the smallest α has to be smaller than
outer bump pad rings; thus, we model dP P and dP B into the upper bound. Furthermore, we also observe that when the
the equation of the upper bound of the smallest α. In order to largest (smallest) α is scaled up (down), the critical wirelength
1428 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 8, AUGUST 2007

TABLE IV
RDL ROUTING RESULTS (N / A: NOT AVAILABLE)

Fig. 21. Running time for the RDL routing.

TABLE V
MEMORY USAGE FOR EACH CIRCUIT

Fig. 23. RDL routing result for fs900.

in Table VI. In this experiment, we tested three pairs of the


smallest α and the largest α values on fs90b740. We first set
the largest (smallest) α to 1 and then scaled it up (down) to
see the effects of different α to the total wirelength, the critical
wirelength, and the signal skew. The percentages listed in the
parentheses give the normalized ratios to that with the smallest
and the largest α being set to 1. From the experimental results
in Table VI, as the largest (smallest) α scales up (down), the
total wirelength increases while the critical wirelength and the
signal skew decrease.

V. C ONCLUSION
In this paper, we have developed an RDL router for the flip-
chip package. The RDL router consists of the two stages of
Fig. 22. Memory usage for the RDL routing. global routing followed by detailed routing. The global routing
applies the network flow algorithm to solve the assignment
and the signal skew may be further improved at the cost of problem from the wire-bonding pads to the bump pads and
larger total wirelength. Thus, we use this property to minimize then creates the global path for each net. The detailed routing
the critical wirelength and the signal skew without increasing applies the three-stage technique of cross-point assignment, net
the total wirelength too much. We conducted an experiment ordering determination, and track assignment to complete the
to explore the effects of different α, and the results are listed routing. Experimental results demonstrate that our router can
FANG et al.: NETWORK-FLOW-BASED RDL ROUTING ALGORITHM FOR FLIP-CHIP DESIGN 1429

TABLE VI
EFFECTS OF DIFFERENT α ON WIRELENGTH AND SKEW

achieve much better results in routability, wirelength, critical I-Jye Lin (S’05) received the B.S. degree in com-
wirelength, and signal skews, as compared with a heuristic puter science from the National Tsing-Hua Uni-
versity, Hsinchu, Taiwan, R.O.C., in 2004. She is
algorithm currently used in industry. currently working toward the Ph.D. degree at the
Graduate Institute of Electronics Engineering, Na-
tional Taiwan University, Taipei, Taiwan.
R EFERENCES Her current research interests include package
[1] H. Cai, “Multi-pads, single layer power net routing in VLSI circuits,” in routing and design for manufacturing.
Proc. ACM/IEEE Des. Autom. Conf., Jun. 1998, pp. 183–188.
[2] D.-S. Chen and M. Sarrafzadeh, “A wire-length minimization algorithm
for single-layer layouts,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided
Des., Nov. 1992, pp. 390–393.
[3] S.-S. Chen, J.-J. Chen, S.-J. Chen, and C.-C. Tsai, “An automatic router
for the pin grid array package,” in Proc. ACM/IEEE Asia and South Pac. Yao-Wen Chang (S’94–M’96) received the B.S.
Des. Autom. Conf., Jan. 1999, pp. 133–136. degree from National Taiwan University, Taipei,
[4] S.-S. Chen, J.-J. Chen, C.-C. Tsai, and S.-J. Chen, “An even wiring Taiwan, in 1988, and the M.S. and Ph.D. degrees
approach to the ball grid array package routing,” in Proc. IEEE Int. Conf. from the University of Texas at Austin in 1993 and
Comput. Des., Oct. 1999, pp. 303–306. 1996, respectively, all in computer science.
He is a Professor in the Department of Electrical
[5] B. V. Cherkassky, “Efficient algorithms for the maximum flow problem,”
Engineering and the Graduate Institute of Electronics
Math. Methods Solut. Econ. Probl., vol. 7, pp. 117–126, 1977.
Engineering, National Taiwan University. He is cur-
[6] J.-W. Fang, I.-J. Lin, P.-H. Yuh, Y.-W. Chang, and J.-H. Wang, “A routing
rently also a Visiting Professor at Waseda University,
algorithm for flip-chip design,” in Proc. IEEE/ACM Int. Conf. Comput.-
Japan. He was with the IBM T. J. Watson Research
Aided Des., Nov. 2005, pp. 753–758. Center, Yorktown Heights, NY, in the summer of
[7] C.-P. Hsu, “General river routing algorithm,” in Proc. ACM/IEEE Des. 1994. From 1996 to 2001, he was on the faculty of National Chiao Tung
Autom. Conf., Jun. 1983, pp. 578–583. University, Taiwan. His current research interests lie in VLSI physical design,
[8] K.-F. Liao, M. Sarrafzadeh, and C. K. Wong, “Single-layer global rout- design for manufacturing, and FPGA. He has been working closely with
ing,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 13, industry on projects in these areas. He has coauthored one book on routing
no. 1, pp. 38–47, Jan. 1994. and over 120 ACM/IEEE conference/journal papers in these areas.
[9] A. Titus, B. Jaiswal, T. J. Dishongh, and A. N. Cartwright, “Innovative Dr. Chang received an award at the 2006 ACM ISPD Placement Contest,
circuit board level routing designs for BGA packages,” IEEE Trans. Adv. Best Paper Award at ICCD-1995, and nine Best Paper Award Nominations
Packag., vol. 27, no. 4, pp. 630–639, Nov. 2004. from DAC-2007, ISPD-2007 (two), DAC-2005, 2004 ACM TODAES, ASP-
[10] C.-C. Tsai, C.-M. Wang, and S.-J. Chen, “News: A net-even-wiring sys- DAC-2003, ICCAD-2002, ICCD-2001, and DAC-2000. He has received many
tem for the routing on a multilayer PGA package,” IEEE Trans. Comput.- awards for research performance, such as the inaugural First-Class Principal
Aided Design Integr. Circuits Syst., vol. 17, no. 2, pp. 182–189, Feb. 1998. Investigator Awards and the 2004 Mr. Wu Ta You Memorial Award from the
[11] UMC, 0.13 µm Flip-Chip Layout Guideline, p. 6, 2004. National Science Council of Taiwan, the 2004 MXIC Young Chair Professor-
[12] D. Wang, P. Zhang, C.-K. Cheng, and A. Sen, “A performance-driven I/O ship from the MXIC Corp, and for excellent teaching from National Taiwan
pin routing algorithm,” in Proc. ACM/IEEE Asia and South Pac. Des. University and National Chiao Tung University. He is an editor of the Journal
Autom. Conf., Jan. 1999, pp. 129–132. of Computer and Information Science. He has served on the ACM/SIGDA
[13] M.-F. Yu and W. W.-M. Dai, “Pin assignment and routing on a single- Physical Design Technical Committee and the technical program committees
layer pin grid array,” in Proc. ACM/IEEE Asia and South Pac. Des. Autom. of ASP-DAC (topic chair), DAC, DATE, FPT (program co-chair), GLSVLSI,
Conf., Sep. 1995, pp. 203–208. ICCAD, ICCD, IECON (topic chair), ISPD, SOCC (topic chair), TENCON,
[14] M.-F. Yu and W. W.-M. Dai, “Single-layer fanout routing and routability and VLSI-DAT (topic chair). He is currently an independent board member of
analysis for ball grid arrays,” in Proc. IEEE/ACM Int. Conf. Comput.- Genesys Logic, Inc, the chair of the Design Automation and Testing (DAT)
Aided Des., Nov. 1995, pp. 581–586. Consortium of the Ministry of Education, Taiwan, a member of the board of
[15] M.-F. Yu, J. Darnauer, and W. W.-M. Dai, “Interchangeable pin rout- governors of the Taiwan IC Design Society, and a member of the IEEE Circuits
ing with application to package layout,” in Proc. IEEE/ACM Int. Conf. and Systems Society, ACM, and ACM/SIGDA.
Comput.-Aided Des., Nov. 1996, pp. 668–673.

Jyh-Herng Wang (M’98) received the B.S. and


Ph.D. degrees in electrical engineering from Na-
tional Taiwan University, Taipei, Taiwan, R.O.C., in
Jia-Wei Fang (S’05) received the B.S. degree in 1987 and 1994, respectively.
electrical engineering from the National Cheng Kung From 1994 to 1999, he was an Associate Scientist
University, Tainan, Taiwan, R.O.C., in 2003 and with the electrical design group, National Center for
the M.S. degree in electronics engineering from the High-Performance Computing, HsinChu, Taiwan.
He then joined Faraday Technology Corporation,
National Taiwan University, Taipei, Taiwan, in 2005.
where he was a Staff Engineer, CAD Manager,
He is currently working toward the Ph.D. degree at
and Director of the Design Development Division,
the Graduate Institute of Electronics Engineering,
in 1999, 2000, and 2003, respectively. He joined
National Taiwan University.
Apache Design Solutions, Hsinchu, Taiwan, as Director of the Taiwan R&D
His current research interests include package team in 2007. His research interests include timing analysis, SI analysis, and
routing and other package technologies. VLSI design flow.