tmp9C3B TMP
tmp9C3B TMP
tmp9C3B TMP
Algorithms
Jacob G. Martin
University of Georgia
Computer Science
Athens, GA, 30601, USA
[email protected]
ABSTRACT
36]. Although these domains are quite dierent in some aspects, each can be reduced to the problem of ascertaining
or ranking relevance in data. Intuitively, the concept of relevance depends critically on the nature of the problem at
hand. SVD provides a method for mathematically discovering correlations within data. The focus of this work is to
investigate several possible methods of using SVD in a genetic algorithm to better solve the minimum graph bisection
problem.
SVD is useful when bisecting certain types of graphs. To
obtain a bisection of a graph, SVD is performed directly on
the 0,1 adjacency matrix of the graph to be bisected. Next,
an eigenvector is chosen and its components are partitioned
based on the median of all of the components. Given that
each component of an eigenvector represents a vertex of the
graph, a partitioning of the graph is achieved. The process
of using eigenvectors to bisection graphs is called spectral
bisection. The techniques roots stem from the works of
Fiedler [19], who studied the properties of the second smallest eigenvector of the Laplacian of a graph, and Donath and
Homan [15], who proved a lower bound on the size of the
minimum bisection of the graph.
In addition to applying SVD directly to graphs, it is also
used in several ways to guide the search process of a Genetic Algorithm (GA). SVD helps guide the search process of
the GA by identifying the most striking similarities between
genes in the most highly t individuals of the optimization
history. The GAs mutation operator is then restricted to
only modify the locus of the genes corresponding to these
striking similarities. In addition, individuals are engineered
out of the discovered similarities between genes across highly
t individuals. The genes are also reordered on a chromosome to group similar genes closer together on a chromosome. The heuristics show remarkable performance improvements. In addition, the performance achieved is magnied when the heuristics are combined with each other. As
further evidence for the applicability of these new heuristics,
several world record minimum bisections have been obtained
from the genetic algorithm described in this paper.
The rst section gives background information on the graph
bisection problem, genetic algorithms, and SVD. The second
section discusses the implementation details for the genetic
algorithm. Section three describes the spectral heuristics
that augment the standard GA. The fourth section gives
experimental evidence for the applicability of the operators
described. The last two sections provide future research
ideas and a summary of the results.
General Terms
Algorithms, Experimentation, Performance
Keywords
Genetic algorithm, singular value decomposition, graph bisection, graph partitioning, spectral bisection, genetic engineering, reduced rank approximation
1.
INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GECCO06, July 812, 2006, Seattle, Washington, USA.
Copyright 2006 ACM 1-59593-186-4/06/0007 ...$5.00.
1249
2.
BACKGROUND
2.1
2.1.1
2.1.2
Literature Review
2.2
2.2.1
2.3
D 0
(1)
=
0 0
where the diagonal entries of D are the rst r singular values
of A, 1 2 r > 0, and there exist an m m
orthogonal matrix U and an n n orthogonal matrix V such
that
A = U V T
(2)
2.3.1
Summary
Genetic Algorithms
Background and Terminology
r=rank(A)
A=
i=1
1250
i ui viT
(3)
spectral_injection;
do {
reorder_schema;
restrict_space;
for i from 1 to 100 do
choose parent1 and parent2 from population;
child = crossover(parent1, parent2);
mutate(child);
modified_kernighan_lin(child);
children.add(child);
end for;
replace(population, children);
engineered = engineer(population);
replace(population, engineered);
} until(stopping condition)
2.3.2
k
X
i ui viT
(4)
i=1
3.
locally around the solutions values. Hybrid GAs are a hybridization of a genetic algorithm with a local search heuristic that is tailored specically for solving a certain problem.
Generally, the performance of the local improvement heuristic is compromised to give a lower time complexity when
creating a hybrid GA. This ensures that the local improvement heuristic does not overwhelm the overall running time
of the GA.
The implemented GA uses a trimmed down variant of the
KernighanLin [32] optimization algorithm. The traditional
KernighanLin heuristic has a time complexity of O(n3 ) and
is not guaranteed to provide the minimum bisection. The
algorithms time complexity is trimmed down in the exact
way that is described in Bui and Moons paper on graph
partitioning with a GA [12].
Additionally, the data structures and implementation of
the algorithm are done in constant time by using the methods of Fiduccia and Mattheyses [18]. Fiduccia and Mattheyses gave a simplication of the KernighanLin heuristic that
has time complexity (E) [18]. The eciency is gained by
sorting vertex gains using a method called the bucket sort.
The addition of the FiducciaMattheyses technique grants
the ability to perform a limited, low cost, local search when
solving various graph bisection problems.
IMPLEMENTATION DETAILS
Individuals are represented in binary in the following manner. If the ith component of an individual is one, then the
ith vertex is placed in the set V1 . Otherwise, if the ith component of an individual is zero, then the ith vertex is put in
the set V2 . Notice that individuals are symmetrical in this
representation. That is, ipping every bit in one solution
gives the exact same bisection. Before the GA starts, the
ordering of the vertices is permuted to prevent the results
from containing any possible bias on the ordering of the input. Tests are performed using a custom GA, implemented
entirely in JavaT M . The SVD is computed using LAPACK
routines and the Matrix Toolkits for JavaT M (MTJ).
3.1
3.3
3.2
Genetic Operators
Local Improvements
Hybrid GAs are those that incorporate a local search procedure during each generation on the new ospring. Local
searches are almost always problem specic. Their goal is
to improve a candidate solution to a problem by exploring
1251
3.4
SVD Incorporation
3.4.2
3.4.1
Schema Reordering
Spectral Injection
The technique of spectral bisection provides initial population seedings for the genetic algorithm. Initially, the SVD
of the adjacency matrix of the graph to be bisected is computed. All bisections are created using the algorithm in
Figure 2. The best spectrally found bisections are initially
injected into the population to inuence the GA towards
good bisections. Experiments with this method show that
spectral injection gives the GA a tremendous head start in
comparison to not using it at all. The motivation for using
spectral partitioning is that the eigenvalues and eigenvectors of many types of adjacency matrices have been shown
to have many relationships to properties of graphs. Moreover, every eigenvalue and eigenvector of a matrix can be
computed eciently in polynomial time. Therefore, eigenvalues and eigenvectors are prime candidates for construct-
3.4.3
Restricted Mutation
The mutation operator is restricted to a strategically chosen subset of the genes. This isolates the search process to
the genes in highly t solutions, facilitating the determination of the local optimum. The subset of genes is chosen by
using a SVD process described in a previous paper by Martin[36]. The subset of genes is chosen randomly from the set
of all sets of highly correlated genes identied by SVD. The
restriction only happened every other generation. This allows the mutation operators to have full access to the entire
space of possible chromosomes.
1252
3.4.4
Genetic Engineering
A genetic engineering approach is tested at every generation. First, the rank2 SVD of 5 to 10 proportionally selected individuals is computed. The number of individuals
is chosen uniformly at random. Then, using a process similar to that described in a previous paper [36], a new graph
of correlated genes is generated. Specically, the magnitude
of the (i, j) entry in unit scaled matrix A2 AT2 determines
if an edge appears between vertex i and vertex j in the
new graph. If the entry is bigger than 0.9, an edge is created. The vertices in the new graph represent represent the
original graphs vertices but are instead connected to those
vertices that the top 5 to 10 best individuals collectively
believe should be clustered into the same side of the bisection. Ironically, a minimum bisection of the new graph gives
a good approximation of the combination of the best individual minimum bisections in the original graph. To keep
the problem from becoming self referentially intractable, an
approximate minimum bisection of the new graph is discovered by running only one iteration of full KernighanLin on
a randomly generated individual. If better, the newly generated individual replaces the worst individual in the current
population.
3.4.5
4.1
4.1.1
Graph Types
Two forms of the SVD are tested. The rst is the full
rank version of the SVD. The second is based on the reduced rank version, where all but the rst k largest singular
values are set to zero, giving Ak . As expected, the reduced
rank strategies generally discover the subproblems more eciently than the full rank versions. This is due in part to the
theoretical results mentioned in the probabilistic analysis of
reduced rank spectral clustering in a well known paper by
Papadimitriou et al. [42]. The performance may also have
improved because, in the application domains tested, the
GA is only seeking one block in the solution space. Reduction to a lower rank correctly directs the search towards
the correct block because a lower value of k in Ak increases
the cosines of the angles between vectors of similar types [9].
Another reason may be that in comparison with higher rank
reductions, lower rank reductions are less restrictive and will
identify larger subsets of related genes as the rank is reduced.
Therefore, lower rank reductions allow the restrictive mutation and crossover operators to have more freedom during
exploration. However, lowering the rank too much may not
always increase the performance because all genes will be
seen as similar to all other genes.
4.
4.1.2
Discussion
EMPIRICAL RESULTS
Intuitively, the number of generations it takes to nd a solution is the greatest factor in proving a genetic algorithms
performance. It is also illuminating to compare the average best individual at every generation. This allows one to
discover the convergence properties of a particular conguration of the GA. The results are based on average of the
best or average individual tness at each generation over 100
independent runs of the GA.
The GA is compared with various combinations of genetic
operators, local search functions, and techniques used for
solving the minimum graph bisection problem. To assess the
amount of benet achieved using the SVD heuristics, comparisons are made to a plain genetic algorithm that does not
use the SVD heuristics. The plain GA serves as a strawman
1253
U1000.05
1200
"SVD Engineer (rank=2) engineered cut size"
"SVD Engineer (rank=2) injection engineered cut size"
"Voting Engineer engineered cut size"
"Voting Engineer injection engineered cut size"
U1500.0.079
-7
1000
-7.5
-8
600
Fitness
Cut Size
800
400
-8.5
200
-9
"Engineering (rank=2) spectral
"Engineering Subproblem Rotation (rank=2) schema spectral
"Engineering Subproblem Rotation (rank=2) spectral
"Plain schema spectral
"Plain spectral
0
0
10
20
30
40
50
Generation
60
70
80
90
100
-9.5
0
50
100
150
200
250
Generation
300
350
400
450
500
shared approximate vote is validated by the similarities between the optimization curves for voting and SVD.
Figure 4 depicts the results from an experiment that compares most of the described heuristics. In addition, local
searches are performed at each generation. Spectral injection, subproblem restriction and rotation[36], engineering,
and schema reordering are all veried to positively inuence the performance of the genetic algorithm separately
for this graph. Figure 5 shows that the performance increase is much more dramatic when the the local search operator is not performed. However, Figure 6 shows that when
the KernighanLin local improvement is used with graphs
for which KL does not perform well (caterpillars), the SVD
techniques outperform the plain GA by a more signicant
margin. This indicates that SVD may be a viable alternative to KL and that it can be successfully paired with KL
to provide additional performance.
In addition to the previous experiments, several record
size minimum bisections for real world graphs are found using the techniques described in this paper. The three graphs
for which record bisections are achieved are named data,
add20 (a 20 bit adder), and bcsstk33 (a statics module of a
pin boss). These results are listed in Chris Walshaws graph
partitioning archive located at http://staffweb.cms.gre.
ac.uk/~c.walshaw/partition/ [48].
5.
best"
best"
best"
best"
best"
U1000.20
5000
"Engineering Subproblem Rotation (rank=2) cut size"
"Engineering Subproblem Rotation (rank=2) schema cut size"
"Plain schema cut size"
4500
4000
3500
Cut Size
3000
2500
2000
1500
1000
500
0
0
10
20
30
40
50
60
70
80
90
100
Generation
FUTURE WORK
The positive benets of adding SVD to KL based algorithms have been explored in this paper. Analysis of variance (ANOVA) tests should be conducted to better prove
that the presented methods work well in combination with
each other. ANOVA tests should also be used to better isolate the benets of each operator.
Figure 5: Cut size results corresponding to no spectral injection and no local improvements for Buis
U1000.20
1254
mutation, and to dene schema reorderings based on approximations of highly t individuals. The new operators
and techniques are investigated with respect to their consequences on performance in conjunction with a hybridized
genetic algorithm employing the KernighanLin local search
operator and other operators described in previous research
papers [12, 36]. All of the introduced techniques are shown
to be benecial to the genetic algorithm. Empirical results
obtained from the combination and application of these new
heuristics are encouraging.
CaterpillarBisectionProblem128
-1.5
-2
Fitness
-2.5
7.
-3
-3.5
-4
"Engineering (rank=2) average.graph"
"Engineering (rank=2) schema average.graph"
"Plain average.graph"
"Plain schema average.graph"
-4.5
0
20
40
60
80
100
120
Generations
140
160
180
200
Figure 6: Average population tness per generation for using a modied KL local improvement on
CAT128
A graph bisection technique called Lock-Gain (LG) partitioning was recently introduced by Kim and Moon [34]. LG
partitioning extends KL by using a new tie-breaking strategy that intelligently selects the best highest gain vertex to
exchange during one pass of KL. In addition, the gain of
a vertex is calculated in a manner that takes into account
vertices that have already been moved. The various combinations of the SVD operators and techniques described
herein should be investigated in conjunction with the lock
gain partitioning method and other metaheuristics for minimum graph bisection.
Additional operators and procedures based on spectral
information should be considered. For example, a spectral
crossover operator can be used to give a linkage probability
to chromosomes that is related to the information provided
by the spectral decomposition of the adjacency matrix of
the graph to be bisected. This type of operator is justied
because many of the eigenvectors of several adjacency matrix
representations tend to group vertices together that should
be placed in the same partition. The distance between the
valuations for the vertices in the eigenvectors can be used
to determine the probability that two genes travel together
during crossover. Another example is the possibility of using
spectral information to enhance tiebreaking strategies in
LG and KL. Finally, the possible benets of starting with a
spectral population should be examined in detail.
6.
REFERENCES
CONCLUSION
1255
17:420425, 1973.
[16] P. Drineas, A. M. Frieze, R. Kannan, S. Vempala, and
V. Vinay. Clustering large graphs via the singular
value decomposition. Machine Learning, 56(13):933,
2004.
[17] R. O. Duda and P. E. Hart. Pattern Classication and
Scene Analysis. Wiley, New York, 1972.
[18] C. M. Fiduccia and R. M. Mattheyses. A linear-time
heuristic for improving network partitions. In DAC
82: Proceedings of the 19th conference on Design
automation, pages 175181, Piscataway, NJ, USA,
1982. IEEE Press.
[19] M. Fiedler. Algebraic connectivity of graphs.
Czechoslovak Math. J., 23(98):298305, 1973.
[20] M. Flickner, H. Sawhney, W. Niblack, J. Ashley,
Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee,
D. Petkovic, D. Steele, and P. Yanker. Query by image
and video content: The QBIC system. IEEE
Computer, 28(9):2332, Sept. 1995.
[21] A. Frieze and C. McDiarmid. Algorithmic theory of
random graphs. Random Structures Algorithms,
10(12):542, 1997.
[22] M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some
simplied NPcomplete graph problems. Theoret.
Comput. Sci., 1(3):237267, 1976.
[23] G.Golub and C. Reinsch. Handbook for Matrix
Computation II, Linear Algebra. SpringerVerlag,
1971.
[24] D. E. Goldberg. Genetic Algorithms in Search,
Optimization, and Machine Learning.
AddisonWesley, Reading, Mass., 1989.
[25] G. H. Golub and C. F. Van Loan. Matrix
computations. Johns Hopkins Studies in the
Mathematical Sciences. Johns Hopkins University
Press, Baltimore, MD, 1996.
[26] G. R. Harik and D. E. Goldberg. Learning linkage. In
Foundations of Genetic Algorithms, pages 247262,
1996.
[27] B. Hendrickson and R. W. Leland. A multi-level
algorithm for partitioning graphs. In Supercomputing
95: Proceedings of the 1995 ACM/IEEE conference
on Supercomputing, 1995.
[28] J. H. Holland. Adaptation in natural and articial
systems. University of Michigan Press, Ann Arbor,
Mich., 1975.
[29] D. S. Johnson, C. R. Aragon, L. A. McGeoch, and
C. Schevon. Optimization by simulated annealing: an
experimental evaluation. part i, graph partitioning.
Oper. Res., 37(6):865892, 1989.
[30] C. A. Jones. Vertex and edge partitions of graphs. PhD
thesis, Pennsylvania State University, University Park,
PA, USA, 1992.
[31] R. Kannan, S. Vempala, and A. Vetta. On clusterings:
good, bad and spectral. J. ACM, 51(3):497515
(electronic), 2004.
[32] B. Kernighan and S. Lin. An Ecient Heuristic
Procedure for Partitioning Graphs. Bell Systems
Journal, 49:291307, 1972.
[33] J.-P. Kim and B.-R. Moon. A hybrid genetic search
for multi-way graph partitioning based on direct
partitioning. In L. S. et al., editor, Proceedings of the
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
1256