Optimum Design of Mechanical Elements: Class Notes For AME60661

Optimum Design of Mechanical Elements:
Class notes for AME60661

Andrés Tovar and John E. Renaud
May 21, 2010

Contents
I Fundamentals 9
1 Introduction 10
1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Components and formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Design variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Design constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Based on the objective function and the constraints . . . . . . . . . . . . . . . . . . 14
1.4.2 Based on the feasible space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.3 Based on the design variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.4 Based on the uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.5 Based on the field of application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Solution methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Mathematical foundation 18
2.1 Vector Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Cross product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4 Tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Linear Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Positive definite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6.1 Sylvester’s criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1
2.6.2 Eigenvalue criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Sets in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.8 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.9 Gradient and Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.10 Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.11 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.11.1 Single-variable formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.11.2 Multivariate formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.12 Matrix calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Numerical foundation 40
3.1 Numeric differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Numeric evaluation of gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Numeric evaluation of Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Matlab programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.1 Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.2 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.3 Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
II Single-variable optimization 46
4 Analytical elements 47
4.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Classification of optimal points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.1 Minimum-value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2 First order necessary condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.3 Second order sufficient conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.4 Higher order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Unimodality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2
5 Basic numerical methods 54
5.1 Bracketing a three-point pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Fibonacci’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 Golden section method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6 Curve fitting methods 63

6.1 Powell’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2 Brent’s method∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3.3 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.4 Secant method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4.3 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.5 Bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5.3 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7 Numerical Analysis 73
7.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Fixed Point Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.3 Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.4 Error analysis and order of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3
III Unconstrained multivariate optimization 79
8.2 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.2.1 First order necessary condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.2.2 Second order sufficient conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.2.3 Higher order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.3 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9 Numerical methods 86
9.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.1.1 Descent direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.1.2 Line search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.1.3 Termination criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.2 Steepest descent method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.2.3 Scaling∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.3 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.4 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.4.2 Modified Newton’s methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9.5 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.5.1 Davidon-Fletcher-Powell (DFP) method . . . . . . . . . . . . . . . . . . . . . . . . 102
9.5.2 Broyden-Fletcher-Goldfarb-Shanno (BFGS) method . . . . . . . . . . . . . . . . . 103
9.6 Trust regions methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.6.2 Reliability index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.6.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.7 Least square problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.8 Nelder-Mead simplex method∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4
9.8.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.8.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10 Numerical Analysis 116

10.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.2 Fixed Point Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.3 Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.4 Error analysis and order of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
IV Constrained multivariate optimization 121

11.2 First order necessary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
11.2.1 Equality constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
11.2.2 Inequality constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
11.2.3 Equality and inequality constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11.3 Second order sufficient conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
11.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
11.5 Postoptimality analysis∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
11.5.1 Effect of a perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
11.5.2 Effect of changing constraint limits . . . . . . . . . . . . . . . . . . . . . . . . . . 137
11.5.3 Effect of objective function scaling on Lagrange multipliers . . . . . . . . . . . . . 138
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
12 Linear Programming 140

12.1 Standard form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
12.2 Basic solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
12.3 The Simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
12.4 Derivation of the Simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
12.4.1 Basic solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
12.4.2 Choice of nonbasic variable to become basic . . . . . . . . . . . . . . . . . . . . . 148
12.4.3 Choice of basic variable to become nonbasic . . . . . . . . . . . . . . . . . . . . . 149
12.4.4 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
12.5 The two phases of the Simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
12.6 The Big M method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5
12.7 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
13 Nonlinear programming 159

13.1 Quadratic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
13.2 Zoutendijk’s method of feasible directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
13.2.1 Search direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
13.2.2 Standard form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
13.2.3 Step size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
13.2.4 Initial feasible point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
13.2.5 Equality constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
13.2.6 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
13.3 Generalized reduced gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
13.3.2 Step size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
13.3.3 Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
13.3.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
13.4 Sequential Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
13.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
13.4.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
13.5 Sequential quadratic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
13.5.1 Equivalence with Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . 173
13.5.3 Practical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
13.5.4 Line search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
V Integer Programming 179
14 Numerical methods 180

14.1 Implicit enumeration method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
14.2 Branch and Bound method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
14.2.1 BIP problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
14.2.2 MIP problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
15 Modeling 187
15.1 Binary approximation of integer variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
15.2 Binary polynomial programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6
16 Applications 189
16.1 Classic problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
16.1.1 Suitcase problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
16.1.2 Class scheduling problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
16.1.3 Traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
16.2 Transportation and networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
16.2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
16.2.2 Transportation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
16.2.3 Assignment problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
16.2.4 Minimum distance problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
VI Global optimization 193
17 Genetic algorithms 194

17.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
17.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
17.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
17.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
17.4.1 Test problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
17.4.2 Initial population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
17.4.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
17.4.4 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
17.4.5 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
17.4.6 Matlab code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
18 Simulated Annealing 201

18.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
18.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
18.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
19 More global optimization methods∗ 206

19.1 Other stochastic methods∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
19.2 Deterministic methods∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
VII Multiobjective Optimization 208
20 Pareto Optimality 209

20.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7
20.2 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
20.2.1 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
20.2.2 Dominance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
20.2.3 Pareto optimal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
20.3 Generation of the Pareto frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
20.4 Single best compromise Pareto solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
20.4.1 Utopia point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
20.4.2 The minimax method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8
Part I
Fundamentals
9
Chapter 1
Introduction
This manuscript contains the class notes for Optimum Design of Mechanical Elements (AME60661). This
is an introductory, graduate-level course in applied optimization offered for Engineering students at the Uni-
versity of Notre Dame, Indiana. The course is divided into five main sections: mathematical foundations,
gradient-based optimization techniques (unconstrained and constrained), binary and integer programming,
and methods for global optimization (stochastic and deterministic). The objective is to present the most
relevant optimization strategies applicable to a large variety of problems in engineering. It will also pro-
vide the necessary elements to understand relevant literature in the field and develop new optimum design
methodologies.
1.1 Definition
The word optimization comes from the Latin optimum which means the best. Design optimization is the pro-
cess of finding the design that provides the best possible value of an objective function within the available
means. Optimization occurs at all times in nature. It has been observed that ants, for example, minimize
the energy required to obtain nutrients by communicating using pheromones. Predators look for easy prey
such as the young or ill to minimize the energy required to obtain their food. The kinematic configuration
of any solid is the one that minimizes its total potential energy. Bones are thought to minimize the use of
calcium while maximizing their mechanical stability. Plants and trees maximize their sun-exposed area and
roots while maintaining their mechanical performance against the forces of the environment.
Optimization is a process that humankind has performed since their existence on Earth and perform
during their whole life. The invention of the wheel is an optimum solution to minimize the drag during
transportation. The process of learning how to walk or how to speak is an optimization process that facilitates
mobility and communication while minimizing energy.
Optimization methods in engineering are also referred to as mathematical programming techniques.
These methods comprise a large area of applied mathematics. These notes deal with the theory and applica-
tions of mathematical programming techniques suitable for the solution of design problems in engineering.
10
In a broad sense, optimization methods can be applied to any engineering problem. Some remarkable exam-
ples include the design of aerospace structures for minimum weight, civil structures for maximum reliability,
automotive structures for maximum occupant protection, electrical networks for minimum power loss, con-
sumer products for minimum cost, trajectories for minimum distance, control for maximum stability, and
machinery for maximum efficiency, among many others.
1.2 History
The history of optimization comes along with the history of mathematics and can be traced to the days
of Isaac Newton (1643–1727), Joseph-Louis Lagrange (1736–1813), Augustin Louis Cauchy (1789–1857),
and Gottfried Leibniz (1646 – 1716). The development of differential calculus for optimization is usually
credited to the contributions of Newton and Leibniz. The foundations of calculus of variations were laid
out by Bernoulli, Euler, Lagrange, and Weierstrass. The use of gradient-based methods were first presented
by Cauchy (Cauchy, 1847) and Gauss. Lagrange proposed the method to solve constrained optimization
problems with the addition of unknown variables.
Modern optimization methods were introduced in the 1940s during War World II. During those years, the
term mathematical programming was used instead of optimization. Programming made reference to military
logistics and not to computers (Dantzig, 1951). ENIAC or Electronic Numerical Integrator And Computer
was the first Turing-complete, digital computer capable of being reprogrammed (Goldstine, 1972). It was
developed at about the same year that the Simplex method was presented by George Dantzig (1914–2005)
in 1947. William Karush (1917–1997), Harold William Kuhn (1925–), and Albert William Tucker (1905 –
1995) are known for the Karush-Kuhn-Tucker (KKT) conditions, a basic result in non-linear programming
presented in 1939 by Karush and in 1951 by Kuhn and Tucker. In 1960, Land and Doig presented branch and
bound methods. In 1964, Fletcher and Reeves presented the conjugate gradient method. In 1959 and 1963,
Davidon, Fletcher and Powell presented the conjugate direction (or DFP) method. In 1960, constrained
optimization methods were pioneered by Rose’s gradient projection method and Zoutendijk’s method of
feasible directions. In 1966, Abadie, Carpentier and Hensgen presented the generalized reduced gradient
(GRG) method. In 1971, Brent and others introduced the polynomial methods that were an improvement
to the ones using the Fibonacci numbers and the golden ratio. Sequential quadratic programming (SQP)
methods were developed in the 1970s by Biggs (1975), Han (1977), and Powell (1978) among others.
A new class of probabilistic methods for global optimization has been developed in the last decades.
In 1975, Holland introduced the genetic algorithms (GAs). Simulated annealing (SA) was independently
introduced by Kirkpatrick et al. (1983) and C̆erný (1985). Later, Dorigo (1992) presented the Ant Colony
Optimization (ACO) method. More recently, Kennedy & Eberhart (1995) introduced the particle swarm
optimization (PSO).
11
1.3 Components and formulation
A design optimization problems has three basic components: an objective or merit function, a set of design
variables, and a set of constrains.
1.3.1 Design variables

An engineering system is defined by a set of quantities (e.g., material, dimensions, strength). The fixed
quantities are referred to as design parameters while the independent ones are called design variables. The
design variables are represented as x where
e
 
x1
x2
 
 
x= .. , (1.1)
e  .

 
xn
The set of all design variables define the design space.
1.3.2 Objective function

The criterion used to defined whether or not a design is optimum is known as the merit, cost or objective
function. This function is represented as f (x). During the optimization process, the objective function is to
e
be maximized or minimized according to the engineering system requirements.
1.3.3 Design constraints

The conditions to be satisfied by the design variables are expressed in terms of design constraints in the
optimization problem. These constraints can be of two types: equality constraints and inequality constraints.
Inequality constraints can be expressed as g (x) ≤ 0 where
ee e
 
g1 (x)
e
 g2 (x)
 

g (x) =  . e
 . (1.2)
 ..

ee 
gr (x)
e
12
In the same way, equality constraints can be expressed as h(x) = 0
e e e
 
h1 (x)
e 
 h2 (x) 

h(x) = 
 .. e  . (1.3)
e e  . 
hm (x)
e
Constraints that represent limitations on the performance of the system are called functional constraints.
Constraints that represent limitations on the design variables are called geometric constraints. In general,
geometric constraints can be written as
xL ≤ x ≤ xU . (1.4)
e e e
The set of all constraints defines the feasible space.
1.3.4 Formulation
The design optimization process seeks to maximize or minimize the objective function by systematically
finding solutions for the design variables in the feasible space. This optimization problem can be written as
find x that minimizesf (x)

g (x) ≤ 0
e e
subject to
ee e (1.5)
h(x) = 0
xL ≤ x ≤ xU ,
e e e
e e e
or simply
min f (x)
x e
e
s.t. g (x) ≤ 0 (1.6)
ee e
h(x) = 0
xL ≤ x ≤ xU .
e e e
e e e
In some contexts, this problem is also formulated as
min f (x)
x e (1.7)
e
s.t. x ∈ Ω,
e
where Ω represents the feasible space. A simpler notation used is referred to the optimum point x∗ as the
e
argument that minimizes the objective function f (x). This is
e
x∗ = arg min f (x)
e x e (1.8)
e
s.t. x ∈ Ω,
e
13
or simply,
x∗ = arg min f (x). (1.9)
e x∈Ω e
e
1.4 Classification
Optimization problems might be classified according to different criteria.
1.4.1 Based on the objective function and the constraints

According to the nature of the objective and constraint functions, the optimization problems can be classified
into linear programming (LP) and non-linear programming (NLP) problems. For instance, if the objective
and the constraints are linear, the problem is referred to as a linear programming (LP) problem. When the
objective is quadratic and the constraints are linear the problem is a quadratic programming (QP) problem.
An example of a LP problem is
min f (x) = x1 + 2x2 + 3x3

x e
e
s.t. h1 (x) = x2 − 3x2 + 2 = 0
g1 (x) = x1 − 2x3 ≤ 0
e
g2 (x) = 2x2 + x3 − 6 ≤ 0.
e
e
An example of a QP problem is
min f (x) = x21 − x2 x3 + 2x1 + 3

x e
e
s.t. h1 (x) = x1 − x2 = 0
g1 (x) = 2x2 + x3 − 5 ≤ 0.
e
e
1.4.2 Based on the feasible space
According to definition of the feasible space, the optimization problems can be constrained or uncon-
strained. Unconstrained problems do not have any design constraints so the feasible space is the design
space.
1.4.3 Based on the design variables

According to the design variables, if they are required to be integers, the problem is called integer pro-
gramming (IP). If they are required to be binary, the problem is called binary integer programming (BIP)
or, simply, binary programming (BP). When some of the design variables are required to be integers, the
problem is called mixed integer programming (MIP). In many engineering applications, the design variables
are to be chosen from a discrete finite set, in this case the problem is called discrete programming (DP).
14
When the design variables are functions of an independent variable, then the problem is referred to as
a dynamic optimization problem. Otherwise, the problem is a static optimization problem (which is a term
not frequently used).
1.4.4 Based on the uncertainty

Most applications consider the cases in which the problem is described by deterministic variables and func-
tions and is referred to as a deterministic optimization problem. However, in engineering applications un-
certainty is always present. Those nondeterministic or probabilistic problems can be classified in two main
groups: reliability-based design optimization (RBDO) problems and robust design optimization (RDO)
problems. The combination of these methods is referred to as reliability-based robust design optimization
(RBRDO) problems.
1.4.5 Based on the field of application

Special applications of optimization methods in engineering generate specialized fields of study. Multi-
disciplinary design optimization (MDO) and collaborative optimization (CO), for example, refer to prob-
lems that incorporate different disciplines in the solution of an optimization problem. Structural optimization
(SO) refers to the design of structures. Some of these techniques involve the use of finite-element analysis.
Within FE-based there are three types of problems: parameter optimization, shape optimization, and topol-
ogy optimization. These problems surpass the objectives of this course but are covered in an intermediate
graduate-level course.
1.5 Solution methods

Optimization problems can be solved analytically or numerically. Analytical methods refer to formal mathe-
matical procedures which lead to exact solutions. Numerical methods refer to the use of computer algorithms
that lead to approximate solutions.
According to the nature of the numerical algorithm, optimization techniques are typically classified
into gradient-based and direct methods. Gradient-based methods make use of first and/or second-order
derivatives (i.e., gradients and Hessians). The process of finding gradients or first-order derivatives is called
sensitivity analysis. The gradient information shows the direction to find an optimum solution. On the
other hand, direct methods do not require sensitivity analysis. These methods explore the design space in a
systematic way. Gradient-based methods lead to local optima that might vary according to the initial design.
Direct methods are usually employed when searching for global optima; however, the number of iterations
required makes them prohibitive for large-scale problems.
Methods that incorporate optimality conditions, also referred to as Karun-Kush-Tucker (KKT) condi-
tions, into their algorithm are named optimality criterion (OC) methods. These methods are gradient-free
15
in the sense that no numeric sensitivity analysis is required. OC methods are very popular in structural
optimization.
Methods based on the experience of the designer are referred to as heuristic methods. Usually, these
methods lack mathematical foundation and rely on educated guesses, intuitive judgments, or common sense.
Meta-heuristic methods are mixed approaches that combine heuristics and mathematical analyses.
Exercises
1. Answer true (T) or false (F):
(a) In Latin, the word optimum means minimum.

(b) Optimization process is also known as engineering design process.
(c) A constraint is a merit function that has to be minimized or driven to zero.
(d) The feasible space is defined by functional and geometric constraints.
(e) Optimization problems are also referred to as mathematical programming problems.
(f) Geometric constraints are related to the performance of the system.
(g) The feasible space is always contained in the design space.
(h) An integer programming problem is the same as a discrete programming problem.
(i) A direct method does not require derivatives.
(j) In a quadratic programming problem the functions that define the objective and the functional
constraints are quadratic.
2. You want to find the length and height of a rectangle of perimeter P0 that has the maximum area.
(a) Identify the design variables.

(b) State the objective functions.
(c) State the functional and geometric constraints.
3. Consider an engineering problem in which you have to design a rectangular box of volume V0 with
the smallest surface area.

4. Consider an engineering problem in which you have to design a cylindrical can (including bottom and
top) of volume V0 with the smallest surface area.
16
5. You want to find the minimum distance between the points on the line y = x − 1 and the points on
the line y = x2 + 1.

17
Chapter 2
Mathematical foundation
This chapter reviews selected topics in real and numerical analysis. This material is covered in detail in texts
such as Atkinson (1978), Bartle (1976), and Greenberg (1998).
2.1 Vector Algebra

2.1.1 Norm of a vector
The norm of a vector x ∈ Rn , defined as ||x||, satisfies the following conditions:
e e
• ||x|| > 0 ∀x 6= 0, and ||x|| = 0 iff x = 0.
e e e e e e
• ||αx|| = |α|||x|| ∀α ∈ R
e e
• ||x1 + x2 || ≤ ||x1 || + ||x2 || (triangle inequality)
e e e e
The natural norm or Euclidean norm of x is defined as
e
q
||x||2 = x21 + x22 + · · · + x2n , (2.1)
e
where the subindex 2 denotes Euclidean space. In most cases, this is well understood so the subindex is not
used. This norm is referred to as L2 norm. The norm in non Euclidean spaces, referred to as Lp -norm, is
defined as
||x||p = (|x1 |p + |x2 |p + · · · + |xn |p )1/p , p ≥ 1. (2.2)
e
The norm L1 , also known as the taxicab norm or Manhattan norm, is defined as
||x||1 = |x1 | + |x2 | + · · · + |xn |. (2.3)

e
The norm L∞ is defined as
||x||∞ = max{|xi |}, i = 1, . . . , n. (2.4)
e
18
All norms are equivalent in the sense that each one is bounded by a multiple of the other. In fact,
||x||∞ ≤ ||x||1 ≤ n||x||∞ (2.5)

e e √e
||x||∞ ≤ ||x||2 ≤ n||x||∞ (2.6)
e e e
||x||∞ ≤ ||x||3 ≤ n1/3 ||x||∞ (2.7)
e e e
and so on. Notice that for every norm ||x|| = |x| when x ∈ R1 . The concepts presented in these notes are
e e
referred to Euclidean vector spaces.
Example. The L1 , L2 , L3 , and L∞ norms of xT = [1, 2, −3] are

e
• ||x||1 = |1| + |2| + | − 3| = 6
√
e p
• ||x||2 = 12 + 22 + (−3)2 = 14 ≈ 3.7416
e
1/3
• ||x||3 = |1|3 + |2|3 + | − 3|3 = 62/3 ≈ 3.3019
e
• ||x||∞ = max{|1|, |2|, | − 3|} = 3
e
2.1.2 Dot product

The dot product (or scalar product or inner product) of two vectors x1 , x2 ∈ Rn is defined as the sum of
e e
their components. This is
x1 · x2 = x1 T x2 = x11 x21 + x12 x22 + · · · + x1n x2n . (2.8)

e e e e
In some occasions, the notation x1 T x2 is used to represent dot product, where super-index T denotes trans-
e e
pose. Notice that the dot product is commutative,
x1 T x2 = x2 T x1 . (2.9)
e e e e
Also notice that
||x||22 = x· x = xT x. (2.10)
e e e e e
Example. The dot product of x1 = (1, −2, 3) and x2 = (−4, 5, −6)T is
T
e e
x1 T x2 = (1)(−4) + (−2)(5) + (3)(−6) = −32.
e e
It is possible to prove that

|x1 T x2 | ≤ ||x1 || ||x2 ||. (2.11)
e e e e
This is referred to as CauchySchwarz inequality.
19
2.1.3 Cross product
The cross product (or vector product or exterior product) of two vectors x1 , x2 ∈ R3 is defined as
e e
     
x11 x21 x12 x23 − x13 x22
x12  × x22  = x13 x21 − x11 x23  . (2.12)
     
x13 x23 x11 x22 − x12 x21
In an Euclidean space this results in another vector which is perpendicular to the plane containing the two
input vectors.
Example. The cross product of x1 = (1, −2, 3)T and x2 = (−4, 5, −6)T is x3 = (−3, −6, −3)T .
e e e
2.1.4 Tensor product

The tensor product (or matrix product or outer product) of two vectors x1 ∈ Rn and x2 ∈ Rm results in a
matrix in Rn×m defined as
e e
 
x11 x21 x11 x22 · · · x11 x2m
 x12 x21 x12 x22 · · · x12 x2m 
 
x1 ⊗ x2 = x1 xT2
=
 .. .. .. .. . (2.13)
e e e e  . . . . 
x1n x21 x1n x22 · · · x1n x2m
Example. The outer product of xT1 = (1, −2, 3) and xT2 = (−4, 5) is
e e
 
−4 5
A= 8 −10 .
 
e
e −12 15
2.2 Linear Dependence

A set of vectors {x1 , . . . , xm } is said to be linearly dependent (LD) if at least one of them can be expressed
e e
as a linear combination of the others. If none can be expressed, then the set is linearly independent (LI).
From this, the set of vectors is LI if and only if the linear combination
α1 x1 + · · · + αk xm = 0 (2.14)
e e e
implies that all αi are zero. To prove this, assume that some αk 6= 0. Then we can divide by αk and express
20
xp in terms of the other vectors, this is
e
α1 αm
xk = x1 + · · · + xm . (2.15)
e αk e αk e
Clearly, this condition prevents from linear independency. When two vectors are LD, (2.11) is an equality.
Example. Determine if the following vectors are LI,

1 1 5
x1 = , x2 = , x3 = .
e 0 e 1 e 4
By inspection, x3 = 4x2 + x1 . Therefore the vectors are LD.

e e e
Example. Determine if the following vectors are LI,
    
2 0 2
     
0
 , x2 = 1 , x3 = 2 .
   
x1 = 
1 e 1 e
e     3
 
−3 1 0
LI implies that if α1 x1 + α2 x2 + α3 x3 = 0, then αk = 0. In matrix form this is

e e e e
   
2 0 2   0
  α1  
 0 1 2   0
 1 1 3 α2  = 0 ,
   
α3
   
−3 1 0 0
or,  
2
0 2 0
 
 0 1 2 0
 1 1 3 0 .
 
 
−3 1 0 0
Using row operations  
1 0 1 0
 
0 1 2 0
0 0 0 0 .
 
 
0 0 1 0
Therefore, α1 = 0, α2 = 0, and α3 = 0. In conclusion, the set of vectors is LI.
An alternative solution is obtained by pre-multiplying the equation Aα = 0 by AT . In this way,
e
ee e e
e
AT Aα = AT 0 = 0
e
e eee e e e e
21
If det(AT A) 6= 0 then α = 0 and the system is LI, otherwise the system is LD.
e
e ee e e
For any matrix A, the number of LI row vectors is equal to the number of LI column vectors and these in
e
turn equal the rank efor A. Thus, if we wish to determine how many vectors in a given set x1 , . . . , xk are LI
e e e
one can form a matrix A e with x1 , . . . , xk as the rows (or columns) and then use elementary row operations
e e e
to determine the rank ofe A or rank(A).
e
e e
e
Example. How many LI vectors are contained in the following set of vectors
       
2 1 −3 4
       
 , x2 = 4 , x3 = −2 , x4 =  5  .
2      
x1 = 
0 e 3 e 1 e
e       3
 
2 1 −3 −2
Constructing the matrix A = (x1 x2 x3 x4 ), this is

e
e e e e e
 
2 1 −3 4
 
2 4 −2 5 
A= 
e 0 3 1 3
e 

2 1 −3 −2
after elementary row operations we can obtain that

 
1 1/2 −3/2 2
 
0 1 1/3 1/3
A→
0

e
e  0 0 1 

0 0 0 0
so rank(A) = 3. Hence, there are three LI vector in the set. Another way to put it is dim[span{x1 , x2 , x3 , x4 }] =
e e e e e
3. e
2.3 Systems of linear equations

A system of linear equations can be expressed as
Ax = b. (2.16)
e
ee e
where the matrix A ∈ Rm×n applies the vector x ∈ Rn into b ∈ Rm . The system has at least one solution
e e e
when b is a linear combination
e of the column vectors of A. In other words, when b belongs to the span of A.
e e
e e e
e
22
The system is consistent if it has at least one solution. In this case
rank(A) = rank(A, b).

e
e e
e e
The system is inconsistent if is has no solution. In this case
rank(A) < rank(A, b).

e
e e
e e
The system is underdetermined if it has an infinite number of solutions. In this case
rank(A) = rank(A, b) < n.

e
e e
e e
The system is uniquely determined if it has a unique solution. In this case
rank(A) = rank(A, b) = n.
e
e e
e e
In the special case in which m = n = rank(A) = rank(A, b), then the unique solution can be expressed
e e
e e
as e
x = A−1 b. (2.17)
e e e e
The system is said to be overdetermined when it has more equations than unknowns, this is m > n. An
overdetermined system can be consistent or inconsistent.
Example. Solve for x in the following systems of linear equations,

! ! e !
1 2 x1 1
(a) = .
3 4 x2 1
! ! !
1 2 x1 1
(b) = .
2 4 x2 2
! ! !
1 2 x1 2
(c) = .
2 4 x2 2
   
1 2 ! 1
 x1
(d) 3 4 = 2 .
  
x2
5 6 3
   
1 2 ! 1
 x1
(e) 3 4 = 2 .
  
x2
5 6 5
In every case, let us compare rank(A) and rank(A, b) to determine if the system is consistent or incon-
e e
e e
sistent. e
23
(a) rank(A) = rank(A, b) = 2 then the system is consistent. In fact, since n = 2 the system is uniquely
e e e
determined
e and theesolution is given by
! !−1 ! !
x1 1 2 1 −1
= = .
x2 3 4 1 1
(b) rank(A) = rank(A, b) = 1 then the system is consistent. However, n = m = 2 > 1 so A is

e e
e e e
not invertible,
e is underdetermined, and has an infinite number of solutions. After row operationseare
performed on the extended matrix (A, b), these solutions can be expressed as
e
e e
! ! !
x1 1 −2
= +α .
x2 0 1
(c) rank(A) = 1 and rank(A, b) = 2 then the system is inconsistent and it has no solution. This can also
e e
e e
be observed
e from row operations on the extended matrix (A, b),
e
e e
! !
1 2 2 1 2 2
→ ,
2 4 2 0 0 −1
where 0 = −1 is inconsistent.
(d) rank(A) = rank(A, b) = 2 then the system is consistent. Since n = 2 the system is uniquely deter-
e e
e e is found by row operations on the extended matrix (A, b),
minedeand the solution
e
e e
   
1 2 1 1 2 1
3 4 2 → 0 2 1
   
5 6 3 0 0 0
The unique solution is given by ! !

x1 0
= .
x2 0.5
(e) rank(A) = 2 and rank(A, b) = 3 then the system is inconsistent and it has no solution.
e
e e
e e
2.4 Eigenvalue problem

In linear algebra, given matrix A ∈ Rn×n the eigenvalue problem is to find a vector e such that Ae is a
e e ee
e
multiple of e. This is e
e
Ae = λe, (2.18)
e
ee e
24
where e is referred to as eigenvector and λ as the eigenvalue. Premultiplying (2.18) by the identity matrix I
e e
yields e
Ae = λI e, (2.19)
e
ee e
ee
therefore
(A − λI )e = 0. (2.20)
e
e e
ee e
The eigenvalue problem has a unique trivial solution e = 0 if det(A − λI ) 6= 0, and non-trivial solutions if
e e e e
and only if e e
det(A − λI ) = 0. (2.21)
e
e e
e
This algebraic equation in λ is known as the characteristic equation corresponding to the matrix A and
e
e λ,
its roots are the eigenvalues of A. The algebraic multiplicity of an eigenvalue λ, represented as mul A
e e
corresponds to the multiplicity of the corresponding root of the characteristic equation. The geometric
e e
multiplicity of an eigenvalue, λ, represented as gmulA λ, is defined as the dimension of the associated

e
eigenspace, i.e., number of linearly independent eigenvectors with that eigenvalue. One can observe that the
e
algebraic multiplicity of an eigenvalue is greater than or equal to its geometric multiplicity, this is
mulA λ ≥ gmulA λ.
e
e e
e
Example. Determine all eigenvalues and eigenvectors of

!
2 −4
A= .
e
e −1 −1
The characteristic equation is

2−λ −4
= 0,

−1 −1 − λ
this is
(2 − λ)(−1 − λ) − 4 = 0.
So the eigenvalues of A are λ1 = 3 and λ2 = −2 (or vice versa since the order is immaterial).
e
To find the eigenvectors,
e replace λ1 in (2.20), so
!
2−3 −4 e11 0
= .
−1 −1 − 3 e12 0
This system of equations is LD. The only equation to be solve can be expressed as
−e11 − 4e12 = 0.
25
The solution is e11 = α (arbitrary), and e12 = α/4, or
e e

1
e1 = α .
e − 14
Repeating the same procedure,

1
e2 = β .
e 1
Example. Determine all eigenvalues and eigenvectors of

 
2 2 1
A = 1 3 1
 
e
e 1 2 2
The characteristic equation is

(λ − 5)(λ − 1)2 = 0,
so the eigenvalues are λ1 = 5 and λ2 = 1, with λ2 = 1 called a repeated eigenvalue, specifically, an

eigenvalue of algebraic multiplicity 2. The eigenvector corresponding to λ1 is
 
1
e1 = α 1 .
 
e
1
The eigenvector for λ2 is

   
−1 −2
e2 = β  0  + γ  1 
   
e
1 0.
Since two independent eigenvectors are obtained, one says that the geometric multiplicity of this eigenvalue
is 2.
The collection σA of all eigenvalues (possibly complex-valued) of A is the spectrum of A. The number
e
e e
e e
e
ρ(A) = max |λ| (2.22)
e λ∈σA
e
is the spectral radius of A. If A is symmetric, then all it eigenvalues are real numbers.
If A ∈ Rn×n , then the e norm || · || : Rn×n → R is defined as
e e
e matrix
e
e
||Ax||
||A|| = max e ee . (2.23)
e ||x||≤1 ||x||
e e e
26
or
||Ax||
||A|| = max e ee . (2.24)
e x 6=0 ||x||
e e e e
From this definition one observes that
||Ax|| ≤ ||A|| ||x|| (2.25)
e
ee e
e e
for any Lp norm. Since Ae = λe, ones has ||Ae|| = |λ| ||e||. From (2.25) follows that ||A|| ≥ |λ|.
e
ee e e
ee e e
Therefore, e
||A|| ≥ ρ(A). (2.26)
e
e e
e
p
Each L norm has a useful matrix expression (Allen & Isaacson, 1998). For instance,
||A||∞ = max nj=1 |aij |, (maximum row sum)

P
e 1≤i≤n
e Pn
||A||1 = max i=1 |aij |, (maximum colum sum)
e 1≤j≤n
q
ρ(AT A).
e
||A||2 =
e
e e
e ee
When A is symmetric, AT A = A2 . The eigenvalues of A2 are simply the squares of the eigenvalues of A.
e e
e e
e when A is symmetric, e e e
Therefore, e e e e
e q
||A||2 = ρ(A2 ) = ρ(A).
e
(2.27)
e
e e
e e
e
Example. Consider the matrix !
0 1
A ,
e −1 0
e
then ||A||2 = 1 = ||A||1 = ||A||∞ .
e
e e
e e
e
2.5 Quadratic forms

A function f (x) is called a quadratic form if it can be written as
e
f (x) = xT Ax (2.28)
e e eee
where A is symmetric, this is
e
e  
A + AT
A = e
e e e . (2.29)
e
e 2
Example. The quadratic function f (x) = x1 x2 + 2x1 + 2 is not a quadratic form.

e
27
Example. Determine if the following quadratic function is a quadratic form:
f (x) = x22 + 5x23 + 6x1 x3 − x2 x3 + 10x3 x4

e
One can observe that
  
0 0 3 0 x1
 − 21
 
0 1 0 x2 
 
f (x) = x1 x2 x3 x4 
3 − 1  .

e  2 5  x3 
5
0 0 5 0 x4
Therefore f (x) is a quadratic form.

e
A quadratic form is said to be canonical if all cross-product terms (i.e., xi xj , i 6= j) are absent. For
example, f (x) = a11 x21 + . . . + ann x2n is canonical, and its associated matrix
e
 
a11 0
A=
 .. 
 . 

e
0 ann
e
is diagonal.
By a linear change of variables, a quadratic form can be reduced to its canonical form. Consider the
following transformation
x = Qx̂. (2.30)
e ee
e
Replacing into the quadratic form (2.28),
f (x) = (Qx̂)T A(Qx̂)

e e e e e e e
e e
which is equal to
f (x) = x̂T QT AQ x̂ = x̂T Λx̂. (2.31)
e e |e {ze
e } e e e ee
e e e
diagonal
It can be shown that the matrix Λ = QT AQ is diagonal and its (diagonal) entries are the eigenvalues of A.
e e e
The columns of the matrix Q correspond e the LI normalized eigenvectors of A. This matrix Q is referred
e etoe
e e e
e e
e e
e e
28
to as the orthogonal matrix corresponding to A. The above explains that
e
e
 
λ1 0
λ2
 
T −1
 
Q AQ = Q AQ = Λ =  
..
. (2.32)
.
e e e 
e
e ee e e
e ee e e  
0 λn
In this way, a quadratic form f (x) can be expressed as

e
f (x) = λ1 x̂21 + λ2 x̂22 + · · · + λn x̂2n .
e
Example. Find the canonical form of the following quadratic form:
f (x) = 3x21 + 3x22 + 2x1 x2 .

e
One observes that !
3 1
A= .
e
e 1 3
The eigenvalues of A are
e
e 1 1
λ1 = 4, e1 = √
e 2 1
and
1 1
λ2 = 2, e2 = √ .
e 2 −1
Therefore,
f (x) = 4x̂21 + 2x̂22 ,
e
where !
x1 1 1 1x̂1
=√ .
x2 2 1 −1 x̂2
2.6 Positive definite matrices

A quadratic form xT Ax is positive definite (i.e., definitively positive) if xT Ax > 0 for all x 6= 0 and
xT Ax = 0 if and onlyeif x = 0. In this case, A is also said to be positive definite.
e ee e ee e Similarly, the
e e
quadratic
definite if xT A
e ee eand the matrix are negative
e e e
e x < 0 for all x 6= 0. It can also be proved that A is negative
form
e e e eT
e e semi-definite e
definite if −A is positive definite. A is positive if x Ax ≥ 0 for all x 6= 0 and e negative
semi-definite eif xT Ax ≤ 0. In any other

e e e e
ee
e case, A is referred to as indefinite. e e
e eee e
e
29
2.6.1 Sylvester’s criterion
One can verify that A and its quadratic form xT Ax are positive definite using the criterion proposed by
e e ee eAccording to this criterion, if all the principal minors
James Joseph Sylvester
e (09/03/1814 – 03/15/1897).
of A are positive then A and its quadratic form xT Ax are positive definite. The principal minors are the
e e e eee
determinants
e of the sub-matrices
e Ai of A. The i-th principal minor det(Ai ) corresponds to the determinant
e e e
of the i-by-i sub-matrix obtained from the upper left corner of A.
e e e
e
e to be non-negative. Using this criterion, A
For positive semi-definite matrices, all principal minors have
and its quadratic form xT Ax are negative definite if det(Ai ) < 0 for i odd and det(Ai ) > 0 for i even. For
e
e
e e e e e
negative semi-definite matrices,
e the criterion can be relaxed
e for the inequalities to include
e zero. A and its
quadratic form xT Ax can also be indefinite if none of the previous conditions are satisfied.
e
e
e eee
Example. Using Sylvester’s criterion, determine if the following matrix is positive definite.
 
−5 2 3
A= 2 −2 0
 
e
e 3 0 −4
The principal minors are

det(A1 ) = det(−5) = −5
e
e !
−5 2
det(A2 ) = det =6
e
e 2 −2
det(A3 ) = det(A) = −6
e
e e
e
Clearly, A is negative definite. We can also say that the nature of A is negative definite. It can be verified
that −A e
e e
is positive definite. e
e
e
2.6.2 Eigenvalue criterion
It is also possible to verify that A and its quadratic form are xT Ax are positive (or negative) definite from
e e ee
the eigenvalues of A, λ1 , . . . , λne. If λi > 0 (or λi < 0) for all ie= 1, . . . , n then A and its quadratic form
xT Ax are positive (or
e
e negative) definite. If λ ≥ 0 (or λ ≤ 0) for i = 1, . . . , n, e ethen A and its quadratic
i i
e e
forme xT Ax are positive (or negative) semi-definite.
e e
e
e eee
Example. Determine if the following quadratic form is positive definite.
f (x) = 2x21 + 4x22 + 4x23 + 2x1 x2 − 2x1 x3 + 6x2 x3 .

e
30
The symmetric matrix A is defined as
e
e
 
2 1 −1
A= 1 4 3 ,
 
e
e −1 3 4
and its eigenvalues are λ1 = 7, λ2 = 3, and λ3 = 0. Therefore, the nature of the quadratic form is positive
semi-definite. It can be observed that f (x) = 7x̂21 + 3x̂22 + 0x̂23 = 0 for x̂ = (0, 0, α)T 6= 0. Notice that the
e e e
same result is obtained using Sylvester criterion.
2.7 Sets in Rn
Let us consider a subset Ω ⊂ Rn , then:
• Ω is open if every x0 ∈ Ω has a neighborhood N of radius r,

e
N = {x ∈ Rn : ||x − x0 || < r}
e e e
such that N ⊂ Ω. For example, Ω = {x ∈ Rn : ||x|| < 1} is an open subset of Rn .
e e
• Ω is closed if its complement Ω̄, Ω ∪ Ω̄ = R , is an open set. For example, Ω = {x ∈ Rn : ||x|| ≤ 1}
n
is a closed subset of Rn .
e e
• Ω is neither closed or open if it does not satisfy one of the two previous conditions. For example,
Ω = {(x1 , x2 )T ∈ R2 : −1 < x1 ≤ 1 and − 1 < x2 ≤ 1} is neither closed or open.
• Ω is bounded if it is contained within a hypersphere of finite radius. In other words, there is a real
number M such that ||x|| ≤ M for all x ∈ Ω. For example, Ω = {(x1 , x2 )T ∈ R2 : −1 < x1 ≤
1 and − 1 < x2 ≤ 1} is bounded and Ω = {(x1 , x2 )T ∈ R2 : x1 ≥ 0 and x2 ≥ 0} is unbounded.
e e
• Ω is compact if it is both closed and bounded. For example, Ω = {(x1 , x2 )T ∈ R2 : −1 ≤ x1 ≤

1 and − 1 ≤ x2 ≤ 1} is a compact subset of R2 .
• Ω is connected if it cannot be represented as the disjoint union of two or more nonempty open subsets
in the relative topology induced on the set. In Euclidean space, Ω is connected if every two points in
Ω be connected by a finite number of straight segments and every point within these segments belongs
to Ω. For example, Ω = Rn \{0} is disconnected for n = 1 and connected for n > 1.
e
• In Euclidean space, Ω is convex if every two points in Ω can be connected with a straight line and
every point in this line also belongs to Ω. For example, Ω = Rn is convex and Ω = Rn \{0} is
e
non-convex.
31
2.8 Functions
A function f projects every element from its domain D(f ) to an element in its range R(f ). This is
f : D(f ) → R(f )
(2.33)
: x 7→ f (x).
e e
The domain is the set of all input elements and the range is the set of all output elements. If the function
applies a vector into a scalar, i.e., D(f ) ⊂ Rn and R(f ) ⊂ R1 , then it is referred to as real or scalar
function. If the function applies a vector into another vector, i.e., D(f ) ⊂ Rn and R(f ) ⊂ Rm , then it is
referred to as vector function. In this text, scalar and vector functions will also be expressed as f (x) and
eT
f (x), respectively. For example, f (x) = x1 + x2 is a scalar function while f (x) = (x21 , x1 + x2 ) is a
e e function.
vector
e e e
A scalar function f is continuous at x0 ∈ D(f ) if and only if
e
lim f (x) = f (x0 ).
x→x0 e e
e e
A function that is continuous at every point of its domain is called a continuous function. More generally, a
function that is continuous on some subset Ω ⊂ D(f ) is said to be continuous in Ω. A function that is not
continuous is called discontinuous.
df d2 f dk f
Let Ω ⊆ R1 and f : Ω → R1 . If the derivatives dx , dx2 , . . . , dxk exist and are continuous in Ω then it is
said that f is of class C k , C k continuous, or simply f ∈ C k in Ω. The function is smooth if f ∈ C ∞ . The
function is analytic of class C ω , if it is smooth and it equals its Taylor series expansion around any point in
its domain.
∂f
Let Ω ⊆ Rn and f : Ω → R1 . If f is continuous then f ∈ C 0 in Ω. If every function ∂xi , i = 1, . . . , n,
2f
is continuous then f ∈ C 1 in Ω. If every function ∂x∂i ∂x j
, i, j = 1, . . . , n, is continuous then f ∈ C 2 in Ω.
k
In general f ∈ C k if all of the partial derivatives ∂xi1∂,...∂x
f
ik
, i = 1, . . . , n, exist and are continuous. Classes
C ∞ and C ω are defined as before.
Example. Determine the continuity of the following function in R:
1
f (x) = .
x−2
The function is not defined in x = 2, therefore it is discontinuous.
f (x) = |x|.
32
One can see that 
 x for x > 0

f (x) = 0 for x = 0

−x for x < 0

which is a continuous function. Now,


∂f  1 for x > 0

= not defined for x = 0
∂x 
−1 for x < 0

which is a discontinuous function in x = 0. Therefore, f ∈ C 0 .
f (x) = [max(0, x − 3)]4 .
∂f ∂2f
f (x) is continuous. ∂x = 4[max(0, x − 3)]3 is continuous. ∂x2
= 12[max(0, x − 3)]2 is continuous.
∂3f ∂4f ∂4f
∂x3
= 24[max(0, x − 3)] is continuous. ∂x4
= 0 for x < 3, and ∂x4
= 24 for x ≥ 3 then f is discontinuous
in x = 3. therefore, f ∈ C 3.
Example. Determine the continuity of the following function:
f (x) = x21 + x22 .

e
∂f ∂f ∂2f ∂2f
∂x1 = 2x1 and ∂x2 = 2x2 are continuous functions in R2 , then f ∈ C 1 . Now, ∂x21
= 2, ∂x22
= 2, and
∂2f
∂x1 ∂x2 = 0 are also continuous, then f ∈ C 2 . All the upper order derivatives are continuous, then f is
smooth or f ∈ C ∞ .
2.9 Gradient and Jacobian

Given a scalar function f : Rn → R1 , such that f ∈ C 1 , the gradient vector (or simply, gradient) of f
evaluated at x is defined as
∂f
e  
∂x1
 ∂f 
 ∂x2 
∇f (x) =  .  .
  (2.34)
e  .. 
∂f
∂xn
Example. The gradient of f (x) = x21 − 2x2 is

e
!
2x1
∇f (x) = .
e −2
33
The differential of a vector function f : Rn → Rm is defined by the Jacobian matrix
 ∂f1 ∂f1 ∂f1

∂x ∂x2 ... ∂xn
 ∂f21 ∂f2 ∂f2 
∂f (x)  ∂x1 ∂x2 ... ∂xn 
Df (x) = e e =  .  .. .. ..  .
 (2.35)
e e ∂x  .. . . . 
e
∂fm ∂fm ∂fm
∂x1 ∂x1 ... ∂xn
Example. Consider a vector function f : R3 → R4 such that

   
f1 (x) x1
 e  
f2 (x)  5x3 
f (x) =  e  =  2
   .
e e  f3 (x )  4x 2 − 2x 3


e
f4 (x) x3 sin(x1 )
e
Then, its Jacobian matrix is
 
1 0 0
 
 0 0 5 
Df (x) = 
 .
e e  0 8x2 −2  
x3 cos(x1 ) 0 sin(x1 )
2.10 Hessian
Given a scalar function f : Rn → R1 , such that f ∈ C 2 , the Hessian matrix (or simply, Hessian) evaluated
at x is defined as
∂2f ∂2f ∂2f
e  
∂x21 ∂x1 ∂x2 ... ∂x1 ∂xn
∂ 2 f2 ∂2f 
 

∂x22 ∂x2 ∂xn 
∇2 f (x) =  . (2.36)

.. ..
e 
 . . 

∂ 2 fm
symm ∂x2n
Observe that the Hessian of a function f corresponds to the Jacobian of its gradient. This can be ex-
pressed as
∇2 f (x) = D∇f (x). (2.37)
e e
Example. Determine the Hessian of f (x) = x31 + x1 x2 − 2x22 . Let us determine the first derivatives of f
e
34
with respect to xi ,
∂f (x) 2 ∂f (x)
e = 3x1 + x2 , e = x1 − 4x2 .
∂x1 ∂x2
Therefore, the gradient f is given by !
3x21 + x2
∇f (x) =
e x1 − 4x2
The Jacobian of the gradient, or the Hessian of f , is obtained from the second derivatives,
∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x)

e = 6x1 , e = −4, e = e = 1.
∂x21 ∂x22 ∂x1 x2 ∂x2 x1
Finally, the Hessian of f is !

6x1 1
∇2 f (x) = .
e 1 −4
2.11 Taylor’s theorem

2.11.1 Single-variable formulation
If f ∈ C m in the closed interval [x0 , x] and f ∈ C n+1 in the open interval (x0 , x), then the Taylor’s
expansion of f about the point x0 is
f (x) = Pm (x) + Rm (x), (2.38)
where Pm (x) is a polynomial of degree m given by

m
X 1 dk f (x0 )
Pm (x) = (x − x0 )k (2.39)
k! dxk
k=1
00
f 0 (x0 ) f (x0 ) f (m) (x0 )
= f (x0 ) + (x − x0 ) + (x − x0 )2 + · · · + (x − x0 )m , (2.40)
1! 2! m!
and Rm (x) is the remainder term, which is smaller in magnitude than the previous terms if x is close enough
to x0 . Commonly the reminder is given by the Cauchy formulation, in which
x
f (m+1) (τ )
Z
Rm (x) = (x − τ )m dτ. (2.41)
x0 m!
An alternative expression corresponds to the Lagrange formulation,
f (m+1) (ζ)
Rm (x) = (x − x0 )(m+1) (2.42)
(m + 1)!
35
where ζ ∈ [x0 , x].
Example. Determine the Taylor series expansion of f (x) = exp(x) about x = 0.

∞
x2 x3 x4 X xk
f (x) = 1 + x + + + + ··· = .
2 3! 4! k!
k=0
2.11.2 Multivariate formulation

If f ∈ C m in an open subset of Rn that contains x and x0 , then the Taylor’s expansion of f about the point
e e
x0 is
e
f (x) = Pm (x) + Rm (x), (2.43)
e e e
where Pm (x) is a polynomial of degree m given by
e
 
m k
X 1  X ∂ f (x0 )
Pm (x) = (xi − x0i1 )(xi2 − x0i2 ) · · · (xik − x0ik ) , (2.44)
k! ∂xi1 ∂xi2 · · · ∂xik 1
e
e k=0 i1 ,i2 ,...,ik
and Rm (x) is the reminder term given by

e
∂ m+1 f (c(τ ))
Z 1
X (1 − τ )m
Rm (x) = (xi1 − x0i1 )(xi2 − x0i2 ) · · · (xim+1 − x0im+1 ) dτ,
m! ∂xi1 ∂xi2 · · · ∂xim+1
e
e i1 ,i2 ,...,im+1 0
(2.45)
where c(τ ) = (1 − τ )x0 + τ x defines the line through x and x0 .
If f ∈ C 1 then its linear approximation about the about the point x0 is
e e e e e
e
fL (x) = P1 (x) (2.46)
e e
= f (x0 ) + ∇f (x0 )T (x − x0 ). (2.47)
e e e e
If f ∈ C 2 then its quadratic approximation about the about the point x0 is
e
fQ (x) = P2 (x) (2.48)
1
e e
= f (x0 ) + ∇f (x0 )T (x − x0 ) + (x − x0 )T ∇2 f (x0 )(x − x0 ). (2.49)
e e e e 2 e e e e e
Example. Consider the following scalar function, f (x) = exp(x1 x2 ). Find its linear and quadratic approx-
imations about x0 = (1, 2)T .
e
e
The linear approximation is
fL (x) = exp(2) + 2 exp(2)(x1 − 1) + exp(2)(x2 − 2).

e
36
The quadratic approximation is
exp(2)
fQ (x) = fL (x) + 2 exp(2)(x1 − 1)2 + 3 exp(2)(x1 − 1)(x2 − 2) + (x2 − 2)2 .
e e 2
2.12 Matrix calculus

Matrix calculus is a widely used notation for doing multivariable calculus over spaces of matrices. This
notation is well-suited to take derivatives of matrix-valued functions (e.g., linear and quadratic forms). For
instance, the derivative of a linear form is
d T d T
A x= x A = AT . (2.50)
dx e e dx e e e
ee e e e
The derivative of a quadratic form is
d T
x Ax = xT (AT + A). (2.51)
dx e e e e e e
e e e e
The derivative of a function f (x, xT ) is

e e
T
df (x, xT ) ∂f (x, xT ) ∂f (x, xT )

= + (2.52)
dx ∂x ∂xT
e e e e e e
e e e
and, consequently, the gradient is
T
∂f (x, xT ) ∂f (x, xT )

T
∇f (x, x ) = + e e . (2.53)
∂x ∂xT
e e
e e
e e
Example. Let the internal energy U be expressed as
1
U (x) = (xT K x),
e 2 e eee
where x is the nodal displacement vector and K is the stiffness matrix. The derivative of U with respect to
e e
x is given by e
e dU 1
= xT (K T + K ).
dx 2e e e e
e
e
T
since K is symmetric, then K + K = 2K and
e
e e
e e
e e
e
dU
= xT K
dx e ee
e
37
or
∇U (x) = K x.
e e
ee
Example. In finite element analysis, the equilibrium condition can be obtained by the derivation of the
potential energy Π with respect to x, where
e
1
Π = xT K x − xT f ,
2e eee e e
and f is the nodal load vector. Then,
e ∂Π
= xT K − f T .
∂x e ee e
e
When this expression is equal to 0, then
xT K = f T
e
e ee
or
Kx = f .
e
ee e
Exercises
1. For x1 T = (1, 8, 3) and x2 T = (3, −3, 1) determine the dot, cross, and tensor products. Show all your
e e
work.
2. Show that the triangle inequality is satisfied by xT1 = (1, −2, 2, 4) and xT2 = (−1, 7, −3, −5). Use L3
e e
norm.
3. Show that the following vectors are LD by expressing one of the vectors as a linear combination of
the others:
xT1 = (1, 0, 0), xT2 = (0, 1, 0), xT3 = (1, 1, 0), xT4 = (1, −1, 0)
e e e e
4. Determine if the following vectors are LI or LD. If they are LD, then give a linear relation among the
vectors:
xT1 = (2, 0, 1, 1, 0), xT2 = (1, 2, 0, 3, 1), xT3 = (4, −4, 3, −9, −2)
e e e
5. Find the eigenvalues and the eigenvectors of the following matrices. Computations must be done by
38
hand (step by step).
     
1 0 1 −1
−2 −1 1 −4 −4
A = 1 1 0  , B =  8 −11 −8 , C =  8 −11 −8
     
e e e
e 0 0 1 e −10 11 7 e −8 8 5
6. Determine if the following functions are quadratic forms.
• f (x) = cos(x1 x2 ) + sin(x21 ) − 2x22

e
• f (x) = 21 (2x21 + 2x1 x2 + 4x1 x3 − 6x22 − 4x2 x3 + 5x23 )
e
• f (x) = x21 + x22 + x23 + 1
e
7. (Arora, 2005; Problems 4.10, 4.12, 4.18) Determine the nature of the following quadratic forms:
• f (x) = 2x21 + 2x22 − 5x1 x2

e
• f (x) = 3x21 + x22 − x1 x2
e
• f (x) = 2x21 + x1 x2 + 2x22 + 3x23 − 2x1 x3
e
• f (x) = 2x21 + 4x22 + 4x23 + 2x1 x2 − 2x1 2x3 + 6x2 x3
e
8. (Arora, 2005; Problems 4.2, 4.4, 4.6, 4.8) Write the Taylor series expansion for the following functions
up to quadratic terms
• f (x) = cos(x) about the point x = π/4

• f (x) = sin(x) about the point x = π/6
• f (x) = exp(x) about the point x = 0
• f (x) = 10x41 − 20x21 x2 + 10x22 + x21 − 2x1 + 5 about the point xT = (1, 1) and compare
e e
approximate and exact values of the function at the point (1.2, 0.8).
• f (x) = (1 − x1 )2 + 100(x2 − x21 )2 about the point (2, 2)
e
9. Determine the continuity of the following functions in the corresponding space Rn
1
• f (x) = x
• f (x) = x1 x2 + |x3 |
e
• f (x) = xx12
e
10. Calculate the gradient and the Hessian of the following functions
• f (x) = (x1 − 10x2 )2 + 5(x3 − x4 )3 + (x2 − 2x3 )4 + 10(x1 − x4 )4

e
• f (x) = x1 sin(x1 x2 )
e
• f (x) = 12 (2x21 + 2x1 x2 + 4x1 x3 − 6x22 − 4x2 x3 + 5x23 )
e
39
Chapter 3
Numerical foundation
3.1 Numeric differentiation

The derivative of a function f with respect to x and evaluated at x0 can be expressed as
df (x0 ) f (x0 + h) − f (x0 )

= lim . (3.1)
dx h→∞ h
From Taylor’s series, one observes that
df (x0 )
f (x0 + h) = f (x0 ) + h + R2 (x).
dx
Disregarding the remainder, one obtains that
df (x0 ) f (x0 + h) − f (x0 )

≈ . (3.2)
dx h
Equation (3.2) is known as Newtons forward difference formula. The parameter h is referred to as the
difference parameter or perturbation. As a rule of thumb,
h = max{10−6 , 0.01x0 }.
Alternatives to the forward difference formulation are the backward difference,
df (x0 ) f (x0 ) − f (x0 − h)

≈ , (3.3)
dx h
and the central difference formula,
df (x0 ) f (x0 + h/2) − f (x0 − h/2)

≈ . (3.4)
dx h
Using the forward difference formula (3.2), the second derivative of f with respect to x evaluated at x0
40
can be expressed as
f 0 (x0 + h) − f 0 (x0 )
f 00 (x0 ) ≈ , (3.5)
h
where
f (x0 + h) − f (x0 )
f 0 (x0 ) ≈
h
and
f (x0 + 2h) − f (x0 + h)
f 0 (x0 + h) ≈ .
h
Then, (3.5) can be written as
f 0 (x0 + 2h) − 2f (x0 + h) + f (x0 )

f 00 (x0 ) ≈ . (3.6)
h2
In general, for forward difference,
f (m−1) (x0 + h) − f (m−1) (x0 )

f (m) (x0 ) ≈ . (3.7)
h
Corresponding formulations can be also obtained for backward and central difference.
Example. Evaluate the numerical derivative of f (x) = x2 with respect to x in x0 = 10 using forward
difference. Using (3.2), this is
(10.1)2 −(10)2
f 0 (10) ≈ 0.1
≈ 20.1
which is close to the analytical solution f 0 (10) = 2(10) = 20.
3.2 Numeric evaluation of gradient

Using Newton’s forward difference formula, the derivative of a function f (x) with respect to xi evaluated
e
in x0 can be approximated by
e
∂f (x0 ) f (x01 , x02 , . . . , x0i + h, . . . , x0n ) − f (x0 )
e ≈ e , (3.8)
∂xi h
where h is the difference parameter or perturbation. In (3.8) each variable is perturbed one at a time. As a
thumb rule,
h = max{10−6 , 0.01x0i }.
In this case, the backward difference formula is
∂f (x0 ) f (x0 ) − f (x01 , x02 , . . . , x0i − h, . . . , x0n )

e ≈ e , (3.9)
∂xi h
41
and the central difference formula is
∂f (x0 ) f (x01 , x02 , . . . , x0i + h/2, . . . , x0n ) − f (x01 , x02 , . . . , x0i − h/2, . . . , x0n )
e ≈ . (3.10)
∂xi h
Example. Numerically evaluate the gradient of f (x) = x21 − x1 x2 at xT0 = (10, 20) using forward differ-
e e
ence. From (3.8)
∂f (x0 )
∂xe1 ≈ f (10.1,20)−f
0.1
(10,20)
(10.1)2 −(10.1)(20)−(10)2 +(10)(20)

≈ 0.1
≈ 0.1
∂f (x0 ) f (10,20.2)−f (10,20)
∂xe2 ≈ 0.2
(10)2 −(10)(20.2)−(10)2 +(10)(20)
≈ 0.2
≈ −10.0
Then, !
0.1
∇f (x0 ) ≈
e −10.0
which is close to the analytical solution (0, −10)T .
3.3 Numeric evaluation of Hessian

Using forward difference, the Hessian can be expressed as
∂ 2 f (x0 ) f (. . . , x0i + hi , . . . , x0j + hj , . . . ) − f (. . . , x0j + hj , . . . ) f (. . . , x0i + hi , . . . ) − f (x0 )

1
e ≈ −
∂xi ∂xj hj hi hi
e
(3.11)
Example. Numerically evaluate the Hessian of f (x) = x21 − x1 x2 at xT0 = (10, 20) using forward differ-
e e
ence. From (3.11)
∂ 2 f (x0 )

f (10.2,20)−f (10.1,20)
∂x21
e ≈ 1
0.1 0.1 − f (10.1,20)−f
0.1
(10,20)
(10.2)2 −(10.2)(20)−(10.1)2 +(10.1)(20) 2 2 +(10)(20)

≈ (0.1)2
− (10.1) −(10.1)(20)−(10)
(0.1)2
≈ −1.0 + 3.0 = 2.0
∂ 2 f (x0 )

f (10,20.4)−f (10,20.2)
∂x22
e ≈ 1
0.2 0.2 − f (10,20.2)−f
0.2
(10,20)
(10)2 −(10)(20.4)−(10)2 +(10)(20.2) 2 2 +(10)(20)

≈ (0.2)2
− (10) −(10)(20.2)−(10)
(0.2)2
≈ −50.0 + 50.0 = 0.0
42
∂ 2 f (x0 )

f (10.1,20.2)−f (10,20.2)
∂x1 ∂x e
2
≈ 1
0.2 0.1 − f (10.1,20)−f
0.1
(10,20)
(10.1)2 −(10.1)(20.2)−(10)2 +(10)(20.2) 2 2 +(10)(20)

≈ (0.2)(0.1) − (10.1) −(10.1)(20)−(10)
(0.2)(0.1)
≈ = −0.5 − 0.5 = −1.0
∂ 2 f (x0 ) ∂ 2 f (x0 )
One can show that ∂x1 ∂x e
2
= ∂x2 ∂x e .
1
Then,
!
2 2.0 −1.0
∇ f (x0 ) ≈
e −1.0 0.0
which matches the analytical solution.
3.4 Matlab programs

3.4.1 Derivative
Numerically obtain the first and second derivatives of f (x) = sin(x) and evaluate it in −2π ≤ x ≤ 2π.
Using M ATLAB, a function to determine the first and second
e derivative can be written as
function [f,df,ddf]=ddfun(fun,x)
h=max(10ˆ(-6),10ˆ(-3)*x);
f=feval(fun,x);
df=Df(fun,x,h);
ddf=(Df(fun,x+h,h)-Df(fun,x,h))/h;
end
function [df]=Df(fun,x,h)
df=(feval(fun,(x+h))-feval(fun,x))/h;
end
This function returns the value of the function, the value of the first derivative, and the value of the second
derivative. Now, in the work space
>> x=-2*pi:0.01:2*pi;
>> [f,df,ddf]=ddfun(@(x) sin(x),x);
>> plot(x,f,’b’); hold on
>> plot(x,df,’r’)
>> plot(x,ddf,’g’)
The plots will show the function in blue, the first derivative in red, and the second derivative in green.
3.4.2 Gradient
The gradient of f (x) = x21 − 2x2 evaluate in xT0 = (1, 1) can be obtained analytically as
e e
!
2x1
∇f (x) =
e −2
43
so ∇T f (x0 ) = (2, −2). Numerically, a M ATLAB function to approximate the solution using central differ-
ence could
e be written as
function df=gfun(fun,x)
% central difference
for i=1:length(x)
h=max(0.001,0.01*x(i));
xp=x; xp(i)=xp(i)+h/2;
xn=x; xn(i)=xn(i)-h/2;
df(i)=(feval(fun,xp)-feval(fun,xn))/h;
end
end
function fx=myfun(x)
fx=x(1)ˆ2-2*x(2);
end
>> df=gfun(’myfun’,[1,1])
df = 2.0000 -2.0000
Using M ATLAB the gradient can be determined as

>> h=0.5;
>> [x1,x2]=meshgrid(-1:h:1);
>> f=x1.ˆ2-2*x2;
>> [fx1,fx2]=gradient(f,h)
fx1 =
-1.5000 -1.0000 0 1.0000 1.5000
-1.5000 -1.0000 0 1.0000 1.5000
-1.5000 -1.0000 0 1.0000 1.5000
-1.5000 -1.0000 0 1.0000 1.5000
-1.5000 -1.0000 0 1.0000 1.5000
fx2 =
-2 -2 -2 -2 -2
-2 -2 -2 -2 -2
-2 -2 -2 -2 -2
-2 -2 -2 -2 -2
-2 -2 -2 -2 -2
>> contour(x1,x2,f)
>> hold on
>> quiver(x1,x2,fx1,fx2)
Smaller values of h give a better approximation. The command contour is used to represent the contour
plots of the function in the design space.
3.4.3 Hessian
Dear student, I will need your help to put something in this section. Thanks.
44
Exercises
1. Develop a M ATLAB function that returns a prescribed Lp norm, from L1 to L∞ . Hint: Use the
following syntax
normx=myfun(x,p)
2. Develop a M ATLAB function to determine if two input vectors are LI. Hint: Use the following syntax
lout=myfun(x1,x2)
3. Develop a function in M ATLAB to determine the gradient of a function f (x1 , x2 ) using forward dif-
ference, backward difference, and central difference derivatives. Compare your results with the ones
obtained by the exact (analytical) solution for f (x) = 2x1 x2 + sin(x1 x2 ). Use 10 discrete points in
the interval 0 ≤ xi ≤ π/2. Hint: Use the following e syntax
function gout=gfun(ffun,x)
where
function fx=ffun(x)
fx=2*x(1)*x(2)+sin(x(1)*x(2))
4. Develop a function in M ATLAB to evaluate the Hessian matrix of a function f (x1 , x2 ).
5. Develop a function in M ATLAB to evaluate the n first derivatives of a function f (x).
45
Part II
Single-variable optimization
46
Chapter 4
Analytical elements
4.1 Problem formulation

In single-variable optimization, the problem is to find x∗ ∈ R that minimizes f : R → R. In general, the
domain of f is a connected interval of R. The optimization problem can be stated as
min f (x)
x (4.1)
s.t. xL ≤ x ≤ xU ,
where xL and xU define the interval that contains x∗ .
Example. Design a minimum-cost cylindrical refrigeration tank of volume 50 m3 . The circular ends cost
$10 per m2 . The cylindrical wall costs $6 per m2 . The cost to refrigerate is $80 per m2 over the useful life.
The total cost f is given by
πx2 πx2

f (x, L) = (10)(2) + (6)(πxL) + (80) 2 + πxL
4 4
= 45πx2 + 86πxL
where x is the diameter of the tank and L is the length. Since the volume is a constant, then
(50)(4) 200
L= 2
= .
πx πx2
Finally, the objective function to be minimized is
17200
f (x) = 45πx2 + ,
x
where x > 0. The minimum cost of f (x∗ ) = 6560 is obtained for x∗ = 3.9329. Using M ATLAB, this can
be solved by
47
>> [xopt,fopt] = fminbnd(@(x) 45*pi*xˆ2+17200/x,0,1000)
xopt = 3.9329
fopt = 6.5601e+03
Using M ATHEMATICA, the problem can be solved by
In[01]:=NMinimize[{45*\[Pi]*xˆ2 + 17200/x, x > 0}, x]

Out[01]={6560.06, {x -> 3.93289}}
4.2 Classification of optimal points

Let f : R → R and x∗ an extreme point of f , then f (x∗ ) corresponds to the extreme value of f . Two
types of extreme points can be defined: minimum point (or minimizer) and maximum point (or maximizer).
These points have corresponding minimum and maximum values of f . Extreme points can be classified as
weak, strict, local, and global. The following classification applies for minimum points but it can be easily
adjusted to maximum points:
• x∗ is a weak local minimum point if f (x∗ ) ≤ f (x ± h) for all sufficiently small values of h.
• x∗ is a strict or strong local minimum point if f (x∗ ) < f (x ± h) for all sufficiently small values of h.
• x∗ is a weak or non-unique global minimum point if f (x∗ ) ≤ f (x) for all x in the domain of f .
• x∗ is a strict or unique global minimum point if f (x∗ ) < f (x) for all x in the domain of f .
4.3 Optimality conditions

4.3.1 Minimum-value theorem
Let f : R → R be continuous in the closed interval [xL , xU ]. Then there is at least one point x∗ ∈ [xL , xU ]
in which f has a minimum value. That is
f (x∗ ) ≤ f (x) for all x ∈ [xL , xU ].
The following example shows the importance of the closed interval [xL , xU ] in Rolle’s theorem.
Example. Let f (x) = −1/x defined in the interval Ω = {x ∈ R : x > 0}. This function is continuous
through Ω but there is no point in which f takes a minimum value.
48
4.3.2 First order necessary condition
Let f ∈ C 0 in the closed interval [xL , xU ] and f ∈ C 1 in the open interval (xL , xU ). Then the necessary
condition for x∗ ∈ (xL , xU ) to be a local minimum of f is given by
df (x∗ )
= 0. (4.2)
dx
This condition implies that the slope of f at x∗ is zero. The points that satisfy this condition are referred to
as stationary points. A stationary point can be a minimum, maximum, or inflection point. If the minimum
point is one of the bounds xL or xU , then (4.2) does not have to be satisfied.
The necessary condition in (4.2) can be demonstrated using Taylor series expansion about x∗ , this is
f (x) = f (x∗ ) + f 0 (x∗ )(x − x∗ ) + R1 (x).
Evaluating at x∗ ± h and disregarding the residual term,
f (x + h) = f (x∗ ) + f 0 (x∗ )h
and
f (x − h) = f (x∗ ) − f 0 (x∗ )h.
If x∗ is a local minimum, then f (x∗ ) ≤ f (x∗ ± h); therefore,
f (x∗ + h) − f (x∗ ) ≥ 0 ⇒ f 0 (x∗ ) ≥ 0
and
f (x∗ − h) − f (x∗ ) ≥ 0 ⇒ f 0 (x∗ ) ≤ 0.
These conditions are satisfied only when f 0 (x∗ ) = 0.
Example. Determine the stationary points of f (x) = x3 + 3x2 − 4.

The derivative of f (x) is
f 0 (x) = 3x2 + 6x.
Solving for f 0 (x) = 0 we obtain that

3x(x + 2) = 0.
Therefore, the stationary points are x∗1 = 0 and x∗2 = −2.
49
4.3.3 Second order sufficient conditions
Let f ∈ C 2 in the open interval (xL , xU ). Then the sufficient conditions for x∗ ∈ (xL , xU ) to be a strict
local minimum of f are given by (4.2) and
d2 f (x∗ )
> 0. (4.3)
dx2
This condition can be demonstrated using Taylor series expansion about x∗ , this is
1
f (x) = f (x∗ ) + f 0 (x∗ )(x − x∗ ) + f 00 (x∗ )(x − x∗ )2 + R2 (x).
2
Evaluating at x∗ ± h and disregarding the residual term,
1
f (x∗ ± h) = f (x∗ ) ± f 0 (x∗ )h + f 00 (x∗ )h2 .
2
Since (4.2) is satisfied, then

1
f (x∗ ± h) = f (x∗ ) + f 00 (x∗ )h2 .
2
If x∗ is a strict local minimum, then f (x∗ ) < f (x∗ ± h); thereofore,
f (x∗ ± h) − f (x∗ ) > 0,
which leads to
1 00 ∗ 2
f (x )h > 0
2
or simply, f 00 (x∗ ) > 0. If f 00 (x∗ ) = 0 no conclusion can be made about the type of stationary point.
4.3.4 Higher order conditions

If f ∈ C 3 and f 0 (x∗ ) = 0 and f 00 (x∗ ) = 0, then
1 (3) ∗ 3
f (x∗ ± h) = f (x∗ ) ± 0 + 0 ± f (x )h .
3!
If f (3) (x∗ ) 6= 0, then x∗ corresponds to a stationary or inflection point. If x∗ is local minimum, then
f (3) (x∗ ) = 0.
If f ∈ C 4 , then
1 (4) ∗ 4
f (x∗ ± h) = f (x∗ ) ± 0 + 0 ± 0 + f (x )h ,
4!
where the condition for x∗ to be a strict local minimum is satisfied if f (4) (x∗ ) > 0. It can be observed
that the sufficient optimality condition is satisfied for f ∈ C k+1 when f (k) (x∗ ) = 0 for k odd (necessary
condition) and f (k+1) (x∗ ) > 0. When f (k) (x∗ ) 6= 0 for k odd and k > 1, then x∗ corresponds to an
50
inflection point. In a weak maximum or minimum all derivatives of f vanish to zero.
Example. Determine if the stationary points of f (x) = x3 + 3x2 − 4 are minima, maxima, or inflection
points.
The stationary points of the functions are x∗1 = 0 and x∗2 = −2. The second derivative of f is
f 00 (x) = 6x + 6.
Evaluating the stationary point we see that f 00 (x∗1 ) = 6 > 0 and f 00 (x∗2 ) = −6 < 0. In conclusion, x∗1 is a
strict local minimum and x∗2 a strict local maximizer.
4.4 Convexity
4.4.1 Definition
A function f defined in a convex interval Ω ⊂ R is called convex function if for all x1 , x2 ∈ Ω the following
condition is satisfied:
f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 ), for 0 ≤ α ≤ 1. (4.4)
Geometrically that means that the graph of the function f (x) lies below the straight line joining any two
points of the curve, f (x1 ) and f (x2 ). The function f is strictly convex if
f (αx1 + (1 − α)x2 ) < αf (x1 ) + (1 − α)f (x2 ), for 0 < α < 1. (4.5)
A function f is said to be concave if −f is convex. Some convex functions in R are:
• f (x) = ax + b in R, for all a, b ∈ R.
• f (x) = exp(ax) in R, for all a ∈ R.
• f (x) = xα in R+ , for all α ≥ 1 or α ≤ 0.
• f (x) = |x|p in R, for p ≥ 1.
• f (x) = x log x in R+ .
Some concave functions in R are:
• f (x) = ax + b in R, for all a, b ∈ R.
• f (x) = xα in R+ , for 0 ≤ α ≤ 1.
• f (x) = log x in R+ .
51
4.4.2 Properties
1. If f ∈ C 1 then f is convex over a convex set Ω if and only if ∀x1 , x2 ∈ Ω,
f (x2 ) ≥ f (x1 ) + f 0 (x1 )(x2 − x1 ). (4.6)
This means that the function evaluated at x2 lies above the linear approximation of the function about
x1 evaluated in x2 .
2. If f ∈ C 2 then f is convex over a convex set S if and only if ∀x ∈ Ω,
f 00 (x) ≥ 0. (4.7)
3. If x∗ is a local minimum for a convex function f on a convex set Ω, then it is also a global minimum.
Replacing x1 = x∗ in item 1, you can see that f (x2 ) ≥ f (x∗ ), ∀x2 ∈ S which shows that the local
minimum is a global minimum. Even more, a point that satisfies the necessary f 0 (x∗ ) = 0 is a global
minimum.
4.5 Unimodality
A function f is said to be unimodal in an interval Ω ⊂ R if it exists one and only one valley point x∗ ,
referred to as the mode, and for all x1 , x2 ∈ Ω the following conditions are satisfied:
• if x2 < x∗ then f (x1 ) > f (x2 ), and
• if x1 > x∗ then f (x1 ) < f (x2 ).
In other words, f strictly decreases for all x < x∗ and strictly increases for all x > x∗ in Ω. By
extension, a multimodal function is the one that contains more than one mode or valley.
Observe that a unimodal function is not always convex, and a convex function is not always unimodal.
However, a strictly convex function that contains the minimum point in the open interval Ω is also unimodal.
Example. The function f (x) = |x|1/2 defined in R is unimodal with a single valley at x∗ = 0. This
function is not convex.
In this case, since f ∈ C 1 for all x > 0 and for all x < 0, the unimodality can be proven as f 0 (x) > 0
for all x > 0 and f 0 (x) < 0 for all x < 0. Furthermore, since f ∈ C 2 for all x > 0 and all x < 0, the
concavity can be proven as f 00 (x) < 0 for all x > 0 and all x < 0.
Exercises
1
1. Show that the strict global maximizer of f (x) = x x is x∗ = e.
52
2. Show that the strict global maximizer of f (x) = xx is x∗ = 1e .
3. Prove that a function f in a convex set Ω is a convex function if and only if f 00 (x) ≥ 0 for all x ∈ Ω.
4. Prove that in a convex function defined in a convex set a local minimum is a global minimum. Is it a
strict global minimum?
5. Consider the function f (x) = x2 in the closed interval [3, 5]. Is this function convex? Is it unimodal?
53
Chapter 5
Basic numerical methods
Numerical methods to locate the minimum point of a function of a single variable comprise two stages:
bracketing and interval reduction. Bracketing consists on establishing a unimodal interval that contains the
minimum point. In general, three points are required to establish the unimodality of the function. These
three points form what is referred to as a three-point pattern.
In the second stage, the interval is reduced so the minimum is located with a desired tolerance. Meth-
ods for interval reduction define intervals around the minimum in which the function remains unimodal.
These methods may use Fibonacci’s series, Golden ratio, or polynomial approximations. The most com-
mon polynomial methods make use of quadratic approximations. Among these ones, the two better known
approaches are Brent’s and Powell’s method.
A different approach to single-variable optimization is to find the point in which the function’s derivative
vanishes to zero. These methods include Newton’s, secant, and bisection methods.
5.1 Bracketing a three-point pattern

5.1.1 Description
The first step in the process of locating a minimum is to bracket it in an interval. This interval must contain
three points, x1 , x2 , and x3 such that for x1 < x2 < x3 , f (x1 ) ≥ f (x2 ) < f (x3 ) or f (x1 ) > f (x2 ) ≤
f (x3 ). This is referred to as a three-point pattern. To find it, let us start with two points, x1 and x2 = x1 +h,
where h is the step size. Then, the third point is located such that x3 = x2 +γh, where γ ≥ 1 is an expansion
parameter. γ = 1 corresponds to uniform spacing. A common practice is to choose γ = 2 or γ = φ, where
φ = 1.6180339887498948482045 . . . , also known as the golden ratio.
5.1.2 Algorithm
The bracketing step algorithm is as follows:
Step 1. Select x1 and x2 = x1 + h.
54
Step 2. Evaluate f1 = f (x1 ) and f2 = f (x2 ).
Step 3. If f2 ≤ f1 , go to Step 5.
Step 4. Else, interchange f1 ↔ f2 , x1 ↔ x2 , and change the search direction h = −h.
Step 5. Increase the step size, h = γh, find the third point, x3 = x2 + h and obtain f3 = f (x3 ).
Step 6. If f3 > f2 go to Step 8.
Step 7. Else, make f1 = f2 , f2 = f3 , x1 = x2 , x2 = x3 and go to Step 5.
Step 8. The points x1 , x2 y x3 satisfy the three-point pattern condition, f1 ≥ f2 < f3 or f1 > f2 ≤ f3 .
The second stage is to reduce the interval to locate the minimum. Classic interval reduction methods are
Fibonacci’s method, Golden section method, and polynomial methods.
Example. Bracket a three-point pattern for f (x) = |x − 5| starting at x = 10 and using h = 0.5 and γ = φ.
• Initially, x1 = 10.0, x2 = 10.5, f1 = 5.0 and f2 = 5.5.
• Since f1 > f2 , then f1 = 5.5, f2 = 5.0, x1 = 10.5, x2 = 10.0, and h = −0.5.
• In the first iteration, h = φh = −0.8090, x3 = 9.1910, and f3 = 4.1910.
• In the second iteration, f1 = 5.0, f2 = 4.1910, x1 = 10.0, x2 = 9.1910, h = −1.3090, x3 = 7.8820,

and f3 = 2.8820.
• In the third iteration, f1 = 4.1910, f2 = 2.8820, x1 = 9.1910, x2 = 7.8820, h = −2.1179,

x3 = 5.7641, and f3 = 0.7641.
• In the fourth iteration, f1 = 2.8820, f2 = 0.7641, x1 = 7.8820, x2 = 5.7641, h = −3.3374,

x3 = 2.3374, and f3 = 2.6626.
• Since f3 > f2 then the points x1 = 7.8820, x2 = 5.7641, and x3 = 2.3374 satisfy the three-point
pattern condition.
This iterative process can be conveniently presented as in Table 5.1.
5.2 Fibonacci’s method

5.2.1 Description
Leonardo Pisano Bogollo (1170–1250), better known by his nickname Fibonacci, developed the Fibonacci
sequence in the study of rabbit reproductions. The problem to be solved was: A certain man put a pair
55
Table 5.1: Bracketing f (x) = |x − 5| from x = 10 and using h = 0.5 and γ = φ.
k h x f
0 10.5000 5.5000
1 −0.5000 10.0000 5.0000
2 −0.8090 9.1910 4.1910
3 −1.3090 7.8820 2.8820
4 −2.1179 5.7641 0.7641
5 −3.3374 2.3374 2.6626
of rabbits in a place surrounded on all sides by a wall. How many pairs of rabbits can be produced from
that pair in a year if it is supposed that every month each pair begets a new pair which from the second
month on becomes productive? Every month a new pair of rabbit offspring matures. Each mature couple
delivers two offspring in a month and the couple remains mature. Assuming that the rabbits never die and
do not lose their fertility, and the offspring are always couples (male and female), one obtains the Fibonacci
sequence:1, 1, 2, 3, 5, 8, 13, 21, . . . . In this sequence the numbers represent the number of couples at the end
of every month. The Fibonacci numbers are denoted as F0 , F1 , F2 , . . . , Fk , . . . , and the sequence can be
generated using the following formula
Fk = Fk−1 + Fk−2 , k ≥ 2, (5.1)
where F0 = 1 and F1 = 1.
Let us assume an interval of length I1 defined by three points, x1 , x2 and x4 that satisfy the three-point
pattern, f1 ≥ f2 < f4 . Now, introduce a fourth point, x3 , such that the distance between x1 and x3 is the
same as the distance between x2 and x4 . In other words, x2 and x3 are symmetric with respect to the center
of the interval [x1 , x4 ]. Evaluate f3 and compare with f2 . If f2 < f3 then x1 , x2 and x3 define the new
interval. If f2 > f3 then the new interval will be defined by x2 , x3 and x4 . Now, if f2 = f3 you define the
interval in either way. The length of the new interval will be reduced to I2 .
Adding a fourth point to the new interval and repeating the whole procedure, one would obtain an
interval of length In after n − 1 iterations. In the last iteration, the two possible intervals will overlap by a
small distance ε. This distance defines the required precision in the location of the minimum. The interval
56
relations followed from this procedure can be expressed as
I1 = I2 + I3
I2 = I3 + I4
..
.
Ik = Ik+1 + Ik+2 (5.2)
..
.
In−2 = In−1 + In
In−1 = 2In − ε.
Disregarding ε and replacing from bottom to top, one can see that
In−1 = 2In
In−2 = 3In
In−3 = 5In
(5.3)
In−4 = 8In
..
.
I1 = Fn In .
where the coefficients are the Fibonacci numbers. Using (5.1), one obtains
In−k = Fk+1 In , k = 1, . . . , n − 1, (5.4)
this is,
I1 = Fn In
I2 = Fn−1 In
.. (5.5)
.
In−1 = F2 In .
From (5.5) it can be observed that the length In of the final interval after n − 1 iterations is
I1
In = . (5.6)
Fn
The relation between the lengths I1 and I2 is given by
Fn−1
I2 = I1 . (5.7)
Fn
57
From (5.2) one observes that
I3 = I1 − I2
I4 = I2 − I3
.. (5.8)
.
In = In−2 − In−1 .
Example. Consider the interval [0, 10]. Determine n such that In ≤ 0.1.
From (5.6) one obtains
1 0.1
< ;
Fn 10
this is, Fn > 100. Using Fibonacci sequence, F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 = 5, F5 = 8, F6 = 13,
F7 = 21, F8 = 34, F9 = 55, F10 = 89, F11 = 144, one obtains that n = 11. In other words, 10 iterations
(k = 10) will be required to obtain the required final interval.
Example. Reduce the initial interval I1 defined in [0, 1] with the Fibonacci method using n = 5.
F4 5
• The first interval reduction (k = 1) is I2 = F5 I1 = 8 in {0, 38 , 58 , 1}.
• For the second interval reduction (k = 2) let us pick the interval on the right, {0, 38 , 85 }. Now, I3 =
F3 3
F4 I2 = I1 − I2 = 8 defines {0, 82 , 38 , 58 }.
F2
• For the third interval reduction (k = 3), let us pick {0, 28 , 38 }. Now, I4 = F3 I3 = I2 − I3 = 2
8 defines
1 2 3
{0, 8 , 8 , 8 }.
F1
• For the fourth and final interval reduction (k = 4), let us pick {0, 81 , 28 }. Now, I5 = F2 I4 = I3 − I4 =
1 1 1 2
8 defines {0, 8 , 8 , 8 }. Since the two middle points are repeated, then one of them is shifted using a
small perturbation parameter ε, for example
ε = max{10−6 , 0.001I1 }.
Then, the final interval could be [0, 81 + ε] or [ 18 , 28 ].
For a large number of iterations, it might be useful to use Binet’s formula (derived in 1843) for Fibonacci
numbers,
√ !k+1 √ !k+1
 
1  1+ 5 1− 5
Fk = √ − . (5.9)
5 2 2
Example. A M ATLAB function that returns the i-th Fibonacci number can be written as
function Fk=fib(k)
Fk=floor((1/sqrt(5))*(((1+sqrt(5))/2)ˆ(k+1)-((1-sqrt(5))/2)ˆ(k+1)));
In the workspace it is possible to identify the number of iterations required to locate a minimum with a
tolerance of ε = 10−6 ,
58
>> f=1; k=0;
>> while f>10ˆ(-6)
f=1/fib(k);
k=k+1;
end
>> f = 7.4279e-07
>> k = 31
5.2.2 Algorithm
Fibonacci algorithm can be written like that:
Step 1. Specify the interval [x1 , x4 ], where I1 = d(x1 , x4 ).
Step 2. Specify the size of the final interval In . If In is given, find the smallest number n such that I1 /Fn ≤
In .
Step 3. Set k = 1. Determine I2 = (Fn−1 /Fn )I1 and α = I2 /I1 . Introduce the point x2 , such that
x2 = αx1 + (1 − α)x4 , and evaluate f2 .
Step 4. Introduce the point x3 , such that x3 = αx4 + (1 − α)x1 , and evaluate f3 .
Step 5. If f2 < f3 , set (x4 , f4 ) = (x1 , f1 ) and (x1 , f1 ) = (x3 , f3 ). Go to Step 7.
Step 6. Set (x1 , f1 ) = (x2 , f2 ) and (x2 , f2 ) = (x3 , f3 ).
Step 7. Set k = k + 1. If k = n, go to Step 9.
Step 8. Determine Ik+1 = Ik−1 − Ik . Update α = Ik+1 /Ik . Go to Step 4.
Step 9. Since (x2 , f2 ) = (x3 , f3 ), set α = α + ε, x3 = αx4 + (1 − α)x1 , and evaluate f3 . If f2 < f3 the
final is [x1 , x3 ], otherwise the final interval is [x2 , x4 ]. The shifting parameter ε is a fraction of the
final interval, e.g., ε = 0.1In .
Example. Minimize the function f (x) = (x − 7)2 in the interval [0, 10] by the Fibonacci method using
n = 5.
The initial interval is defined for I1 = 10 between x1 = 0 and x4 = 10 with corresponding f1 = 49
I1 10
and f4 = 9. The final interval corresponds to I5 = F5 = 8 = 1.25. Let us review the interval reduction
procedure:
F4 I2 6.25
• For k = 1, I2 = F5 I1 = 6.25 and α = I1 = 10 = 0.6250. Then x2 = αx1 + (1 − α)x4 = 3.75
with corresponding f2 = 10.5625. Now, let us introduce x3 = αx4 + (1 − α)x1 = 6.25 with
correspondingf3 = 0.5625. Since f2 > f3 then the new interval will be defined by (x1 , f1 ) =
(3.75, 10.5625), (x2 , f2 ) = (6.25, 0.5625), and (x4 , f4 ) = (10, 9).
59
I3 3.75
• For k = 2, I3 = I1 − I2 = 3.75 and α = I2 = 6.25 = 0.6. Then, (x3 , f3 ) = (7.5, 0.25). Since
f3 < f2 then (x1 , f1 ) = (6.25, 0.5625), (x2 , f2 ) = (7.5, 0.25) and (x4 , f4 ) = (10, 9).
I4
• For k = 3, I4 = I2 − I3 = 2.5 and α = I3 = 0.6667. Then (x3 , f3 ) = (8.75, 3.0625). Since f2 < f3
then (x4 , f4 ) = (6.25, 0.5625), (x1 , f1 ) = (8.75, 3.0625), and (x2 , f2 ) = (7.5, 0.25).
I5
• For k = 4, I5 = I3 − I4 = 1.25 and α = I4 = 0.5. Then (x3 , f3 ) = (7.5, 0.25). Observe that
(x2 , f2 ) = (x3 , f3 ).
• For k = 5, and ε = 0.01, then α = 0.51, x3 = 7.4750, and f3 = 0.2256. Since f2 > f3 , the interval
that contains the minimum point is [6.25, 7.50].
This iterative process can be conveniently presented as in Table 5.2.
Table 5.2: Minimization of f (x) = (x − 7)2 using the Fibonacci method for n = 5.
k Ik α x1 , f1 x2 , f2 x3 , f3 x4 , f4
1 10.0000 0.6250 0.0000, 49.0000 3.7500, 10.5625 6.2500, 0.5625 10.0000, 9.0000
2 6.2500 0.6000 3.7500, 10.5625 6.2500, 0.5625 7.5000, 0.2500 10.0000, 9.0000
3 3.7500 0.6667 6.2500, 0.5625 7.5000, 0.2500 8.7500, 3.0625 10.0000, 9.0000
4 2.5000 0.5000 8.7500, 3.0625 7.5000, 0.2500 7.5000, 0.2500 6.2500, 0.5625
5 1.2500 0.5100 8.7500, 3.0625 7.5000, 0.2500 7.4750, 0.2256 6.2500, 0.5625
5.3 Golden section method

5.3.1 Description
The term “golden section” seems to first have been used by Martin Ohm in the 1835 2nd edition of his
textbook Die Reine Elementar-Mathematik (Livio, 2002). However, this concept has been used by the
Egyptians in the design of pyramids, by Greek architects, and by Renaissance artists. The golden section is
also known as the golden ratio, golden mean, Divine Proportion, or by its Latin name sectio aurea.
The golden section can be defined by a special geometric construction. Consider an interval I1 that is
divided into two intervals I2 and I3 , where I1 = I2 + I3 and I2 > I3 . Then, the ratio of I1 to I2 is the same
as the ratio of I2 to I3 which is also equal to the golden ratio. This is
I1 I2
= = φ,
I2 I3
where φ is the golden ratio. Substituting I1 = φI2 and I2 = φI3 into I1 = I2 + I3 yields
φ2 − φ − 1 = 0. (5.10)
60
The positive root of this polynomial is
√
1+ 5
φ= = 1.6180339887498948482045 . . . , (5.11)
2
1
which is the irrational number that defines the golden ratio. Note that φ = φ − 1. Customary, the inverse of
φ is denoted as Φ,
1
Φ= = 0.6180339887498948482045 . . .
φ
In the interval reduction strategy using this approach the ratio between two consecutive intervals is
always constant and, therefore, equal to the golden ratio. This relation can described by
I1 = I2 + I3
..
.
Ii = Ii+1 + Ii+2 (5.12)
..
.
In−2 = In−1 + In
and
I1 I2 Ik In−1
= = ··· = = ··· = = φ. (5.13)
I2 I3 Ik+1 In
Observe that
I2 = ΦI1
I3 = Φ 2 I1
I4 = Φ 3 I1 (5.14)
..
.
In = Φn−1 I1 ,
which describes the size of the final interval In . For a given initial and final interval sizes, I1 and In , the
number of interval reductions n can be expressed as

ln In − ln I1
n = int 1.5 , (5.15)
ln(1 − φ)
where int() defines the integer part.
5.3.2 Algorithm
The golden section algorithm can be written like that:
Step 1. Specify the interval [x1 , x4 ], where I1 = d(x1 , x4 ).
Step 2. Specify the accuracy ε, or number of interval reductions n.
61
Step 3. Set k = 1. Introduce the point x2 , such that x2 = Φx1 + (1 − Φ)x4 , and evaluate f2 .
Step 4. Introduce the point x3 , such that x3 = Φx4 + (1 − Φ)x1 , and evaluate f3 .
Step 5. If f2 < f3 , set (x4 , f4 ) = (x1 , f1 ) and (x1 , f1 ) = (x3 , f3 ). Go to Step 7.
Step 6. Set (x1 , f1 ) = (x2 , f2 ) and (x2 , f2 ) = (x3 , f3 ).
Step 7. Update k = k + 1. If k = n or In < ε, go to Step 9.
Step 8. Update Ik+1 = ΦIk . Go to Step 4.
Step 9. End. The final interval is [x1 , x4 ].
Example. Minimize the function f (x) = 2−4x+exp(x) in the interval [0.5, 2.6180] by the golden section
method. Find the minimum point within a tolerance of ε = 0.001.
Using (5.15), the number of interval reductions to achieve In ≤ 0.001 from I1 = 2.1180 is n = 17. The
results are presented in Table 5.3. For illustration, let us show the first three iterations.
• For k = 1, I1 = 2.1180, (x1 , f1 ) = (0.5000, 1.6487) and (x4 , f4 ) = (2.6180, 5.2366). Introducing
x2 = Φx1 + (1 − Φ)x4 , then (x2 , f2 ) = (1.3090, 0.4665). Introducing x3 = Φx4 + (1 − Φ)x1 ,
then (x3 , f3 ) = (1.8090, 0.8684). Since f2 < f3 , (x4 , f4 ) = (0.5000, 1.6487) and (x1 , f1 ) =
(1.8090, 0.8684).
• For k = 2, I2 = 1.3090, (x3 , f3 ) = (1.0000, 1.6487). Since f2 < f3 , (x4 , f4 ) = (1.8090, 0.8684)
and (x1 , f1 ) = (1.0000, 1.6487).
• For k = 3, I3 = 0.8090, (x3 , f3 ) = (1.5000, 0.4817). Since f2 < f3 , (x4 , f4 ) = (1.0000, 1.6487)
and (x1 , f1 ) = (1.5000, 0.4817).
Table 5.3: Interval reduction using golden section method for In ≤ 0.001, n = 17.
k Ik x1 , f1 x2 , f2 x3 , f3 x4 , f4
1 2.1180 0.5000, 1.6487 1.3090, 0.4665 1.8090, 0.8684 2.6180, 5.2366
2 1.3090 1.8090, 0.8684 1.3090, 0.4665 1.0000, 1.6487 0.5000, 1.6487
3 0.8090 1.0000, 0.7183 1.3090, 0.4665 1.5000, 0.4817 1.8090, 0.8684
.. .. .. .. .. ..
. . . . . .
17 0.0009 1.3860, 0.4582 1.3864, 0.4548 1.3870, 0.4548
The minimum point can be located in the interval [1.3860, 1.3870].
62
Chapter 6
Curve fitting methods
6.1 Powell’s method

6.1.1 Description
In a sufficiently small interval, a smooth function f can be approximated by a quadratic polynomial
fQ (x) = a0 + a1 x + a2 x2 , (6.1)
where a0 , a1 , and a2 are unknown coefficients. Given three points, x1 , x2 , and x3 , one can determine the
three unknown coefficients from
f (x1 ) = a0 + a1 x1 + a2 x21
f (x2 ) = a0 + a1 x2 + a2 x22
f (x3 ) = a0 + a1 x3 + a2 x23 .
In matrix form,     
1 x1 x21 a0 f (x1 )
1 x2 x22  a1  = f (x2 ) .
    
1 x3 x23 a2 f (x3 )
Solving the system of linear equations,

1 f (x3 ) − f (x1 ) f (x2 ) − f (x1 )
a2 = − (6.2)
x3 − x2 x3 − x1 x2 − x1
f (x2 ) − f (x1 )
a1 = − a2 (x1 + x2 ) (6.3)
x2 − x1
a0 = f (x1 ) − a1 x1 − a2 x21 . (6.4)
The minimum x∗Q of the quadratic approximation fQ is a good approximation of the minimum x∗ of
63
0 (x∗ ) = 0 and the sufficient condition
the function f . This minimum satisfies the necessary conditionfQ Q
00 (x∗ ) > 0. This point is defined as
fQ Q
a1
x∗Q = − , (6.5)
2a2
00 (x∗ ) = 2a > 0. Evaluating f (x∗ ) will establish a criterion to select a reduced interval that
where fQ Q 2 Q
satisfies the three-point pattern.
6.1.2 Algorithm
The algorithm to reduce the interval using Powell’s method is the following:
Step 1. Set k = 1. Specify the three-point pattern interval defined by x1 , x2 , and x3 with respective function
values f (x1 ), f (x2 ), and f (x3 ).
Step 2. Determine a1 and a2 using (6.3) and (6.2). Then introduce x∗Q using (6.5) and evaluate f (x∗Q ).
Step 3. If x2 < x∗Q then
(a) If f (x2 ) < f (x∗Q ) then x1 < x∗ < x∗Q and the new interval is defined as x1 = x1 , x2 = x2
and x3 = x∗Q .
(b) If f (x2 ) > f (x∗Q ) then x2 < x∗ < x3 and the new interval is defined as x1 = x2 , x2 = x∗Q
and x3 = x3 .
Step 4. If x2 > x∗Q then
(a) If f (x2 ) < f (x∗Q ) then x∗Q < x∗ < x3 and the new interval is defined as x1 = x∗Q , x2 = x2
and x3 = x3 .
(b) If f (x2 ) > f (x∗Q ) then x1 < x∗ < x2 and the new interval is defined as x1 = x1 , x3 = x2
and x2 = x∗Q .
Step 5. If termination criteria are not satisfied, set k = k + 1 and go to Step 2.
Step 6. End. The final interval is given by [x1 , x3 ].
In this method, useful termination criteria are
|∆x∗Q | ≤ εx |x∗Q |, (6.6)

|∆f (x∗Q )| ≤ εf |f (x∗Q )|, (6.7)
where ∆x∗Q and ∆f (x∗Q ) represent the variations of the minimum and its image in two consecutive itera-
tions.
64
Example. Minimize f (x) = 2 − 4x + exp(x) in the interval [0.5000, 2.6180] by Powell’s method. For
illustration, let us show the first two iterations.
1. For k = 1, the initial points correspond to (x1 , f1 ) = (0.5000, 1.6487), (x3 , f3 ) = (2.6180, 5.2366).
Let us introduce (x2 , f2 ) = (1.3090, 0.4665). The coefficients of the quadratic approximation are,
a2 = 2.4100 and a1 = −5.8210. Therefore, x∗Q = 1.2077 and fQ
∗ = 0.5149. The new interval will
be defined by (x1 , f1 = 1.2077, 0.5149), (x2 , f2 ) = (1.3090, −7.8339), and (x3 = 2.6180, 5.2366)
2. For k = 2, the updated coefficients of the quadratic approximations are a2 = 2.9228 and a1 =
−7.8339. The new solution is x∗Q = 1.3401 and fQ
∗ = 0.4590. The final interval is [1.2077, 1.3401].
For this problem, one can prove that five iterations of Powell’s method give a solution as precise as the
one obtained from 17 iterations of the golden section method.
6.2 Brent’s method∗

6.2.1 Description
The basic idea of Brent’s method is to fit a quadratic polynomial when applicable and to accept the quadratic
minimum. If certain criteria are met. Golden sectioning is carried out otherwise. This method makes use of
five points, x1 , x2 , x3 , x4 , x5 . These points may not be all distinct. Points x1 and x5 bracket the minimum
and x2 is the point with least function value. In each iteration a new point x∗Q is introduced. This point is
updated as x2 in the next iteration. x4 is the point with the second least function value. x3 is the previous
value of x4 .
6.2.2 Algorithm
The algorithm can be written as
Step 1. Specify the three-point pattern interval {x1 , x2 , x5 } with images {f1 , f2 , f5 }, f1 ≥ f2 < f5 or
f1 > f2 ≤ f5 .
Step 2. Initialize x3 = x2 and x4 = x2 .
Step 3. If the points x2 , x3 are x4 all distinct, then go to Step 5.
Step 4. Estimate x∗Q using golden section method of the larger of the two intervals [x1 , x2 ] or [x2 , x3 ]. Go
to Step 7.
Step 5. Try quadratic fit for x2 , x3 and x4 . If the quadratic minimum is likely to fall inside [x1 , x5 ] then
determine the minimum of this approximation,
1 (x2 − x4 )2 (f2 − f3 ) − (x2 − x3 )2 (f2 − f4 )

x∗Q = x2 − . (6.8)
2 (x2 − x4 )(f2 − f3 ) − (x2 − x3 )(f2 − f4 )
65
Step 6. If the point x∗Q is to close to x1 , x2 or x5 then adjust x∗ into the larger of [x2 , x1 ] or [x2 , x5 ] such
that it is away from x2 by a minimum distance ε chosen on the basis of machine tolerance.
Step 7. Evaluate the function at x∗Q .
Step 8. From among x1 , x2 , x3 , x4 , x5 and x∗Q determine determine the new x1 , x2 , x3 , x4 , x5 .
Step 9. If the length of intervals [x1 , x2 ] and [x2 , x5 ] are less than 2ε, then termination is satisfied. Other-
wise, go to 3.
6.3 Newton’s method

Another family of approaches to find the minimum of a function of a single real variable is by using an
initial point and the derivative information of the objective function. The two most common derivative-based
optimization algorithm are Newton’s and secant methods. This chapter also introduces the application of
the bisection method in optimization.
6.3.1 Description
Suppose that we are confronted with the problem of minimizing f . Assuming that f ∈ C 2 , the quadratic
approximation of the function about a given point xk has the form
1
fQ (x) = f (xk ) + f 0 (xk )(x − xk ) + f 00 (xk )(x − xk )2 .
2
Instead of minimizing f we minimize its quadratic approximation fQ . The necessary condition yields
0
fQ (x) = f 0 (xk ) + f 00 (xk )(x − xk ) = 0.
Letting x = xk+1 , we obtain

f 0 (xk )
xk+1 = xk − . (6.9)
f 00 (xk )
The sufficient condition states that
00
fQ (x) = f 00 (xk ) > 0.
If f 00 (xk ) ≤ 0 this algorithm does not converges in a minimum point.
6.3.2 Algorithm
Step 1. Given x1 , set k = 1.
Step 2. Determine f 0 (xk ) and f 00 (xk ).
66
f 0 (xk )
Step 3. Obtain xk+1 = xk − f 00 (xk ) .
Step 4. If termination criteria are not satisfy, set k = k + 1 and go to Step 2.
Step 5. End. The minimum point is xk .
|∆xk | ≤ εx |xk |, (6.10)

|f 0 (xk )| ≤ εg , (6.11)
where ∆xk = xk+1 − xk , and εx and εg are very small numbers.
1 2
Example. Minimize f (x) = 2x − sin(x) by Newton’s method using x1 = 0.5. For illustration, let us
show the first seven iterations.
1. For k = 1, f 0 (x1 ) = −0.3776 and f 00 (x1 ) = 1.4794. The new point is x2 = 0.7552.
2. For k = 2, f 0 (x2 ) = 0.0271 and f 00 (x2 ) = 1.6854. The new point is x3 = 0.7391.
3. For k = 3, f 0 (x3 ) = 0.0001 and f 00 (x3 ) = 1.4794. The new point is x4 = 0.7391.
Observe that it only took three iterations to find the minimum within a reasonable tolerance.
6.3.3 Extension
Newton’s method can also be used as a zero-finder algorithm. Indeed, if we set g(x) = f 0 (x), then we
obtain the formula relative to g(x) = 0,
g(xk )
xk+1 = xk − . (6.12)
g 0 (xk )
Observe that this expression can be also obtained from a linear approximation of g about xk evaluated at
xk+1 . For this reason, this method is also known as Newton’s method of tangents.
Example. Find the root of the following function using Newton’s method from the point x0 = 1.0,
2x
g(x) = − sin x = 0.
3
The derivative of this function is given by
2
g 0 (x) = − cos x.
3
Table 6.1 shows the results of the iterations.
67
Table 6.1: Iterations using Newton’s method, f (x) = 2x/3 − sin x, ε = 0.001
k xk fk fk0
0 1.000 -0.175 0.126
1 2.383 0.900 0.139
2 1.737 0.172 0.832
3 1.530 0.021 0.626
4 1.496 0.000
Newton’s method of tangents may fail if the first approximation to the root is such that the ratio
g(xk )/g 0 (xk ) is not small enough. Thus, an initial approximation to the root is very important.
6.4 Secant method

6.4.1 Description
In the secant method, the second derivative is explicitly approximated using first derivative information as
f 0 (xk ) − f 0 (xk−1 )
f 00 (xk ) ≈ . (6.13)
xk − xk−1
If f 00 (xk ) ≤ 0 this algorithm does not converge in a minimum. Using (6.13) in (6.9) yields
xk − xk−1
xk+1 = xk − f 0 (xk ), (6.14)
f 0 (x 0
k ) − f (xk−1 )
which defines the secant method. Note that this method requires two initial points x0 and x1 . Also observe
that, like Newton’s method, the secant method does not directly involve values of f . Instead, it tries to drive
the derivative f 0 to zero.
6.4.2 Algorithm
Step 1. Given x1 , set k = 1.
Step 2. Determine f 0 (xk ).
Step 3. Obtain xk+1 from (6.14).
Step 4. If termination criteria are not satisfy, set k = k + 1 and go to Step 2.
Step 5. End. The minimum point is xk .
68
|∆xk | ≤ εx |xk |, (6.15)

0
|f (xk )| ≤ εg , (6.16)
1 2
Example. Minimize f (x) = 2x − sin(x) by the secant method using x0 = 0.5 and x1 = 0.75. For
illustration, let us show the first three iterations.
1. For k = 1, f 0 (x0 ) = −0.3776, f 0 (x1 ) = 0.01831, then the new point is
0.75 − 0.5
x2 = 0.75 − (0.01831) = 0.7384.
0.01831 − (−0.3776)
2. For k = 2, f 0 (x2 ) = −0.0011 then the new point is
−0.0011 − 0.01831
x3 = 0.7384 − (−0.0011) = 0.7391.
0.7384 − 0.75
3. For k = 3, f 0 (x3 ) = −2 × 10−6 then the new point is x4 = 0.7391.
Observe that it only took three iterations to find the minimum within a small tolerance.
6.4.3 Extension
The secant method can be used as a zero-finder algorithm to solve for equations of the form g(x) = 0.
Approximating the derivative of this function,
g(xk ) − g(xk−1 )
gk0 ≈ ,
xk − xk−1
and replacing in (6.12), we obtains
xk − xk−1
xk+1 = xk − g(xk ). (6.17)
g(xk ) − g(xk−1 )
Example. Perform two iterations of the secant method to find the root of the function
g(x) = x3 − 12.2x2 + 7.45x + 42 = 0,
starting from x−1 = 13 and x0 = 12. Using (6.17) we obtain x1 = 11.40 and x2 = 11.25.
69
6.5 Bisection method
6.5.1 Description
Assuming that f 0 (x) is available for every x, then the bisection method may be used to locate the root of
f 0 (x) = 0. This method makes use of an interval [x1 , x3 ] such that f 0 (x1 ) and f 0 (x2 ) are opposite in sign.
Then, the algorithm introduces a third point
x2 = 0.5(x1 + x3 )
right in the middle of [x1 , x3 ], and evaluates f 0 (x2 ). If the sign of f 0 (x2 ) is opposite to the sign of f 0 (x1 ),
then the interval is reduced to [x1 , x2 ]. If the sign of f 0 (x2 ) is opposite to the sign of f 0 (x3 ), then the interval
is reduced to [x2 , x3 ]. The process continues until the termination criteria are satisfied.
6.5.2 Algorithm
Step 1. Given x1 and x3 , such that x1 < x3 , determine f 0 (x1 ) and f 0 (x3 ) and verify that f 0 (x1 ) < 0 and
f 0 (x3 ) > 0. Set k = 1.
Step 2. Introduce x2 = 0.5(x1 + x3 ) and evaluate f 0 (x2 ).
• If f 0 (x2 ) < 0, make x1 = x2 and f 0 (x1 ) = f 0 (x2 ).

• If f 0 (x2 ) > 0, make x3 = x2 and f 0 (x3 ) = f 0 (x2 ).
Step 3. If termination criteria are not satisfied, set k = k + 1 and go to Step 2.
Step 4. End. The minimum point is x2 .
|∆xk | ≤ εx |xk |, (6.18)

|f 0 (xk )| ≤ εg , (6.19)
1 2
Example. Minimize f (x) = 2x − sin(x) by the bisection method using x1 = 0.5 and x3 = 5.5. For
illustration, let us show the first five iterations.
1. For k = 1, the initial interval is x1 = 0.5 and x3 = 5.5 with f 0 (x1 ) = −0.3776 < 0, f 0 (x3 ) =
4.7913 > 0. Let us introduce x2 = 3.0 and evaluate f 0 (x2 ) = 4.0000.
2. For k = 2, x1 = 0.5000 and x3 = 3.0000. Then, x2 = 1.7500 with f 0 (x2 ) = 1.9283.
70
5. For k = 5, x1 = 0.5000 and x3 = 0.8125. Then, x2 = 0.6563 with f 0 (x2 ) = −0.1360.
6. For k = 6, x1 = 0.6563 and x3 = 0.8125. Then, x2 = 0.7344 with f 0 (x2 ) = −0.0079.
Observe that the rate of termination of this method is considerably lower than the one of previous
derivative-based methods.
6.5.3 Extension
The bisection method is widely used as a zero-finder algorithm to solve for equations of the form g(x) = 0.
Example. Using the bisection method, determine the root of the following function
g(x) = x2 − 4 = 0,
starting from the interval x1 = 0 and x2 = 10.

Analytically, one can prove that the zeroes of this function are ±2. The results of the bisection method
are shown in Table 6.2.
Table 6.2: Bisection method, g(x) = x2 − 4 = 0.

k x1 g1 x2 g2 ∆x
0 0.0000 -4.0000 10.000 96.000 10.000
1 0.0000 -4.0000 5.0000 21.000 5.0000
2 0.0000 -4.0000 2.5000 2.2500 2.5000
3 1.2500 -2.4375 2.5000 2.2500 1.2500
4 1.8750 -0.4843 2.5000 2.2500 0.6250
..
.
Exercises
1. Prove that the ratio between two Fibonacci numbers can be written as
√ !
1 − si

Fi−1 5−1
= , i = 2, 3, . . . , n
Fi 2 1 − si+1
71
where √
1− 5
s= √
1+ 5
2. A string of length 1 m is used to make a rectangle. Determine the size of the rectangle with maximum
area.
3. A string of length 1 m is used to delimit an area by shaping it as a polygon in which all sides have the
same length. Determine the number of sides of the polygon of maximum area.
4. Two discs, of diameters 1 m and 2 m, are to be placed inside a rectangle. Determine: (i) the size of
the rectangle with minimum perimeter, (ii) the size of the rectangle with minimum area.
5. Three discs, each of diameter 1 m are to be placed inside a rectangle. Determine: (i) the size of the
rectangle with minimum perimeter, (ii) the size of the rectangle with minimum area.
6. An open box is made by cutting out equal squares from the corners, bending the flaps, and welding
the edges of a sheet of metal of size H × L. Determine the box dimension of maximum volume.
7. An open box is made by cutting out equal squares from the corners, bending the flaps, and welding
the edges of a sheet of metal of size 8.500 × 1000 . Every cubic inch of volume of the open box brings in
a profit of $ 0.10. Every square inch of corner waste results in a cost of $ 0.04. Every inch of welding
length costs $ 0.02. Determine the box dimensions of maximum profit.
8. A part is produced on a lathe in a machine shop. The cost of the part includes machining cost, tool-
related cost, and cost of the idle time. The cost for the machining time is inversely proportional to the
cutting speed V m/min. The tool-related costs are proportional to V 3/2 . The cost c in dollars is given
by
240
c(V ) = + 10−4 V 1.5 + 0.45.
V
Determine the cutting speed for minimum cost and the corresponding minimum cost.
9. In a solution of potassium (K) and chlorine (Cl), the equilibrium distance r in nanometers between
two atoms is obtained as the minimum of the total energy E. The total energy is given by the sum of
the energy of attraction and the energy of repulsion. The total energy in electron volts (eV) is given
by
1.44 5.9 × 10−6
E(r) = − +
r r9
Determine the equilibrium spacing and the corresponding energy.
72
Chapter 7
Numerical Analysis
7.1 Convergence
From Binmore (1982), a sequence of real numbers
{xk }∞
k=0 = {x0 , x1 , . . . , x∞ }
is said to converge to the limit x∗ if and only if the following criterion is satisfied: Given any ε > 0, there is
a (natural) number N such that for any number k > N , |xk − x∗ | < ε.
Example. Consider the sequence {xk } where
1
xk = 1 + .
k
Show that this sequence converges to x∗ = 1. This is
1
lim 1 + = 1.
k→∞ k
Evidently this limit problem can be solved using l’Hôpital’s rule; however, let us used the definition of
convergence. From this definition, there should be a number N such that for any k > N ,

1
1+ − 1 < ε.
k
Since,
1 + 1 − 1 = 1 = 1 < ε

k k k
then
1
k> = N.
ε
73
For each ε > 0 one obtains a different N . For example, if ε = 0.1 then N = 10.
7.2 Fixed Point Iteration

Consider a sequence of real numbers {xk }∞
k=0 obtained after the successive application of an iterative
method. Given an initial guess x0 a function g computes computes the successive terms. The sequence
of values is obtained {xk }∞
k=0 is obtained using the iterative rule
xk+1 = g(xk ). (7.1)
The pattern of this sequence is
x0
x1 = g(x0 )
..
.
xm = g(xm−1 )
xm+1 = g(xm )
..
.
If all goes well, the sequence {xk } convergences to the optimum point x∞ = x∗ where
x∗ = g(x∗ ). (7.2)
In this case, x∗ is referred to as a fixed point of the function g and (7.1) is called a fixed point iteration.
A fixed point is a solution of the equation x = g(x). Geometrically, the fixed points of a function g are the
points of intersection of the curve y = g(x) and y = x.
Example. Find the fixed point(s) of

x2
g(x) = 1 + x − .
3
The fixed points of g are the roots of g(x) − x = 0. This is
x2
1+x− −x=0
3
√
or x∗ = ± 3 ≈ ±1.73205.
Example. Consider the minimization of f (x) = 12 x2 − sin(x) by Newton’s method in the interval [−3, 3].
Determine the fixed point of the corresponding fixed point iteration.
74
The fixed point iteration of Newton’s method is
f 0 (x)
g(x) = x − .
f 00 (x)
This is
x − cos(x)
g(x) = x − .
1 + sin(x)
The fixed point of g satisfy that
x − cos(x)
x=x− .
1 + sin(x)
Solving numerically, the solution of this equation is x∗ ≈ 0.739085, which also corresponds to the minimum
point of the function f .
7.3 Contraction mapping theorem

Successive substitutions do not always converge. When small changes in the initial data produce corre-
spondingly small changes in the final result the algorithm is called stable; otherwise it is unstable. An
algorithm that is stable only for certain conditions is called conditionally stable.
From Allen & Isaacson (1998), let Ω ⊂ R. A function g : Ω → R satisfies a Lipschitz condition on Ω if
there is a constant L > 0 such that, for any two points x1 , x2 ∈ Ω
|g(x1 ) − g(x2 )| ≤ L|x1 − x2 |. (7.3)
The greatest lower bound for L is called the Lipschitz constant for g on Ω. If g has a Lipschitz constant
L < 1, then g is a contraction on Ω. Any function that satisfies a Lipschitz condition is continuous.
Lipschitz condition (7.3) can be expressed as
|g(x1 ) − g(x2 )|
≤ L.
|x1 − x2 |
If g ∈ C 1 in the closed interval [x1 , x2 ], then the Mean Value Theorem states that there is a point x ∈
[x1 , x2 ], i.e., x = αx1 + (1 − α)x2 for some value α ∈ [0, 1], such that
|g(x1 ) − g(x2 )|
= g 0 (x).
|x1 − x2 |
In particular, if |g 0 (x)| < 1 for all x ∈ Ω then g is a contraction on Ω.

If g is a contraction on some neighborhood (x∗ − h, x∗ + h) of the fixed point x∗ , then iterations
xk+1 = g(xk ) starting at x0 ∈ (x∗ − h, x∗ + h) decreases the distance between the iterates and the fixed
point. In this case x∗ is said to be an attractive fixed point and the iteration exhibits local convergence.
75
On the other hand, if |g 0 (x)| > 1 for all x ∈ (x∗ − h, x∗ + h), then the iteration xk+1 = g(xk ) will not
converge to x∗ . In this case x∗ is said to be a repelling fixed point and the iteration exhibits local divergence.
Example. Determine if the fixed points of
x2
g(x) = 1 + x − .
3
are attractive or repelling.

√ √
From the previous example, the fixed points of g are x∗1 = 3 and x∗2 = − 3. The derivative of g is
2x
g 0 (x) = 1 − .
3
Then, |g 0 (x)| < 1 implies that

2x
1<1− < 1,
3
or 0 > x > 3. Since x∗1 ∈ (0, 3) it is an attractive fixed point. On the other hand, x∗2 is a repelling fixed
point.
7.4 Error analysis and order of convergence

Let g be contraction on some interval [a, b] and let us examine the rate at which xk approaches the fixed
point x∗ . Equations (7.3) and (7.2) allow us to estimate the error εk+1 = x∗ − xk+1 in term of previous
values:
|εk+1 | = |x∗ − xk+1 | = |g(x∗ ) − g(xk )|
(7.4)
≤ L |x∗ − xk | = L |εk | .
Therefore, every iteration the magnitude of the error reduces at least by a factor L.
The sequence {xk } converges with order p to x∗ if there is a constant C and a natural number N ≥ 0
such that
|x∗ − xk+1 | ≤ C |x∗ − xk |p (7.5)
whenever k ≥ N . Larger values of p imply faster convergence, at least when k ≥ N . If p = 1 then

convergence occurs when 0 < C < 1. This condition is referred to as linear convergence. If p = 2, the
convergence is quadratic. If
|x∗ − xk+1 |
lim p = C,
k→∞ |x∗ − xk |
then C is referred to as the asymptotic error constant.

If there is a sequence {Ck } such that
lim Ck = 0
k→∞
76
and
|x∗ − xk+1 | ≤ Ck |x∗ − xk | ,
then it is said that the sequence {xk } converges superlinearly to the fixed point x∗ .
Under some circumstances one can construct iteration functions g for which successive substitutions
converges with order p ≥ 2. In a trivial case, if for all p > 0,
|xk+1 − x∗ |
lim = 0,
k→∞ |xk − x∗ |p
then we say that the order of convergence is ∞. This is the case in which all the values of the sequence are
the same.
The order of convergence p can be interpreted in terms of decimal places of accuracy. If |x∗ − xk | =
10−q , then |x∗ − xk+1 | ≤ C × 10−pq . For example, if |εk | = 103 and p = 2, then |εk+1 | = C × 10−6 .
The order of convergence of an algorithm can be checked using a convergence plot. If the fixed point x∗
is known, one can compute the sequence of errors εk = x∗ − xk . From (7.5),
log |εk+1 | ≤ p log |εk | + log C.
Then the order of convergence can be obtained from the slope of the curve log |εk+1 | versus log |εk |. If the
function g is a contraction on some interval, then the convergence is at least linear.
Example. Suppose that xk = 1/k, and thus xk → 0. Then
|xk+1 | 1/(k + 1) kp
= = .
|xk |p 1/k p k+1
If p < 1, the above sequence convergences to 0, whereas if p > 1, it grows to ∞. If p = 1, the sequence
converges to 1. Hence, the order of convergence is 1.
Example. Suppose that xk = 1 for all k, and thus xk → 1. Then,
|xk+1 | 0
= p =0
|xk |p 0
for all p. Hence, the order of convergence is ∞.
Example. Suppose that xk = αk , where 0 < α < 1, and thus xk → 0. Then,
|xk+1 | αk+1
= = αk+1−kp .
|xk |p (αk )p
If p < 1, the above sequence converges to 0, whereas if p > 1 it grows to ∞. If p = 1 the sequence
converges to α. Hence the order of convergence of is 1.
77
Exercises
1. (From K.G. Binmore, Mathematical analysis: a straightforward approach) Prove that
k2 − 1

lim = 1.
k→∞ k2 + 1
2. Let x > 0. Prove that

lim k 1/k = 1.
k→∞
3. Let p be any positive rational number. Prove that

1
lim = 0.
k→∞ kp
4. Let α be any real number. If

lim xk = x∗
k→∞
prove that
lim αxk = αx∗ .
k→∞
5. (From J.H. Mathews, Numerical Methods for Mathematics, Science and Engineering) Find the fixed
points and determine local convergence or divergence for the following fixed point iterations:
√
• g(x) = 2x
8
• g(x) = x+2
x2 x
• g(x) = 4 + 2
2
• g(x) = − x4 − x
2 +4
78
Part III
Unconstrained multivariate optimization
79
Chapter 8
Analytical elements

A multivariable optimization problem can be expressed as
min f (x)
x e (8.1)
e
s.t. x ∈ Ω,
e
where f is a scalar function f : Rn → R of n variables x = (x1 , . . . , xn )T , and Ω ⊂ R is the feasible space.
It is said that x∗ minimizes f if f (x∗ ) ≤ f (x) for all x ∈ Ω. A completely unconstrained optimization prob-
e
lem corresponds to the case in which Ω = Rn . However, the theory presented in this chapter is applicable
e e e e
whenever Ω is a convex, open subset of Rn . Let us consider an example of an unconstrained optimization

problem in engineering.
Example. Two frictionless bodies A and B with unidimensional displacement are connected by three linear
springs. The first spring with spring constant k1 connects body B to the wall. The second spring with spring
constant k2 connects body A to the wall, and the third spring with spring constant k3 connects body A
and body B. Consider that the springs are in their natural position. Find the displacements x1 and x2 of
the bodies A and B under a static force P applied on body B. State the optimization problem using the
principle of minimum potential energy.
The potential energy of the system is given by Π = U − W , where U is the internal (strain) energy and
W is the work of the external force. The optimization problem can be stated as
min Π(x) = U (x) − W (x)

x e e e
e
where
1 1 1
U (x) = k2 x21 + k3 (x2 − x1 )2 + k1 x22
e 2 2 2
80
and
W (x) = P x2 .
e
8.2 Optimality conditions

8.2.1 First order necessary condition
If f ∈ C 1 , then the necessary condition for x∗ to be a local minimum of f is
e
∇f (x∗ ) = 0. (8.2)
e e
This equation states that ∂f (x∗ )/∂xi = 0 for all i = 1, . . . , n. The points that satisfy this condition are
e
referred to as stationary points. A stationary point can be a minimum, maximum, or saddle point.
The condition in (8.2), also called the first order necessary condition, can be demonstrated using Taylor
series expansion of f about x∗ ,
e
f (x) = f (x∗ ) + ∇f (x∗ )T (x − x∗ ) + R1 (x).
e e e e e e
Evaluating this function at y within a h-neighborhood of x∗ yields
e e
f (x∗ + hy ) − f (x∗ ) = h∇f (x∗ )T y + R1 (hy ) ≥ 0,
e e e e e e
where h is a small positive number. For small enough h the residual R1 (hy ) can be disregarded. The above
equation can be written as e
n
X ∂f (x∗ )
h e yi ≥ 0. (8.3)
∂xi
i=1
Let us assume that every component of the vector y is zero except at the k-th position, i.e., y = (0, . . . , yk , . . . , 0)T .
Then (8.3) can be simplified as e e
∂f (x∗ )
h e yk ≥ 0.
∂xk
Since h is assumed positive and yk is unrestricted in sign, the only solution to this equation is
∂f (x∗ )
e = 0,
∂xk
which applies to any k. This conditions yields ∇f (x∗ ) = 0.

e e
81
Example. Determine the stationary points of
1
f (x) = x1 + + 2x2 .
e x1 x2
The gradient of f (x)  

e 1
1− x21 x2 
∇f (x) =  1
.
e 2− x1 x22
Equating ∇f (x) = 0 yields x∗1 = 1.2599 and x∗2 = 0.6300. This solution can be proven using M ATHEMAT-
ICA : e e
In[1]:=f := x1 + 1/(x1 x2) + 2 x2

In[2]:=gf := D[f, {{x1, x2}, 1}]
In[3]:=NSolve[gf == 0, {x1, x2}]
Out[3]={{x1 -> 1.25992, x2 -> 0.629961},
{x1 -> -0.629961 - 1.09112 I, x2 -> -0.31498 - 0.545562 I},
{x1 -> -0.629961 + 1.09112 I, x2 -> -0.31498 + 0.545562 I}}
8.2.2 Second order sufficient conditions

If f ∈ C 2 , then the sufficient conditions for x∗ to be strict local minimum of f are given by (8.2) and
e
∇2 f (x∗ ) positive definite. (8.4)
e
This can be demonstrated a using quadratic approximation from the Taylor’s series expansion of f about x∗ ,
e
1
f (x) = f (x∗ ) + ∇f (x∗ )T (x − x∗ ) + (x − x∗ )T ∇2 f (x∗ )(x − x∗ ) + R2 (x).
e e e e e 2 e e e e e e
Disregarding the residual R2 and evaluating in y in the h-neighborhood of x∗ yields
e e
1
f (x∗ + hy ) − f (x∗ ) = h2 y T ∇2 f (x∗ )y .
e e e 2 e e e
If f (x∗ + hy ) − f (x∗ ) > 0, then
y T ∇2 f (x∗ )y > 0.
e e e
e e e
2 ∗
Therefore, ∇ f (x ) must be positive definite.
e
Example. Determine all the stationary points of the following function,
1
f (x) = x1 + + 2x2 .
e x1 x2
82
Also, determine if they are minima, maxima, or saddle points.
The Hessian of f (x) is !
2x2
1
e
2 1 x1
∇ f (x) = 2 2 2x1
.
e x1 x2 1 x2
Evaluating x∗ = (1.2599, 0.6300)T , one observes that ∇2 f (x∗ ) is positive definite. This can be observed
using Sylvester’s
e test or the eigenvalue test. Therefore, x∗ isea strict local minimum. This solution can be
proven using M ATHEMATICA: e
In[1]:= f = x1 + 1/(x1 x2) + 2 x2;

In[2]:= gf = D[f, {{x1, x2}, 1}];
In[3]:= Hf = D[f, {{x1, x2}, 2}];
In[4]:= NSolve[gf == 0, {x1, x2}]
In[5]:= x1 = 1.2599210498948736; x2 = 0.6299605249474368;
In[6]:= PositiveDefiniteMatrixQ[Hf]
Out[6]= True
8.2.3 Higher order conditions

Let us consider the sufficient conditions for the case in which the Hessian matrix of the objective function
f is semi-definite. As in the case of single-variable optimization, this is investigated using higher-order
derivatives of the Taylor’s series expansion.
Let the partial derivatives of f be continuous in the neighborhood of a stationary point x∗ , and
e
∂ (k) f (x∗ )
=0
∂x1 · · · ∂xn
e
the first non-vanishing higher-order differential of order k ≥ 2 of f evaluated at x∗ .

e
• If k is even, then
– if ∇(k) f (x∗ ) is positive definite, then x∗ is a local minimum point.

e e
– if ∇(k) f (x∗ ) is negative definite, then x∗ is a local maximum point.
e e
– if ∇(k) f (x∗ ) is semi-definite, then nothing can be concluded.
e
• If k is odd, then x∗ is a saddle point.
e
8.3 Convexity
8.3.1 Definition
A set Ω ⊂ Rn is convex if ∀x1 , x2 ∈ Ω and ∀α ∈ R, 0 ≤ α ≤ 1, the point αx1 + (1 − α)x2 ∈ Ω. In other
e e e e
words, Ω is convex if every point in the line segment joining any two points in Ω is also in Ω.
83
A function f defined in a convex set Ω is convex if for all x1 , x2 ∈ Ω, and for all α ∈ R, 0 ≤ α ≤ 1,
e e
there holds
f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 ). (8.5)
e e e e
8.3.2 Properties
1. If f ∈ C 1 , then f is convex in Ω if and only if
f (x2 ) ≥ f (x1 ) + ∇f (x1 )T (x2 − x1 ),

e e e e e
for all x1 , x2 ∈ Ω.
e e
2. If f ∈ C 2 , then f is convex in Ω if and only if for all x ∈ Ω, ∇2 f (x) is positive semi-definite.
e e
∗ ∗
3. If x is a local minimum of a convex function f , then x is also a global minimum.
e e
Replacing x1 = x∗ in Property 1, one observes that f (x2 ) ≥ f (x∗ ) for all x2 ∈ Ω, which demonstrates
Property 3. Even more, a point that satisfies the condition ∇f (x∗ ) = 0 is a global minimum.
e e e e e
e e
Example. Determine if the following function is convex,
f (x) = x1 x2
e
in S = {(x1 , x2 )T ∈ R2 : x1 > 0, x2 > 0}. Since f ∈ C 2 , one observes that
!
2 0 1
∇ f (x) = ,
e 1 0
which is not positive semi-definite. Therefore, f is not convex.
84
Exercises
1. Consider the following functions:
• f (x) = x21 + x1 (2x2 − 3) − 2x22

e
• f (x) = (x1 − 1)2 + (x22 − 1)2
e
• f (x) = x1 + x11x1 + x2
e
• f (x) = 1+x2x+x
1 +x2
2 +x x
1 2 1 2
e
• f (x) = (x1 − 1)2 + x1 x3 + x2 x3
e
Determine (a) all stationary points and (b) check if they are strict local minima using the sufficient
conditions.
2. Consider the function
f (x) = 1x22 + x1 x2 + x22 + x2 x3 + x23 − 6x1 − 7x2 + 9.

e
(a) Using the first order necessary conditions, find the minimum point.
(b) Verify that the point is a local minimum point by verifying that the second order sufficient
conditions hold.
(c) Prove that the point is a global minimum.
3. Suppose that through an experiment the value of a function g is observed at m points, x1 , . . . , xm .

Thus, values g(x1 ), . . . , g(xm ) are known. Let us consider now an polynomial approximation of
degree n of the form
h(x) = an xn + an−1 xn−1 + · · · + a0
where n < m. Corresponding to any choice of the approximating polynomial, there will be a set of
errors given by ei = g(xi ) − h(xi ). In order to find the optimum values for the n + 1 coefficients of
the polynomial approximation, let us define the following unconstrained optimization problem:
m
X
min f (a) = e2i .
a
e
e i=1
where a = (a0 , . . . , an )T . Show that the objective function can be expressed as

e
f (a) = aT Qa − 2bT a + c.
e e ee e e
e
Show that using first order necessary conditions the optimum point satisfies that
Qa = b.
e e e
e
85
Chapter 9
Numerical methods
Most numerical gradient-based methods require an initial design x0 , a search direction d0 , and a step size
e e
α0 . In this way, the improved design can be defined as x1 = x0 + α0 d0 . Iterativelly, these algorithms may
e e e
be written as
xk+1 = xk + αk dk . (9.1)
e e e
Gradient-based numerical methods are characterized by the different ways they determine the search direc-
tion vector dk . The step size αk is determined by single-variable optimization methods, also referred to as
e
line search methods. Therefore, αk can be expressed as
αk = arg min f (xk + αdk ). (9.2)

α≥0 e e
9.1 Principles
9.1.1 Descent direction
A current design xk is driven to an improved design xk+1 such that
e e
f (xk+1 ) < f (xk ); (9.3)
e e
this is,
f (xk + αk dk ) < f (xk ). (9.4)
e e e
Using Taylor series expansion about xk , the objective function f can be written as
e
f (x) = f (xk ) + ∇f (xk )T (x − xk ) + R1 (x). (9.5)
e e e e e e
Evaluating in xk+1 = xk + αk dk , one obtains that
e e e
f (xk+1 ) = f (xk ) + ∇f (xk )T (αk dk ) + R1 (αk2 ). (9.6)
e e e e
86
For αk sufficiently small, the residual term can be disregarded. Since f (xk+1 ) − f (xk ) < 0, see (9.3), one
e e
observes that
αk ∇f (xk )T dk < 0. (9.7)
e e
For any αk > 0,
∇f (xk )T dk < 0, (9.8)
e e
where dk is a descent direction and (9.8) is referred to as the descent condition. The unit direction of most
e
rapid decrease is the solution to the problem
min ∇f (xk )T d
d e e (9.9)
e
s.t. ||d|| = 1,
e
where ∇f (xk )T d = ||∇f (xk )|| ||d|| cos(θ), and θ is the angle between ∇f (xk ) and d. Since ||d|| = 1 then
∇f (xk )T d = ||∇f (xk )|| cos(θ). Therefore, the objective in (9.9) is minimized when cos(θ) = −1 at θ = π
e e e e e e e
e e e
radians. In other words, the solution to (9.9) is
∇f (xk )
dk = − .
||∇f (xk )||
e
e
e
Observe that if dk is opposite to the gradient ∇f (xk ), this is
e e
dk = −∇f (xk ), (9.10)
e e
then
∇f (xk )T dk = −||∇f (xk )||2 < 0. (9.11)
e e e
The search direction in (9.10) is referred to as the steepest descent direction. In Rn the gradient ∇f (xk ) is
e
a normal vector to the tangent hyperplane of the function isovalue at xk . The gradient vector is aligned with
e
the direction of maximum growth steepest ascent of the function f at xk .
e
9.1.2 Line search

Once the search direction dk in the point xk is identified, the algorithm has to identify the step size αk . This
e e
problem can be stated as,
min f (α) = f (xk + αdk ). (9.12)
α e e
The necessary and sufficient condition is obtained from
df (α) d(xk + αdk )

= e e = 0. (9.13)
dα dα
87
This is
df (xk+1 ) ∂f (xk+1 ) dxk+1
= = ∇f (xk+1 )T dk = 0. (9.14)
dα ∂x dα
e e e
e e
e
In other words, the directional derivative of f (xk+1 ) along dk must be zero.
e e
Now, let us consider the particular problem of minimizing a quadratic function
min f (x) = 12 xT Ax + bT x + c, (9.15)

x e e e
e ee e e
where A is symmetric and positive definite. In order to find αk , let us consider the following optimization
e
problem,
e
1
f (xk + αdk ) = (xk + αdk )T A(xk + αdk ) + bT (xk + αdk ). (9.16)
e e 2 e e e e e e e e e
The derivative of f with respect to α is given by
df (xk +αdk )
h iT
e dα e = 1
+ αdk )T Adk + 12 dk T A(xk + αdk ) + bT dk
2 (xk
(9.17)
= (xk + αdk )T Adk + bT dk .
e e e ee e e e e e e e
e e e ee e e
The necessary and sufficient conditions for optimality are satisfied when (9.17) is equal to zero. This condi-
tion can be written as
xk T Adk + αdk T Adk + bT dk = 0, (9.18)
e e ee e e ee e e
or
αdk T Adk = −(xk T A + bT )dk . (9.19)
e e ee e e e e e
Comparing (9.19) with (9.15) one observes that
df (xk )
αdk T Adk = − e dk . (9.20)
e e ee dx e
e
or
αdk T Adk = −dk T ∇fk , (9.21)
e e ee e
where ∇fk = ∇f (xk ). Finally, solving for α allows
e
dk T ∇fk
αk = − e T . (9.22)
dk Adk
e e ee
This equation is applicable only when the objective function is quadratic. In the general case of a non-
linear function (9.12) should be used.
88
9.1.3 Termination criteria
The necessary condition for optimality is satisfied in a particular point xk if
e
||∇f (xk )|| ≤ εG , (9.23)
e
where εG is a tolerance on the gradient supplied by the user. Notice that the gradient can vanish at any
stationary point (maximum, minimum, or saddle point). However, the chances to find a maximum or a
saddle point with a steepest descent algorithm are remote.
The algorithm should also check successive reduction in the function value, for example,
|f (xk+1 ) − f (xk )| ≤ εA + εR |f (xk )|, (9.24)

e e e
where εA is the absolute tolerance on the change in the function value while εR is the relative tolerance.
Suggested values might be εG = 10−4 , εA = 10−6 , and εR = 10−2 .
9.2 Steepest descent method

9.2.1 Definition
This method, presented by Cauchy in 1847, defines the search direction as
dk = −∇f (xk ), (9.25)

e e
which corresponds to the steepest descent direction. Using (9.25), one can observe that
dk+1 = −∇f (xk+1 ), (9.26)

e e
therefore,
df (xk+1 )
= −dk+1 T dk = 0. (9.27)
dα
e
e e
Interestingly, since two consecutive steepest descent directions are orthogonal, this method approaches the
minimum in a “zig-zag” shape.
Example. Consider the function

f (x) = 2x21 + x22 .
e
Determine if the search direction d0 = (0.5, 0.8)T is a descent direction in x0 = (1, 2)T . Determine the
e e
steepest descent direction in this point.
The gradient of the function is !
4x1
∇f (x) = .
e 2x2
89
Evaluating at x0 = (1, 2)T yields !
4
e
∇f (x0 ) = ;
e 4
therefore, ds = −(4, 4)T is the steepest descent direction of f at x0 . The descent condition for d0 =
(0.5, 0.8)T can be verified with the sign of the directional derivative of f0 along d0 . This is
e e e
e
!
0.5
4 4 = 5.2 ≮ 0,
0.8
then, d0 is not a descent direction.

e
9.2.2 Algorithm
Step 1. Estimate a reasonable initial point x0 and termination parameters εA , εG , and εR , where εA is the
e
absolute tolerance, εG and εR tolerances for the gradient and the function.
Step 2. Determine ∇f (xk ). Stop if

||∇f (xk )|| ≤ εG .
e
e
Otherwise, define the search direction dk = −∇f (xk ).
e e
Step 3. Obtain αk minimizing f (α) = f (xk + αdk ), for α > 0. Update xk+1 = xk + αk dk .
e e e e e
Step 4. Evaluate f (xk+1 ). Stop if
e
|f (xk+1 ) − f (xk )| ≤ εA + εR |f (xk )|
e e e
is satisfied in two consecutive iterations. Otherwise, update k = k + 1 and xk = xk+1 and go to
e e
Step 2.
Example. Minimize the function

f (x) = x21 + x22 − 2x1 x2
e
using the steepest descent method from the point x0 = (1, 0)T .
The search direction is d0 = (−2, 2)T . The step size minimizes f (α) = 16α2 − 8α + 1, this is,
e
α0 = 0.25. The improved design is x1 = (0.5, 0.5)T . In this point, the gradient of the function is 0 which
e
e e
satisfies the optimality condition and the termination criterion. Therefore, this is the optimum point.
The steepest descent method is simple and robust to optimization. However, it has some drawbacks.
1. Even though this method is convergent, it is also slow even with positive definite quadratic forms.
2. The information from previous iterations is not used in the next steps.
90
3. In practice, the function is greatly reduced in the first iterations but it decreases mores slowly as the
iterations continue.
4. The steepest descent direction makes sense from a local prospective (current point) but it can be
improved in a more global sense.
9.2.3 Scaling∗
Scaling techniques are used to increase the order (or speed) of convergence in numerical methods. The
underlying idea is the transform the set of design variables to reduce the number of iterations required to
reach the minimum. Let us present some scaling procedures in the following examples.

f (x) = 25x21 + x22
e
using the steepest descent method form the point x0 = (1, 1)T .
e
The gradient and the steepest descent direction is determined in every iteration. The line search is
performed using a one-dimensional technique (e.g., golden section, polynomial approximations). Using the
function fminbnd in M ATLAB, this can be done as
function [xopt,fopt,aopt,ngradf]=afun(x0)
d0=-gfun(’fun’,x0);
[aopt,fopt] = fminbnd(@(a) fun(x0,d0,a),0,10);
xopt=x0+d0*aopt;
ngradf=norm(d0);
function f=fun(x0,d0,a)
if nargin==1
d0=zeros(size(x0));
a=0;
end
x=x0+a*d0;
f=25*x(1)ˆ2+x(2)ˆ2;
On the workspace,
>> [x,f,a,n]=afun([1;1])
x =
-0.0010
0.9600
f =
0.9215
a =
0.0200
n =
50.0400
91
Table 9.1 shows the iterative process.
Table 9.1: Steepest descent, f (x) = 25x21 + x22

k x1 x2 f (x) α ||∇f (x)||
001 +1.0000E+00 +1.0000E+00 +2.6000E+01 +2.0033E-02
e +5.0040E+01
e
002 -1.6444E-03 +9.5992E-01 +9.2154E-02 +4.7898E-02 +1.9216E+00
..
.
111 -2.3545E-06 +1.3752E-03 +1.8916E-02 +2.7531E-03
Analytically, the Hessian is given by

!
50 0
∇2 f (x) = .
e 0 2
Introducing new variables x̂ = (x̂1 , x̂2 )T such that

e
x = Dx̂, (9.28)
e e ee
where !
√1 0
D= 50 ,
0 √1
2
e
e
√ √
then x1 = x̂1 / 50, x2 = x̂2 / 2 and
1
f (x̂) = (x̂21 + x̂22 ).
e 2
The minimum of f (x̂) can be located in one iteration using the steepest descent method. The point that
minimizes f is x̂∗ = (0, 0)T in the scaled space. Then,
e
e
√
x∗1 = x̂∗1 / 50 = 0
and
√
x∗2 = x̂∗2 / 2 = 0.
f (x) = 6x21 − 6x1 x2 + 2x22 − 5x1 + 4x2 + 2

e
using the steepest descent method from the point x0 = (−1, −2)T . Perform an appropriate scaling to
e
improve the convergence rate of this method.
92
The Hessian of the function is given by
!
2 12 −6
∇ f (x) = .
e −6 4
The eigenvalues are λ1 = 0.7889 and λ2 = 15.211 with eigenvectors e1 = (0.4718, 0.8817)T and e2 =
(−0.8817, 0.4718)T . The scaling is defined as
e e
x = Qx̂, (9.29)
e ee
e
where Q = (e1 , e2 ), this is
e e e !
0.4718 −0.8817
e
Q= .
e 0.8817 0.4718
e
Note that the Hessian of the scaled function is not the identity matrix yet. To accomplish this condition one
needs a new change of variables
ˆ,
x̂ = Dx̂ (9.30)
e e e e
where !
√ 1 0
D= 0.7889 .
0 √ 1
15.211
e
e
The Hessian of the new function x̂ˆ is, in fact, the identity matrix. Using the steepest descent method one
e
reaches the optimum in one single iteration. The minimum is x̂ ˆ ∗ = (−1.3158, −1.6142)T . Applying the
e
linear transformations, the minimum in the original space is
x = QDx̂ˆ, (9.31)
e ee e e
e
and the solution is x∗ = (−1/3, −2/3)T .
e
9.3 Conjugate gradient method

9.3.1 Definition
It is said that two vectors, di and dj , are conjugate with respect to A if
e e e
e
di T Adj = 0, i 6= j, (9.32)
e e ee
where A is symmetric and positive definite. The conjugate gradient method was presented by Fletcher &
e
Reevese(1964) as a quadratically convergent gradient method for locating an unconstrained local minimum
93
of a function of several variables. With this method it is possible to locate the minimum of a quadratic
function of n variables in n iterations. In this method the search direction dk+1 has the form
e
dk+1 = −∇fk+1 + βk dk , (9.33)
e e
where βk dk represents the deflection of the search direction with respect to the steepest descent direction. If
e
two consecutive search directions are conjugate with respect to A then
e
e
dk+1 T Adk = 0. (9.34)
e e
ee
Consider an initial design x0 and a set of conjugate directions d0 , d1 , . . . , dn−1 . Minimizing f (x) along
e e e e e
d0 leads to x1 . Then, from the point x1 , f (x) is minimized along d1 so one obtains x2 . The process
e e e e e e
continues until one reaches xn along dn−1 . Finally, the point xn minimizes (9.15). This iterative process
e e e
can be expressed as
xk+1 = xk + αk dk , (9.35)
e e e
where dk is descent direction and
e
αk = arg min f (xk + αdk ). (9.36)
α≥0 e e
Using (9.33), the conjugate condition (9.34) can be written as
(−∇fk+1 + βk dk )T Adk = 0, (9.37)

e e ee
this is
−∇fk+1 T Adk + βk dk T Adk = 0. (9.38)
e
ee e e ee
Solving for βk yields
∇fk+1 T Adk
βk = ee .
e (9.39)
dk T Adk
e e ee
xk+1 − xk
dk = e e .
e αk
Multiplying by A one obtains
Axk+1 − Axk
e
e
Adk = e
ee ee .
e (9.40)
e
e e α k
Since,
∇fk = Axk + b
e
ee e
and
∇fk+1 = Axk+1 + b,
e
ee e
94
then (9.40) can be written as
∇fk+1 − ∇fk
Adk = . (9.41)
e
ee αk
Replacing (9.41) into (9.39) yields
∇fk+1 T (∇fk+1 − ∇fk )

βk = . (9.42)
αk dk T Adk
e e ee
dk = −∇fk + βk−1 dk−1 . (9.43)
e e
Multiplying (9.43) by ∇fk ,
dk T ∇fk = −∇fk T ∇fk + βk−1 dk−1 T ∇fk . (9.44)

e e
The condition df /dα = 0 make dk and ∇fk+1 orthogonal, this is
e
dk T ∇fk+1 = 0, (9.45)
e
or
dk−1 T ∇fk = 0.
e
In this way, (9.44) can be expressed as
dk T ∇fk = −∇fk T ∇fk . (9.46)

e
In consequence, the step size described by (9.22) can be written in the following closed form:
∇fk T ∇fk
αk = . (9.47)
dk T Adk
e e ee
Substituting (9.47) into (9.39) allows to express the deflection factor
∇fk+1 T (∇fk+1 − ∇fk )

βk = . (9.48)
∇fk T ∇fk
This expression is used by the Polak-Ribiere algorithm (Polak & Ribiere, 1969). On the other hand, if one
considers
∇fk+1 T ∇fk = 0, (9.49)
then the deflection factor takes the form
∇fk+1 T ∇fk+1
βk = . (9.50)
∇fk T ∇fk
95
This expression is used by the Fletcher-Reeves algorithm (Fletcher & Reeves, 1964).
9.3.2 Algorithm
Step 1. Estimate the initial design x0 . Select a termination parameter ε. The first search direction is defined
e
as the steepest descent direction,
d0 = −∇f0 .
e
If ||∇f0 || < ε, then algorithm cannot improve the current design; otherwise go to Step 5.
Step 2. Compute the gradient ∇fk . If ||∇fk || < ε the algorithm cannot improve the current design; other-
wise go to the next Step.
Step 3. Determine the deflection factor using Polak-Ribiere equation (9.48) or Fletcher-Reeves equation
(9.50). In the second case, the deflection factor can be expressed as
∇fk+1 T ∇fk+1
βk = .
∇fk T ∇fk
Step 4. Update the search direction

dk+1 = −∇fk+1 + βk dk .
e e
Step 5. Determine the step size by performing a line search. In a quadratic function, the step size is given
by
∇fk T ∇fk
αk = .
dk T Adk
e e ee
If the function is not quadratic, then αk = arg min f (α).
Step 6. Update the design

xk+1 = xk + αk dk
e e e
and go to Step 2.
If the minimum is not found after n + 1 iterations, it is recommended to restart the algorithm using the
steepest descent direction.
Example. Consider the minimization of the function
f (x) = x21 + 2x22 + 2x23 + 2x1 x2 + 2x2 x3

e
from the initial design x0 = (2, 4, 10)T . Using the conjugate gradient method, how many iterations would
e
it take to find the minimum? Perform that many iterations and compare your numerical solution with the
analytical solution.
96
Since this is a quadratic problem, we can use the Hessian to update the step size. The algorithm should
converge in three iterations. The gradient of the function is given by
 
2x1 + 2x2
∇f (x) = 2x1 + 4x2 + 2x3 
 
e
2x2 + 4x3
and the Hessian is  

2 2 0
A = ∇2 f (x) = 2 4 2 ,
 
e e
e 0 2 4
which is symmetric and positive definite.
Iteration 1. The first improved point is x1 = x0 + α0 d0 where

e e e
 
−12
d0 = −∇f0 = −40 .
 
e
−48
Since f is quadratic then the step size is
∇f0 T ∇f0 253

α0 = T
= = 0.1587.
d0 Ad0 1594
e e ee
Then,  
0.0954
x1 = x0 + α0 d0 = −2.3488 .
 
e e e
2.3814
Iteration 2. The new gradient is

 
−4.5069
∇f1 = −4.4417 .
 
4.8281
Using Fletcher-Reeves equation
∇f1 T ∇f1
β0 = = 0.01565.
∇f0 T ∇f0
97
The new search direction is
 
4.3191
d1 = −∇f1 + β0 d0 =  3.81566  .
 
e e
−5.5793
The step size is

∇f1 T ∇f1
α1 = = 0.3155.
d1 T Ad1
e e ee
The second improved point is
 
1.4578
x2 = x1 + α1 d1 = −1.1452 .
 
e e e
0.6214
Iteration 3. The third (hopefully last) design is determine in the same way:
 
0.62534
∇f2 = −0.422104 ,
 
0.195419
∇f2 T ∇f2
β1 = = 0.0096,
∇f1 T ∇f1
 
−0.5839
d2 = −∇f2 + β1 d1 =  0.4587  ,
 
e e
−0.2489
∇f2 T ∇f2
α2 = = 2.4966,
d2 T Ad2
e e ee
and  
0.0000
x∗ = x3 = x2 + α2 d2 = 0.0000 .
 
e e e e
0.0000
The minimum value of the function is f (x∗ ) = 0.

e
Notice that the termination criterion is satisfied, this is
 
0.0000
∇f3 = 0.0000 .
 
0.0000
98
The reader can easily prove that this corresponds to the analytical solution.
The conjugate gradient method is in essence the steepest descent with a deflected search direction. This
simple modification does not require a considerably amount of computational work; however, it substantially
improves the order of convergence.
9.4 Newton’s method

9.4.1 Definition
When second-order derivatives are available, the objective function can be better approximated. In this way,
better search directions can be obtained and the convergence rate can be also improved. Newton’s method
makes use of the Hessian of the objective function and has quadratic convergence. When the method is
applied to a quadratic function that is positive definite, it locates the minimum in one iteration.
Newton’s method relies on a quadratic approximation of the function about a current design xk ,
e
1
fQ (x) = f (xk ) + ∇f (xk )T (x − xk ) + (x − xk )T ∇2 f (xk )(x − xk ) (9.51)
e e e e e 2 e e e e e
and its gradient
∇fQ (x) = ∇f (xk ) + ∇2 f (xk )(x − xk ). (9.52)
e e e e e
∗
The minimum of fQ satisfies that ∇fQ (xQ ) = 0, this is
e e
∇fQ (x∗Q ) = ∇f (xk ) + ∇2 f (xk )(x∗Q − xk ) = 0. (9.53)
e e e e e e
Assuming that ∇2 f (xk ) is not singular, and therefore invertible, one observes that
e
x∗Q = xk − (∇2 f (xk ))−1 ∇f (xk ), (9.54)
e e e e
where x∗Q is assumed to be a better approximation to the minimum of f . The updating rule of the Newton’s
e
method can be expressed as
xk+1 = xk − (∇2 f (xk ))−1 ∇f (xk ), (9.55)
e e e e
2 −1
where αk dk = −(∇ fk ) ∇fk . Notice that the descent condition (9.8) is satisfied if
e
−∇fk T (∇2 fk )−1 ∇fk < 0, (9.56)
which is accomplished if and only if ∇2 f (xk ) is positive definite.

e
Example. Consider the minimization of the function
f (x) = x21 + 2x22 + 2x23 + 2x1 x2 + 2x2 x3

e
99
from the initial design x0 = (2, 4, 10)T . Since this is a quadratic function, only one iteration of Newton’s
e
method is required to find the minimum point. The gradient and the Hessian are given by
 
2x1 + 2x2
∇f (x) = 2x1 + 4x2 + 2x3 
 
e
2x2 + 4x3
and  
2 2 0
∇2 f (x) = 2 4 2 ,
 
e
0 2 4
which is positive definite. Starting from x0 = (2, 4, 10)T and using 9.55, the next point is
e
     −1    
x11 2 2 2 0 12 0
x12  =  4  − 2 4 2 40 = 0 ,
         
x13 10 0 2 4 48 0
which corresponds to the minimum point of f .
Some drawbacks of Newton’s method include:
1. Each iteration in Newton’s method requires the Hessian of the objective function. In some functions
that might be impossible for certain points; however, when it is possible, the number of function calls
increases substantially with respect to the previous first-order methods. For instance, if f : Rn → R,
then the algorithm requires n(n + 1)/2 second-order derivatives.
2. The Hessian might be singular (or close to singular) in some points which makes impossible to deter-
mine a search direction.
3. The method does not use information collected in previous iterations.
4. The convergence of this method is not guaranteed.
9.4.2 Modified Newton’s methods

When the function is highly nonlinear, then a quadratic approximation may not be good enough. In fact, it
might happen that f (xk+1 ) f (xk ) and convergence cannot be achieved. A modified Newton’s method
e e
incorporates a line search to find a step size αk over the search direction dk . This approach is referred to as
e
modified Newton’s method.
Marquardt (1964) suggested a modification on Newton’s method by proposing a new way to find search
directions. In this approach, Marquardt combines the advantages of the Newton’s method with the steepest
100
descent direction method. The proposed search direction is defined as
dk = −(∇2 f (xk ) + γI )−1 ∇f (xk ), (9.57)

e e e
e e
where I is the identity matrix and γ is a weighting value. When γ is big, then the direction corresponds to
e
the steepest descent direction. When γ is small, then it corresponds to the Newton’s direction. This approach
is referred to as Marquardt’s modification.
9.4.3 Algorithm
The algorithm for the modified Newton’s method can be described as follows:
Step 1. Estimate an initial design x0 and a termination parameter ε.

e
Step 2. Determine ∇f (xk ). If ||∇f (xk )|| < ε, then the algorithm cannot improve the current design;
e e
otherwise go to the next Step.
Step 3. Determine the Hessian ∇2 f (xk ).

e
Step 4. Determine the search direction using (9.57).
Step 5. Find the optimum step size αk by minimizing f (xk +αdk ) and update the design xk+1 = xk +αk dk .
e e e e e
Then go to Step 2.
Example. Given the function
f (x) = 10x41 − 20x1 x2 + 10x22 + x21 − 2x1 + 5,

e
perform two iterations using the modified Newton’s method from the point x0 = (−1, 3)T .
e
This is not a quadratic function, therefore one cannot expect to find the minimum in one single iteration.
The gradient of the function is given by
!
−2 + 2x1 + 40x31 − 20x2
∇f (x) = ,
e −20x1 + 20x2
and the Hessian is given by !

2 + 120x21 −20
∇2 f (x) = .
e −20 20
Iteration 1. The first improved point is x1 = x0 + α0 d0 where

e e e
!
−1 0.2353
d0 = − ∇2 f (x0 )

∇f (x0 ) = .
e e e −3.7647
101
Minimizing f (α) = f (x0 + αd0 ) one obtains that
e e
α0 = 1.0045.
Then the improved point is

!
−0.7637
x1 = x0 + α0 d0 = .
e e e −0.7815
Iteration 2. Following the same procedure,

!
−1 0.1167
d1 = − ∇2 f (x1 )

∇f (x1 ) = ,
e e e 0.1346
α1 = 1.3424,
!
−0.6070
x2 = x1 + α1 d1 = .
e e e −0.6008
The exact solution is given by x∗ = (0.7207, 0.7207)T and f (x∗ ) = 1.5818.

e e
9.5 Quasi-Newton methods

Quasi-Newton methods approximate the inverse of the Hessian using first-order derivatives. For positive
definite quadratic functions of n variables, these methods find the minimum in n iterations. If the minimum
is not found after n + 1 iterations, then the algorithm restarts. In these methods the search direction is given
by
dk = −H k ∇f (xk ), (9.58)
e e
e e
where H k is an approximation of (∇2 f (xk ))−1 . Initially, this approximation is the identity matrix, H 0 = I .
e
e e e
e e
e
9.5.1 Davidon-Fletcher-Powell (DFP) method
This method was proposed by Davidon (1959) and modified by Fletcher & Powell (1963). The DFP method
generates an approximation of the inverse of the Hessian given by
H k yk yk TH k sk sk T
H k+1 = Hk − e
e eT e e e +ee , (9.59)
e
e e
e yk H k yk sk T y k
e e e e e e
where sk = xk+1 − xk represents the change in the design and y k = ∇f (xk+1 ) − ∇f (xk ) represents the
e e e e e
change in the gradient. e
102
f (x) = 5x21 + 2x1 x2 + x22 + 7
e
from x0 = (1, 2)T . Perform two iterations of the DFP method in order to improve your design.
e
H 0 = I then
e e !
14
e e
d0 = −∇f0 = − .
e 6
The norm ||∇f0 || = 15.232 > ε, then there is no convergence and the algorithm continues. The updated
design is given by
x1 = x0 + α0 d0
e e e
where α0 = arg min 1184α2 − 232α + 20 = 0.0980. Therefore
!
−0.3716
x1 = .
e 1.4122
In the second iteration, ||∇f1 || = 1.877 > ε, therefore there is no convergence. The algorithm deter-
mines s0 = x1 − x0 = (−3716, −0.5878)T and y 0 = ∇f1 − ∇f0 = (−14.8919, 3.9189)T . Then,
e e e e
!
0.148 −0.211
H1 = .
e
e −0.211 0.950
The new search direction is !

0.5703
d1 = .
e −2.1656
The new step size is α1 = 0.652 and the updated design is
!
−1.6 × 10−4
x2 = x1 + α1 d1 = .
e e e 2.88 × 10−5
9.5.2 Broyden-Fletcher-Goldfarb-Shanno (BFGS) method

In the BFGS, the approximation of the inverse of the Hessian is given by
 
sk y k T H k + H k y k sk T yk TH k yk s s T
k k
H k+1 = Hk − e e e e T e e e e + 1 + e Te e e e e T
, (9.60)
e
e e
e s y
k k s y
k k s k yk
e e e e e e
where sk = xk+1 − xk represents the change in the design variables and y k = ∇fk+1 − ∇fk represents the
e e e
change in the gradient. e
103
f (x) = 5x21 + 2x1 x2 + x22 + 7
e
from x0 = (1, 2)T . Perform two iterations of the BFGS method to improve your design.
e
The first iteration is just like in the previous example, this is H 0 = I and
e
e e
e
!
−0.3716
x1 = .
e 1.4122
In the second iteration, one obtains that

!
0.1490 −0.2161
H1 =
e
e −0.2161 0.9711
and the new search direction is !

0.5829
d1 = .
e −2.2135
The new step size is α1 = 0.6380. Therefore, the updated design is
!
−1.098 × 10−4
x2 = x1 + α1 d1 , = .
e e e −2.13 × 10−4
9.6 Trust regions methods

9.6.1 Definition
These methods are applicable to iterative schemes of the form
xk+1 = xk + dk (9.61)
e e e
in which the step size αk is not present. By contrast, the search direction vector dk is restricted to a prescribed
e
region referred to as the trust region. These method are also known as restricted step methods.
The key idea is to approximate the objective function f with a simpler expression fA that reasonably
reflects its behavior in the neighborhood of a current design xk . If f is highly nonlinear the approximation
e
fA will be valid in a trust region Γk around xk . This region can be described as
e
Γk = {x : ||x − xk || ≤ γk }, (9.62)
e e e
where γk is the size of the trust region, which is dynamically adjusted. Usually the Euclidean norm is used.
104
When the L∞ norm is used this method is also known as the box-step or hypercube method (Bazaraa et al.,
2006). Using (9.62) for x = xk + d, the sub-optimization problem of the search direction can be stated as
e e e
dk = arg min fA (d)
e d e (9.63)
e
s.t. ||dk || ≤ γk .
e
Using a linear approximation of the objective function,
fA (d) = f (xk ) + ∇f (xk )T d,

e e e e
and the Euclidean norm, the solution to (9.63) can be written in closed form as
∇f (xk )
dk = − γk . (9.64)
||∇f (xk )||
e
e
e
A more interesting trust-region algorithm is obtained by choosing a quadratic approximation
1
fA (d) = f (xk ) + ∇f (xk )T d + dT ∇2 f (xk )d.
e e e e 2 e e e
Because of the trust-region restriction ||d|| ≤ γk , there is no need for ∇2 f (xk ) to be positive definite, since
e e
(9.63) is guaranteed to have a solution dk . In this case the method is called trust-region Newton method. If
e
an approximation of the Hessian is used, the method is called trust-region quasi-Newton method (Nocedal
& Wright, 1999).
9.6.2 Reliability index

The reliability index is defined as
∆f (xk+1 )
rk = , (9.65)
∆fA (xk+1 )
e
e
which can be also expressed as
f (xk ) − f (xk + dk )
rk = (9.66)
f (xk ) − fA (xk + dk )
e e e
e e e
where f is the objective function to be minimized and fA is its approximation. If the numerator and the
denominator are greater than zero and rk ≈ 1, then it is said that there is a good agreement between the
function and its approximation. On the other hand, if the numerator is negative, then rk is also negative,
which is not desired. In that case, the new point xk+1 = xk + dk does not decrease the function so the size
e e e
of the trust region γk has to be adjusted according to the value of rk .
105
9.6.3 Algorithm
Given xk and an initial trial value γk , a trust-region algorithm determines xk+1 and γk+1 by using two
threshold values σ1 and σ2 with 0 ≤ σ1 ≤ σ2 ≤ 1 and two factors β1 and β2 with 0 < β1 < 1 < β2 .
e e
Typical values are σ1 = 0.2, σ2 = 0.8, β1 = 0.25, β2 = 2.
Step 1. Given xk and γk , obtain dk solving the sub-optimization problem (9.63). For a linear approximation
e e
of the objective function, the solution is given by (9.64). If fA (dk ) = f (d) stop (xk is a local
e e e
minimum); else go to Step 2.
Step 2. If f (xk + dk ) < f (xk ) set

e e e
xk+1 = xk + dk ,
e e e
calculate the reliability index rk from (9.66) and go to Step 3; else set γk = β1 ||dk || and go to Step
e
1.
Step 3. Set 
 β1 ||dk ||
 if rk < σ1 ,
γk+1 = if σ2 ≤ rk and ||dk || = γk , (9.67)
e
β 2 γk
 e
γk otherwise.

Go to the next iteration.

f (x) = (x1 − 1)2 + (x2 − 2)4
e
T
and the initial point x0 = (0, 0) . Solve using linear approximation and the trust region with an initial size
e
γ0 = 1.
The linear approximation of f about xk can be written as
e
fL (x) = f (xk ) + ∇f (xk )T d
e e e e
where !
2(−1 + x1 )
∇f (x) = .
e 4(−2 + x2 )3
The initial value of the function is f (x0 ) = 17.

e
Iteration 1. For k = 0, at the initial point x0 , the gradient is ∇f (x0 ) = (−2, −32)T and the direction is
d0 = (0.0624, 0.9981)T . Since f (x0 + d0 ) = 14.9745, which represents an improvement in the
e e
function value, then x1 = x0 + d0 = (0.0624, 0.9981)T . The reliability index is

e e e
e e e
f (x0 ) − f (x1 )
r0 = e = 0.0631.
f (x0 ) − fL (x1 )
e
e e
106
Since r0 < σ1 , then γ1 = β1 ||d0 || = 0.25.
e
Iteration 2. For k = 1, at the point x1 , the gradient is ∇f (x1 ) = (−1.8752, −4.0234)T and the direction
is d1 = (0.1056, 0.2266)T . Since f (x1 + d1 ) = 1.3377, which represents an improvement in
e e
the function value, then x2 = (0.1680, 1.2247)T . The reliability index is r1 = 0.9605. Since
e e e
r1 ≥ 0.8 and ||d1 || = 0.25 = γ1 , then γ2 = 2γ1 = 0.5.

e
e
Iteration 3. For k = 2, at the point x2 , the gradient is ∇f (x2 ) = (−1.66402, −1.8645)T and the direction
is d2 = (0.3329, 0.3730)T . Since f (x1 + d1 ) = 0.2874, which represents an improvement in
e e
the function value, then x3 = (0.5009, 1.5977)T . The reliability index is r1 = 0.6849, therefore
e e e
e
γ3 = γ2 = 0.5.
Iteration 4. The process continues until ||∇f (x1 )|| ≤ ε for some small value. The optimum point is x∗ =
(1, 2)T with f x∗ = 0.
e e
e
9.7 Least square problems

Consider the system of linear equations
Ax = b. (9.68)
e
ee e
where the matrix A ∈ Rm×n applies the vector x ∈ Rn into b ∈ Rm . When the system is inconsistent, this
e e e
is rank(A) < rank(A
e , b), it does not have a solution. However, one can attempt to provide an approximate
point thate minimizese||Ax − b||. This is,
e e e
e
ee e
min f (x) = (Ax − b)T (Ax − b). (9.69)
x e e
e ee e e ee e
The necessary condition for optimality can be expressed as ∇f (x∗ ) = 0, where
e e
∇f (x) = AT Ax − AT b + AT Ax − AT b
e e ee e e e e e e
ee e e e
= 2AT Ax − 2AT b.
e e
e
e eee e
e e
In the optimal point,
AT Ax∗ = AT b. (9.70)
e
e eee e
e e
Solving for x∗ yields
x∗ = (AT A)−1 AT b.
e
(9.71)
e e
e ee e
e e
This closed form solution is referred to as least square error. Notice that this is a convex problem, therefore
x∗ is a global minimum.
e
107
Example. Solve for x in the following system of linear equations,
e
   
1 2 ! 1
 x1
 3 4 = 1 .
  
x2
−1 1 1
Since rank(A) = 2 and rank(A, b) = 3, the system is inconsistent and does not have a solution.
e e e
However, the solution
e that minimizesethe quadratic error is obtained using (9.71), this is
  −1  
! ! 1 2 1!
x∗1  1 3 −1  1 3 −1  
=   3 4  1 (9.72)

x∗2 2 4 1 2 4 1
−1 1 1
!
−0.4516
= . (9.73)
0.6129
One of the most important applications of this method is the identification of the coefficients of a function
that better matches a cloud of points.
Example. An experiment yields the data in the following table.
t x
0.00 1.001
0.10 1.089
0.23 1.240
0.70 1.604
0.90 1.738
1.50 2.020
2.65 1.412
3.00 1.241
Find the least square best fit coefficients a, b, and c if the assumed functional form is
(a) x = a + bt + ct2 ,
(b) x = a + b sin(t) + c sin(2t).
Which best fit estimate has the smallest least square error?
108
(a) The system of linear equations can be expressed as
   
1.000 0.000 0.000 1.001
   
1.000 0.100 0.010 1.089
   
1.000 0.230 0.053   1.240
 
 a

  
1.000 0.700 0.490   1.604

1.000
b = 
 .
0.900 0.810 1.738

 c
 
  
1.000 1.500 2.250 2.020
   
1.000 2.650 7.023 1.412
   
1.000 3.000 9.000 1.241
Using (9.71), the solution can be expressed as

   
a 0.982
 b  =  1.205  .
   
c −0.379
The function is x(t) = 0.982 + 1.205t − 0.379t2 . The quadratic error is 0.0224.
(b) In this case, the system can be expressed as

   
1.000 0.000 0.000 1.001
   
1.000 0.100 0.199  1.089
   
1.000 0.228 0.444    1.240
 
 a

  
1.000 0.644 0.985    1.604

1.000
b = 
 .
0.783 0.974  1.738

 c
 
  
1.000 0.997 0.141  2.020
   
1.000
 0.472 −0.832
1.412
 
1.000 0.141 −0.279 1.241

   
a 1.017
 b  =  0.960  .
   
c −0.011
The function is x(t) = 1.017+0.960 sin(t)−0.011 sin(2t). The quadratic error is 0.0157 and, therefore,
it is a better approximation.
In some occasions, the data are weighted according to their reliability. In this case, one makes use of a
109
weighting matrix W , such that
W Ax = b.
f
f (9.74)
fe
f ee e
In this case, the optimum solution is given by
(W A)T W Ax∗ = (W A)T b (9.75)

fe
f e f fee e∗ fe
f e e
x = [(W A)T W A]−1 (W A)T b. (9.76)
e f
fee f fee fe
f e e
This method is referred to as weighted least square error.
The application of least square error is not limited to linear systems. In fact, one can minimize the sum
of squares deviations between a set of given values and predicted values. In any case, the least square error
problem is given by
− pi (x))2 ,
P
min f (x) = i (bi (9.77)
x e e
e
where bi represent the given data and pi are the predicted data. To illustrate this concept, let us consider the
following example.
Example. An experiment yields the data in the following table.
f x1 x2
2.2 5.0 10.0
9.5 3.0 1.0
23.6 0.6 0.6
74.3 0.1 2.0
6.3 3.0 1.8
Find the least square best fit coefficients a, b, and c if the assumed functional form is
(a) f (x) = ax1 + bx2 + c,

e
(b) f (x) = axb1 xc2 .
e
(a) In this case we have a system of linear equations given by
   
5.0 10.0 1.0 2.2
  
3.0 1.0 1.0 a  9.5 
 
    
  b  = 23.6
0.6 0.6 1.0   

0.1 2.0 1.0 c 74.3
   
3.0 1.8 1.0 6.3
110
   
a −18.17
 b  =  4.34  .
   
c 52.35
The least square error is 707.2.
(b) In this case, the function is nonlinear and (9.71) cannot be used. In this case, the general approach in
(9.77) is utilized. The optimization problem is given by
min (2.2 − a5.0b 10.0c )2 + · · · + (6.3 − a3.0b 1.8c )2 .

a,b,c
The solution of this nonlinear, unconstrained problem is given by

   
a 15.72
 b  = −0.72 .
   
c −0.14
The least square error is 8.08 which indicates that it is a better fit than the previous functional.
9.8 Nelder-Mead simplex method∗

9.8.1 Description
Presented by Nelder & Mead (1965), this method is one the most widely used for unconstrained optimiza-
tion. The Nelder–Mead method attempts to minimize a scalar function f (x) for x ∈ Rn using only function
e e
values, without any derivative information. This algorithm, as explained by Lagarias et al. (1998), makes
use of four scalar parameters: coefficient of reflection (ρ), expansion (χ), contraction (γ), and shrinkage (σ).
According to the original Nelder–Mead paper, these parameters should satisfy
ρ > 0, χ > 1, χ > ρ, 0 < γ < 1, and 0 < σ < 1. (9.78)
The nearly universal choices used in the standard Nelder–Mead algorithm are
1 1
ρ = 1, χ = 2, γ< , and σ= . (9.79)
2 2
This method starts with a simplex of n + 1 vertices, each of which is a point in Rn . These vertices are
labeled as x(1) , x(2) , . . . , x(n−1) , such that
e e e
f (x(1) ) ≤ f (x(2) ) ≤ · · · ≤ f (x(n+1) ). (9.80)
e e e
111
Since f is to be minimized, x(1) is the best point and x(n+1) is the worst point. At each step in the iteration,
the current worst point x(n+1) is discarded, and another point is accepted into the simplex. This process
e e
e
continues until termination conditions are satisfied.
9.8.2 Algorithm
(1) (n+1)
Given an initial point x0 the algorithm conforms a simplex x0 , . . . , x0 by adding 5% of each compo-
e e e
nent x0i to x0 , and using these n vectors as elements of the simplex in addition to x0 . If x0i = 0 then it uses
e e
0.00025 as component i. Then the algorithm modifies the simplex repeatedly according to the following
algorithm.
Step 1. Order. Order the n + 1 vertices of the Simplex using (9.80).
Step 2. Reflect. Generate the reflected point
x(r) = x̄ + ρ(x̄ − x(n+1) ), (9.81)

e e e e
Pi=n (i) (r)
where x̄ = i=1 x /n defines the centroid of the n best points. Then, evaluate f (x ). If
f (x(1) ) ≤ f (x(r) ) < f (x(n) ) then accept the reflected point x(r) and terminate the iteration.
e e e
e e e e
(r) (1)
Step 3. Expand. If f (x ) < f (x ) then calculate the expansion point
e e
x(e) = x̄ + χ(x(r) − x̄), (9.82)
e e e e
and evaluate f (x(e) ). If f (x(e) ) < f (x(r) ) then accept x(e) and terminate the iteration; otherwise
accept x(r) and terminate the iteration.
e e e e
e
Step 4. Contract. If f (x(r) ) ≥ f (x(n) ), perform a contraction between x̄ and the better of x(n+1) and x(r) .
e e e e e
(a) Outside contraction. If f (x(n) ) ≤ f (x(r) ) < f (x(n+1) ), i.e., x(r) is strictly better than x(n+1) ,
e e e e e
then perform an outside contraction: calculate
x(c) = x̄ + γ(x(r) − x̄), (9.83)

e e e e
and evaluate f (x(c) ). If f (x(c) ) ≤ f (x(r) ) then accept x(c) and terminate the iteration; other-
e e e e
wise go to Step 5 (Shrink).
(b) Inside contraction. If f (x(r) ) ≥ f (x(n+1) ), then perform an inside contraction: calculate
e e
¯
x(cc) = x̄ + γ(x̄ − x(n+1) ), (9.84)
e e e e
and evaluate f (x(cc) ). If f (x(cc) ) < f (x(n+1) ) then accept x(cc) and terminate the iteration;
e e e e
otherwise go to Step 5 (Shrink).
112
Step 5. Shrink. Calculate the n points
v (i) = x(1) + σ(x(i) − x(1) )

e e e e
and calculate f (v (i) ), for i = 2, ..., n + 1. The vertices of simplex at the next iteration are
x(1) , v (2) , ..., v (n+1) .
e
e e e
This procedure has been implemented in M ATLAB’s optimization toolbox under the function fminsearch.
113
Exercises
1. Determine if the given direction at the point is that of descent for the following functions (show all
the calculations). In each case, determine the steepest descent direction.
• f (x) = 10(x2 − x21 )2 + (1 − x1 )2 ; d = (162, −40)T at x = (2, 2)T .

e e e
• f (x) = (x1 − 1)2 + (x2 − 2)2 + (x3 − 3)2 + (x4 − 4)2 ; d = (2, −2, 2, −2)T at x = (2, 1, 4, 3)T .
e e e
• f (x) = i=4 2 + 100(x 2 )2 ]; d = (−1, 1, 2, 1, −1)T at x = (−1, 1, 2, 1, −1)T .
P
i=1 [(1 − x i ) i+1 − x i
e e e
• f (x) = x1 sin(x2 x3 ); d = (1, −1, −1) at x = (1, 2, 1)T .
2 T
e e e
• f (x) = logx1 (x2 ); d = (−1, 2)T at x = (10, 10)T .
e e e
2. Consider the following functions, search directions, and current points:
• f (x) = 4x21 + x22 − 4x1 x2 ; d = (−1, 2)T at x = (−1, 0)T

e e e
• f (x) = x21 + x22 + x23 ; d = (−2, −4, 2)T at x = (1, 2, −1)T
e e e
• f (x) = (x1 − 1)2 + (x2 − 2)2 + (x3 − 3)2 + (x4 − 4)2 ; d = (−2, 2, −2, 2)T at x = (2, 1, 4, 3).
e e e
In each case, (a) derive the function of one variable (line search function) and (b) determine optimum
step size (show all calculations).
3. Consider the function f (x) = x1 x22 at the point x0 = (1, 2)T . (a) Obtain the expression for the
e e
quadratic approximation fQ and make a plot of f and fQ . (b) Analytically, obtain the minimum of
fQ . This is equivalent to one iteration of Newton’s method. (c) Improve this minimum by performing
a line search. This is equivalent to one iteration of a modified Newton’s method.
4. For the function f (x) = x1 x2 , determine the expression for f (α) along the line x1 = x2 and along
the line joining (0, 1)T to (1, 0)T .
e
5. Perform two iterations (by hand) using steepest descent, Fletcher-Reeves, Newtons’s, and BFGS for
the following functions:
• f (x) = (x1 + x2 − 6)2 + (2x1 + x2 − 5)2 at x0 = (0, 0)T .

e e
• f (x) = (x1 − 1)2 + 2x22 + 2x23 + 2x1 x2 + 2x2 x3 at x0 = (0, 0, 0)T .
e e
3
6. A cardboard box is to be designed to have a volume of 1 m . Determine the optimal values of length,
width, and height to minimize the amount of cardboard material. Hint: The problem can be formulated
as unconstrained in terms of two design variables.

f (x) = cos(x21 − 3x2 ) + sin(x21 + x22 )
e
114
and the initial point x0 = (0, 0)T . Perform three iterations using the trust region method and a linear
e
approximation of the function. Use h0 = 1 as your initial trust region size and ε = 0.0001 as
your termination criteria for the norm of the gradient. The gradient can be evaluated analytically or
numerically. In a contour plot show the initial point and the points obtained after every iteration.
8. An experiment yields the data in the following table. Find the least square best fit coefficients a, b,
x y
0 0
1 8
2 12
3 15
4 16
5 16
6 15
7 10
8 0
9 −10
and c if the assumed functional form is
(a) y = a + b exp(−x) + c exp(x),

(b) y = ax sin(b + cx).
Which best fit estimate has the smallest least square error? Plot your results and the data provided.
Compare your results with the ones obtained using the function lsqnonlin from the optimization
toolbox in M ATLAB.
9. Develop a M ATLAB function to solve unconstrained optimization problems using:
• Steepest descent method.

• Fletcher-Reeves.
• Polak-Ribiere.
• Modified Newton’s method (line search).
• BFGS
• DFP
• Trust region method with linear approximation.
Makes sure that your algorithm works for two and three dimensional problems.
115
Chapter 10
Numerical Analysis
The theory concerning convergence in multi-variate optimization follows the principles establishes in Chap-
ter 7. The main change is to convert absolute values of scalar quantities to norms of vectors. Since all
norms on Rn are equivalent, then the conditions that guarantee convergence of a sequence {xk } in one norm
e
suffices to guarantee convergence in any norm.
10.1 Convergence
A sequence of vectors
{xk }∞
k=0 = {x0 , x1 , . . . , x∞ }
e e e e
∗
is said to converge to the limit x if and only if the following criterion is satisfied: Given any ε > 0, there is
a (natural) number N such that for any number k > N , ||xk − x∗ || < ε.
e
e e
10.2 Fixed Point Iteration

A sequence of vectors can be obtained by the successive application of an iterative method such as
xk+1 = g (xk ). (10.1)

e ee
If this method is convergent, the sequence has a fixed point x∞ = x∗ where
e e
x∗ = g (x∗ ). (10.2)
e ee
In this case (10.1) is a fixed point iteration.
116
Example. Let g (x) = x − ∇f (x) and f (x) = 0.5(x1 − 2)2 + 0.5(x2 − 3)2 . Find the fixed points of g .
ee e e e e
!
x1 − 2
∇f (x) = .
e x2 − 3
The fixed points of g are the roots of g (x) − x = 0. This is

e ee e e
! ! ! !
x1 x1 − 2 x1 0
− − = .
x2 x2 − 3 x2 0
or x∗ = (2, 3)T .
e
10.3 Contraction mapping theorem

From Allen & Isaacson (1998), let Ω ⊂ Rn . A function g : Ω → Rn satisfies a Lipschitz condition on Ω
(with respect to the norm || · ||) if there is a constant L > 0e such that, for any two points x1 , x2 ∈ Ω
e e
||g (x1 ) − g (x2 )|| ≤ L||x1 − x2 ||. (10.3)
ee ee e e
The greatest lower bound for L is called the Lipschitz constant for g on Ω. If g has a Lipschitz constant
L < 1, then g is a contraction on Ω. Any function that satisfies a Lipschitz
e condition
e is continuous.
Let Ω ⊂eRn be open and convex, and let g ∈ C 1 in Ω containing x1 and x2 . The Mean Value Theorem
states that there is a point x on the line segmente joining x and x , i.e.,ex = αxe + (1 − α)x for some value
1 2 1 2
α ∈ [0, 1], such that
e e e e e e
||g (x1 ) − g (x2 )|| = ||Dg (x)(x1 − x2 )||

ee ee ee e e (10.4)
≤ ||Dg (x)|| ||x1 − x2 ||,
ee e e
where Dg (x) is the Jacobian of g (x). Comparing with (10.3), one observes that if
ee ee
||Dg (x)|| ≤ L < 1
ee
for all x ∈ Ω , then g is a contraction on Ω. Furthermore, g (x) has a unique fixed point x∗ . For each ε > 0
e e || · || such that
there is a vector norm ee e
ρ(Dg (x)) ≤ ||Dg (x)|| < ρ(Dg (x)) + ε, (10.5)

ee ee ee
where ρ(Dg (x)) is the spectral radius of Dg (x) defined as
ee ee
ρ(Dg (x)) = max |λ|
ee λ∈σDg
117
and σDg = {λ1 , . . . , λs } is the spectrum of Dg (x) containing all the eigenvalues of Dg (x). Hence if
ee ee
ρ(Dg (x)) < 1, (10.6)
ee
for all x ∈ Ω , then g is a contraction on Ω. If g is a contraction, for all x in some neighborhood of the fixed
point x∗ , then x∗ is esaid to be an attractive fixed
e e point and the iterationeexhibits local convergence. On the
other hand, if ρ(Dg (x)) > 1 for all x in some neighborhood of the fixed point x∗ , then x∗ is said to be a
e
e e
e and
repelling fixed point the iteration exhibits local divergence. Notice that for Eulerian norm and Dg (x)
symmetric, ee
||Dg (x)|| = ρ(Dg (x)).
ee ee
Example. Let g (x) = x − α∇f (x) and f (x) = (x1 − 2)2 − x1 x2 + x22 . Determine the condition under
e e e e
which α makesethe the fixed point(s) of g attractive.
e
!
−4 + 2x1 − x2
∇f (x) = .
e −x1 + 2x2
The fixed points of g are the roots of g (x) − x = 0. This is x∗ = (8/3, 4/3)T . The Jacobian of g is
e ee e e e e
!
1 − 2α α
Dg (x) =
ee α 1 − 2α
The eigenvalues of the Jacobian are λ1 = 1 − 3α and λ2 = 1 − α. Solving for max{|λ1 |, |λ2 |} < 1 yields
0 < α < 2/3.
10.4 Error analysis and order of convergence

Let p > 1. An iterative scheme that produces a sequence {xk } of approximations to a fixed point x∗ ∈ Rn
converges with order p if there exists a constant C and an integer N ≥ 0 such that
e e
x − xk+1 ≤ C x∗ − xk p
∗
(10.7)
e e e e
whenver k ≥ N . If p = 1, then 0 ≤ C < 1, and it is said that the iterative scheme converges linearly. If
p = 2, the schemes converges quadratically. If
∗
x − xk+1
lim e e p = C,
k→∞ x∗ − xk
e e
then C is referred to as the asymptotic error constant.
118
If there is a sequence {Ck } such that
lim Ck = 0
k→∞
and
∗
x − xk+1 ≤ Ck x∗ − xk ,

e e e e
then it is said that the sequence {xk } converges superlinearly to the fixed point x∗ .
e e
If the function g is a contraction on some interval, then the convergence is at least linear. Under some
circumstances one can construct iteration functions g for which successive substitutions converges with
order p ≥ 2. In a trivial case, if for all p > 0,
||xk+1 − x∗ ||
lim e = 0,
k→∞ ||xk − x∗ ||p
e
e e
then we say that the order of convergence is ∞. This is the case in which all the values of the sequence are
the same.
The order of convergence of an algorithm can be checked using a convergence plot. If the fixed point x∗
is known, one can compute the sequence of errors εk = ||x∗ − xk ||. From (10.7) one can observe that
e
e e
log εk+1 ≤ p log εk + log C. (10.8)
Then the order of convergence can be obtained from the slope of the curve log |εk+1 | versus log |εk |.
Example. Consider the following series
{ε} = {0.5, 0.25, 0.0625, 0.00391, 0.0000152}
which converges to 0. The order of convergence of that series can be determined from (10.8). Let us
determine the natural logarithm of the above series:
{log ε} = {−0.693147, −1.38629, −2.77259, −5.54422, −11.0942}.
Defining the following two series:
{log εk } = {−0.693147, −1.38629, −2.77259, −5.54422, }
and
{log εk+1 } = {−1.38629, −2.77259, −5.54422, −11.0942}
one determine the coefficients p and log C by a linear curve fit. In matrix form, this problem can be repre-
119
sented by    
−1.38629 −0.693147 1 !
   
−2.77259  −1.38629 1 p
−5.54422 =  −2.77259 .
   
   1 log C
−11.0942 −5.54422 1
! !
p 2.0012
= .
log C 0.0019
Consequently, the order of convergence is quadratic as p ≈ 2, and the asymptotic error constant is C ≈ 1.
Exercises
f (x) = 3(x21 + x22 ) + 4x1 x2 + 5x1 + 6x2 + 7.
e
Suppose we use a fixed step size gradient-based algorithm to find the minimum point of f :
xk+1 = xk − α∇f (xk ).

e e e
Find the largest range of values of α for which the algorithm is globally convergent.
3
f (x) = (x21 + x22 ) + (1 + a)x1 x2 − (x1 + x2 ) + b,
e 2
where a and b are some unknown real-valued parameters.
(a) Find the largest set of values of a and b such that the unique global minimum point of f exists.
Express the minimum point in terms of a and b.
(b) Consider the following algorithm:
2
xk+1 = xk − ∇f (xk ).
e e 5 e
Find the largest set of values of a and b for which the above algorithm converges to the global
minimum point of f for any initial point x0 .
e
120
Part IV
Constrained multivariate optimization
121
Chapter 11
Analytical elements

Most optimization problems in engineering are defined with functional and geometric constraints. A con-
strained optimization problem may be formulated as
min f (x)
x e
(11.1)
e
s.t. gi (x) ≤ 0, i = 1, . . . , r
e
hj (x) = 0, j = 1, . . . , m
e
where x ∈ Rn are the design variables, f : Rn → R is the objective function, gi : Rn → R are the inequality
constraints, hj : Rn → R are the equality constraints, and m ≤ n. In vector notation, the aforementioned
e
problem can be represented in the following standard form:
min f (x)
x e
(11.2)
e
s.t. g (x) ≤ 0
ee e
h(x) = 0,
e e e
where g : Rn → Rr and h : Rn → Rm . In some cases it is required to modify the formulation of the
e
problem
e so it can be expressed as in (11.1). Let us consider some typical cases (Belegundu & Chandrupatla,
1999):
(a) max f (x) is equivalent to min −f (x).

e e
(b) g(x) ≥ 0 is equivalent to −g(x) ≤ 0.
e e
(c) A “minimax” problem,
min max{f1 (x), . . . , fs (x)},
x e e
e
122
can be written as
min α
x,α
e
s.t. fi (x) ≤ α i = 1, . . . , s.
e
(d) Functions that depend on a parameter, for example 0 ≤ θ ≤ 1, can be handled by discretizing the
parameter as θi , where i = 1, . . . , d and d is the number of discrete points. For example, the inequality
constraint
max {g(x, θ)} ≤ 0,
0≤θ≤1 e
can be discretized as
max {g(x, θi )} ≤ 0,
1≤i≤d e
and, finally, expressed as
g(x, θi ) ≤ 0, i = 1, . . . , d.
e
(e) An objective function expressed in terms of absolute value,
min |f (x)|
x e
e
can be written as
min p+n
x,p,n
e
s.t. f (x) = p − n
p≥0
e
n≥0
(f) In regression analysis, one can minimize the error between given and predicted data. This error can be
defined with various norms. Consider
i=d
X
min f (x) = |ei (x)|,
x
e
e i=1 e
where ei (x) represents the error between the model and the data. This problem can be written as
e
Pi=d
min i=1 (pi + ni )
x,p,n
e e e
s.t. ei (x) = pi − ni
pi ≥ 0
e
ni ≥ 0.
123
(g) A minimax error in the objective function
min max{|fi (x)|},

x i e
e
can be written as
min f =α
x
e
s.t. fi (x) ≤ α
fi (x) ≥ −α.
e
e
Example. Express the following nonlinear unconstrained optimization problem as an LP problem.
min f (x) = |x1 − 4| + |x2 − 2|

x e
e
Using the preceding equivalences, this problem can be written as
min p1 + n1 + p2 + n2
x,p,n
e e e
s.t x1 − 4 − p 1 + n1 = 0
x2 − 2 − p 2 + n2 = 0
pi ≥ 0
ni ≥ 0 i = 1, 2.
11.2 First order necessary conditions

In constrained optimization, the necessary conditions for optimality can be obtained using the method of La-
grange multipliers. Even though this method can be used for any type of constrained problem, its application
is limited to regular points in the feasible space.
A point xk that satisfies all the constraints (feasible point) is a regular point if the gradients of all active
e
constraints are linearly independent. This condition is referred to as constraint qualification (Reklaitis et al.,
1983). A regular point is also called a Lagrange regular point. An active constraint in xk is the one which
e
has a value of zero. Notice that all equality constraints are active. Now, let us analyze the cases of equality
and inequality constraints separately.
124
11.2.1 Equality constraints
Consider the problem of one single equality constraint,
min f (x)
x e (11.3)
e
s.t. h(x) = 0.
e
Sometimes this problem can be converted to an unconstrained formulation by eliminating one of the design
variables using the equality constrained and replacing that into the objective function. For example
xn = xn (x1 , . . . , xi , . . . , xn−1 ). (11.4)
In some cases this is the right way to solve the problem, but in other cases eliminating a design variable
might be cumbersome or simply impossible. Joseph-Louis Lagrange (1736–1813) proposed a different
method that avoids this elimination and maintains all the physical meaning of the design variables.
Using the chain rule, a stationary point (minimum, maximum, or saddle point) requires that
df ∂f dx1 ∂f ∂f dxn
= + ··· + + ··· + = 0. (11.5)
dxi ∂x1 dxi ∂xi ∂xn dxi
for all xi and i = 1, . . . , n. In this way
∂f ∂f ∂f
df = dx1 + · · · + dxi + · · · + dxn = 0. (11.6)
∂x1 ∂xi ∂xn
Using the chain rule on h(x) = 0,

e
∂h ∂h ∂h
dh = dx1 + · · · + dxi + · · · + dxn = 0. (11.7)
∂x1 ∂xi ∂xn
Lagrange suggested to multiply (11.7) by a factor λ and then add it to (11.6). That results in
n
X ∂f ∂h
+λ dxi = 0. (11.8)
∂xi ∂xi
i=1
Assuming that xn will be eliminated according to (11.4), let us choose λ such that the term in dxn vanishes,
yielding
∂f ∂h
+λ = 0. (11.9)
∂xn ∂xn
Using this factor λ, (11.8) can be reduced to
n−1
X
∂f ∂h
+λ dxi = 0. (11.10)
∂xi ∂xi
i=1
125
Since every dxi in (11.10) is independent, then
∂f ∂h
+λ = 0, i = 1, . . . , n − 1. (11.11)
∂xi ∂xi
From (11.9) and (11.11), one concludes that each coefficient in (11.8) vanishes. This is
∂f ∂h
+λ = 0, i = 1, . . . , n, (11.12)
∂xi ∂xi
just like if every dxi was independent. The method of Lagrange multipliers can be described using the
following function
L(x, λ) = f (x) + λh(x). (11.13)
e e e
This function is known as the Lagrangian. The first order necessary conditions for optimality can be ex-
pressed in terms of the Lagrangian as
∂L(x∗ , λ∗ ) ∂f (x∗ , λ∗ ) ∂h(x∗ )

= + λ∗ = 0, i = 1, . . . , n (11.14)
∂xi ∂xi ∂xi
e e e
h(x∗ ) = 0. (11.15)
e
or, in vector notation,
∇x L(x∗ , λ∗ ) = ∇x f (x∗ ) + λ∗ ∇x h(x∗ ) = 0 (11.16)

e e e e e e e
∇λ L(x∗ ) = h(x∗ ) = 0. (11.17)
e e
Therefore, one has n + 1 equations to solve for x1 , . . . , xn , λ.
Example. Solve
min f (x) = (x1 − 1.5)2 + (x2 − 1.5)2
x e
e
s.t. h(x) = x1 + x2 = 0
e
One possible approach is to eliminate one of the design variables using the equality constraint. For
example, x2 = −x1 . Then this can be replaced into the objective function and solve the unconstrained
optimization problem. The second approach makes use of the Lagrangian
L(x1 , x2 , λ) = (x1 − 1.5)2 + (x2 − 1.5)2 + λ(x1 + x2 ),
and the necessary conditions for optimality

   
2(x1 − 1.5) + λ 0
2(x2 − 1.5) + λ = 0 .
   
x1 + x2 0
126
Solving the system of linear equations yields
x∗1
  

0
 ∗  
x2  = 0 .
λ∗ 3
This result can be proved in M ATLAB using the function fmincon as follows:
>> f=@(x)(x(1)-1.5)ˆ2+(x(2)-1.5)ˆ2.
>> [x,fval,exitflag,output,lambda] = fmincon(f,[1,1],[],[],[1,1],[0]);
>> x = 1.0e-007 * 0.3529 -0.3529
>> fval = 4.5000
>> lambda.eqlin = 3.0000
To understand the geometric meaning of the Lagrange multipliers, consider the Lagrangian (11.13) and
the optimality condition (11.16) expressed as
∇f + λ∇h = 0 (11.18)
e
or as
∇f = −λ∇h. (11.19)
Geometrically, the above expression means that in the stationary point the gradients of the objective function
and the equality constraints are aligned. Furthermore, the Lagrange multiplier λ is their scaling factor.
Now, let us consider a problem with multiple equality constraints,
min f (x)
x e
e
s.t. h1 (x) = 0
(11.20)
e
h2 (x) = 0
.. e
.
hm (x) = 0.
e
or, in vector form,
min f (x)
x e (11.21)
e
s.t. h(x) = 0.
e e e
In this case, the Lagrangian can be written as
m
X
L(x, λ) = f (x) + λj hj (x), (11.22)
e e e j=1 e
127
or, in vector form,
L(x, λ) = f (x) + λT h(x). (11.23)
e e e e e e
The first order necessary conditions for optimality are
m
∂L(x∗ , λ∗ ) ∂f (x∗ ) X ∗ ∂hj (x∗ )
e e = e + λj = 0, i = 1, . . . , n, (11.24)
∂xi ∂xi ∂xi
e
j=1
hj (x∗ ) = 0, j = 1, . . . , m, (11.25)
e
or, in vector form,
Df (x∗ , λ∗ ) + λ∗ T Dh(x∗ ) = 0T (11.26)

e e e e e e
h(x∗ ) = 0, (11.27)
e e e
where D represents the Jacobian operator with respect to x. The first condition is referred to as the optimality
e
condition and the second one as the feasibility condition.
11.2.2 Inequality constraints

In a problem with multiple inequality constraints,
min f (x)
x e
e
s.t. g1 (x) ≤ 0
g2 (x) ≤ 0
e (11.28)
.. e
.
gr (x) ≤ 0.
e
or, in vector form,
min f (x)
x e (11.29)
e
s.t. g (x) ≤ 0.
ee e
Let x satisfy g (x) ≤ 0 and J(x) be the index set of active inequality constraints, this is
e ee e e
J(x) = {j : gj (x) = 0, j = 1, . . . , r}. (11.30)
e e
The point x is a regular point if the vectors ∇gj (x), j ∈ J(x), are linearly independent.
e e e
For a regular point x, a search direction d is a feasible search direction if
e e
∇gj (x)T d ≤ 0, for all j ∈ J(x). (11.31)
e e e
128
This condition ensures that gj (x + αd) ≤ 0 for a sufficiently small, positive value α. Notice that the equality
condition in (11.31), ∇gj (x)T d = 0, is permissible only for linear constraints. For any point, a direction d
e e
e e e
is a descent direction if
∇f (x)T d < 0. (11.32)
e e
A direction d that is both feasible and descent exists only if the gradient of f is a linear combination of the
e
gradients of the active constraints gj . In other words,
X
∇f (x) = µj ∇gj (x), where µj ≥ 0. (11.33)
j∈J(x)
e e
e
At a stationary point it is not possible to find a direction that is both feasible and descent. This condition can
be also proved using Farkas Lemma (Belegundu & Chandrupatla, 1999).
Let us define the Lagrangian of (11.28) as
Xr
L(x, µ) = f (x) + µj gj (x), (11.34)
e e e j=1 e
or, in vector form,

L(x, µ) = f (x) + µT g (x). (11.35)
e e e e ee
The first order necessary conditions for optimality are
∂L(x∗ , µ∗ ) r
∂f (x∗ ) X ∗ ∂gj (x∗ )
e e = + µj = 0, i = 1, . . . , n, (11.36)
∂xi ∂xi ∂xi
e e
j=1
µ∗j gj (x∗ ) = 0, j = 1, . . . , r (11.37)
e
µ∗j ≥ 0, (11.38)
gi (x∗ ) ≤ 0, (11.39)
e
The first condition (11.36) is referred to as the optimality condition, the second condition (11.37) is the
complementary condition, the third condition (11.38) is the non-negativity, and the fourth (11.39) is the
feasibility condition. In vector form, these are
Df (x∗ ) + µ∗ T Dg (x∗ ) = 0T (11.40)

e e ∗ T e e∗ e
µ g (x ) = 0, (11.41)
e ee
µ ≥ 0, (11.42)
e
g (x∗ )
e
≤ 0. (11.43)
ee e
These conditions are the Karush-Kuhn-Tucker (KKT) conditions. A point satisfying these conditions is
called a KKT point.
129
Example. Solve
min f (x) = (x1 − 1)2 + (x2 − 1)2
x e
(11.44)
e
s.t. g1 (x) = −2x1 − x2 + 4 ≤ 0
g2 (x) = −x1 − 2x2 + 4 ≤ 0
e
e
The Lagrangian is
L = (x1 − 1)2 + (x2 − 1)2 + µ1 (−2x1 − x2 + 4) + µ2 (−x1 − 2x2 + 4)
and the necessary optimality conditions are
∂L
= 2x1 − 2 − 2µ1 − µ2 = 0 (11.45)
∂x1
∂L
= 2x2 − 2 − µ1 − 2µ2 = 0 (11.46)
∂x2
g1 = −2x1 − x2 + 4 ≤ 0 (11.47)
g2 = −x1 − 2x2 + 4 ≤ 0 (11.48)
µ1 g1 = µ1 (−2x1 − x2 + 4) = 0 (11.49)
µ2 g2 = µ2 (−x1 − 2x2 + 4) = 0 (11.50)
µ1 ≥ 0 (11.51)
µ2 ≥ 0. (11.52)
There are four possible cases:
Case 1. If µ1 = 0 and µ2 = 0, then x1 = 1 and x2 = 1, but g1 = 1 and g2 = 1 which is not feasible.
Case 2. If µ1 = 0 and g2 = 0, then x1 = 1.2, x2 = 1.4, and µ2 = 0.4; however, g1 = 0.2 which is not
feasible.
Case 3. If g1 = 0 and µ2 = 0, then x1 = 1.4, x2 = 1.2, and µ1 = 0.4; however, g2 = 0.2 which is not
feasible.
Case 4. If g1 = 0 and g2 = 0, then x1 = 4/3, x2 = 4/3, µ1 = 2/9 and µ2 = 2/9. The value of the function
is f = 2/9.
The problem can be solve graphically using M ATLAB in this way
>> [x1,x2]=meshgrid(0:0.05:2);
>> f=(x1-1).ˆ2+(x2-1).ˆ2;
>> g1=-2*x1-x2+4;
>> g2=-x1-2*x2+4;
>> contour(x1,x2,f,0:0.2:1)
>> axis equal; hold on;
130
>> contour(x1,x2,g1,[0 0])
>> contour(x1,x2,g2,[0 0])
The result can be obtained using fmincon. The test problem is
function f=tprob2(x)
f=(x(1)-1)ˆ2+(x(2)-1)ˆ2;
In the workspace,
>> options = optimset(’LargeScale’,’off’);
>> [x,fval,exitflag,output,lambda]=...
fmincon(@tprob2,[2;2],[-2,-1;-1,-2],[-4;-4],[],[],[],[],[],options);
Optimization terminated: first-order optimality measure less
than options.TolFun and maximum constraint violation is less
than options.TolCon.
Active inequalities (to within options.TolCon = 1e-006):
lower upper ineqlin ineqnonlin
1
2
>> x
x =
1.3333
1.3333
>> fval
fval =
0.2222
>> lambda.ineqlin
ans =
0.2222
0.2222
11.2.3 Equality and inequality constraints

For a problem
min f (x)
x e
(11.53)
e
s.t. hi (x) = 0, i = 1, . . . , m,
gj (x) ≤ 0,
e
j = 1, . . . , r,
e
the Lagrangian is
m
X Xr
L(x, λ, µ) = f (x) + λi hi (x) + µj gj (x), (11.54)
e e e e i=1 e j=1 e
or, in vector form,

L(x, λ, µ) = f (x) + λT h(x) + µT g (x). (11.55)
e e e e e e e e ee
131
Let x∗ be a regular point. This is h(x∗ ) = 0, g (x∗ ) ≤ 0, and ∇hi (x∗ ) for i = 1, . . . , m and ∇gj (x∗ ) for
j ∈ J(x∗ ) and are LI. Then the first order necessary
e e e e ee e e e
conditions or KKT conditions for optimality can be
e
expressed as
∂L(x∗ , λ∗ , µ∗ ) m r
∂f (x∗ ) X ∗ ∂hi (x∗ ) X ∗ ∂gj (x∗ )
e e e = e + λi e + µj = 0, e = 1, . . . , n (11.56)
∂xe ∂xe ∂xe ∂xe
e
i=1 j=1
hi (x∗ ) = 0, i = 1, . . . , m (11.57)
e
gj (x∗ ) ≤ 0, j = 1, . . . , r (11.58)
e
µ∗j gj (x∗ ) = 0, (11.59)
e
µ∗j ≥ 0, (11.60)
or, in vector form,
Df (x∗ ) + λ∗ T Dh(x∗ ) + µ∗ T Dg (x∗ ) = 0T , (11.61)

e e e e ee e
h(x∗ )
e
= 0, (11.62)
e e e
g (x∗ ) ≤ 0, (11.63)
ee e
µ∗ T g (x∗ ) = 0, (11.64)
e ee∗
µ ≥ 0. (11.65)
e e
Example. Solve
min f (x) = x21 + x22 + x23
x e
(11.66)
e
s.t. h(x) = x1 + x2 + x3 − 1 = 0
g(x) = x1 − x2 − 2 ≤ 0
e
e
The Lagrangian is
L = x21 + x22 + x23 + λ(x1 + x2 + x3 − 1) + µ(x1 − x2 − 2)
132
and the necessary optimality conditions are
∂L
= 2x1 + λ + µ = 0
∂x1
∂L
= 2x2 + λ − µ = 0
∂x2
∂L
= 2x3 + λ = 0
∂x3
h = x1 + x2 + x3 − 1 = 0
g = x1 − x2 − 2 ≤ 0
µg = µ(x1 − x2 − 2) = 0
µ ≥ 0.
There are two possible cases:
Case 1. If g = 0, then x1 = 4/3, x2 = −2/3, and x3 = 1/3, but µ = −2 which is not allowed.
Case 2. If µ = 0, then x1 = 1/3, x2 = 1/3, x3 = 1/3, and g = −2. This is the solution to the problem. In
this case, λ = −2/3 and f = 1/3.
11.3 Second order sufficient conditions

Let x∗ be a regular point and f , g and h be C 2 functions. Then, x∗ is a strict local minimum if there are
e e e
Lagrange multipliers such that e
• The KKT conditions (11.56) to (11.60) are satisfied.
• The Hessian
m
X r
X
∗ ∗ ∗ ∗
2 2
∇ L(x , λ , µ ) = ∇ f (x ) + λ∗i ∇2 hi (x∗ ) + µ∗j ∇2 gj (x∗ ) (11.67)
e e e e i=1 e j=1 e
is positive definite on a subspace of Rn as defined by the condition
dT ∇2 L(x∗ , λ∗ , µ∗ )d > 0, (11.68)

e e e e e
for all d ∈ Rn such that:
e
(a) ∇hi (x∗ )T d = 0, for i = 1, . . . , m, and
e e
(b) ∇gj (x∗ )T d = 0 for j ∈ J(x∗ ) and µj > 0.
e e e
133
In other words, the Lagrangian must be positive definite for all d lying in the constraint tangent hyperplane.
e
Example. Consider the optimization problem
min f (x) = x21 + x22 − 3x1 x2

x e (11.69)
e
s.t. g(x) = x21 + x22 − 6 ≤ 0.
e
Check for sufficient conditions for the candidate minimum points.
The points that satisfy KKT conditions are:
• x∗ = (0, 0)T , µ∗ = 0
√ √ T
e
• x∗ = ( 3, 3) , µ∗ = 12
√ √ T
e
• x∗ = (− 3, − 3) , µ∗ = 1
2
e
The Hessian matrix of the Lagrangian is
!
2 2 + 2µ −3
∇ L(x, µ) = .
e −3 2 + 2µ
For x∗ = (0, 0)T , µ∗ = 0, the constraint is inactive so the sufficient condition requires that dT ∇2 L(x∗ , µ∗ )d >
0 for all d. Notice that this is equivalent to the condition ∇2 f (x∗ ) is positive definite (unconstraint problem).
e e e e
This condition is not satisfied and x∗ is not a local minimum point.

e e
√ √ T √ √ T
For x∗ = ( 3, 3) , µ∗ = 12 and x∗ = (− 3, − 3) , µ∗ = 12 , the constraint is active. Observe that
e
∇2 L(x∗ , µ∗ ) is not definite positive, so we have to show that dT ∇2 L(x∗ , µ∗ )d > 0 for all ∇g(x∗ )T d, where
e e
e e e e e e
√ 1

∇g(x∗ ) = ±2 3 .
e 1
For d = (d1 , d2 )T , we obtain that

√

e d1
±2 3(1 1) =0
d2
or d1 + d2 = 0. Thus d1 = −d2 = α, where α 6= 0 is an arbitrary constant. Therefore, d = α(1, −1)T . The

e
sufficient condition gives
! !
2 ∗ ∗ 2
3 −3 1
T
d ∇ L(x , µ )d = α 1 −1 = 12α2 > 0.
e e e −3 3 −1
√ √ T √ √ T
Therefore, the points x∗ = ( 3, 3) and x∗ = (− 3, − 3) satisfy the sufficient conditions and they
e e
are local minima.
134
If the problem is convex, then necessary KKT conditions are also sufficient conditions for optimality
and the positive definiteness of the Lagrangian is not required. Let us discuss that in the next section.
11.4 Convexity
According to Weierstrass Theorem on the existence of a global minimum, if f (x) is continuous on a
e
nonempty feasible set Ω that is closed and bounded, then f (x) has a global minimum in Ω (Arora, 2004).
e
The KKT conditions and second order conditions dictate are used to determine strict local minimum points.
A stronger condition, such as the convexity of the problem, is used to determine whether the minimum point
is a global. Let us define this concept for multivariate constrained optimization.
A convex optimization problem is the one defined by a convex objective function in a convex feasible
space. If all the inequality constraints are convex and all the equality constraints are linear, then the feasible
space is convex. However, the feasible space can be convex even if the inequality constraints are not convex.
These constraints only need to be convex in the feasible space. The same applies to the objective function.
If the optimization problem is convex, then we have what is known as a convex programming problem
and the KKT conditions are sufficient conditions for optimality and any local minimum is also a global
minimum. On the other hand, several minima can be found in a non-convex space.
Example. Determine if the following optimization problem is convex,
min f (x) = 2x1 + 3x2 − x31 + 2x22

x e
e
s.t. g1 (x) ≡ x1 + 3x2 − 6 ≤ 0
g2 (x) ≡ 5x1 + 2x2 − 10 ≤ 0
e
g3 (x) ≡ −x1 ≤ 0
e
g4 (x) ≡ −x2 ≤ 0
e
e
The feasible space is convex since all the constraints are linear and, therefore, convex. Let us consider
now the Hessian of the objective function
!
−6x1 0
∇2 f (x) = .
e 0 4
Since this Hessian is positive semi-definite for all x1 ≥ 0 then, the function is convex in the feasible space
and the problem is also convex.
11.5 Postoptimality analysis∗

The study of variations in the optimum solution as some of the original problem parameters are changed
is known as postoptimality analysis or sensitivity analysis. The investigation of this question leads to a
135
physical interpretation of the Lagrange multipliers. The multipliers show the benefit of relaxing a constraint
or the penalty associated with tightening it.
11.5.1 Effect of a perturbation

Let us consider
min f (x, p)
x e
(11.70)
e
s.t. hi (x, p) ≤ 0, i = 1, . . . , m
e
gj (x, p) = 0, j = 1, . . . , r,
e
where x∗ = x∗ (p = p0 ). Assuming x∗ a regular point, then x∗ (p) continuously varies with p. Differentiating
e e e e
f (x, p) with respect to p,
e df (x, p) ∂f (x, p) ∂x ∂f (x, p)
= e+ e . (11.71)
dp ∂x ∂p ∂p
e e
e
Differentiating the active inequality constraints,
dg (x, p) ∂g (x, p) ∂x ∂g (x, p)

ee = ee e+ ee = 0, (11.72)
dp ∂x ∂p ∂p
e
and the equality constraints,
dh(x, p) ∂h(x, p) ∂x ∂h(x, p)

= e e e+ e e = 0. (11.73)
dp ∂x ∂p ∂p
e e
e
Using KKT conditions,
l
X X
∇f = − λi ∇hi − µj ∇gj , (11.74)
i=1 j∈J
which is
∂f ∂g ∂h
= −µT e − λT e . (11.75)
∂x e ∂x e ∂x
e e e
Substituting (11.75) into (11.71),
!
df ∂g ∂h ∂x ∂f
= −µT e − λT e e+ (11.76)
dp e ∂x e ∂x ∂p ∂p
e e
∂g ∂x ∂h ∂x ∂f
= −µT e e − λT e e + . (11.77)
e ∂x
e
∂p e ∂x ∂p
e
∂p
136
Using (11.72) and (11.73),
df ∂f ∂g ∂h
= + µT e + λ T e (11.78)
dp ∂p e ∂p e ∂p
l
∂f X ∂gj X ∂hi
= + µj + λi . (11.79)
∂p ∂p ∂p
j∈J i=1
11.5.2 Effect of changing constraint limits

Let us consider
min f (x)
x e
(11.80)
e
s.t. gi (x) ≤ ai , i = 1, . . . , m
e
hj (x) = bj , j = 1, . . . , l,
e
where ai and bj are small variations around 0. The new optimum design depends on the perturbation a and b,
this is, x∗ = x∗ (a, b). Also, the value of the objective function depends on these perturbations f = f (a, b).
e e
Substituting p = −ai in (11.78)

e e ee ee
df (0, 0)
e e = −µi , (11.81)
dai
and p = −bj ,
df (0, 0)
e e = −λi . (11.82)
dbj
Using Taylor series about (ai , bj ) = (0, 0),
∂f (0, 0) ∂f (0, 0)
f (ai , bj ) = f (0, 0) + ai + bj . (11.83)
∂ai ∂bj
Then,
∆f = f (ai , bj ) − f (0, 0) = −µi ai − λj bj . (11.84)
This is
X l
X
∆f = − µi a i − λj bj . (11.85)
i∈I j=1
Example. Consider the problem
min f (x) = x21 + x22 − 3x1 x2

x e (11.86)
e
s.t. g(x) = x21 + x22 − 6 ≤ 0
e
√
The solution is x∗1 = x∗2 = 3, µ = 1/2, and f ∗ = −3. Determine the change in the objective function
when a is added to the constraint.
137
From (11.85),
1
∆f = −µa = − a. (11.87)
2
If a = 1, for example, f ≈ −3.5. If a = −1, then f ≈ −2.5.
11.5.3 Effect of objective function scaling on Lagrange multipliers

Consider
min f (x)
x e (11.88)
e
s.t. g(x) ≤ 0
e
∂L ∂f ∂g
= +µ = 0, (11.89)
∂x ∂x ∂x
this is,
f0
µ=− . (11.90)
g0
Scaling by K and M ,
min fˆ = Kf (x)
x e (11.91)
e
s.t. ĝ = M g(x) ≤ 0.
e
fˆ0 K
µ̂ = − = µ. (11.92)
gˆ0 M
Exercises
1. Consider
min 2x1 + x2
x
e
s.t. x21 + x22 − 1 = 0.
Solve for the minimum point of f using optimality conditions.
2. A box with a square base and open top is to hold 50 cm3 . Find the dimensions that require the
least material (assume uniform thickness of material) to construct the box. Solve the problem using
optimality conditions.
3. A can is to be designed to hold 315 cm3 . You are asked to find the dimensions of the can that requires
the least material (of uniform thickness). Assume that the can is completely closed.
138
4. An engineering design problem is formulated as
min f (x) = x21 + 320x1 x2

x e
e
x1
s.t. 60x2 − 1 ≤ 0
1
1− 3600 x1 (x1 − x2 ) ≤ 0
x1 ≥ 0
x2 ≥ 0
Write KKT necessary conditions and solve for the candidate minimum designs. Verify the solutions
graphically. Interpret the KKT conditions on the graph for the problem.
139
Chapter 12
Linear Programming
The term linear programming was introduced by George Bernard Dantzig (1914–2005) in 1947 after World
War II (Dantzig, 1959). Programming or scheduling refers to the construction of a statement of actions which
will permit the system to move from a given status toward a defined objective. Linear programming (LP) is
a special area of the broader field of mathematical programming (MP). According to Dantzig (1959), “if the
system exhibits a structure which can be represented by a mathematical equivalent, called a mathematical
model, and if the objective can also be quantified, then some computational method may be evolved for
choosing the best schedule of actions among alternatives. Such use of mathematical models is termed
mathematical programming.”
When the problem can be represented by a system of linear equations or inequalities then it is a LP
problem. Any linear function f : Rn → R can be written as
n
X
f (x) = c1 x1 + c2 x2 + · · · + cn xn = ci xi = cT x,
e i=1 e e
where ci for i = 1, . . . , n are constants (Arora, 2004). In this way, a set of linear equality constraints can be
expressed as
a11 x1 + a12 x2 + · · · + a1n xn = b1 (12.1)

a21 x1 + a22 x2 + · · · + a2n xn = b2 (12.2)
..
. (12.3)
am1 x1 + am2 x2 + · · · + amn xn = bm , (12.4)
where aji and bj , for j = 1, . . . , m and i = 1, . . . , n, are constants. The standard form of the problem is
defined for bj ≥ 0 and xi ≥ 0. The set of linear constraints can be also expressed as
n
X
aji xi = bj , for j = 1, . . . , m,
i=1
140
or simply, Ax = b. Any linear inequality constraint can be expressed as a linear equality constraint with the
addition oresubtraction of a positive quantity. With this procedure, the condition bj ≥ 0 is maintained. Let
ee e
us review this in the next section.
12.1 Standard form

A LP problem can be formulated as minimizing a linear objective function, subject to nonnegative linear
equality constraints, with nonnegative design variables. This is referred to as the standard form of a LP
problem. Mathematically it can be expressed as
n
X
min f (x) = ci xi (12.5)
x
e
e i=1
n
X
s.t. aji xi = bj , j = 1, . . . , m (12.6)
i=1
xi ≥ 0, i = 1, . . . , n, (12.7)
where bj ≥ 0. In matrix notation,

min f (x) = cT x
x e e e
e
s.t. Ax = b (12.8)
e e e
x ≥ 0,
e
e e
where c ∈ Rn , x ∈ Rn , A ∈ Rm×n , b ∈ Rm , and b ≥ 0.
e e e e e e
Any less-or-equal (LE)
e inequality constraint can be converted into an equality constraint by adding a
positive slack variable. For example,
aj1 x1 + aj2 x2 + · · · + ajn xn ≤ bj
can be transformed into

aj1 x1 + aj2 x2 + · · · + ajn xn + xn+1 = bj
where xn+1 > 0 is the slack variable. In the same way, any greater-or-equal (GE) inequality constraint can
be converted into an equality constraint by subtracting a positive surplus variable. For example
aj1 x1 + aj2 x2 + · · · + ajn xn ≥ bj
can be transformed into

aj1 x1 + aj2 x2 + · · · + ajn xn − xn+1 = bj
where xn+1 > 0 is the surplus variable. Now, an unrestricted design variable (i.e., with no restriction in
sign), can be expressed in standard form as the difference between two nonnegative variables. For example,
141
if xn is unrestricted, then one can expressed as
xn = xn+1 − xn+2
where xn+1 ≥ 0 and xn+2 ≥ 0. Let us see this procedure in the following example.
Example. Convert the following LP problem to standard form.
min f (x) = 2x1 + 5x2

x e
e
s.t. 3x1 + 2x2 ≤ 12
2x1 + 2x2 ≥ 6
x1 ≥ 0
x2 is unrestricted.
1. Since x2 is unrestricted, then x2 = x3 − x4 , where x3 ≥ 0 and x4 ≥ 0. Substituting,
min f (x) = 2x1 + 5x3 − 5x4

x e
e
s.t. 3x1 + 2x3 − 2x4 ≤ 12
2x1 + 2x3 − 2x4 ≥ 6
x1 ≥ 0
x3 ≥ 0
x4 ≥ 0.
2. The inequality constraints are converted into equality constraints by adding a positive slack variable
x5 and subtracting a positive surplus variable x6 . This is
min f (x) = 2x1 + 5x3 − 5x4

x e
e
s.t. 3x1 + 2x3 − 2x4 + x5 = 12
2x1 + 2x3 − 2x4 − x6 = 6
x1 ≥ 0
x3 ≥ 0
x4 ≥ 0
x5 ≥ 0
x6 ≥ 0.
Since all functions are linear the problem is convex. Therefore, if the problem has a local minimum, this
minimum is also a global minimum. However, the solution of a LP problem is only guaranteed when the
feasible space is bounded. But this solution might not be unique.
142
12.2 Basic solutions
The feasible space of a LP problem is defined by the set geometric constraints x ≥ 0 and the set of functional
e e
constraints Ax = b. Assuming that the geometric constraints are satisfied, a solution can be obtained by
ee e
reducing theesystem of functional constraints to some canonical with pivotal variables x1 , x2 , . . . , xm . This
is
x1 + â1,m+1 xm+1 + â1,m+2 xm+2 + · · · + â1,m xn = b̂1

x2 + â2,m+1 xm+1 + â2,m+2 xm+2 + · · · + â2,m xn = b̂2
.. .. .. .. .. (12.9)
. . . . .
xm + âm,m+1 xm+1 + âm,m+2 xm+2 + · · · + âm,n xn = b̂m ,
where âij and b̂j are the reduced coefficients resulting after pivoting. The m pivotal variables are referred
to as basic variables while the remaining n − m variables are named nonbasic variables. In matrix form,
(12.9) can be written as
I xB + ÂxN = b̂, (12.10)
e
ee e
ee e
where I is the identity matrix of size m × m, xB contains the m basic variables, and xN contains the n − m
e
e variables. Solving for xB yields e e
nonbasic
e
xB = b̂ − ÂxN . (12.11)
e e e ee
Thus, the basic variables xB are dependent variables while the nonbasic variables xN are independent. One
e e
particular solution is obtained when the nonbasic variables are all zero, xN = 0. In this case the basic
e e
variables can be expressed as
xB = b̂. (12.12)
e e
This solution is referred to as a basic solution. Since basic and nonbasic variables can be exchanged, several
basic solutions can be obtained. The number of basic solution is given by

n n n!
= = . (12.13)
m n−m m!(n − m)!
It can be proven that the optimum solution is one of the basic solutions.
Example. Determine all the basic solution of the following problem:
min f (x) = 2x1 + 5x3 − 5x4

x e
e
s.t. 3x1 + 2x3 − 2x4 + x5 = 12
2x1 + 2x3 − 2x4 − x6 = 6
xi ≥ 0 i = 1, 3, 4, 5, 6.
143
As observed, n = 5 and m = 2 therefore the number of basic solutions is 10. The following table shows
all the basic solutions to this problem.
Table 12.1: Basic solutions

Num. x1 x3 x4 x5 x6 f Location
1 0 0 0 12 −6 0 Unfeasible
2 0 0 −6 0 6 30 Unfeasible
3 0 6 0 0 6 30 Feasible
4 4 0 0 0 2 8 Feasible
5 0 0 −3 6 0 15 Unfeasible
6 0 3 0 6 0 15 Feasible
7 3 0 0 3 0 6 Feasible
8 0 ? ? 0 0 ? ?
9 6 0 3 0 0 −3 Feasible
10 6 −3 0 0 0 −3 Unfeasible
The solution is the basic solution number 9, x1 = 6 y x2 = −3, with f = −3.
If a one or more of the basic variables in a basic solution has value of zero, that solution is said to be a
degenerate basic solution (Luenberger & Ye, 2008).
12.3 The Simplex method

The Simplex method proposed by Dantzig looks to find the minimum among all the basic solutions by
iteratively reducing the value of the objective function. In his book, Dantzig (1959) distinguishes between
the simplex method, which starts with a LP problem in standard form, and the simplex algorithm, which
starts with a canonical form and consists of a sequence of pivot operations. The simplex algorithm forms
the main subroutine of the simplex method.
The first step of the simplex method is to express the LP problem in standard form as in (12.8).
This step involves the introduction of slack, surplus, and artificial variables. The resulting problem is
in canonical form and simplex algorithm can be employed. The LP optimization problem consists of finding
x1 , x2 , . . . , xn by satisfying the simultaneous system of equations
a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bm ,
and minimizing an objective of the form
c1 x1 + c2 x2 + · · · + cn xn = f,
144
where all xi are restricted to be nonnegative, i.e.,
xi ≥ 0 for i = 1, . . . , n.
Initially, the objective function is expressed in terms of nonbasic variables. Therefore, for the initial
point, the objective function will have a zero value. In LP problems with LE constraints only, one is required
to add slack varibles. In this case the auxiliary problem is canonical with respect to the slack variables.
Therefore, they will form the first set of basic variables. The steps involved in the simplex algorithm include
the following:
Step 1. Initial basic solution. Create an initial table or initial tableau with m columns and n rows and
the corresponding coefficients of A. Name each column with the corresponding variables xi for
e
i = i, . . . , n. Add a column with the
e coefficients of b and add an extra row with the coefficients of
e
c. Select m basic variables and add an extra column with their names. Naturally, these correspond
e
to the slack variables.
Step 2. Canonical form. Perform row operations in order to convert this problem into the Canonical form as
in (12.9). The m variables correspond to the basic variables. The remaining n − m are the nonbasic
variables. In problems with LE constraints, the initial tableau is already in canonical form.
Step 3. Test for optimality. Scan the relative cost factors ĉ which should have nonzero entries only in the
e
nonbasic columns. If all nonzero entries are positive, then the current solution is an optimum. The
reason behind that is that every nonzero coefficient ĉi is multiplying a nonbasic variable (which is
set zero). If ĉi is negative, that means that xi can be a basic variable so its values will be larger than
zero and, consequently, reduce the value of the function. If ĉi is positive, then the best scenario is
that xi remains nonbasic or zero.
Step 4. Choice of nonbasic variable to become basic. If ĉi < 0, then xi should become a basic variable.
If there are multiple negative ĉi values, then one should select the minimum value so the function
reduces faster.
Step 5. Choice of basic variable to become nonbasic. Select a basic variable to become nonbasic. For this
purpose, let us evaluate bj /aji for j = 1, . . . , m and i the index of the nonbasic variable to become
basic. The basic variable xj to become nonbasic corresponds to the one with the minimum value
of bj /aji for aji > 0. This condition prevents that the remaining basic variables reach a negative
value. Section 12.4.3 explains this condition in detail.
Step 6. With the new set of basic variables, go to Step 2 and create the next tableau.
Let us illustrate this method with the following example.
145
Example. Solve the following LP problem:
max 2x1 + x2
x
e
s.t. 2x1 − x2 ≤ 8
x1 + 2x2 ≤ 14
−x1 + x2 ≤ 4
x1 ≥ 0, x2 ≥ 0.
The simplex method requires a LP problem in standard form. Using appropriate slack variables, the
auxiliary problem can be written as
min −2x1 − x2
x
e
s.t. 2x1 − x2 + x3 = 8
x1 + 2x2 + x4 = 14
−x1 + x2 + x5 = 4
x1 ≥ 0, x2 ≥ 0
x3 ≥ 0, x4 ≥ 0, x5 ≥ 0.
Now, let us apply the simplex algorithm.
Step 1. Initial basic solution. The initial basic solution is given by x1 = 0 and x2 = 0 which are the
variables in the objective function. This solution can be written as a tableau as expressed by Table
12.2. In this initial basic solution, x1 = 0, x2 = 0, x3 = 8, x4 = 14, x5 = 5, and f = 0.
Table 12.2: Initial tableau

xB x1 x2 x3 x4 x5 bj bj /aji
x
e
3 2 −1 1 0 0 8
x4 1 2 0 1 0 14
x5 −1 1 0 0 1 4
−f −2 −1 0 0 0 0
Step 2. Canonical form. The problem is already in Canonical form.
Step 3. Test for optimality. We scan the row of the objective function, which should have nonzero entries
only in the nonbasic columns, i.e., x1 and x2 . If all the nonzero entries are positive, then we have
an optimum solution because the objective function cannot be reduced any further and the Simplex
method is terminated. In this problem we have negative entries in the nonbasic columns, then the
solution is not an optimum and the objective function can be reduced.
Step 4. Choice of nonbasic variable to become basic. Two nonbasic columns have negative numbers. By
convention, one selects the most negative value. In this case, this value corresponds to x1 so this
146
nonbasic variable will become basic. See Table 12.3.

xB x1 x2 x3 x4 x5 bj bj /aj1
x3 2 −1 1 0 0 8
e
x4 1 2 0 1 0 14
x5 −1 1 0 0 1 4
−f −2 −1 0 0 0 0
Step 5. Choice of basic variable to become nonbasic. To identify which basic variable should become non-
basic, we compare the ratios bi /ai1 for all ai1 > 0. The row having the smallest ratio corresponds
to one of the current basic variable that will become nonbasic. In this case, x3 . Then we say that the
new pivot will be a11 . See Table 12.4.

xB x1 x2 x3 x4 x5 bj bj /aj1
e
x3 2 −1 1 0 0 8 8/2 = 4
x4 1 2 0 1 0 14 14/1 = 14
x5 −1 1 0 0 1 4
−f −2 −1 0 0 0 0
Step 6. Pivoting and second tableau. Performing row operations about the pivot, we obtain the second
tableau. The new canonical form is written in terms of the new basic variables x1 , x4 and x5 . The
cost function will be 0 = f + 8 or f = −8. See Table 12.5.
Table 12.5: Second tableau

xB x1 x2 x3 x4 x5 bi bj /aji
x1 1 −1/2 1/2 0 0 4
e
x4 0 5/2 −1/2 1 0 10
x5 0 1/2 1/2 0 1 8
−f 0 −2 1 0 0 8
Step 7. Variables in second tableau. At this point, the algorithm returns to Step 3. Following that procedure,
let us identify the basic and nonbasic variables to be exchanged and the new pivot. See Table 12.6.
Step 8. Pivoting and third tableau. Using row operations, we obtain the third tableau. See Table 12.7.
The optimality, according to Step 3, is satisfied. The objective has nonzero entries only in the
nonbasic columns, i.e., x3 and x4 , and these entries are positive. Therefore, the optimal solution to
this problem is x1 = 6, x2 = 4, x3 = 0, x4 = 0, x5 = 6 and f = −16.
147
Table 12.6: Second tableau: New pivot
xB x1 x2 x3 x4 x5 bi bj /aj2
x
e
1 1 −1/2 1/2 0 0 4
x4 0 5/2 −1/2 1 0 10 20/5 = 4
x5 0 1/2 1/2 0 1 8 16/1 = 16
−f 0 −2 1 0 0 8
Table 12.7: Third tableau

xB x1 x2 x3 x4 x5 bj bj /aji
x
e
1 1 0 2/5 1/5 0 6
x2 0 1 −1/5 2/5 0 4
x5 0 0 3/5 −1/5 1 6
−f 0 0 3/5 4/5 0 16
12.4 Derivation of the Simplex method

The basic steps of the Simplex method were illustrated with an example problem in Section 12.3. The
underlying principles for these steps are presented by Dantzig (1959) and summarized in this section.
12.4.1 Basic solution

When the problem involves LE inequality constraints, the slack variables are selected as basic and the real
design variables are selected as nonbasic, i.e., equal to zero. That ensures a feasible solution in which the
objective function, already expressed in terms of the nonbasic variables, is also zero. Dealing with GE
inequality constraints and equality constraints is described in Section 12.5.
12.4.2 Choice of nonbasic variable to become basic

If at least one relative cost factor cj in the canonical form (or tableau) is negative, it is possible, assuming
non-degeneracy (all bi > 0), to construct a new basic feasible solution that reduces the value of the objective
function. To illustrate that, let us express the canonical system as
x1 + a1,m+1 xm+1 + a1,m+2 xm+2 + · · · + a1,n xn = b1

x2 + a2,m+1 xm+1 + a2,m+2 xm+2 + · · · + a2,n xn = b2
.. (12.14)
.
xm + am,m+1 xm+1 + am,m+2 xm+2 + · · · + am,n xn = bm .
and the objective function, which is written only in terms of nonbasic variables,
cm+1 xm+1 + cm+2 xm+2 + · · · + cn xn = f (12.15)
148
since all the nonbasic variables are zero. If all the function coefficients cj are positive, then there is no
possible improvement since all design variables have to be nonnegative. However, a lower value of f can
be obtained by increasing the value of one of the nonbasic variables xJ and adjusting the values of the basic
variables accordingly, where xJ is any nonbasic variable whose relative factor cJ is negative. In particular,
the index J can be selected as the minimum negative value. That practice usually leads to fewer iterations.
12.4.3 Choice of basic variable to become nonbasic

Assuming cJ is negative, the value of xJ should be made as large as possible in order to minimize the
objective function. The only consideration that prevents the setting of xJ being infinity large is the fact that
one of the basic variables will become negative. Let us illustrate that by translating all the terms containing
xJ to the right hand side of canonical system,
x1 + a1,m+1 xm+1 + a1,m+2 xm+2 + · · · + a1,n xn = b1 − a1,J xJ

x2 + a2,m+1 xm+1 + a2,m+2 xm+2 + · · · + a2,n xn = b2 − a2,J xJ
.. (12.16)
.
xm + am,m+1 xm+1 + am,m+2 xm+2 + · · · + am,n xn = bm − amJ xJ .
Since xJ will be a basic variable its value will be, in a non-degerate case, greater than zero. However, its
maximum value will be limited by the condition bi − ai,J xJ ≥ 0 for all i = 1, . . . , m. Notice that if all
ai,J ≤ 0 then xJ is unbounded and can be arbitrarily large. On the other hand, if at least one ai,J > 0 then
the maximum value of xJ will be bi /ai,J . Any larger value will lead to xi < 0 for some i. If more than one
aiJ is positive then the smallest ratio bi /ai,J will determine the maximum value of xJ . This is

bI bi
xJ = = min : ai,J > 0, i = 1, . . . , m . (12.17)
aI,J ai,J
In this case, xI = 0 and will be a nonbasic variable. Notice that if the basic solution is degenerate (bI = 0)
then no improvement in the objective function will be possible.
12.4.4 Lagrange multipliers

The Lagrange multipliers can be recovered from the coefficients of the corresponding slack variables in the
objective function of the final tableau. In the previous example, the three Lagrangian multipliers associates
with the inequality constraints are given by µ1 = 3/5, µ2 = 4/5, and µ3 = 0.
12.5 The two phases of the Simplex method

The LP standard form for the Simplex involves only equality constraints. However, the basic methodology
presented in Section 12.3 only works for LE inequality constraints by adding slack variables. For a GE
149
inequality constraint, this methodology will give an unfeasible solution, i.e., negative surplus variable. For
equality constraints, this method would be inconsistent, e.g., zero equals to a nonzero number. To avoid
these difficulties, Dantzig (1959) stated the two phases of its Simplex method.
In the first phase, artificial variables are added to the inconsistent equations and an artificial objective
function (expressed only in terms of artificial variables) is solved. Artificial variables differ from slack
or surplus variables since they do not have physical meaning. At the end of this phase, the artificial cost
function should be zero, otherwise the problem does not have a feasible solution.
The second phase consists on the solution of the original objective function. In more detail, the steps
involved the the two phases can be described as follows:
Step 1. Standard LP form: Arrange the original system of equations so that all constant terms bj are non-
negative.
Step 2. Add artificial variables: Augment the system to include a basic set of artificial variables, xn+1 , . . . , xn+m .
In practice, artificial variables are only required for GE and equality (EQ) constraints.
Step 3. Phase I: Use the simplex algorithm to find a solution to the set of linear equations (equality con-
straints) that minimizes the sum all artificial variables denoted by w and given by
xn+1 + xn+2 + · · · + xn+m = w. (12.18)
Equation (12.18) is called the unfeasibility form (Dantzig, 1959) or artificial cost or artificial ob-
jective function. After pivoting, the coefficients of (12.18) will be referred to as reduced artificial
cost coefficients. If min w > 0, no feasible solution exists and the procedure is terminated. On the
other hand, if min w = 0, Phase II initiates by: (i) dropping from further consideration all non-basic
variables xi whose corresponding reduced artificial cost coefficients are positive (not zero), and (ii)
replacing the linear form w by the linear form f .
Step 4. Phase II: Apply the simplex algorithm to the adjusted feasible canonical form to obtain a solution
that minimizes f .
To illustrate this method let us consider the following example.
Example. Solve the following LP problem:
min −x1 − x2 − 2x3

x
e
s.t. 2x1 + x2 + 2x3 ≤ 8
x1 + x2 + x3 ≥ 2
−x1 + x2 + 2x3 = 1
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
150
Step 1. Standard LP form. Adding slack variable x4 and surplus variable x5 , this problem can be written in
standard form as
min −x1 − x2 − 2x3
x
e
s.t 2x1 + x2 + 2x3 + x4 = 8
x1 + x2 + x3 − x5 = 2
−x1 + x2 + 2x3 = 1
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0
x4 ≥ 0, x5 ≥ 0
Step 2. Add artificial variables. Artificial variables x6 and x7 are added to GE and EQ constraints, respec-
tively.
min −x1 − x2 − 2x3
x
e
s.t. 2x1 + x2 + 2x3 + x4 = 8
x1 + x2 + x3 − x5 + x6 = 2
−x1 + x2 + 2x3 + x7 = 1
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0
x4 ≥ 0, x5 ≥ 0
x6 ≥ 0, x7 ≥ 0.
Step 3. Phase I. The artificial cost function can be stated as
x6 + x7 = w.
Initial tableau. This problem can be written as a tableau as in the first sub-table of Table 12.8. The
three basic variables in this problem are the slack variable x4 and the artificial variables x6 and x7 .
However, the objective needs to be expressed in terms of nonbasic variables only.
Canonical form. Using row operation, one obtains the canonical form of the system and, as a result,
the objective is expressed only in terms of nonbasic variables. See second sub-table in Table 12.8.
Pivoting. As before, the pivots are selected using the minimum negative function coefficient cj
and the minimum positive ratio bi /aij . This process continues until all coefficient in the objective
function are positive. When this condition is accomplished and the problem has a feasible solution,
the value of the artificial objective is zero. See Table 12.8.
Step 1. Phase II. The original problem can now be solved.

Initial tableau. The original objective function takes the place of the artificial objective. This forms
the initial tableau of the second phase. See first sub-table in Table 12.9.
Canonical form. The objective function has to be expressed in terms of nonbasic variables only.
This is done using row operations in the initial tableau. See second sub-table in Table 12.9.
151
Table 12.8: Phase I
xB x1 x2 x3 x4 x5 x6 x7 bj bj /aji
ex 2 1 2 1 0 0 0 8
4
x6 1 1 1 0 −1 1 0 2
x7 −1 1 2 0 0 0 1 1
−f −1 −1 −2 0 0 0 0 0
−w 0 0 0 0 0 1 1 0
x4 2 1 2 1 0 0 0 8 4
x6 1 1 1 0 −1 1 0 2 2
x7 −1 1 2 0 0 0 1 1 1/2
−f −1 −1 −2 0 0 0 0 0
−w 0 −2 −3 0 1 0 0 −3
x4 3 0 0 1 0 0 −1 7 7/3
x6 3/2 1/2 0 0 −1 1 −1/2 3/2 1
x3 −1/2 1/2 1 0 0 0 1/2 1/2
−f −2 0 0 0 0 0 1 1
−w −3/2 −1/2 0 0 1 0 3/2 −3/2
x4 0 −1 0 1 2 −2 0 4
x1 1 1/3 0 0 −2/3 2/3 −1/3 1
x3 0 2/3 1 0 −1/3 1/3 1/3 1
−f 0 2/3 0 0 −4/5 4/3 1/3 3
−w 0 0 0 0 0 1 1 0
Pivoting. The pivots are selecting in the usual way and the problem is solved. However, the arti-
ficial design variables should remain as nonbasic variables during the second phase. The artificial
variables are used to expand the dimension of the design domain and make the problem feasible in
the first phase, but then they become nonbasic reducing the dimension of the space to the original
(standard) size. Therefore, these variables should not be used as pivots in the second phase. The
solution for this problem is found at x1 = 7/3, x2 = 0, x3 = 5/3, and f = −17/3.
12.6 The Big M method

The Big M method is a penalization approach that combines the two phases of the Simplex method. In
this method, the objective function is augmented by adding all the artificial variables multiplied by a large
constant M . This is
P
min F (x) = f (x) + M a xa (12.19)
x e e
e
where xa represent the artificial variables. Since these variables are basic, they need to be eliminated from
the objective function. To illustrate this method, let us consider the following example.
152
Table 12.9: Phase II
xB x1 x2 x3 x4 x5 x6 x7 bj bj /aji
x4 0 −1 0 1 2 −2 0 4 2
e
x1 1 1/3 0 0 −2/3 2/3 −1/3 1
x3 0 2/3 1 0 −1/3 1/3 1/3 1
−f 0 2/3 0 0 −4/5 4/3 1/3 3
x5 0 −1/2 0 1/2 1 −1 0 2
x1 1 0 0 1/3 0 0 −1/3 7/3
x3 0 1/2 1 1/6 0 0 1/3 5/3
−f 0 0 0 2/3 0 0 1/3 17/3
Example. Solve the LP problem from the previous example using the Big M method,
min f (x) = −x1 − x2 − 2x3

x e
e
s.t. 2x1 + x2 + 2x3 ≤ 8
x1 + x2 + x3 ≥ 2
−x1 + x2 + 2x3 = 1
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
Step 1. Penalized standard LP problem. The standard LP problem with penalized objective can be written
as
min F (x) = −x1 − x2 − 2x3 + M (x6 + x7 )
x e
e
s.t. 2x1 + x2 + 2x3 + x4 = 8
x1 + x2 + x3 − x5 + x6 = 2 (12.20)
−x1 + x2 + 2x3 + x7 = 1
xi ≥ 0 i = 1, . . . , 7.
Step 2. Solve using Simplex. Let us take M a large enough value for M . If M is too small the result might
be inconsistent. If it is too large, rounding off will result in large errors. As a rule of thumb, let M
be ten times the sum of the absolute values of the objective function coefficients, this is M = 40.
Then, let us solve the LP problem as shown in Table 12.10. Notice that the number of iterations is
not decreased.
12.7 Duality
Every linear programming problem has a corresponding dual linear programming problem. This dual prob-
lem is constructed with the cost and constraint coefficients of the initial or primal problem. However, the
solution of the dual problem can be obtained from the solution of the primal and vice versa. In certain
occasions it might be simpler to solve the dual formulation. Duality can be used to improve the performance
153
Table 12.10: Big M method
xB x1 x2 x3 x4 x5 x6 x7 bj
ex 2 1 2 1 0 0 0 8
4
x6 1 1 1 0 −1 1 0 2
x7 −1 1 2 0 0 0 1 1
−F −1 −1 −2 0 0 40 40 0
x4 2 1 2 1 0 0 0 8
x6 1 1 1 0 −1 1 0 2
x7 −1 1 2 0 0 0 1 1
−F −1 −81 −122 0 40 0 0 −120
x4 3 0 0 1 0 0 −1 7
x6 3/2 1/2 0 0 −1 1 −1/2 3/2
x3 −1/2 1/2 1 0 0 0 1/2 1/2
−F −62 −20 0 0 40 0 61 −59
x4 0 −1 0 1 2 −2 0 4
x1 1 1/3 0 0 −2/3 2/3 −1/3 1
x3 0 2/3 1 0 −1/3 1/3 1/3 1
−F 0 2/3 0 0 −4/3 124/3 123/3 3
x5 0 −1/2 0 1/2 1 −1 0 2
x1 1 0 0 1/3 0 0 −1/3 7/3
x3 0 1/2 1 1/6 0 0 1/3 7/6
−F 0 0 0 2/3 0 40 123/3 17/3
of the simplex algorithm, and also to develop non-simplex methods to solve LP problems. Examples of
non-simplex methods include Khachiyan’s algorithm and Karmarkar’s algorithm. These algorithms are not
presented in this notes. For in-depth discussion of non-simplex methods and advanced aspects to duality,
see Luenberger (1989).
Consider a primal LP problem of the form
min cT x
x e e
e
s.t. Ax ≥ b
e
ee e
x ≥ 0.
e e
Then the corresponding dual LP problem is defined as
max λT b
λ e e
e
s.t. λT A ≤ cT
e ee e
λ ≥ 0,
e e
154
where λ ∈ Rm is the vector of dual variables. This form of duality is referred to as symmetric duality form.
e
Now, consider a primal LP problem of the form
min cT x
x e e
e
s.t. Ax = b
e
ee e
x ≥ 0.
e e
Notice that Ax = b can be expressed as
e
ee e
Ax ≥ b
e
ee e
−Ax ≥ −b.
e
ee e
Then the primal problem can be written as
min cT x
x e e
e   !
A b
s.t.  e e x ≥
−A e −b
e
e e
x ≥ 0.
e
e e
This problem is in the form of the primal problem in the symmetric form of duality. The corresponding dual
is

T Tb
max (u v ) e
u,v
e e
e e −b
 e 
A
s.t. (uT v T )  ee ≤c
T
e e −A e
e
u ≥ 0, v ≥ 0.
e
e e e e
After a simple manipulation, the above can be expressed as
max (u − v )T b
u,v e e e
e e
s.t. (u − v )T A ≤ cT
e e e e e
u ≥ 0, v ≥ 0.
e e e e
155
Let λ = u − v . Then the dual problem becomes
e e e
max λT b
λ e e
e
s.t. λT A ≤ cT .
e ee e
Notice that the vector of dual variables λ is unrestricted in sign. The above form of duality is referred to as
e
the asymmetric form of duality.
Example. Consider the following LP problem:
max 2x1 + 5x2 + x3

x
e
s.t. 2x1 − x2 + 7x3 ≤ 6
3x1 + 6x2 + x3 ≤ 3
x1 , x2 , x3 ≥ 0.
The dual form to this problem is
min 6λ1 + 3λ2

λ
e
s.t. 2λ1 + 3λ2 ≥ 2
−λ1 + 6λ2 ≥ 5
7λ1 + λ2 ≥ 1
λ1 , λ2 ≥ 0.
According to the complementary slackness condition, the feasible solutions x and λ to a dual pair of
e e
problems (either in symmetric or asymmetric form) are optimal if and only if the following conditions are
satisfied
(cT − λT A)x = 0, and (12.21)

e e ee e
λT (Ax − b) = 0. (12.22)
e e ee e
Let us show part of the proof for the asymmetric case from which the proof for the symmetric form can
be easily derived. One can observe that cT x − λT b = 0. Since Ax − b ≥ 0 and λ ≥ 0 one has that
e e e e ee e e
e e e
(cT − λT A)x = λT (b − Ax) ≤ 0.

e e ee e e e e ee
On the other hand, since λT A − cT ≤ 0 and x ≥ 0, one has that (cT − λT A)x ≥ 0. Hence (cT − λT A)x = 0.
e ee e e e e e ee e e e ee e
156
For further reference see (Chong & Żak, 2001).
Example. Consider the following LP problem:
min −2x1 − x2 − 7x3 − 4x4

x
e
s.t. x1 + x2 + x3 + x4 = 26
x1 , x2 , x3 , x4 ≥ 0.
The dual problem is
max 26λ
λ
s.t. λ ≤ −2
λ ≤ −1
λ ≤ −7
λ ≤ −4.
The solution is λ = −7. By the complementary slackness condition, (−(2 1 7 4) − (−7)(1 1 1 1)) x = 0.
e
Then the solution is obtained for
(1 1 1 1)x = 26, x ≥ 0, (5 6 0 3)x = 0,

e e e e
which leads to x∗ = (0 0 26 0)T .
e
Exercises
1. Convert the following problem to a linear program in standard form:
min |x1 | + |x2 | + |x3 |

x
e
s.t. x1 + x2 ≤ 1
2x1 + x3 = 3.
2. Design a diet of bread and milk to get at least 5 units of vitamin A and 4 units of vitamin B each day.
The amount of vitamins A and B in 1 kg of each food and the cost per kilogram of food are given
in the following table. Formulate the design optimization problem so that we get at least the basic
requirements of vitamins at the minimum cost. Solve the problem graphically and show gradients of
the cost function and active constraints on the graph. Solve the problem analytically verifying the
KKT conditions in all possible solution points (e.g., interior point, one active constraint, two active
157
Vitamin Bread Milk
A 1 2
B 3 2
Cost/kg 2 1
constraints). Solve the problem using the Simplex method. Compare your result with the one obtained
using linprog from M ATLAB.
158
Chapter 13
Nonlinear programming
13.1 Quadratic programming

Let us consider the following quadratic programming (QP) problem:
min cT x + 1 xT Qx
x
e
e e 2e ee
Ax ≤ b
e
s.t.
e
ee e
x ≥ 0,
e e
where Q is a symmetric, positive definite matrix. Adding slack variables y the problem above can be written
as e e
min cT x + 1 xT Qx
x
e
e e 2e ee
e
s.t. Ax + y = b
e (13.1)
ee e e
x≥0
y ≥ 0.
e e
e e
The Lagrangian to problem (13.1) is
1
L(x, y , u, v , w) = cT x + xT Qx + uT (Ax + y − b) + v T (−x) + wT (−y ),
e e e e e e e 2e ee e e ee e e e e e e
e
159
where u, v , and w are vectors of Lagrange multipliers associated with the constraints. The corresponding
e e e
KKT conditions can be expressed as
c + Qx + AT u − v = 0 (13.2)
e ee e e e e e
u−w = 0 (13.3)
e
e e e
Ax + y − b = 0 (13.4)
e
ee e e e
x ≥ 0 (13.5)
e e
y ≥ 0 (13.6)
e e
v ≥ 0 (13.7)
e e
w ≥ 0 (13.8)
e e
v T x + wT y = 0. (13.9)
e e e e
From (13.3) and (13.8), this set of equation can be stated as
Qx + AT u − v = −c (13.10)
e e ee e e e
Ax + y = b (13.11)
e
ee e e
x ≥ 0 (13.12)
e
e e
y ≥ 0 (13.13)
e e
v ≥ 0 (13.14)
e e
u ≥ 0 (13.15)
e e
T T
v x+u y = 0. (13.16)
e e e e
Notice that the set of conditions given by (13.10) to (13.15) are linear functions of the variables x,y ,u, and
eee
v . The only nonlinear function is the complementary condition (13.16). Thus the solution to the original QP
e
problem corresponds to the nonnegative solution to (13.10) and (13.11) that also satisfies (13.16).
Since Q is positive definite, the objective function in (13.1) is strictly convex. Furthermore, the con-
straints areeall linear so the problem is strictly convex. Therefore, any local minimum of (13.1) will be the
e
strict global minimum. Hence, the solution to the system (13.10) to (13.16) must be unique.
Following the procedure presented by Wolfe (1959) and referenced by Rao (2009), phase I of the simplex
method can be used to solve the aforementioned problem. This procedure involved the introduction of
nonnegative artificial variables z into (13.10) so that
e
Qx + AT u − v + z = −c, (13.17)
e e e e e e e e
e
and the coefficients of the right hand side are all positive. Change of sign in rows maybe required before the
addition of the artificial variables. The artificial cost function to be minimized corresponds to the sum of the
160
n artificial variables zi . The LP problem to be solved can be stated as
Pn
min i=1 zi
x,y ,z ,u,v
e e e e e
s.t. Qx + AT u − v + z = −c
e e e e e e e e (13.18)
Ax + y = b
e
e
ee e e
x, y , z , u, v ≥ 0.
e e e e e e
During the solution procedure, one verifies that the nonlinear constraint (13.16) is satisfied. This additional
check is a rather simple task. It is when deciding which non-basic variable will become basic. Let us
illustrate this with the following example.
Example. Consider the following QP problem:
min x21 + x22 − x1 x2 − 6x1 + 5x2

x
e
s.t. x1 + x2 ≤ 4
3x1 + 6x2 ≤ 20
x1 , x2 ≥ 0.
The above can be formulated as

!
1 T 2 −1
min (−6 5)x + x x
x ,y e 2 e −1 2 e
e e
! !
1 1 4
s.t. x+y =
3 6 e e 20
x, y ≥ 0.
e e e
Using (13.18), the LP problem to be solved is
min z1 + z2
x,y ,z ,u,v
e e e e e
! ! ! !
2 −1 1 3 z1 6
s.t. x+ u−v+ =
−1 2 e 1 6 e e −z2 −5
! !
1 1 4
x+y =
3 6 e e 20
x, y , z , u, v ≥ 0,
e e e e e e
verifying that
v1 x1 + v2 x2 + u1 y1 + u2 y2 = 0. (13.19)
161
The above problem is solved using the simplex method as shown in Table 13.1. In the initial tableau all the
coefficients in the right hand side are positive and the artificial variables z1 and z2 are basic. Any other two
basic variables can be selected making sure that (13.19) is satisfied. In other words, avoiding that (v1 , x1 ) or
(v2 , x2 ) or (u1 , y1 ) or (u2 , y2 ) are simultaneously basic (non-zero) variables. The solution to this problem is
x∗1 = 3, x∗2 = 0, and f ∗ = −9. The Lagrange multipliers associated with the original inequality constraints
are zero, this isu1 = u2 = 0.
Table 13.1: Quadratic programming using simplex

xB x1 x2 y1 y2 z1 z 2 u1 u2 v1 v2 bj bj /aji
ze1 2 −1 0 0 1 0 1 3 −1 0 6
z2 1 −2 0 0 0 1 −1 −6 0 1 5
y1 1 1 1 0 0 0 0 0 0 0 4
y2 3 6 0 1 0 0 0 0 0 0 20
−w 0 0 0 0 1 1 0 0 0 0 0
z1 2 −1 0 0 1 0 1 3 −1 0 6 3
z2 1 −2 0 0 0 1 −1 −6 0 1 5 5
y1 1 1 1 0 0 0 0 0 0 0 4 4
y2 3 6 0 1 0 0 0 0 0 0 20 6.7
−w −3 3 0 0 0 0 0 3 1 −1 −11
x1 1 −0.5 0 0 0.5 0 0.5 1.5 −0.5 0 3
z2 0 −1.5 0 0 −0.5 1 −1.5 −7.5 0.5 1 2
y1 0 1.5 1 0 −0.5 0 −0.5 −1.5 0.5 0 1
y2 0 7.5 0 1 −1.5 0 −1.5 −4.5 1.5 0 11
−w 0 1.5 0 0 1.5 0 1.5 7.5 −0.5 −1 −2
x1 1 −0.5 0 0 0.5 0 0.5 1.5 −0.5 0 3
v2 0 −1.5 0 0 −0.5 1 −1.5 −7.5 0.5 1 2
y1 0 1.5 1 0 −0.5 0 −0.5 −1.5 0.5 0 1
y2 0 7.5 0 1 −1.5 0 −1.5 −4.5 1.5 0 11
−w 0 0 0 0 1 1 0 0 0 0 0
This formulation works for LE inequality constraints. If there is an EQ constraint h1 = b1 , then it can
be replaced by two inequality constraints: h1 ≤ b1 and h1 ≥ b1 . One can check that a system of m EQ
constraints
h1 = b1
h2 = b2
..
.
hm = bm
162
is equivalent to the following system of m + 1 inequality constraints:
h1 ≤ b1
h2 ≤ b2
..
.
hm ≤ bm
h1 + h2 + · · · + hm ≥ b1 + b2 + · · · + bm .
13.2 Zoutendijk’s method of feasible directions

13.2.1 Search direction
Zoutendijk developed the method of feasible directions in 1960. This method is very robust and still popular.
This method was proposed to solve problems only with inequality constraints of the form
min f (x)
x e (13.20)
e
s.t. gj (x) ≤ 0, j = 1, . . . , r
e
The strategy is to find a search direction d that is descent, ∇f T d < 0, and feasible, ∇gj T d ≤ 0 for j ∈ J.
e e e
In this formulation J is the active set defined as
J = {j : gj (x) + ε ≥ 0, j = 1, . . . , r}, (13.21)

e
where ε is the parameter that defines the “thickness” of the active constraints. The problem of finding a
descent and feasible direction can be formulated by the following subproblem,
min max{∇f T d, ∇gj T d for j ∈ J}

d e e
e
s.t. ∇f T d < 0 (13.22)
∇gj T d ≤ 0, j ∈ J.
e
e
This can be written as
min β
β,d
e
s.t. ∇f T d ≤ β
∇gj T d ≤ θj β,
e
j∈J (13.23)
−1 ≤ di ≤ 1,
e
i = 1, . . . , n
β ≤ 0.
This problem imposes a constraint on the magnitude of di to ensure a bounded solution. If β ∗ < 0 then
d∗ can improve the current point. If β ∗ = 0 then d∗ does not exist and the current design is a KKT point.
e e
163
In (13.23), θj is a “push-off” factor used by Zoutendijk to control the angle between the optimum search
direction d∗ and the tangent of the active constraint gj . If θj = 0, then it is possible for d∗ to be tangent to
the constraint, this is ∇gj T d∗ = 0. This is permissible only for linear constraints. In general, θj > 0. By
e e
e
default, θj = 1.
13.2.2 Standard form

The problem (13.23) can be expressed in LP standard form using the following change of variables, dˆ = d+1
and β = −β̂. In this way, the constraint ∇f T d ≤ β can be written as
e e e
e
∇f T (dˆ − 1) ≤ −β̂
∇f T dˆ + β̂ ≤ ∇f T 1
e e
Pn e ∂f
∇f T dˆ + β̂ ≤
e
i=1 ∂xi .
e
In this way, the constraint ∇gi T d ≤ β can be written as
e
Pn ∂gj
∇gi T dˆ + β̂ ≤ i=1 ∂xi , j ∈ J.
e
In LP standard form, (13.23) is
min −β̂
β̂,dˆ
∇f T dˆ + β̂ ≤ ni=1 ∂x∂f
e P
s.t. i
∂g
∇gj T dˆ + β̂ ≤ ni=1 ∂xji , j ∈ J.
e P
(13.24)
dˆi ≤ 2
e
dˆi ≥ 0, i = 1, . . . , n
β̂ ≥ 0.
13.2.3 Step size

The next stage in this problem is to determine the step size α. This one-dimensional optimization problem
can be stated as
min f (α) = f (xk + αdk )
α e e (13.25)
s.t. gj (α) = gj (xk + αdk ) ≤ 0, i = 1, . . . , r.
e e
where dk is the solution of (13.24). One approach is to determine the upper limit for the step size αU . The
first approximation to this value α1U , is obtained from the limits of the design variables,
e
xL ≤ xk + α1U dk ≤ xU .
e e e e
Then, one evaluates all constraints gj , j = 1, . . . , r at xk + α1U dk . If the point is feasible, then αU = α1U .
e e
164
Otherwise, one has to find the upper limit of the step size α1U using the bisection method or another zero
finding procedure.
Having determined the upper limit of the step size αU , the step size αk is the one that drives the slope of
f to zero. If f 0 (αU ) < 0 then αk = αU . If f 0 (αU ) > 0 then the minimum is located in the interval [0, αU ]
and can be found using the bisection method.
13.2.4 Initial feasible point

This method needs a feasible starting point. A simple effective approach to obtaining such an x0 is to solve
e
the following optimization sub problem
fˆ(x) = m 2
P
min
x j=1 max{(0, gj + ε) }
e
e (13.26)
s.t. xL ≤ x ≤ xU .
e e e
13.2.5 Equality constraints
To incorporate equality constraints, Vanderplaats suggests the use of penalty functions. This problem is
handled as
f (x) − r lj=1 hj (x)
P
min
x e e
(13.27)
e
s.t. gi (x) ≤ 0, i = 1, . . . , m
hj (x) ≤ 0, j = 1, . . . , l.
e
e
Another approach is to convert an equality constraint into two inequality constraints. For example,
h(x) = b
e
can be approximated to
9 10
b ≤ h(x) ≤ b.
10 e 9
13.2.6 Algorithm
Step 1. Determine a feasible starting point x0 . Use (13.26) if necessary.
e
Step 2. Determine the active set J of active constraints.
Step 3. Solve the LP problem (13.24) for β ∗ and dk = d∗ .

e e
∗
(a) If β = 0, the current point is a KKT point, then stop.
(b) Otherwise, perform line search: determine αU and hence αk . Update the current point
xk+1 = xk + αk dk .
e e e
165
and return to Step 2.
Example. Solve
max −x21 + 3x1 x2 − 4.5x22 + 10x1 + 6x2
x
e
s.t. x1 − x2 ≤ 3
x1 + 2x2 ≤ 12
x1 , x2 ≥ 0
from the initial point x0 = (4, 4)T . This problem can be expressed as,
e
min f (x) = x21 − 3x1 x2 + 4.5x22 − 10x1 − 6x2
x e
e
s.t. g1 (x) = x1 − x2 − 3 ≤ 0
g2 (x) = x1 + 2x2 − 12 ≤ 0
e
x1 , x2 ≥ 0
e
At the initial point, f (x0 ) = −24, g1 (x0 ) = −3, and g2 (x0 ) = 0. Then this point is feasible and g2 is
e e e
active. The gradients of the objective function and the active constraint evaluated at x0 are
e
∇f (x0 ) = (−14, 18)T ∇g2 (x0 ) = (1, 2)T
e e
A descent and feasible search direction can be obtained solving the LP problem
ˆ
 
d1
ˆ 
min 0 0 −1 d2 
β̂,dˆ
e β̂
! dˆ1
 
!
−14 18 1  ˆ  4
s.t. d2  ≤ (13.28)
1 2 1 3
β̂
ˆ
     
0 d1 2
  ˆ   
0 ≤ d2  ≤  2 
0 β̂ ∞
The solution to this problem is dˆ1 = 0, dˆ2 = 0, and β̂ = 3; this is d1 = −1, d2 = −1, and β = −3.
The step size is found as
α0 = arg min f (α) = 2.5α2 − 4α − 24,

α
which is α0 = 0.8. The candidate next point is (3.2, 3.2)T . One can check that this point is feasible,
therefore x1 = (3.2, 3.2)T . In the new point, there is no active set so one can use the steepest descent
method. The solution to this problem is x∗1 = 6, x∗2 = 3, and f ∗ = −55 with the two active constraints,
e
166
g1 = 0 and g2 = 0.
13.3 Generalized reduced gradient method

The generalized reduced gradient (GRG) method was proposed by Wolfe in 1967. This method was stated
for nonlinear optimization problems with equality constraints,
min f (x)
x e
(13.29)
e
s.t. hi (x) = 0, i = 1, . . . , m
xL ≤ x ≤ xU .
e
e e e
If there is an inequality constraint gj (x) ≤ 0, then this can be converted to an equality through the additions
e
of a slack variable,
gj (x) + xn+1 = 0, 0 ≤ xn+1 ≤ ∞ (a large number, in practice). (13.30)

e
The first step in GRG is to partition x into m dependent variables y and (n − m) independent variables
e
z,
e
e y
x= e . (13.31)
e z
e
Let x be a regular point. Then the n × m Jacobian matrix of h, defined as
e e
 ∂h ∂h1 ∂h1

1
∂x1 ∂x2 . . . ∂x n
 ∂h2 ∂h2 ∂h2 
∂h  ∂x1 ∂x2 . . . ∂x 
n 
e =  ..
 .
.. .. .
.. 
, (13.32)
∂x  . . 
e
∂hm ∂hm
∂x1 ∂x2 . . . ∂h
∂xn
m
has m linearly independent rows. In other words, rank(Dh) = m. Now, let us do the following partition:
e
 
∂h1 ∂h1 ∂h1 ∂h1
∂y1 . . . ∂ym ∂z1 . . . ∂zn−m
 ∂h ∂h2 ∂h2 ∂h2 
 !
∂h  2
 ∂y1 . . . ∂ym ∂z1 . . . ∂zn−m  ∂h ∂h
= = = B C , (13.33)
∂x  ... .. .. .. .. ..  ∂y ∂z
e  e e
. . . . .  e
e ee
e   e e
∂hl ∂hm ∂hl ∂hm
∂y1 . . . ∂ym ∂z1 . . . ∂zn−m
where B is an (m × m) nonsingular matrix and C is a m × (n − m) matrix. The nonsingularity of B allows

e use of the implicit function theorem.e
e e
us to make e The theorem states that there is a small neighborhood
e of
x such that for z in this neighborhood, y = y (z ) is a differentiable function of z . In other words, B can be
e e e ee e e
e
167
inverted.
Conveniently, the objective function can be expressed as f (x) = f (y , z ) and the constraints can be
e e e only on z . The gradient of
expressed as h(x) = h(y , z ) = 0. Since y is a function of z , then f and h depend
f in the (n − m) dimensional
e e ee e e e e e
z -space is e
called the reduced gradient. The reduced gradient can be expressed
e
as
df ∂f dy ∂f
RT = = e+ . (13.34)
e dz ∂y dz ∂z
e e e e
For the equality constraints, h = 0, one observes that
e e
dh ∂h dy ∂h dy
e = e e + e = B e + C = 0. (13.35)
dz ∂y dz ∂z e dze e
e e e e
e e e e
Solving for dy /dz yields
e e dy
−1
e = −B C . (13.36)
dz e
e e e
e
Substituting (13.36) into (13.34), the reduced gradient can be expressed as
∂f ∂f
RT = − B −1 C + . (13.37)
e ∂y e
e ee ∂ze
e
The search direction d is partitioned as
e dy
d= e , (13.38)
e dz
e
where dy and dz represent the components of d in the y - and z -spaces, respectively. Particularly, the direction
dz is chosen to be the steepest descent direction ine z -space. This is dz = −R. However, since each
e e e e
component zi is bounded between ziL and ziU for i = 1, . . . , n − m, then one can apply the following rule:
e e e e


 0 if zi = ziL and Ri > 0
dzi = 0 if zi = ziU and Ri < 0 (13.39)

−Ri otherwise

In practice, one can implement the check zi < ziL + εz and zi > ziU − εz rather than zi = ziL and zi = ziU ,
respectively, where εz is a small value.
Given dz , then dy can be obtained from (13.36) as follows:
e e
dy = −B −1 C dz . (13.40)
e e e
e ee
If d = 0, then the current point is a KKT point and the iterations are terminated. Otherwise, the algorithm
e e
performs a line search to determine the step size along the search direction d.
e
168
13.3.2 Step size
Given a search direction dk , the step size αk is determined from the following optimization problem
e
min f (xk + αdk )
α e e (13.41)
s.t. xL ≤ xk + αdk ≤ xU .
e e e e
One procedure is to solve the unconstrained optimization problem and determined if the new point xk+1 =
e
xk + αk dk satisfies the geometric constraint. If the new point violates this constraint, then the step size is
e e
obtained from
xL ≤ xk + αk dk ≤ xU .
e e e e
Now, one has to verify that the new point satisfies the equality constraints of the original optimization
problem. If it does not, then a correction step is required.
13.3.3 Correction
If all the constraints h were linear, then xk+1 is a new improved point. However, with nonlinear constraints,
e e
xk+1 may not be a feasible point, this is
e
max {|hi (xk+1 )|} > εh ,
i=1,...,m e
where εh is a tolerance that determines the violation of the constraints. When that occurs, one has to adjust
the dependent variables y to return to the feasible region while keeping the independent variables fixed. The
problem to be solved cane be written as
h(y , z k+1 ) = 0. (13.42)
ee e e
To this end, the algorithm might use any zero-finding technique such as the Newton-Raphson method.
(0)
Starting from y k+1 , then
e (r+1) (r) (r)
y k+1 = y k+1 + ∆y k+1 , (13.43)
e e e
where
(r)
(r+1) (r)
∂h(y k+1 , z k+1 ) (r)
h(y k+1 , z k+1 ) ≈ h(y k+1 , z k+1 )
+ ee e ∆y k+1 = 0 (13.44)
ee e ee e ∂y e e
e
or " (r) #−1
(r) ∂h (r)
∆y k+1 = − ek+1 hk+1 . (13.45)
e ∂y e
e
13.3.4 Algorithm
Step 1. Choose feasible starting starting point x0 .
e
169
Step 2. Determine the basic and nonbasic variables. In general, the variables in the objective function
might be selected as basic variables. In some occasions these variables should be determined by
performing pivoted Gaussian elimination on the matrix ∇hT . Determine B and C and evaluate R.
e e
e e
e e
Step 3. Determine the direction vector dk from (13.38). If dk = 0, then stop since the current point is a
e e e
KKT point.
Step 4. Determine the step size αk form (13.41) and determine a candidate design xk+1 = xk + αk dk .
e e e
Step 5. If xk+1 is feasible, then set xk = xk+1 and go to Step 2. Otherwise, perform Newton-Raphson
e e e
correction to return to feasibility. If f (xk+1 ) < f (x) then set xk = xk+1 and go to Step 2. If
e e e e
f (xk+1 ) > f (xk ) then set αk = αk /2 to obtain a new xk and go to the beginning of this Step.
e e e
Example. Solve the following NLP problem:
min f (x) = (x1 − 3)2 − (x2 − 3)2

x e
e
s.t. h1 (x) = 2x1 − x22 − 1 − x3 = 0
h2 (x) = 0.8x21 − 2x2 − 9 + x4 = 0
e
x3 ≥ 0, x4 ≥ 0,
e
from the initial point x1 = 1, x2 = 1, x3 = 0, and x4 = 6.2.

This initial point is feasible. Let us select x1 and x2 as basic or dependent variables, then x3 and x4 will
be independent or nonbasic variables. Then
!
∂h ∂h 2 −2x2
B= e= =
∂y ∂(x1 , x2 ) −2
e
e
e 1.6x1
e
and !
∂h ∂h −1 0
C= e= = .
∂z ∂(x , x )
e
e
e 3 4 0 1
e
Also,
∂f ∂f
= = 2(−3 + x1 ), −2(−3 + x2 )
∂y ∂(x1 , x2 )
e
and
∂f ∂f
= = 0 0 .
∂z ∂(x3 , x4 )
e
The reduced gradient in the initial point is
∂f ∂f −1
RT = − B C = −2 0
e ∂z ∂y e
e ee
e e
The search direction in the reduced z -space is
e
170

2
dz = ,
e 0
and
−1 0.5556
dy = −B C dz = .
e e
e e
ee −0.4444
For α = 1,
     
1 0.5556 1.5556
     
 1  −0.4444 0.5556
x1 =   + 
  
 =  2 .
  
e 0
   2   
6.2 0 6.2
(0)
Since h1 = −0.197531 and h2 = 0.246914 this point is not feasible. Using Newton-Raphson from y 2 =
(1)
(1.5556, 0.5556)T , one obtains y 2 = (1.5734, 0.409895)T . One observes that h = (−0.0212171, 0.000254688)T .
e
(2)
In the next Newton-Raphson iteration, y 2 = (1.58036, 0.401002)T and h = (0, 0)T . That would be the
e e
e
end of the first iteration, so the correctedepoint is
 
1.58036
 
0.401002
x1 = 
 .
e  2 

6.2
13.4 Sequential Linear Programming

13.4.1 Description
Sequential linear programming (SLP) methods solve a LP problem at each iteration. A NLP problem can
be converted into a LP problem by linearizing the objective functions and the constraints about the current
point xk . Consider the NLP problem
e
min f (x)
x e
(13.46)
e
s.t. hi (x) = 0, i = 1, . . . , m
gj (x) ≤ 0, j = 1, . . . , r
e
e
171
The linearized problem about xk can be written as
e
min fL (x) = f (xk ) + ∇f (xk )T (x − xk )
x e e e e e
hLi (x) = hi (xk ) + ∇hi (xk )T (x − xk ) = 0, i = 1, . . . , m (13.47)
e
s.t.
gLj (x) = gj (xk ) + ∇gj (xk )T (x − xk ) ≤ 0, j = 1, . . . , r.
e e e e e
e e e e e
The first step of the SLP algorithm is to find the search direction dk . From the problem above, one can
e
stated dk as the solution of
e
min fL (d) = f (xk ) + ∇f (xk )T (d)
d e e e e
hLi (d) = hi (xk ) + ∇hi (xk )T (d) = 0, i = 1, . . . , m
e
s.t. (13.48)
gLj (d) = gj (xk ) + ∇gj (xk )T (d) ≤ 0, j = 1, . . . , r
e e e e
−dLk ≤ d ≤ dU
e e e e
k ,
e e e
where the bounds −dLk and dU k are referred to as move limits. Without the addition of the move limits, dk
e e e
may not be bounded or its value may be too large invalidating the linear approximation.
Usually the move limits are selected as some fraction of the current design variable values. If the
resulting LP problem turns out to be infeasible, the move limits will have to be relaxed (i.e., allow larger
changes in design) and the subproblem will have to be solved again. As in trust region methods, the move
limits might be adjusted at every iteration.
13.4.2 Algorithm
Step 1. Set k = 0. Define an initial feasible point x0 , and tolerances εc and εd for constraint violation and
e
changes in the design variables.
Step 2. Evaluate the function and the constraints and their gradients at the current design point xk .
e
Step 3. Define the LP sub-problem as in (13.48). Select the proper move limits.
Step 4. Solve for dk and obtain xk+1 = xk + dk .

e e e e
Step 5. Check for convergence: If gi (xk+1 ) ≤ εc for i = 1 to m, |hj (x0 )| ≤ εc for j = 1 to l, and
||dk || ≤ εd , then stop. Otherwise, set k = k + 1 and go to Step 2.
e e
e
172
Example. Consider the following NLP problem
min f (x) = −x21 + x22 − 6x1 − 8x2 + 10

x e
e
s.t. g1 (x) = 4x21 + x22 ≤ 0
e
g2 (x) = 3x1 + 5x2 ≤ 0
e
g3 (x) = −x1 ≤ 0
e
g4 (x) = −x2 ≤ 0.
e
and the initial point x0 = (1, 1)T . Using SLP method complete one iteration. Use ±2 as your move limits.
e
!
d
1
min −8 −6
d d2
e    

8 2
 ! −5
3 5  d1
 −8
s.t. 
−1 0  d ≤  
1
  2  
0 −1 1
! ! !
−2 d1 2
≤ ≤
−2 d2 2
Using Simplex, the new direction is d0 = (−1, −1)T the new point is x1 = (0, 0)T .
e e
13.5 Sequential quadratic programming

13.5.1 Equivalence with Newton’s method
Sequential quadratic programming (SQP) methods generate design improvements by solving QP subprob-
lems. This approach can be used both in line search or trust-region frameworks (Nocedal & Wright, 1999).
SQP methods deliver high order of convergence and they are suitable for solving problems with significant
nonlinearities.
min f (x)
x e (13.49)
e
s.t. hi (x) = 0, i = 1, . . . , m.
e
One of the most simple derivations of SQP methods states them as an application of Newton’s method to
the KKT conditions for the NLP stated above. The Lagrangian to this problem is
L(x, λ) = f (x) + λT h(x),

e e e e e e
173
and the KKT conditions can be stated as
∂L(x, λ) ∂f (x) T ∂h(x) T
e e = e +λ e e = 0
∂x ∂x e ∂x e
∂L(x, λ)
e e e
= h(x) = 0T ,
∂λ
e e
e e e
e
or
∇f (x) + A(x)T λ
!
∂L(x, λ) T

F (x, λ) = = e e e e =e
e 0, (13.50)
∂(x, λ)
e e
e e e h(x)
e e e e
where
∂h(x) T

T
A(x) = = ∇h(x) = ∇h1 (x), . . . , ∇hm (x) .
∂x
e e
e
e e e e e e
e
One approach to solve this problem is by using Newton’s method. The Jacobian of (13.50) is given by
 
T
∂F (x, λ) W (x, λ) A(x )
e e e = f
f e e e e e , (13.51)
∂(x, λ) A(x) 0
e e e
e e e
e
where W denotes the Hessian of the Lagrangian with respect to x,
f
f e
W (x, λ) = ∇2 L(x, λ) = ∇2 f (x) + ∇A(x)T λ.

f e e
f e e e e e e
The Newton step from the point (xk , λk ) is thus given by
e e
! ! !
xk+1 xk dx
e = e + e , (13.52)
λk+1 λk dλk
e e e
where dx and dλ solve the KKT system
e e
 
W k Ak T
! !
dx −∇fk − Ak T λk
f e 
e e = e
e e . (13.53)
Ak 0 dλ −hk
f
e
e e
e e e
This iterations is sometimes referred to as the Newton-Lagrange method (Nocedal & Wright, 1999). It is
well defined when the KKT matrix is nonsingular. The first assumption when using this method is that the
constraint Jacobian Ak has full row rank. In other words, that all constraints are linearly independent. The
e
e that the matrix W k is positive definite on the tangent space of the constraints, i.e.,
second assumption is
dT W k d > 0 for all d 6= 0 such that Adf
f
= 0.
e f e e e e e e
fAn alternative way to view the Newton
e iteration defined by (13.52) and (13.53) is by defining the fol-
174
lowing QP problem:
1 T
min + ∇fk T d
2 d W kd
d e f f e e (13.54)
e
s.t. Ak d + hk = 0.
e
e e e e
With the assumptions about Ak and W made above, this problem has a unique solution (dk , µk ) that satisfies
e f e e
the KKT conditions e f
W k dk + ∇fk + Ak T µk = 0, (13.55)
f e
f e
e e e
Ak dk + hk = 0. (13.56)
e
e e e e
Reorganizing terms, these conditions can be written as
 
W k Ak T
! !
dk −∇fk
f e  e
e = . (13.57)
Ak 0 µk −hk
f
e
e e
e e e
Adding the term Ak T λk from the first set of equations in (13.53), one obtains that
e
e e
W k dx + Ak T (λk + dλ ) = −∇fk .
f e
f e
e e e
By comparison with the first set of equations in (13.57), one observes that these two expressions are the
same when dx = dk and λk+1 = µk . This relationship is referred to as the equivalence between QP and
e e e
Newton’s method. e

From the KKT conditions one can solve for dk and µk . In the particular case in which Ak can be inverted,
e e
then one can use (13.56) and obtain e e
dk = −A−1k hk , (13.58)
e e
e e
which depends only on the constraints. In the same way, the Lagrange multipliers can be obtained from
(13.55) as
µk = −A−1 k W d
k x + ∇fk . (13.59)
e e
e f e
f
However, Ak is in general not invertible as m < n. Then, one can solve dk from (13.55). If W k is
e e f
invertible, thene f
dk = −W −1 T
k (∇fk + Ak µk ). (13.60)
e f
f e
e e
Replacing the above equation into (13.56) yields
−Ak W −1 T
k (∇fk + Ak µk ) + hk = 0,
e
e ff e
e e e e
175
−Ak W −1 −1 T
k ∇fk − Ak W k Ak µk + hk = 0.
e
e ff e
e ff e e e e e
Solving for µk
h i−1
µk = Ak W −1 −1
e
T
k A k A k W k ∇f k + h k ,
e e
e ff e e e
e f f e
h i−1 h i−1
µk = Ak W −1
k A k
T
A k W −1
k ∇f k + A k W −1
k A k
T
hk .
e e f
e f e e e
e f f e f
e f e e e
Replacing into (13.60)
dk = −W −1 −1
k ∇fk − W k Ak µk ,
T
e f
f f e
f e e
h i−1 h i−1
dk = −W −1 k ∇f k − W −1
k A k
T
A k W −1
k A k
T
A k W −1
k ∇fk − W −1
k A k
T
A k W −1
k A k
T
hk .
e f
f f e
f e e
e f f e e e
e f f f e
f e e
e f f e e e
In general, W k is not invertible; however, one can approximate W −1 k using BFGS or DFP methods and
make is invertible and also positive definite. In the case in which W −1
f f
k = I , the above equation can be
f f
f e
simplified as f e
h i−1 h i−1
dk = −∇fk − Ak T Ak Ak T Ak ∇fk − Ak T Ak Ak T hk .
e e
e e
e ee e
e e
e e
e e e e
Or simply dk = d1 + d2 where
e e e
h i−1
T T
d1 = − I − Ak Ak Ak Ak ∇fk (13.61)
e e
e e e e
e ee e
e
and h i−1
d2 = −Ak T Ak Ak T hk . (13.62)
e e
e e
e ee e
Geometrically, d1 represents the projection of the steepest descent direction onto the tangent plane to the
e
active constraints at xk and d2 is the correction direction which points towards the feasible region. One can
prove that d1 T d2 = 0.
e e
e e
min f (x) = 2x31 + 15x22 − 8x1 x2 − 4x1

x e
e
s.t. h(x) = x1 + x2 + 1 = 0
g(x) = x1 − 14 x22 + x3 − 1 = 0
e
e
T
and the initial point x0 = (−2, 1, 13
4 ) . The gradient of the objective function is:
e
 
−4 + 6x21 − 8x2
∇f (x) =  −8x1 + 30x2 
 
e
0
176
The constraints can be expressed as
!
1 + x1 + x2
h(x) = ,
e e −1 + x1 − 41 x22 + x3
and its Jacobian is !

1 1 0
A(x) =
e
e e 1 − 12 x2 1
Assuming W = I , then the two components of the search direction can be obtained from (13.61) and
f onee
f
(13.62). Then, eobtains that
   3   77 
−40 2 −2
  3   83 
d0 =  40  +  2  =  2  .

− 49 − 49
e
0
13.5.3 Practical implementation

The SQP framework can be extended to general NLP problems such as:
min f (x)
x e
e
s.t. hi (x) = 0, i = 1, . . . , m (13.63)
gj (x) ≤ 0, j = 1, . . . , r
e
xL ≤ x ≤ xU .
e
e e e
Given a point xk and using a line search approach, a new improved design is obtained as xk+1 =
e e
xk + αk dk . The search direction vector dk can be found from the solution to the following QP subproblem:
e e e
min ∇f (xk )T d + 12 dT ∇2 f (xk )d
d e e e e e
T
e
s.t. hi (xk ) + ∇hi (xk ) d = 0, i ∈ J1 (13.64)
gj (xk ) + ∇gj (xk )T d ≤ 0, j ∈ J2
e e e
xL ≤ xk + d ≤ xU .
e e e
e e e e
The Hessian of the objective function can be approximated using the DFP or BFGS formulae. The active
sets J1 and J2 in (13.64) are defined as
J1 = {i : |hi (xk )| ≥ V (xk ) − δ, i = 1, . . . , m} (13.65)

e e
J2 = {j : gj (xk ) ≥ V (xk ) − δ, j = 1, . . . , r}, (13.66)
e e
177
where δ is a small number specified by the user. In the above definitions, V (xk ) represents the maximum
e
violation as defined by
V (xk ) = max{|hi (xk )|, gj (xk )}, i = 1, . . . , m, j = 1, . . . , r. (13.67)

e i,j e e
For example, if g (xk ) = (−1.5, −0.3, 0.3, 1.3, 1.6, 1.7)T , then V (xk ) = {1.7}. If δ = 0.1, then
V − δ = 1.6 and the eactive set is J2 = {5, 6}.
e e
min f (x) = 2x31 + 15x22 − 8x1 x2 − 4x1

x e
e
s.t. g1 (x) = x1 − 14 x22 − 1 ≤ 0
h1 (x) = x21 + x1 x2 + 1 = 0
e
e
and the initial point x0 = (1, 1)T . State the QP sub-problem assuming δ = 0.1.
Evaluating, f (x0 ) = 5, g(x0 ) = −0.25 (inactive), and h(x0 ) = 3 (violated). Then, the active sets are
e
given by J1 = ∅ and J2 = {1}. The QP sub-problem can be stated as

e e e
min fQ (d) = −6d1 + 22d2 + 21 (d21 + d22 )

d e
e
s.t. hL1 (x) = 3d1 + d2 + 3 = 0.
e
13.5.4 Line search

Regularly, the step size if found by minimizing the function f (α) along the search direction dk . However,
e
in SQP methods this method cannot be used because while d1 component reduces f , the d2 will in general
e e
increase f (Belegundu & Chandrupatla, 1999). Therefore,
αk = arg min f (α) + RV (α), (13.68)

α
where V is defined as in (13.67). The value of the penalty parameter R is chosen such that
X
R≥ |µi |, i ∈ J1 ∪ J2 (13.69)
i
178
Part V
Integer Programming
179
Chapter 14
Numerical methods
Many problems in engineering involve the use of discrete variables. For example, binary decision variables
(e.g., turn off or turn on) or the selection of a discrete number of values (e.g., values from a sheet metal
thickness chart). Other discrete variables can be expressed in real numbers, (e.g., 0, 1, 2, 3 . . .). In some
cases it is possible to approximate a real number to an integer. For example, 4, 567.8 manufactured parts
could be rounded to any closer integer with no meaningful change in the result. However, when the variable
values are small, rounding could significantly change the result, e.g., 3.4 airplanes might not be rounded to
3 or 4 without significant change in the budget.
Most industrial applications involve the use of some integer variables. These problems are referred to
as mixed integer programming (MIP) problems. These problem could be linear, i.e., mixed-integer linear
programming (MILP), or nonlinear, i.e., mixed-integer nonlinear programming (MINLP). When the prob-
lems involve only integer variables they are called integer programming (IP) problems. In the same way,
problems involving only discrete variables are discrete programming (DP) problems. A special class of an
IP problem would be the binary integer programming (BIP) problems that only involve binary variables.
Two common methods to solve problems involving binary, integer and discrete variables are the implicit
enumeration (IE) method and the branch and bound (BB) method. These two methods are presented by
Belegundu & Chandrupatla (1999) and reviewed in this chapter.
14.1 Implicit enumeration method

Several IP or DP problems can be brought into a form where each variable is binary, i.e., takes values 0
or 1. Implicit enumeration (IE) is a popular method for solving these BIP problems. Let us consider these
problems in the standard form
min f (x) = cT x
x e e e
e
s.t. Ax − b ≥ 0 (14.1)
e
ee e e
xi ∈ {0, 1}, i = 1, . . . , n
180
where each coefficient ci is non-negative (ci ≥ 0). The initial design is selected as x = 0 which gives the
e e
minimum possible value to the objective function. If that design is feasible, the search is over. If it is not
feasible, then we make use of IE technique to find the solution which is described in this section.
Writing the problem in IE standard form is rather simple. A LE constraint can be multiplied by −1
to make it GE constraint. An equality constraint h(x) = b, can be replaced by two inequality constraints
h(x) ≤ b and h(x) ≥ b. A set of EQ constraints
e
e e
l1 (x) = b1
(14.2)
e
l2 (x) = b2
e
l3 (x) = b3
e
is equivalent to
l1 (x) ≤ b1
≤ b2
e
l2 (x)
(14.3)
≤ b3
e
l3 (x)
≥ b1 + b2 + b3 .
e
l1 + l2 + l3
Finally, if a coefficient ci is negative then xi is replaced by the new variable yi = 1 − xi .
Example. Convert the following BP problem to the standard form (14.1)
min 4x1 − 2x2

x
e
s.t. x1 + x2 = 1
−2x1 + x2 ≤ −1
x1 , x2 ∈ {0, 1}.
The equality constraint can be expressed as two inequality constraints:
x1 + x2 ≥ 1,
x1 + x2 ≤ 1.
All the LE constraints are multiplied by −1, so the problem can be written as
min 4x1 − 2x2

x
e
s.t. x1 + x2 − 1 ≥ 0
−x1 − x2 + 1 ≥ 0
2x1 − x2 − 1 ≥ 0
x1 , x2 ∈ {0, 1}.
Finally, the negative coefficient in the cost function can be turned positive with the change of variable
181
x3 = 1 − x2 . Then, the problem can be written as
min 4x1 + 2x3 − 2

x
e
s.t. x1 − x3 ≥ 0
−x1 + x3 ≥ 0
2x1 + x3 − 2 ≥ 0
x1 , x3 ∈ {0, 1}.
The constant −2 does not modify the optimum point.
The main steps of the IE process involve fathoming and backtracking. The search process can be illus-
trated with nodes and branches. From node k we choose variable xi to be fixed at 1 and we are at node i.
Then the following possibilities exist:
1. The point is feasible.
2. Feasibility is impossible for this branch.
3. The function cannot be improved from a previous feasible value.
4. Possibilities for feasibility and improvement exist.
If the node is in one of the first three states, then the branch is fathomed and we must attempt to back-
track. Case 4 is the default when 1, 2, and 3 are not satisfied. In this case, further search is necessary. A
new variable to be raised is to be chosen from the free variables. Backtracking refers to the step in which
one fixes xi = 0 from xi = 1. Fathoming is the step where a new variable to be brought in is chosen. Let us
illustrate this with the following example.
Example. Consider the following optimization problem
min f (x) = 4x1 + 5x2 + 3x3

x e
e
s.t. g1 (x) = x1 − 3x2 − 6x3 + 6 ≥ 0
g2 (x) = 2x1 + 3x2 + 3x3 − 2 ≥ 0
e (14.4)
g3 (x) = x1 + x3 − 1 ≥ 0
e
xi ∈ {0, 1}, i = 1, 2, 3.
e
The explicit enumeration is observed in Table 14.1 where the optimum is located at (0, 0, 1).
Using implicit enumeration, the initial point is (0, 0, 0). Since this point is not feasible, then one per-
forms fathoming, a systematic procedure to determine whether improvements could be obtained by chang-
ing the levels of the variable in succession. Increasing x1 = 1, one obtains a feasible solution with f = 4.
Adding an extra variable, e.g., x2 = 1 or x3 = 1, will only increase the value of f . Then, the search along
the branch with x1 = 1 can be terminated.
182
Table 14.1: Explicit enumeration for (14.4)
x1 x2 x3 g1 g2 g3 f Feasible
0 0 0 6 −2 −1 0 No
0 0 1 0 1 0 3 Yes
0 1 0 3 1 −1 5 No
0 1 1 −3 4 0 8 No
1 0 0 7 0 0 4 Yes
1 0 1 1 3 1 7 Yes
1 1 0 4 3 0 9 Yes
1 1 1 −2 6 1 12 No
The next step is backtracking with x1 = 0. Raising x3 = 1 brings feasibility and f = 3 which is lower
than before. Therefore, the solution is updated. Increasing another variable will increase f then the branch
with x3 can be terminated.
For x1 = 0 and x3 = 0 it is not possible to achieve feasibility by raising x2 , since g3 = −1 ≤ 0. Then
the search is complete as there are no more free nodes and the solution is (0, 0, 1) with f = 3.
14.2 Branch and Bound method

Branch and bound (BB) method is commonly used for MILP and MINLP problems. With this method
one solves a relaxed problem in each step. This method can be applied to BIP and MIP problems. Let us
consider these two cases in the following sections.
14.2.1 BIP problems

Consider a MILP problem in which all the design variables are binary, i.e., xi ∈ {0, 1}. The relaxed LP
problem states that 0 ≤ xi ≤ 1. One says that problem P 0 is a relaxation of problem P if:
1. The feasible space of P 0 contains the feasible space of P .
2. The value of the objective function of P̂ is not worse than the one for P .
The BB method starts by solving the relaxed LP problem. If at the end of this stage all variables are
integer then that is the solution of the MILP problem. Otherwise the BB algorithm performs branching by
selecting one of the design variables xi with no integer value and creating two LP subproblems. For binary
problems one assigns xi = 0 to the design variable and the other assigns x1 = 1. If one of these two
subproblems has a integer solution, there is no need to further explore that branch. Otherwise one selects
another non integer design variable and repeats the procedure. Let us illustrate the BB method through the
following example.
183
max f (x) = 8x1 + 4x2 + 6x3

x e
e
s.t. g1 (x) = 5x1 + 3x2 + 4x3 ≤ 9 (14.5)
g2 (x) = 6x1 + 2x2 + 3x3 ≤ 8
e
xi ∈ {0, 1}, i = 1, 2, 3.
e
The first step is to solve a relaxed LP subproblem (LP1) defined by
max f (x) = 8x1 + 4x2 + 6x3

x e
e
s.t. g1 (x) = 5x1 + 3x2 + 4x3 ≤ 9 (14.6)
g2 (x) = 6x1 + 2x2 + 3x3 ≤ 8
e
0 ≤ xi ≤ 1, i = 1, 2, 3.
e
The solution to this LP problem is x∗1 = (0.626, 0.625, 1)T and f1∗ = 13.5 which is the upper value for the
IP problem, f U = 13.5. For now, f L = −∞. In this node or solution there are two non-integer variables,
e
x1 and x2 . We select to branch from x1 by forcing it to be integer. Then we have two new LP problems, one
with x1 = 0 (LP2) and one with x1 = 1 (LP3). The other two variables will remain relaxed. This process is
referred to as branching. LP2 is
max f (x) = 8x1 + 4x2 + 6x3

x e
e
s.t. g1 (x) = 5x1 + 3x2 + 4x3 ≤ 9
g2 (x) = 6x1 + 2x2 + 3x3 ≤ 8
e (14.7)
e
x1 = 0
0 ≤ xi ≤ 1, i = 2, 3,
and the solution is x∗2 = (0, 1, 1) and f2∗ = 10 which is the new lower limit, f L = 10. The difference
e
between the upper and lower limits is known as the gap, ∆. In this case, ∆ = 13.5−10 = 3.5. Convergence
can be obtained when
∆
≤ε (14.8)
1.0 + |f L |
where ε is a proper tolerance value. The value 1.0 ensure that this condition can be evaluated when f L = 0.
For now, ∆/1.0 + |f L | = 0.318. If this value is less than the tolerance, the search is over.
184
LP3 is given by
max f (x) = 8x1 + 4x2 + 6x3
x e
e
s.t. g1 (x) = 5x1 + 3x2 + 4x3 ≤ 9
g2 (x) = 6x1 + 2x2 + 3x3 ≤ 8
e (14.9)
e
x1 = 1
0 ≤ xi ≤ 1, i = 2, 3,
and x∗3 = (1, 0.52, 0.327) with f ∗ = 12, then f U = 12, ∆ = 2. Now, ∆/1.0 + |f L | = 0.1818. If there is
e
no convergence, then we branch x1 .
LP4 will be defined by
max f (x) = 8x1 + 4x2 + 6x3
x e
e
s.t. g1 (x) = 5x1 + 3x2 + 4x3 ≤ 9
g2 (x) = 6x1 + 2x2 + 3x3 ≤ 8
e
e (14.10)
x1 = 1
x2 = 0
0 ≤ x3 ≤ 1,
and x∗4 = (1, 0, 0.667), f ∗ = 12. There is no improvement so the brach is considered explored. Now, LP5
e
is
max f (x) = 8x1 + 4x2 + 6x3
x e
e
s.t. g1 (x) = 5x1 + 3x2 + 4x3 ≤ 9
g2 (x) = 6x1 + 2x2 + 3x3 ≤ 8
e
e (14.11)
x1 = 1
x2 = 1
0 ≤ x3 ≤ 1,
and x∗5 = (1, 1, 0), f ∗ = 12. No better solution can be achieved. This is the solution to this problem.
e
14.2.2 MIP problems

If xi is constrained as an integer variable then it can be expressed as
xi = Ii + αi , (14.12)
where Ii = bxi c and 0 ≤ αi ≤ 1. In this way, two constraints are introduced on xi ,
xi ≥ Ii + 1 (14.13)
xi ≤ Ii . (14.14)
These constraints exclude the region where the possible non-integer answer might be and states two LP
185
sub-problems. This dichotomy is the key of the BB method. To illustrate this methods, let us consider the
following example.
max f (x) = 2x1 + 3x2

x e
e
s.t. g1 (x) = 4x1 + 10x2 ≤ 45 (14.15)
g2 (x) = 4x1 + 4x2 ≤ 23
e
x1 , x2 ≥ 0 are integer.
e
The solution of the LP1 relaxed problem (no integer constraint) is x∗1 = (2.0833, 3.6667)T with f ∗ =
e
15.167. From the two variables, one can select the variable with the minimum integer infeasibility, that is,
the variable whose value is closest to α = 0 or α = 1 but not equal to 0 or 1; or one can also select the
variable with the maximum integer infeasibility, which is the variable whose value is closest to α = 0.5. Let
us use the later selection criterion, so x2 is constrained.
In LP2, let us impose the constraint x2 ≥ 4. The solution to this relaxed problem is x∗2 = (1.25, 4)T
with f2∗ = 14.5. In LP3, x2 ≥ 4 and x1 ≥ 2, but the solution is not feasible, g1 = 48 and g2 = 24. Then,
e
for LP4, x2 ≥ 4 and y1 ≤ 1. The solution is x∗3 = (1, 4.1)T with f ∗ = 14.3.
For LP5, x1 ≤ 1 and x2 ≥ 5 and the solution is not feasible. For LP6, x1 ≤ 1 and x2 ≤ 4, the solution
e
is x∗6 = (1, 4)T with f ∗ = 14.0. This is an integer and feasible condition. This provides the lower limit
f L = 14 and this branch is completely explored.
e
For LP7, x2 ≤ 3. Exploring this branch one finds the solution x∗ = (1, 4) with f ∗ = 14.0 corresponding
e
to LP6.
186
Chapter 15
Modeling
15.1 Binary approximation of integer variables

An integer variable can be represented as binary variables. For example if there is a non-negative variable
0 ≤ z ≤ 11,
it can be expressed as
z = 20 x1 + 21 x2 + 22 x3 + 23 x4 , xi ∈ {0, 1}, i = 1, . . . , 4. (15.1)
Then the constraint is

x1 + 2x2 + 4x3 + 8x4 ≤ 11. (15.2)
15.2 Binary polynomial programming

A binary variable xi raised to the power p satisfies that xpi = xi . In the same way, xp11 xp22 = x1 x2 . In
general, this is
xp11 xp22 · · · xpkk = x1 x2 · · · xk . (15.3)
The product of binary variables can be substituted for a variable y and two constraints. One of these con-
straints makes y = 0 if any xi = 0. The other constraint makes y one if all xi are one. This can be expressed
as
x1 x2 · · · xk = y
x1 + x2 + · · · + xk − y ≤ k − 1
(15.4)
x1 + x2 + · · · + xk ≥ ky
xi , y ∈ {0, 1}, i = 1, . . . , k.
Let us illustrate this procedure with the following example.
187
Example. Using the polynomial properties presented before, state the corresponding LP expression of the
following NLP problem,
max 3x21 x2 x33
x
(15.5)
e
s.t. 4x1 + 7x22 x3 ≤ 12
x1 , x2 , x3 ∈ {0, 1}.
Using (15.3), the problem can be written as
max 3x1 x2 x3
x
(15.6)
e
s.t. 4x1 + 7x2 x3 ≤ 12
x1 , x2 , x3 ∈ {0, 1}.
Substituting y1 = x1 x2 x2 and y2 = x2 x3 , and using (15.4), one obtains that
max 3y1
x ,y
e e
s.t. 4x1 + 7y2 ≤ 12
x1 + x2 + x3 − y1 ≤ 2
x1 + x2 + x3 ≥ 3y1 (15.7)
x2 + x3 − y2 ≤ 1
x2 + x3 ≥ 2y2
x1 , x2 , x3 , y1 , y2 ∈ {0, 1}.
188
Chapter 16
Applications
16.1 Classic problems

16.1.1 Suitcase problem
There are n objects. The weight of the i-th one is wi and its value is vi . Select the most valued objects such
that the total weight does not exceed W .
Pn
max i=1 vi xi
x
Pn (16.1)
e
s.t. i=1 wi xi ≤W
xi ∈ {0, 1} i = 1, . . . , n
Example. You would like to pack four items with values of $9, $12, $5, and $4, respectively. The weight
of these items are 3, 4, 2, and 1 kg, respectively. The capacity of your bag is 7 kg. Determine which items
to carry in order to maximize the value of the contents in your bag.
max 9x1 + 12x2 + 5x3 + 4x4

x
e
s.t. 3x1 + 4x2 + 2x3 + x4 ≤ 7
xi ∈ {0, 1} i = 1, . . . , 4
The answer to this problem is x∗1 = 0, x∗2 = 1, x∗3 = 1, and x∗4 = 1 for a value of $21.
16.1.2 Class scheduling problem

Consider that a student has to register n courses or amount of credits. The student has a preference for certain
ones. The objective function is to maximize the student’s preference. However, the problem is constrained
for the minimum and maximum amount of credits or courses and the schedule overlapping among them.
Let us consider the following example to illustrate this problem
189
Example. A student must register for four courses next semester: Chemistry, Physics, Mathematics, and
Design. There are three sections of Physics and two sections of each of the other courses. The following
combinations of sections are not possible because of time overlaps:
• Physics-01, Chemistry-01 and Mathematics-01.
• Physics-02 and Mathematics-02.
• Chemistry-01 and Design-02.
The student preference weight values from 1 to 5 for the various sections and the corresponding binary
design variables are shown in Table 16.1.
Table 16.1: Course preference and related design variable.

Section Var. Pref. Section Var. Pref. Section Var. Pref.
Physics-01 x1 5 Chemistry-01 x4 3 Mathematics-02 x7 5
Physics-02 x2 3 Chemistry-02 x5 5 Design-01 x8 5
Physics-03 x3 1 Mathematics-01 x6 2 Design-02 x9 2
In the optimization problem xi = 1 implies that section i is chosen and xi = 0 implies that it is not. The
BIP problem can be written as
max 5x1 + 3x2 + x3 + 3x4 + 5x5 + 2x5 + 5x7 + 5x8 + 2x9

x
e
s.t. x1 + x4 + x6 ≤ 1
x2 + x7 ≤ 1
x4 + x9 ≤ 1
x1 + x2 + x3 = 1 (16.2)
x4 + x5 = 1
x6 + x7 = 1
x8 + x9 = 1
xi ∈ {0, 1} i = 1, . . . , 9
The solution to this problem is x1 = 1, x2 = 0, x3 = 0, x4 = 0, x5 = 1, x6 = 0, x7 = 1, x8 = 1, x9 = 0.
16.1.3 Traveling salesman problem

Consider a traveling salesperson who must visit each of n cities before returning home. The salesperson
knows the distance between each of the cities and wishes to minimize the total distance traveled while
visiting all of the cities. In what order should he or she visit the cities?
Let there be n cities numbered from 1 up to n. For each pair of cities (i, j) let cij be the cost of going
from city i to city j or from city j to city i. Let xij be 1 if the person travels between cities i and j (either
190
from city i to city j or from j to i). This problem is known as the symmetric TSP. In the asymetric TSP, the
cost to travel in one direction may differ from the cost to travel in the other, and the decision variables must
distinguish between the two directions. Clearly the asymetric problem is the more general.
The constraints of this problem include that the seller must start and finish in the same city and it must
visit all the cities and each city only once. The optimization problem can be expressed as
f (x) = ni=1 nj=1 cij xij

P P
min
x
e Pen
s.t. xij = 1
Pni=1 (16.3)
j=1 xij = 1
xij = 0 for i = j
xij ∈ {0, 1} i, j = 1, . . . , n.
16.2 Transportation and networks

16.2.1 Terminology
In general, a network consists of a set of nodes connected by arcs. An arc is an ordered pair of nodes often
indicating the direction of motion between the nodes. A chain between two nodes is a sequence of arcs
connecting them. A path is a chain in which the terminal node of each node is the initial node of the next
arc. A node j is reachable from node i if there is a path from i to j. A cycle or a loop is a chain starting from
a node and ending at the same node. A network is said to be connected if there is chain between any two
nodes. A network or a graph is a tree if it is connected and has no cycles. A spanning tree is a connected
network that includes all the nodes in the network with no loops. A directed arc is one in which there is a
positive flow in one direction and a zero flow in the opposite direction.
16.2.2 Transportation problem

Consider m source nodes with s1 , . . . , sm products in storage, respectively. There are n destination nodes
with demands d1 , . . . , dn . The transportation cost between source i and destination j is cij . The problem is
to determine the quantity of product xij to be transported from i to j. This problem can be stated as
Pn Pn
min i=1 j=1 cij xij
x
e Pn
s.t. xij = si i = 1, . . . , m (constraints on supply) (16.4)
Pj=1
m
i=1 xij = dj j = 1, . . . , n (constraints on demand)
xi,j ≥ 0 and they are integer.
This is an example of a balanced transportation problem in which the total demand is equal to the total
offer. If the problem is not balanced, then one can add artificial source and/or destination nodes. The costs
associated with these artificial nodes can be made equal to zero.
191
Example. Goods manufactured at three stations are to be delivered to four locations. The supply at the
stations is s = (5, 8, 9)T while the demand is at the locations is d = (3, 9, 4, 6)T . The corresponding
e e
transportation costs are
 T
3 2 6 5
c = 8 4 2 1 .
 
e
e 2 7 4 2
Obtain the optimal allocations for minimum cost.
These problems can be solved using conventional simplex method; however, the size of the coefficient
matrix can be very large, i.e., (m + n) × mn. In this case, the dual formulation is a more efficient approach
(Belegundu & Chandrupatla, 1999). The dual problem can be stated as
Pm Pn
max i=1 si ui + j=1 dj vj
u,v (16.5)
e e
s.t. ui + vj ≤ cij i = 1, . . . , m, j = 1, . . . , n.
16.2.3 Assignment problem

A special case of the transportation problem is the assignment problem which occurs when each supply is
1 and each demand is 1. In this case, the integrality implies that every supplier will be assigned one desti-
nation and every destination will have one supplier. The costs give the charge for assigning a supplier and
destination to each other. These problem include assigning n jobs to n machines, or lecturers to classrooms,
or drivers to trucks, among others.
In these problems there is a cost cij associated to the job i assigned to the machine j. This problem can
be expressed as
f (x) = ni=1 nj=1 cij xij
P P
min
y e
e Pn
s.t. xij = 1 (16.6)
Pj=1
n
i=1 xij = 1
xi,j ∈ {0, 1} i, j = 1, . . . , n.
If the number of source and destination nodes is not the same, artificial nodes can be added.
16.2.4 Minimum distance problem

Consider a phone network. At any given time, a message may take a certain amount of time to traverse each
line (due to congestion effects, switching delays, and so on). This time can vary greatly minute by minute
and telecommunication companies spend a lot of time and money tracking these delays and communicating
these delays throughout the system. Assuming a centralized switcher knows these delays, it remains the
problem of routing a call so as to minimize the delays.
192
Part VI
Global optimization
193
Chapter 17
Genetic algorithms
17.1 Description
Genetic algorithms (GAs) are stochastic search techniques based on the principals of natural selection and
genetic recombination as observed in nature. These techniques were formally introduced in the United States
in the 1970s by John Holland at University of Michigan (Holland, 1975). Although, this search approach is
inherently an optimization process, since it searches for the fittest or optimal solution to a particular problem,
it has been used in a wide variety of applications (Chambers, 1995). In particular, genetic algorithms work
very well on mixed (continuous and discrete), combinatorial problems. They are less susceptible to getting
“stuck” at local optima than gradient search methods, but they tend to be computationally expensive. The
continuing price/performance improvements of computational systems has made them attractive for certain
applications.
This section presents an illustrative example of the use of genetic algorithms in an optimization problem,
along with the code for its implementation in M ATLAB.
17.2 Components
The three most important aspects of using genetic algorithms are:
1. Definition of the objective function.
2. Definition and implementation of the genetic representation.
3. Definition and implementation of the genetic operators.
Once these three have been defined, the generic genetic algorithm should work fairly well. Beyond
that, you can try many different variations to improve performance, find multiple optima (if they exist), or
parallelize the algorithm.
194
17.3 Algorithm
A traditional implementation of a genetic algorithm is as follows:
1. Initialize an initial population of individuals.
2. Evaluate the fitness value of each member in the population.
3. Select parents of individuals according to some selection schemes.
4. Create new individuals by mating parents by applying crossover and mutation.
5. Evaluate the new individuals and insert them into the population.
6. If the convergence condition is reached, stop and return the best individuals; if not, go to 3.
The binary alphabet (0, 1) is often used to represent the members of a population, although depending
on the application integers or real numbers are used. In fact, almost any representation can be used that
enables a solution to be encoded as a finite length string.
17.4 Implementation
17.4.1 Test problem
For the specific purpose of this paper, let us explain the use of a genetic algorithm through an example. Let
us consider the following optimization problem presented by Reeves & Rowe (2003):
max f (x) = x3 − 60x2 + 900x + 100

x (17.1)
s.t. 0 ≤ x ≤ 31.
The conventional use of a genetic algorithm requires the representation of the design variables as a genome
or chromosome. This is nothing but a binary representation. For the optimization problem given by (17.1),
the encoding function is such that
Table 17.1: Encoding function
00000 = 0
00001 = 1
.. ..
. .
11111 = 31
The genetic algorithms are stochastic in nature and require the use of a random number source. For this
problem let us use the following array of random numbers obtained from M ATLAB.
195
Table 17.2: Random numbers
0.6038 0.0153 0.9318 0.8462 0.6721 0.6813 0.5028 0.3046 0.6822
0.2722 0.7468 0.4660 0.5252 0.8381 0.3795 0.7095 0.1897 0.3028
0.1988 0.4451 ···
0.9501 0.2311 0.6068 0.4860 0.8913 0.7621 0.4565 0.0185 ···
17.4.2 Initial population

The first task is to select an initial population of design variables, i.e., set of strings. According to Reeves
& Rowe (2003), the minimum population size should guarantee that at the very least every point in the
search space should be reachable from the initial population by crossover only. This requirement can only
be satisfied if there is at least one instance of every allele at each locus in the whole population of strings. On
the assumption that the initial population is generated by a random sample with replacement, the probability
P that at least one allele is present at each locus can be found. For binary strings of length L, this is
P = (1 − (1/2)N −1 )L , (17.2)
where N is the number of individuals or chromosomes. In other words,

1
ln(1 − P L )
N =1− . (17.3)
ln(2)
For example, a population of 10 is enough to ensure that the required probability exceeds 99% for strings
of length 5. Initial random populations do not necessarily cover the search space uniformly, and there may
be advantages in terms of coverage if we use more sophisticated statistical methods. Let us take our genes
of 5 alleles, labeled 0,. . .,4. We choose the population size N to be a multiple of 5, and the alleles in each
“column” are generated as an independent random permutation of 0,. . .,N − 1, which is then taken module
5. For our example, let us assume that we have a initial population of 4 individuals. If we generate a random
number r, then we assign each allele of the chromosome the value 0 if r < 0.5, and a 1 otherwise. The first
five random numbers in Table 17.2 generate the chromosome 10111. Applying the decoding function from
Table 17.1, this number corresponds to
x = 1 · 24 + 0 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = 23,
and its function value is f (23) = 1227. Following the same procedure with 3 more sets of random numbers
produces the first generation. This initial population is shown in Table 17.3.
196
Table 17.3: Initial population
No. String x f (x) s(x)

1 10111 23 1227 0.1891
2 11010 26 516 0.0796
3 10110 22 1508 0.2325
4 10000 16 3236 0.4988
Total fitness 6487
Average fitness 1622
17.4.3 Selection
The final column in Table 17.3 is the fitness-proportional selection probability. This is the fitness value
divided by the total fitness,
f (x)
s(x) = P . (17.4)
f (x)
From this column we can derive the following cumulative distribution for roulette-wheel selection:
Table 17.4: Initial cumulative distribution
Rand. No. 0.1891 0.2687 0.5012 1.0000

String No. 1 2 3 4
17.4.4 Crossover
The next two random numbers, starting the fourth row of Table 17.2, are 0.9501 and 0.2311, which implies
that we select strings 4 and 2 as parents of the next generation. Now we perform crossover and mutation of
these strings. Crossover is an operation that swaps genetic material between the parent chromosomes after
some selected crosspoint. In the single one-point crossover, we choose this crosspoint with equal probability
from the numbers 1, 2, 3 and 4, so we can use the following distribution:
Table 17.5: Initial cumulative distribution
Rand. No. 0.2500 0.5000 0.7500 1.0000

Crosspoint 1 2 3 4
As the next random number is 0.6068, the selected crosspoint is 3. If we cross the strings (10000) and
(11010) at the 3rd crosspoint, the resulting strings are (10010) and (11000).
From these two children we select one with a probability of 0.5. Since the next random number is
0.4860, we choose the first one, i.e., (10010). Another criteria to choose the child is the so-called elitism.
This is nothing but selecting the child with the best fitness.
197
17.4.5 Mutation
The concept of mutation is even simpler than crossover: a gene (or subset of genes) is chosen randomly and
the allele value of the chosen genes is changed. We shall suppose that the mutation probability pm is 0.10
for each locus of the string. Since the next random number is 0.8913(> 0.10) there is no change to the allele
value at locus 1. Similarly loci 2, 3 and 4 remain the same, but at locus 5 the random number is 0.0185 so
the allele value for this gene changes from 0 to 1. The final string is (10011), which decodes to x = 19 with
fitness f (19) = 2399. This procedure can be repeated to produce a second generation as seen in Table 17.6.
Table 17.6: Second generation
First Second Crossover Offspring

Step Mutation?
parent parent point String f (x)
1 4 2 3 nnnny 10011 2399
2 3 1 2 nnnyn 10100 2100
3 4 1 4 nnnny 10000 3236
4 2 4 2 nnnnn 10010 2692
Total fitness 10425
Average fitness 2607
The search for the maximum (i.e., x = 10) will continue until it finds the optimum point. This can
happen after several generations when all the individuals have the same value. The genetic algorithm can
run until it reaches a maximum number of generations. The next section presents the M ATLAB code used to
implement the example above.
Finally, as an optimization technique, genetic algorithms possess an implicit parallelism: different partial
effective gene combinations—or schemata, are searched in parallel manner, simultaneously for all searched
combinations. Genetic algorithms are most attractive because they are used for the purpose of global opti-
mization, where the absolute optimum of a search space is sought. This a result of the entire design space
being searched. Because of the parallel nature of the GA approach, the performance is much less sensi-
tive to initial conditions. Most gradient-based algorithm use local gradient information that confines the
optimization to that search space leading to locally optimum solutions. Though these gradient-based opti-
mizations are effective in unimodal search spaces where there exists only one optimum, they are less capable
of locating the global optimum in multi-modal search spaces with multiple optima.
17.4.6 Matlab code
function gafun
% Implementation of a maximization problem
% A. Tovar
clear ’all’
% 1) Initial population
198
% 1.1) Size of the initial population
% L=5 %length of the string
% P=.999 %desired probability
% N=round(real(1+log(-L/log(P))/log(2))) %population
% or
L=5 %length of the string
N=5 %number of individuals
P=(1-(0.5)ˆ(N-1))ˆL %probability
% 1.2) Initial population
for n=1:N
binx(n,:)=sprintf(’%d’,rand(1,L)<ones(1,L)*0.5);
end
it=0;
while it<50
it=it+1;
%2) Evaluation
x=decoding(binx);
fx=maxfun(x);
tfx=sum(fx);
%3) Fitness proportional selection probability and
% Initial cumulative distribution
sx=fx./tfx;
tsx=sum(sx);
csx=cumsum(sx);
%4) Table of the generation
disp(sprintf(’\n’))
disp([’generation ’,num2str(it)])
disp(sprintf(’%s’,’ x fx csx ’))
disp(num2str([x,fx,csx]))
%5) Parents and children
for k=1:N
parents(k,:)=getparents(rand,rand,csx);
bparents=binx(parents(1,:),:);
[cpt(k,1),binxc(k,:),binx(k,:)]=getbchild(bparents,rand);
end
pause
end
%--------------------------------------------------------------------------
function [cpt,bchild,bchildm]=getbchild(bparents,pc)
j=0;
cpt=0;
szbpar=size(bparents,2);
pt=(1:(szbpar-1));
ptn=pt/max(pt);
% Crossover point
while j<=size(ptn,2) & cpt==0
j=j+1;
if pc<ptn(j)
cpt=j;
end
end
% a) Crossover
bchild1=[bparents(1,1:cpt),bparents(2,cpt+1:szbpar)];
bchild2=[bparents(2,1:cpt),bparents(1,cpt+1:szbpar)];
% Selection of the child
199
% if rand<0.5
% bchild=bchild1;
% else
% bchild=bchild2;
% end
child1=decoding(bchild1);
fchild1=maxfun(child1);
child2=decoding(bchild2);
fchild2=maxfun(child2);
if fchild1>fchild2
bchild=bchild1;
else
bchild=bchild2;
end
% b) Mutation
i=0;pm=0.1;
bchildm=bchild;
while i<1:szbpar
i=i+1;
if rand<=pm
bchildm=str2num(bchildm);
bchildm(i)=mod(bchildm(i)+1,2);
bchildm=num2str(bchildm);
end
end
%--------------------------------------------------------------------------
function parents=getparents(p1,p2,csx)
i=0;
P1=0;P2=0;
while i<=size(csx,1) & (P1==0 | P2==0)
i=i+1;
if p1<csx(i) & P1==0
P1=i;
end
if p2<csx(i) & P2==0
P2=i;
end
end
parents=[P1,P2];
%--------------------------------------------------------------------------
function d=decoding(b)
d=bin2dec(num2str(b));
%--------------------------------------------------------------------------
function y=maxfun(x)
y=x.ˆ3-60*x.ˆ2+900*x+100;
200
Chapter 18
Simulated Annealing
18.1 Description
Simulated Annealing (SA) is a probabilistic method for global optimization. It was independently proposed
by Kirkpatrick et al. (1983) and C̆erný (1985). The inspiration of this technique comes from the annealing
process in metallurgy. This technique involves heating and controlled cooling of a material to relieve internal
stresses from a previous hardening process. Heat causes the atoms to change their initial positions (a local
minimum of the internal energy) and wander randomly through states of higher energy. The controlled
cooling gives them a chance of finding configurations with lower internal energy than the initial one.
Starting off at the maximum temperature, the annealing process can be described as follows (van
Laarhoven & Aarts, 1988). At a given temperature T , the material can reach thermal equilibrium. This
state is characterized by a probability P (E = E0 ) for the internal energy E to be in a state E0 given by the
Boltzmann distribution:
1 E0
P (E = E0 ) = exp − , (18.1)
Z(T ) kB T
where Z(T ) is a normalization factor referred to as the partition function, which depends on the temperature

T , and kB is the Boltzmann constant, kB = 1.3806505 × 10−23 Joule/Kelvin. The factor exp − kBET is
known as the Boltzmann factor. As the temperature approaches to zero, only the minimum energy states
have a non-zero probability of occurrence.
To simulate the evolution to thermal equilibrium, N. Metropolis & Teller (1953) proposed a Monte Carlo
method, which generates sequences of states of the material in the following way. Given the current state
of the solid, characterized by the positions of its atoms, a small, randomly generated perturbation is applied
by a small displacement of a randomly chosen atom. If the difference in energy ∆E1 = E1 − E0 between
the original state E0 and the slightly perturbed state E1 on is negative (i.e., the perturbed state is in a lower
energy) then the process is continued with the new state. If ∆E1 > 0, then the probability of acceptance of
201
the perturbed state is given by the Boltzmann factor, or

E1
P (∆E1 > 0) = exp − . (18.2)
kB T
This acceptance rule for new states is referred to as the Metropolis criterion. Following this criterion, the
system eventually evolves into thermal equilibrium. This property applied to an optimization problem is
referred to as the Metropolis algorithm.
18.2 Components
Let us consider an optimization problem of the form
min f (x)
x e (18.3)
e
s.t. xL ≤ x ≤ xU .
e e e
As explained by Belegundu & Chandrupatla (1999), the solution of this optimization problem initiates
at a feasible starting point x0 with a function value f0 . Let us set xmin = x0 and fmin = f0 . A vector
e e e
of step sizes s is also defined. Initially, each step size is selected equal to a value sT (e.g., sT = 1). A
e
vector of acceptance ratios a is defined such that each element ai = 1. A starting temperature T is chosen
e
(e.g. T = 10). A reduction factor rT for the temperature is selected (e.g., rT = 0.5). For each temperature
rT T there is an associated step size rs sT . At each temperature the algorithm performs NT iterations. (e.g.,
NT = 5). Each iteration consists of NC cycles (e.g., NC = 5). A cycle involves taking a random step in
each of the n directions.
A step in a direction i is taken in the following manner. A random number r in the range −1 to 1 is
generated. A new point is evaluated using
xs = x + r s i e i , (18.4)
e e e
where ei is the coordinate direction along i. If the new point xs is outside the bounds, the i-th component
of xs is adjusted to be a random point in the interval xLi to xU
e e
i . The function value fs is then evaluated. If
fs ≤ f then the point is accepted by setting x = xs . If fs < fmin then fmin and xmin are updated. If fs > f
e
e e e
then the new point is accepted with a probability of

f − fs
p = exp . (18.5)
T
To this end, a new random number r is generated. If r < p then xs is accepted. This is referred to as
e
Metropolis criterion. Whenever a rejection takes place, the acceptance ratio ai is updated. The acceptance
ratio ai is the ratio between the number of acceptances to the total number of trials for direction i. At the
202
end of NC cycles, the value of the acceptance ratio ai is used to update the step size for the direction si .
A low value implies that there are more rejections, suggesting that the step size should be reduced. A high
rate indicates more acceptances which may be due to a small step size. In this case, the step size is to be
increased. If ai = 0.5, the current step size is adequate with the number of acceptances at the same level as
that of rejections. Once again, drawing from the work of Metropolis on Monte Carlo simulations of fluids,
our idea is to adjust the steps to achieve the ratio of acceptances to rejections equals to 1.
To update the step size, a step multiplication factor is introduced such that
si = si g(ai ), (18.6)
where 
−0.6

 1 + c ai0.4 for ai > 0.6

1
g(ai ) = 0.4−ai for ai < 0.4 (18.7)
 1+c 0.4

1 otherwise

where c = 2 (Corana et al., 1987). Sometimes the step size may take low values and may not permit a step
into the higher energy rate even at moderate temperatures. A resetting scheme is sometimes introduced.
This scheme states that at the end of each temperature step, each step size is updated to be equal to sT (e.g.
sT = 1).
For convergence one can use the variation between two minimum values of the function fmin , such that
(k−1) (k) (k)

fmin − fmin ≤ εa + εr |fmin |, (18.8)
where εa and εr are absolute and relative tolerances. Let us incorporate these elements as an algorithm
which is explained in the next section.
18.3 Algorithm
The procedure presented in the previous section can be summarized as follows:
Step 1. Select T = 10 and a feasible design x(0) with f (0) . Select NC , NT and rT . Initialize xmin = x(0) ,
fmin = f (0) , ai = 1, sT = 1, si = sT , K = 1, k = 1, and i = 1.
e e e
(k)
Step 2. Generate a new point xs using (18.4). Adjust the point if it is unfeasible by generating a random
(k)
number between xLi to xU
e
i . Compute fs .
(k) (k) (k)

Step 3. If fs ≤ f accept the point. If fs (k) < fmin update xmin = xs and fmin = fs . Otherwise,
e e
accept the point with a probability (18.5). If the point is rejected then update the acceptance ratio
ai .
Step 4. If i < n, then set i = i + 1 and go to Step 2. Otherwise, continue.
203
Step 5. Set i = 1. If k < NC , then set k = k + 1 and go to Step 2. Otherwise, continue.
Step 6. Set k = 1. If K < NT , then set K = K + 1 and update si according to (18.6), then go to Step 2.
Otherwise, continue.
Step 7. If convergence is achieve as in (18.8), then stop. Otherwise, set K = 1, T (K) = rT T (K−1) , and
si = sT , then go to Step 2.
Example. Let us consider the following optimization problem
min f (x) = cos(x) + cos(0.5x2 )

x
s.t. 0 ≤ x ≤ 10
and the initial point x = 5 with f = 1.282. Using an interval reduction technique, i.e., function fminbnd
in M ATLAB (no initial point required), one obtains the local optimum x∗ = 2.587 with f ∗ = −1.829. A
gradient-based technique, e.g., function fmincon in M ATLAB (interior point), will find x∗ = 4.292 with
f ∗ = −1.385. Using a genetic algorithm, i.e., function ga in M ATLAB (initial population of 50 individuals
and 1000 iterations) will find x∗ = 2.587 with f ∗ = −1.829. Using the simulated annealing method (i.e.,
function simulannealbnd in M ATLAB) the solution x∗ = 9.705 with f ∗ = −1.961 is found after 1033
iterations. Let us perform the first couple iterations.
The initial minimum values will be xmin = 5 and fmin = 1.2815. Let us assume NC = 3 (number of
cycles per temperature), NT = 2 (number of iterations per temperature), and rT = 0.5. Let the step size be
s = 1 and the initial temperature T = 10.
• For K = 1 at T = 10 and s = 1 we have:
(1)
– For k = 1 a random number could be r(1) = −0.7929. Then, the first point is xs = 4.2071
(1)
(feasible) with fs = −1.3233. This point improves the current result and updates xmin =
4.2071 and fmin = −1.3233.
(2) (2)
– For k = 2, r(2) = 0.7471, xs = 5.7471 (feasible), and fs = 0.1678. This point improves f
so it is accepted.
(3) (3)
– For k = 3, r(3) = 0.3436, xs = 5.3436 (feasible), and fs = 0.4505. This point improves f
so it is accepted.
No rejections, then the step size is increased to
1 − 0.6
s=1+2 1 = 3.
0.4
204
(1) (1)
– For k = 1, r(1) = 0.5322, xs = 6.5965 (feasible), and fs = −0.0213. This point improves
f so it is accepted.
(2) (2)
– For k = 2, r(2) = −0.0579, xs = 4.8262 (feasible), and fs = 0.7191. This point improves
(3) (3)
– For k = 3, r(3) = −0.6330, xs = 3.1009 (feasible), and fs = −0.9039. This point improves
Convergence cannot be checked yet. Now, the temperature is reduced by a factor of rT = 0.5 to T = 5.
Since no rejections the step size is updated to s = 1 + 2(3 − 0.6)/0.4 = 13. The new point is x = 4.2071
and f = −1.3233.
(1) (1)
– For k = 1, r(1) = −0.0165, xs = 4.7860 (feasible), and fs = 0.5149. This point does not
improve f . The probability of acceptance is p(1) = exp((−1.3258 − 0.5149)/5) = 0.6920. A
(1)
new random number rp = 0.5805 then the point is rejected. The acceptance ratio is updated to
a = 1 − 1/3 = 2/3.
(2)
– For k = 2, r(2) = 0.7997, xs = 15.3958 which is unfeasible and needs correction. Then a
(2) (2)
random number is generated between 0 and 10, so xs = 3.7371 and fs = −0.0629. This
(2)
point does not improve f . Then, p(2) = 0.7768, rp = 0.9779 then the point is accepted.
(3) (3)
– For k = 3, r(3) = 0.3338, xs = 9.3391 (feasible), and fs = −0.0651. This point does not
(3)
improve f . Then, p(3) = 0.7771, rp = 0.1280 then the point is rejected and a = 2/3 − 1/3 =
1/3.
The new acceptance ratio is a = 1/3 then the step size is updated to
1
s = 13 = 9.75
1+ 2 0.4−1/3
0.4
• For K = 2 at T = 5 and s = 9.75 the process continues in the same fashion until convergence is
achieved.
205
Chapter 19
More global optimization methods∗
19.1 Other stochastic methods∗

1. Ant colony optimization (ACO). Proposed by Dorigo (1992).
2. Cross-entropy method (CE). Proposed by Rubinstein (1997).
3. Evolution strategies (ES)
4. Extremal optimization (EO)
5. Genetic programming (GP)
6. Interactive genetic algorithms (IGA)
7. Memetic algorithm (MA)
8. Tabu search (TS)
19.2 Deterministic methods∗

Not all global optimization methods are stochastic or based on Monte Carlo techniques. Some methods
are deterministic. Exhaustive search over the feasible space is an example. Other deterministic methods
include:
1. Covering. Exhaustive search is a covering method.
2. Zooming
3. Generalized descent methods
• Trajectory
206
• Penalty methods
• Golf methods
4. Tunneling
207
Part VII
Multiobjective Optimization
208
Chapter 20
Pareto Optimality
20.1 Problem statement

Engineering design optimization often involves minimizing or maximizing multiple conflicting objective
functions. For example, in structural optimization one would like to design a structure of minimum mass and
maximum stiffness; in manufacturing one would like to maximize the production volume while minimizing
the production cost. These problems are referred to as multiobjective optimization and can be expressed as
 
f1 (x)
 e 
 f2 (x) 
min f (x) = 
 .. e 

x e e  .  (20.1)
e
fm (x)
x ∈ Ω,
e
s.t.
e
or alternatively as
min f1 (x)
x e
e
min f2 (x)
x e
..
e
.. (20.2)
. .
min fm (x)
x e
e
s.t. x ∈ Ω.
e
If the objective functions fi (x) do not conflict with one another, the problem might have a single optimum
x∗ that minimizes all the objectives simultaneously. However, it is more often to find conflicting objec-
e
e
tives. In this case, there might be several good points that are a compromise of all the objective functions.
Out of all possible points, some of them will be efficient and their corresponding function vectors will be
nondominated.
Vilfredo Pareto (1848–1923), an Italian sociologist, economist, and philosopher, made important contri-
209
butions in the study of income distribution and in the analysis of individuals’ choices. In his work (Pareto,
1906) he introduced the concept of Pareto optimality. Let us define the concepts of efficiency, dominance
and Pareto optimality in the following section.
20.2 Concepts
20.2.1 Efficiency
A feasible point x∗ is efficient for (20.1) if there is no other feasible point that reduces at least one objective
e
function without increasing another one. This can be defined more precisely as follows:
A point x∗ ∈ Ω is efficient if and only if there is no other point x ∈ Ω such that
e e
fi (x) ≤ fi (x∗ ) for all i = 1, . . . , m
e e
with at least one fi (x) < fi (x∗ ). Otherwise x∗ is inefficient. The set of all efficient points is referred to as
e e e
the efficient frontier.
20.2.2 Dominance
All the function vectors f can be represented as points in a space defined by the objective functions fi . This
e criterion space or the cost space. The projection of all feasible points Ω into the
space is referred to as the
criterion space conforms the feasible criterion space Ω0 . The dominance of the different solutions can be
defined in Ω0 that space as follows:
A function vector f ∗ ∈ Ω0 is nondominated if and only if there is no other function vector f ∈ Ω0 such
that e e
fi ≤ fi∗ for all i = 1, . . . , m
with at least one fi < fi∗ . Otherwise, f ∗ is dominated.

In other words, the projection of ane efficient point in the design space to the criterion space is a nondom-
inated point. The set of all nondominated points is referred to as the Pareto frontier.
20.2.3 Pareto optimal

An efficient point is also referred to as a Pareto efficient or Pareto optimal point. One can also define a
weakly Pareto optimal point as the one in which there is no point that improves all the objective functions
simultaneously; however, there may be points that improve some of the objectives while keeping the others
unchanged. More precisely:
A point x∗ ∈ Ω is a weakly Pareto optimal if and only if there is no other point x ∈ Ω such that
e e
fi (x) < fi (x∗ ) for all i = 1, . . . , m.
e e
210
To illustrate the main concepts involved in multi-objective optimization, let us consider the following prob-
lem from Belegundu & Chandrupatla (1999).
Example. Each unit of product Y requires two hours of machining in the first cell and one hour in the
second cell. Each unit of product Z requires three hours of machining in the first cell and four hours in the
second cell. There are 12 available machining hours in each cell. Each unit of Y yields a profit of $ 0.80
and each unit of Z yields $ 2.00. It is desired to determine the number of units of Y and Z to manufacture to
maximize both:
1. total profit, and
2. consumer satisfaction by producing as many unit of the superior quality product Y.
If x1 and x2 denote the number of units of Y and Z, respectively, then the problem can be written as
! !
f1 −0.8x1 − 2x2
min =
x f2 −x1
e
s.t. 2x1 + 3x2 ≤ 12 (20.3)
x1 + 4x2 ≤ 12
x1 , x2 ≥ 0.
In a contour plot, the feasible space is defined by the vertices (0, 0), (0, 3), (2.4, 2.4), and (6, 0). The
solution for f1 only is (2.4, 2.4) while the one for f2 only is (6, 0). Every point in the design space (x1 , x2 )
has a corresponding point in the objective space (f1 (x), f2 (x)) which is also called criterion or cost space. In
e e
particular, the feasible space will be defined by the vertices (0, 0), (6, 0), (6.72, 2.4), and s(4.8, 6). Observe
that there are no points in the line between the two optima is “better” than any other point on the line with
respect to both objectives. These points are referred to Pareto points and satisfy Pareto optimality.
20.3 Generation of the Pareto frontier

Two classic approaches to generate the Pareto frontier are the weighted sum method and the constrained
objectives method. In the first case, the problem can be expressed using weighting parameters ωi such that
min f (x) = ω1 f1 (x) + ω2 f2 (x) + · · · + ωm fm (x)

x e e e e (20.4)
e
s.t. x ∈ Ω,
e
Pm
where ωi ≥ 0 and i=1 ωi = 1. The other popular approach is to designate one of the objectives as the
211
primary objective and constrain the values of the other ones. This is
min f1 (x)
x e
e
s.t. f2 (x) ≤ c2
..e (20.5)
.
fm (x) ≤ cm
x ∈ Ω,
e
e
where ci are arbitrary values. A specific set of ωi or ci results in a solution x∗ . However, the global picture
e
of the optimum solutions is obtained for a wide range of values for ωi or ci . That global picture will help
us to find the single best compromise solution to the problem. However, the weighting approach will not
generate the entire curve for nonconvex problems (Koski, 1985). Also, for certain choices of constraint
limits there may not be no a feasible solution. Several other techniques to generate the Pareto frontier have
been proposed. Some of the evolutionary techniques presented by Arora (2004) include: Vector Evaluated
Genetic Algorithm, Ranking, Pareto Fitness Function, Pareto-Set Filter, Elitist Strategy, and Niche Tech-
niques. Other techniques include: weighted minimax, weighted global criterion, lexicographic, bounded
objective function, goal programming, homotopy methods, Normal-Boundary Intersection, and Multilevel
Programming.
20.4 Single best compromise Pareto solution

20.4.1 Utopia point
This is a unique point f o in the criterion space such fio = min{fi (x)} for i = 1, . . . , m and for all x ∈ Ω.
e point. The utopia point exists only in the ecriterion space and, in general, ite is not
It is also called the ideal
attainable.
20.4.2 The minimax method

There are quantitative criteria to select a single best compromise Pareto design. The most popular criterion
is the minimax method. Consider a Pareto point x with corresponding (f1 , . . . , fm ) as its coordinates in the
criterion space. Let (f1o , . . . , fm
o ) be the coordinates
e
of the utopia point. Then one can define the deviations
zi = |fi − fio | for i = 1, . . . , m. The minimax approach seeks to find a single Pareto point x∗ from
min max{z1 , . . . , zm }. The problem is to determine x∗ from
e
e
min max{z1 , . . . , zm }
x (20.6)
e
s.t. x ∈ Ω.
e
This is
212
min xn+1
x
(20.7)
e
s.t. zi − xn+1 ≤ 0, i = 1, . . . , m
x ∈ Ω.
e
Example. For the previous example, the best compromise Pareto solution can be obtained from
n o
|0.8x1 +2x2 −6.72| |x1 −6|
min max 6.72 , 6
x
e
s.t. 2x1 + 3x2 ≤ 12 (20.8)
x1 + 4x2 ≤ 12
x1 , x2 ≥ 0.
In this case the deviations are normalized. This gives x∗ = (4.84, 0.77)T with f ∗ = (5.412, 4.84)T .
e e
213
Bibliography
Allen, M.B. & E.L. Isaacson. 1998. Numerical Analysis for Applied Science. United States of America: John
Wiley & Sons, Inc.
Arora, J.S. (ed.) . 2004. Introduction to Optimum Design. Elsevier Academic Press, second edn.
Atkinson, K.E. 1978. An Introduction to Numerical Analysis. New York: Wiley.
Bartle, R.G. 1976. The Elements of Real Analysis. New York: Wiley, second edn.
Bazaraa, M.S., H.D. Sherali & C.M. Shetty. 2006. Nonlinear Programming. United States of America:
Wiley, third edn.
Belegundu, A.D. & T.R. Chandrupatla (eds.) . 1999. Optimization Concepts and Applications in Engineer-
ing. Prentice Hall.
Bertsekas, D.P. 2008. Nonlinear Programming. United States of America: Athena Scientific, second edn.
Biggs, M.C. 1975. Constrained Minimization Using Recursive Quadratic Programming, 341–349.
Binmore, K.G. 1982. Numerical Analysis: A Straightforward Approach. Cambridge, UK: Cambridge Uni-
versity Press, second edn.
Burden, R.L. & J.D. Faires. 2005. Numerical Analysis. United States of America: Thomson, eight edn.
Cauchy, A. 1847. Methode general pour la resolution des systemes d’equations simultanes. Comptes rendus,
Ac. Sci. Paris. 25, 536–538.
Chambers, L. (ed.) . 1995. Practical handbook of genetic algorithms. CRC Press.
Chong, E.K.P. & S.H. Żak. 2001. An Introduction to Optimization. John Wiley & Sons, Inc., second edn.
Corana, A.M., M. Marchesi, C. Martini & S. Ridella. 1987. Minimizing multimodal functions of continuous
variables with simulated annealing algorithm. ACM Transactions on Mathematical Software 13, 262–280.
Dantzig, G.B. 1951. Maximization of a linear function of variables subject to linear inequalities. In T.C.
Koopmans (ed.) Activity Analysis of Production and Allocation, chap. 21, New York: Wiley.
214
Dantzig, G.B. 1959. Linear Programming and Extensions. Princeton University Press.
Dorigo, M. 1992. Learning and Natural Algorithms. Phd thesis, Politecnico di Milano, Italy.
Fletcher, R. & C.M. Reeves. 1964. Function minimization by conjugate-gradients. The Computer Journal 7,
149–154.
Goldstine, H.H. 1972. The Computer: from Pascal to von Neumann. Princeton. New Jersey: Princeton
University Press.
Greenberg, M.D. 1998. Advanced Engineering Mathematics. New Jersey: Prentice Hall, second edn.
Han, S.P. 1977. A globally convergent method for nonlinear programming. Optimization Theory and Appli-
cations 22, 297–309.
Holland, J. H. 1975. Adaptation in Natural and Artificial Systems. The University of Michigan Press.
Kennedy, J. & R. Eberhart. 1995. Particle swarm optimization. 19421948.
Kirkpatrick, S., C.D. Gelatt & M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220, 671–
680.
Koski, J. 1985. Defectiveness of weighting method in multicriterion optimization of structures. Communi-

cations in Applied Numerical Methods 1, 333–337.
Lagarias, J.C., J.A. Reeds, M.H. Wright & P.E. Wright. 1998. Convergence properties of the nelder-mead
simplex method in low dimensions. SIAM Journal of Optimization 9, 112147.
Livio, M. 2002. The Golden Ratio: The Story of Phi, the World’s Most Astonishing Number. New York:
Broadway Books.
Luenberger, D.G. 1989. Linear and Nonlinear Programming. Addison-Wesley Publishing Company, second
edn.
Luenberger, D.G. & Y. Ye. 2008. Linear and Nonlinear Programming. Springer, third edn.
Marquardt, D. 1964. An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal on
Applied Mathematics 11, 431441.
N. Metropolis, M.N. Rosenbluth A.H. Teller, A.W. Rosenbluth & E. Teller. 1953. Equation of state calcula-
tions by fast computing machines. Journal of Chemical Physics 21, 10871092.
Nelder, J.A. & R. Mead. 1965. A simplex method for function minimization. Computer Journal 7, 308313.
Nocedal, J. & S.J. Wright. 1999. Numerical Optimization. United States of America: Springer.
215
Pareto, V. (ed.) . 1906. Manuale di economia politica. Milan: Società editrice libraria. Revised and translated
into Frenchas Manuel dconomie politique. Paris: Giard at Brire, 1909. English translation Manual of
Political Economy. NewYork: Kelley, 1971.
Polak, E. & G. Ribiere. 1969. Note sur la convergence de methods de directions conjugres. Revue Francaise
Informat. Rechercher Operationnelle 16, 35–43.
Powell, M.J.D. 1978. The Convergence of Variable Metric Methods for Nonlinearly Constrained Optimiza-
tion Calculations. Academic Press.
Rao, S.S. 2009. Engineering Optimization: Theory and Practice. United States of America: John Wiley &
Sons, fourth edn.
Reeves, C.R. & J.E. Rowe. 2003. Genetic Algorithms—Principles and Perspectives. Kluwer Academic Pub-
lishers.
Reklaitis, G.V, A. Ravindran & K.M. Ragsdell. 1983. Engineering Optimization: Methods and Applications.
John Wiley and Sons.
Rubinstein, R.Y. 1997. Optimization of computer simulation models with rare events. European Journal of
Operations Research 99, 89–112.
C̆erný, V. 1985. A thermodynamical approach to the travelling salesman problem: an efficient simulation
algorithm. Journal of Optimization Theory and Applications 45, 41–45.
van Laarhoven, P.J.M. & E.H.L. Aarts. 1988. Simulated Annealing: Theory and Applications. The Nether-
lands: Kluwer Academic Publishers Inc.
Wolfe, P. 1959. The simplex method for quadratic programming. Econometrica 27, 382–398.
216

Optimum Design of Mechanical Elements: Class Notes For AME60661

Uploaded by

Copyright:

Available Formats

Optimum Design of Mechanical Elements: Class Notes For AME60661

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimum Design of Mechanical Elements: Class Notes For AME60661

Uploaded by

Copyright:

Available Formats

Optimum Design of Mechanical Elements:

Class notes for AME60661

May 21, 2010

6 Curve fitting methods 63

10 Numerical Analysis 116

IV Constrained multivariate optimization 121

11 Analytical elements 122

12 Linear Programming 140

13 Nonlinear programming 159

V Integer Programming 179

14 Numerical methods 180

VI Global optimization 193

17 Genetic algorithms 194

18 Simulated Annealing 201

19 More global optimization methods∗ 206

VII Multiobjective Optimization 208

20 Pareto Optimality 209

1.3.1 Design variables

The set of all design variables define the design space.

1.3.2 Objective function

1.3.3 Design constraints

find x that minimizesf (x)

1.4.1 Based on the objective function and the constraints

min f (x) = x1 + 2x2 + 3x3

min f (x) = x21 − x2 x3 + 2x1 + 3

1.4.3 Based on the design variables

1.4.4 Based on the uncertainty

1.4.5 Based on the field of application

1.5 Solution methods

(a) In Latin, the word optimum means minimum.

(a) Identify the design variables.

(a) Identify the design variables.

(a) Identify the design variables.

2.1 Vector Algebra

||x||1 = |x1 | + |x2 | + · · · + |xn |. (2.3)

||x||∞ ≤ ||x||1 ≤ n||x||∞ (2.5)

Example. The L1 , L2 , L3 , and L∞ norms of xT = [1, 2, −3] are

2.1.2 Dot product

x1 · x2 = x1 T x2 = x11 x21 + x12 x22 + · · · + x1n x2n . (2.8)

It is possible to prove that

x13 x23 x11 x22 − x12 x21

2.1.4 Tensor product

2.2 Linear Dependence

Example. Determine if the following vectors are LI,

By inspection, x3 = 4x2 + x1 . Therefore the vectors are LD.

LI implies that if α1 x1 + α2 x2 + α3 x3 = 0, then αk = 0. In matrix form this is

Constructing the matrix A = (x1 x2 x3 x4 ), this is

after elementary row operations we can obtain that

2.3 Systems of linear equations

rank(A) = rank(A, b).

rank(A) < rank(A, b).

rank(A) = rank(A, b) < n.

Example. Solve for x in the following systems of linear equations,

(b) rank(A) = rank(A, b) = 1 then the system is consistent. However, n = m = 2 > 1 so A is

The unique solution is given by ! !

2.4 Eigenvalue problem

multiplicity of an eigenvalue, λ, represented as gmulA λ, is defined as the dimension of the associated

Example. Determine all eigenvalues and eigenvectors of

The characteristic equation is

Repeating the same procedure,