Matrix Decomposition and Its Application in Statistics NK

Matrix Decomposition and its
Application in Statistics
Nishith Kumar
Lecturer
Department of Statistics
Begum Rokeya University, Rangpur.
Email: [email protected]
1
Overview
• Introduction
• LU decomposition
• QR decomposition
• Cholesky decomposition
• Jordan Decomposition
• Spectral decomposition
• Singular value decomposition
• Applications
2
Introduction
This Lecture covers relevant matrix decompositions, basic
numerical methods, its computation and some of its applications.
Decompositions provide a numerically stable way to solve
a system of linear equations, as shown already in [Wampler,
1970], and to invert a matrix. Additionally, they provide an
important tool for analyzing the numerical stability of a system.
Some of most frequently used decompositions are the LU, QR,

Cholesky, Jordan, Spectral decomposition and Singular value
decompositions.
3
Easy to solve system (Cont.)
Some linear system that can be easily
solved
The solution:
 b1 / a11 
b / a 
 2 22 
  
 
b / a
 n nn 
4
Lower triangular matrix:
Solution: This system is solved using forward substitution
5
Upper Triangular Matrix:
Solution: This system is solved using Backward substitution
6
LU Decomposition
LU decomposition was originally derived as a decomposition of quadratic
and bilinear forms. Lagrange, in the very first paper in his collected
works( 1759) derives the algorithm we call Gaussian elimination. Later
Turing introduced the LU decomposition of a matrix in 1948 that is used to
solve the system of linear equation.
Let A be a m × m with nonsingular square matrix. Then there exists two

matrices L and U such that, where L is a lower triangular matrix and U is an
upper triangular matrix.
u11 u12  u1m   l11 0  0 
0 l  0 
u 22  u 2 m  L   21
l 22
U 
 
and     
    
  l m1 lm2  l mm 
0 0  u mm 
Where, A  LU J-L Lagrange A. M. Turing

(1736 –1813) (1912-1954)
7
How to decompose A=LU?
 6  2 2
A  12  8 6 Now,
 3  13 2
A … U (upper triangular) 6  2 2   1 0 0  6  2 2
0  4 2    2 1 0 12  8 6
    
 U = Ek  E1 A 0  12 1   1 / 2 0 1  3  13 2
 6  2 2  1 0 0   1 0 0  6  2 2
 A = (E1)1  (Ek)1 U  0  4 2   0 1 0   2 1 0 12  8 6
    
0 0  5 0  3 1  1 / 2 0 1  3  13 2
U E2 E1 A
If each such elementary matrix Ei is a lower triangular matrices,

it can be proved that (E1)1, , (Ek)1 are lower triangular, and
(E1)1  (Ek)1 is a lower triangular matrix.
Let L=(E1)1  (Ek)1 then A=LU. 8
Calculation of L and U (cont.)
 6  2 2 1 0 0  6 2 2
A  12  8 6 = 0 1 0 12 8 6
 3  13 2 0 0 1  3  13 2
Now reducing the first column we have
6 22  1 0 0  6  2 2
0  4 2    2 1 0 12  8 6

0  12 1   1 / 2 0 1  3  13 2
 6  2 2  1 0 0  1 0 0  6  2 2
 0  4 2   0 1 0   2 1 0 12  8 6
0 0  5 0  3 1  1 / 2 0 1  3  13 2
9
Now
1 1
 1 0 0 1 0 0  1 0 0 1 0 0  1 0 0
  2 1 0  0 1 0    2 1 0  0 1 0    2 1 0 
        
 1 / 2 0 1 0  3 1 1 / 2 0 1 0 3 1 1 / 2 3 1
Therefore,
 6  2 2  1 0 0 6  2 2 
A  12  8 6 =  2 1 0 0  4 2 
=LU
   
 3  13 2 1 / 2 3 1 0 0  5
If A is a Non singular matrix then for each L (lower triangular matrix) the
upper triangular matrix is unique but an LU decomposition is not unique.
There can be more than one such LU decomposition for a matrix. Such as
 6  2 2  6 0 0 1  2 / 6 2 / 6
A  12  8 6 = 12 1 0 0  4 2  =LU
 3 3 1 0 10
 3  13 2 0  5 
Calculation
Calculationof
of LL and U (cont.)
and U (cont.)
Thus LU decomposition is not unique. Since we compute LU
decomposition by elementary transformation so if we change
L then U will be changed such that A=LU
To find out the unique LU decomposition, it is necessary to

put some restriction on L and U matrices. For example, we can
require the lower triangular matrix L to be a unit one (i.e. set
all the entries of its main diagonal to ones).
LU Decomposition in R:
• library(Matrix)
• x<-matrix(c(3,2,1, 9,3,4,4,2,5 ),ncol=3,nrow=3)
• expand(lu(x))
11
• Note: there are also generalizations of LU to non-square and singular
matrices, such as rank revealing LU factorization.
• [Pan, C.T. (2000). On the existence and computation of rank revealing LU
factorizations. Linear Algebra and its Applications, 316: 199-222.
• Miranian, L. and Gu, M. (2003). Strong rank revealing LU factorizations.
Linear Algebra and its Applications, 367: 1-16.]
• Uses: The LU decomposition is most commonly used in the solution of

systems of simultaneous linear equations. We can also find determinant
easily by using LU decomposition (Product of the diagonal element of
upper and lower triangular matrix).
12
Solving system of linear equation
using LU decomposition
Suppose we would like to solve a m×m system AX = b. Then we can find
a LU-decomposition for A, then to solve AX =b, it is enough to solve the
systems
Thus the system LY = b can be solved by the method of forward

substitution and the system UX = Y can be solved by the method of
backward substitution. To illustrate, we give some examples
Consider the given system AX = b, where
 6  2 2
and  8 
 
A  12  8 6 b   14 
 3  13 2   17 
 
13
We have seen A = LU, where
 1 0 0 6  2 2 
L   2 1 0 U  0  4 2 
0 0  5
1 / 2 3 1
Thus, to solve AX = b, we first solve LY = b by forward substitution

 1 0 0  y1   8 
 2 1 0  y    14 
  2   
1 / 2 3 1  y 3   17 
Then
 y1   8 
Y   y 2     2 
 y 3   15
14
Now, we solve UX =Y by backward substitution
6  2 2   x1   8   x1  1 
0  4 2   x     2   x    2
  2    then  2  
0 0  5  x3   15 
 x3 
  3

15
QR Decomposition
Firstly QR decomposition
originated with Gram(1883).
Later Erhard Schmidt (1907)
proved the QR Decomposition
Jørgen Pedersen Gram Theorem Erhard Schmidt
(1850 –1916) (1876-1959)
If A is a m×n matrix with linearly independent columns, then A can be

decomposed as , A  QR where Q is a m×n matrix whose columns
form an orthonormal basis for the column space of A and R is an
nonsingular upper triangular matrix.
16
QR-Decomposition (Cont.)
Theorem : If A is a m×n matrix with linearly independent columns, then A
can be decomposed as , A  QRwhere Q is a m×n matrix whose
columns form an orthonormal basis for the column space of A and R is an
nonsingular upper triangular matrix.
Proof: Suppose A=[u1 | u2| . . . | un] and rank (A) = n.

Apply the Gram-Schmidt process to {u 1, u2 , . . . ,un} and the
orthogonal vectors v1, v2 , . . . ,vn are
u i , v1 ui , v2 u i , vi 1
vi  u i  2
v1  2
v2    2
vi 1
v1 v2 vi 1
vi
qi 
vi
Let for i=1,2,. . ., n. Thus q1, q2 , . . . ,qn form a orthonormal
basis for the column space of A. 17
Now, u i , v1 ui , v2 u i , vi 1
u i  vi  2
v1  2
v2    2
vi 1
v1 v2 vi 1
 u i  vi qi  u i , q1 q1  u i , q 2 q 2    u i , qi 1 qi 1
i.e.,
u i  span{v1 , v 2 ,  , vi }  span{qi , q 2 ,  qi }
Thus ui is orthogonal to qj for j>i;
u1  v1 q1
u 2  v 2 q 2  u 2 , q1 q1
u 3  v3 q3  u 3 , q1 q1  u 3 , q 2 q 2

u n  v n q n  u n , q1 q1  u n , q 2 q 2    u n , q n 1 q n 1 18
Let Q= [q1 q2 . . . qn] , so Q is a m×n matrix whose columns form an
orthonormal basis for the column space of A .
 v1 u 2 , q1 u 3 , q1  u n , q1 
 
Now, 0 v2 u3 , q2  u n , q2 
A  u1 u 2  u n   q1 q 2  q n  0 0 v3  u n , q3 
 
      
0 0 0 0 v n 

i.e., A=QR.  v1 u 2 , q1 u 3 , q1  u n , q1 
 
Where,  0 v2 u3 , q2  u n , q2 
R 0 0 v3  u n , q3 
 
      
 0 0 0 0 vn 
 
Thus A can be decomposed as A=QR , where R is an upper triangular and 19

nonsingular matrix.
QR Decomposition
Example: Find the QR decomposition of

1  1  1
1 0 0 
A 
1  1 0 
 
0 0  1
20
Calculation of QR Decomposition
Applying Gram-Schmidt process of computing QR decomposition
1st Step: r11  a1  3
1 3
 
1 1 3
q1  a1   
a1 1 3
 0 
 
2nd Step:
r12  q1T a 2   2 3
3rd Step:   1 1 3    1 / 3 
     
 0 1 3   2 / 3 
qˆ 2  a 2  q1 q1 a 2  a 2  q1 r12     (2 / 3 )
T
    1 / 3
1  1 3  
  
 0  0   0 
     
r22  qˆ 2  2 3
  1/ 6 
 
1  23 
q2  qˆ 2   
qˆ 2   1/ 6 
 0  21
 
4th Step:
r13  q1T a3   1 3
5th Step: r23  q 2T a3  1 6
6th Step:
  1/ 2
 
 0 
qˆ 3  a3  q1 q1T a3  q 2 q 2T a3  a3  r13 q1  r23 q 2  
1/ 2 
 
 1 
 
r33  qˆ 3  6 / 2
  1/ 6 
 
1  0 
q3  qˆ 3   
qˆ 3  1/ 6 
 2/ 6 
  22
Therefore, A=QR
1  1  1 1 / 3  1 / 6  1/ 6 
1 0    3  2 / 3  1/ 3
 0   1 / 3 2 / 6 0  
 0 2 / 6 1 / 6 
1  1 0  1 / 3  1 / 6 1/ 6 
    0 0 6 / 2 
 0 0  1  0 0  2 / 6 
R code for QR Decomposition:
x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)
qrstr <- qr(x)
Q<-qr.Q(qrstr)
R<-qr.R(qrstr)
Uses: QR decomposition is widely used in computer codes to find the

eigenvalues of a matrix, to solve linear systems, and to find least squares
approximations. 23
Least square solution using QR
Decomposition
The least square solution of b is
X X b  X Y
t t
Let X=QR. Then
X X b  QR  QR b  R Q QRb  R Rb

t t t t t
 X Y RQY
t t t
Therefore,
R Rb  R Q Y  R
t t t
  t 1
R Rb  R
t
 
t 1
R t Q t Y  Rb  Q t Y  Z
24
Cholesky Decomposition
Cholesky died from wounds received on the battle field on 31 August
1918 at 5 o'clock in the morning in the North of France. After his death
one of his fellow officers, Commandant Benoit, published Cholesky's
method of computing solutions to the normal equations for some least
squares data fitting problems published in the Bulletin géodesique in 1924.
Which is known as Cholesky Decomposition
Cholesky Decomposition: If A is a real, symmetric and positive definite

matrix then there exists a unique lower triangular matrix L with positive
diagonal element such that A  LLT .
Andre-Louis Cholesky 25
1875-1918
Cholesky Decomposition
Theorem: If A is a n×n real, symmetric and positive definite matrix then
there exists a unique lower triangular matrix G with positive diagonal
element such that A  GG T .
Proof: Since A is a n×n real and positive definite so it has a LU

decomposition, A=LU. Also let the lower triangular matrix L to be a unit
one (i.e. set all the entries of its main diagonal to ones). So in that case LU
decomposition is unique. Let us suppose D  diag (u11 , u 22 , , u nn )
observe that M T  D 1U . is a unit upper triangular matrix.
Thus, A=LDMT .Since A is Symmetric so, A=AT . i.e., LDMT =MDLT.

From the uniqueness we have L=M. So, A=LDLT . Since A is positive
definite so all diagonal elements of D are positive. Let G  L diag ( d , d ,,
11 22 d nn )
then we can write A=GGT.

26
Cholesky Decomposition (Cont.)
Procedure To find out the cholesky decomposition
Suppose  a11 a12  a1n 
 a 
a 22  a 2 n 
A   21
    
 
a n1 a n 2  a nn 
We need to solve  a11 a12  a1n  l11 0  0  l11 l 21  l n1 

a a  a 2 n  l 21 l 22  0   0 l 22  l n 2 
the equation A   21 22 
            
    
a n1 a n 2  a nn  l n1 l n 2  l nn   0  0 l nn 
             
L LT
27
Example of Cholesky Decomposition
Suppose 4 2  2 For k from 1 to n 1 / 2
A   2 10 2 

 k 1

 2 2 5  l kk   a kk   l ks2 
 s 1 
 k 1

For j from k+1 to n l jk   a jk   l js l ks  l kk
Then Cholesky Decomposition  s 1 
Now,
2 0 0
L   1 3 0 
28
 1 1 3 
R code for Cholesky Decomposition
• x<-matrix(c(4,2,-2, 2,10,2, -2,2,5),ncol=3,nrow=3)
• cl<-chol(x)
• If we Decompose A as LDLT then
 1 0 0  4 0 0
L   1 / 2 1 0 and D  0 9 0
 1 / 2 1 / 3 1 0 0 3
29
Application of Cholesky
Decomposition
Cholesky Decomposition is used to solve the system
of linear equation Ax=b, where A is real symmetric
and positive definite.
In regression analysis it could be used to estimate the
parameter if XTX is positive definite.
In Kernel principal component analysis, Cholesky

decomposition is also used (Weiya Shi; Yue-Fei
Guo; 2010)
30
Characteristic Roots and
Characteristics Vectors
Any nonzero vector x is said to be a characteristic vector of a matrix A, If
there exist a number λ such that Ax= λx;
Where A is a square matrix, also then λ is said to be a characteristic root of

the matrix A corresponding to the characteristic vector x.
Characteristic root is unique but characteristic vector is not unique.
We calculate characteristics root λ from the characteristic equation |A- λI|=0

For λ= λi the characteristics vector is the solution of x from the following
homogeneous system of linear equation (A- λiI)x=0
Theorem: If A is a real symmetric matrix and λi and λj are two distinct latent
root of A then the corresponding latent vector xi and xj are orthogonal.
31
Multiplicity
Algebraic Multiplicity: The number of repetitions of a certain
eigenvalue. If, for a certain matrix, λ={3,3,4}, then the
algebraic multiplicity of 3 would be 2 (as it appears twice) and
the algebraic multiplicity of 4 would be 1 (as it appears once).
This type of multiplicity is normally represented by the Greek
letter α, where α(λi) represents the algebraic multiplicity of λi.
Geometric Multiplicity: the geometric multiplicity of an

eigenvalue is the number of linearly independent eigenvectors
associated with it.
32
Jordan Decomposition
Camille Jordan (1870)
• Let A be any n×n matrix then there exists a nonsingular matrix P and JK(λ)
a k×k matrix form  1 0  0 
0  1  0
J k ( )   
     
 
0 0 0   
Such that
Camille Jordan
 J k1 (1 ) 0  0  (1838-1921)
 0 J (  )  0 
P 1 AP   
k2 2
     
 
 0 0 0 J kr (r )
where k1+k2+ … + kr =n. Also λi , i=1,2,. . ., r are the characteristic roots
And ki are the algebraic multiplicity of λi ,
Jordan Decomposition is used in Differential equation and time series analysis. 33
Spectral Decomposition
A. L. Cauchy established the Spectral

Decomposition in 1829.
CAUCHY, A.L.(1789-
1857)
Let A be a m × m real symmetric matrix. Then
there exists an orthogonal matrix P such that
P T AP   or A  PP , where Λ is a diagonal
T
matrix.
34
Spectral Decomposition and
Principal component Analysis (Cont.)
By using spectral decomposition we can write A  PPT
In multivariate analysis our data is a matrix. Suppose our data is

X matrix. Suppose X is mean centered i.e., X  ( X   )
and the variance covariance matrix is ∑. The variance covariance
matrix ∑ is real and symmetric.
Using spectral decomposition we can write ∑=PΛPT . Where Λ is

a diagonal matrix.   diag (1 , 2 ,, n )
Also
1  2    n
tr(∑) = Total variation of Data =tr(Λ)
35
Spectral Decomposition and
Principal component Analysis (Cont.)
The Principal component transformation is the transformation
Y=(X-µ)P
Where,
 E(Yi)=0
 V(Yi)=λi
 Cov(Yi ,Yj)=0 if i ≠ j
 V(Yn1) ≥ V(Y2) ≥ . . . ≥ V(Yn)

V (Y )  tr ()
i 1
i
V (Y )   i
 i 1
36
R code for Spectral Decomposition
eigen(x)
Application:
 For Data Reduction.
 Image Processing and Compression.
 K-Selection for K-means clustering
 Multivariate Outliers Detection
 Noise Filtering
 Trend detection in the observations.
37
Historical background of SVD
There are five mathematicians who were responsible for establishing the existence of the
singular value decomposition and developing its theory.
Eugenio Beltrami Camille Jordan James Joseph Erhard Schmidt Hermann Weyl
(1835-1899) (1838-1921) Sylvester (1876-1959) (1885-1955)
(1814-1897)
The Singular Value Decomposition was originally developed by two mathematician in the
mid to late 1800’s
1. Eugenio Beltrami , 2.Camille Jordan
Several other mathematicians took part in the final developments of the SVD including James
Joseph Sylvester, Erhard Schmidt and Hermann Weyl who studied the SVD into the mid-1900’s.
C.Eckart and G. Young prove low rank approximation of SVD (1936).

38
C.Eckart
What is SVD?
Any real (m×n) matrix X, where (n≤ m), can be
decomposed,
X = UΛVT
 U is a (m×n) column orthonormal matrix (UTU=I),
containing the eigenvectors of the symmetric matrix
XXT.
 Λ is a (n×n ) diagonal matrix, containing the singular
values of matrix X. The number of non zero diagonal
elements of Λ corresponds to the rank of X.
 VT is a (n×n ) row orthonormal matrix (VTV=I),
containing the eigenvectors of the symmetric matrix
XTX.
39
Singular Value Decomposition (Cont.)
Theorem (Singular Value Decomposition) : Let X be m×n of rank
r, r ≤ n ≤ m. Then there exist matrices U , V and a diagonal
matrix Λ , with positive diagonal elements such that, X  UV T
Proof: Since X is m × n of rank r, r ≤ n ≤ m. So XXT and XTX both
of rank r ( by using the concept of Grammian matrix ) and of
dimension m × m and n × n respectively. Since XXT is real
symmetric matrix so we can write by spectral decomposition,
XX T  QDQ T
Where Q and D are respectively, the matrices of characteristic
vectors and corresponding characteristic roots of XXT.
Again since XTX is real symmetric matrix so we can write by
spectral decomposition,
X T X  RMR T 40
Where R is the (orthogonal) matrix of characteristic vectors and M
is diagonal matrix of the corresponding characteristic roots.
Since XXT and XTX are both of rank r, only r of their characteristic
roots are positive, the remaining being zero. Hence we can
write,
 Dr 0
D
0 0
Also we can write,
M r 0
M 
 0 0
41
We know that the nonzero characteristic roots of XXT and XTX are
equal so Dr  M r
Partition Q, R conformably with D and M, respectively
i.e., Q  (Qr , Q* ) ; R  ( Rr , R * ) such that Qr is m × r , Rr is n × r and
correspond respectively to the nonzero characteristic roots of
XXT and XTX. Now take
U  Qr
V  Rr
1/ 2 1/ 2 1/ 2
D 1/ 2
r    diag (d 1 , d2 ,, d r )
Where d i , i  1,2,, r are the positive characteristic roots of XXT
and hence those of XTX as well (by using the concept of
42
grammian matrix.)
Now define, S  Qr Dr1 / 2 Rr
T
Now we shall show that S=X thus completing the proof.

1/ 2 T 1/ 2 T
S T S  (Qr Dr Rr ) T Qr Dr Rr
 Rr Dr1 / 2 QrT Qr Dr1 / 2 RrT
T
 Rr Dr Rr
T
 Rr M r Rr
 RMR T
 XTX
Similarly, SS T  XX T
From the first relation above we conclude that for an arbitrary orthogonal matrix, say
P1 ,
S  P1 X
While from the second we conclude that for an arbitrary orthogonal matrix, say P 2
We must have 43
S  XP2
The preceding, however, implies that for arbitrary orthogonal

matrices P1 , P2 the matrix X satisfies
T
XX T  P1 XX T P1 , X T X  P2T X T XP2
Which in turn implies that, P1  I m , P2  I n
Thus X  S  Qr Dr1 / 2 RrT  UV T
44
R Code for Singular Value Decomposition

sv<-svd(x)
D<-sv$d
U<-sv$u
V<-sv$v
45
Decomposition in Diagram
Matrix A
Full column rank
Lu decomposition
QR Decomposition
Not always unique
Rectangular
Square
Asymmetric
Symmetric SVD
AM>GM AM=GM
PD Similar
Cholesky Jordan Diagonalization
Spectral Decomposition
Decomposition P-1AP=Λ
Decomposition 46
Properties Of SVD
Rewriting the SVD

r
A  UV T
  ui i viT
i 1
where
r = rank of A
λi = the i-th diagonal element of Λ.
ui and vi are the i-th columns of U and V
respectively.
47
Proprieties of SVD
Low rank Approximation
Theorem: If A=UΛVT is the SVD of A and the
singular values are sorted as 1   2    , n
then for any l <r, the best rank-l approximation
to A is ~ l

r
~2
A ui i vi ; A  A   i2
T
i 1 i l 1
Low rank approximation technique is very much

important for data compression.
48
Low-rank Approximation
• SVD can be used to compute optimal low-rank
approximations.
• Approximation of A is Ã of rank k such that
~
A Min A X F Frobenius norm
X :rank ( X )  k
m n 2
A   a
i 1 j 1
ij
n
  di
2
If d1 , d 2 , , d n are the characteristics roots of ATA then A
i 1
Ã and X are both mn matrices.

49
Low-rank Approximation
• Solution via SVD
~
A  U diag(1 ,...,  k ,0,...,0)V T
set smallest r-k

singular values to zero
* * * * * *
* * * * * * 
     * * *
* * *  * * *    * * *
     
* * * * * *     * * *
         

* * *  
* * *   VT
       
X U
K=2
~
A  i 1 i u i viT
k
column notation: sum
50
of rank 1 matrices
Approximation error
• How good (bad) is this approximation?
• It’s the best possible, measured by the Frobenius norm of the
error: ~ r

2 2 2
min
X :rank ( X )  k
A X F  A A
F

i  k 1
i
• where the λi are ordered such that λi  λi+1.

~
Now A A
F
2
51
Row approximation and column
approximation
Suppose Ri and cj represent the i-th row and j-th column of A. The SVD
~
of A and A is
r l
~
A  UV T
 u
k 1
k k v T
k A  U l  lVl   uk k vkT
T
k 1
r
The SVD equation for Ri is

Ri  u
k 1
ik k vk
l
R 
i
l
u k 1
ik k vk
We can approximate Ri by ; l<r
where i = 1,…,m.
Also the SVD equation for Cj is, r
C j   v jk  k u k
where j = 1, 2, …, n k 1
l
We can also approximate Cj by C   v jk  k u k
l
j ; l<r 52
k 1
Least square solution in inconsistent
system
By using SVD we can solve the inconsistent system.This gives the
least square solution. min 2
Ax  b
x
The least square solution
where Ag be the MP inverse of A.
53
The SVD of Ag is
This can be written as
Where
54
55
Basic Results of SVD
SVD based PCA
If we reduced variable by using SVD then it performs like PCA.
Suppose X is a mean centered data matrix, Then

X using SVD, X=UΛVT
we can write- XV = UΛ
Suppose Y = XV = UΛ
Then the first columns of Y represents the first
principal component score and so on.
o SVD Based PC is more Numerically Stable.

o If no. of variables is greater than no. of observations then SVD based PCA will
give efficient result(Antti Niemistö, Statistical Analysis of Gene Expression
Microarray Data,2005) 56
Application of SVD
 Data Reduction both variables and observations.

 Solving linear least square Problems
 Image Processing and Compression.
 K-Selection for K-means clustering
 Multivariate Outliers Detection
 Noise Filtering
 Trend detection in the observations and the variables.
57
Origin of biplot
 Gabriel (1971)
 One of the most
important advances in
data analysis in recent
decades
 Currently…
 > 50,000 web pages
 Numerous academic
publications
 Included in most
statistical analysis
packages
 Still a very new
technique to most
scientists
Prof. Ruben Gabriel, “The founder of biplot”
Courtesy of Prof. Purificación Galindo
University of Salamanca, Spain
58
What is a biplot?
• “Biplot” = “bi” + “plot”
– “plot”
• scatter plot of two rows OR of two columns, or
• scatter plot summarizing the rows OR the columns
– “bi”
• BOTH rows AND columns
• 1 biplot >> 2 plots
59
Practical definition of a biplot
“Any two-way table can be analyzed using a 2D-biplot as soon as it can be
sufficiently approximated by a rank-2 matrix.” (Gabriel, 1971)
(Now 3D-biplots are also possible…)
5
Matrix decomposition 4
E1
3
G2 G1
P(4, 3) G(3, 2) E(2, 3) 2
 e1 e2 e3   x y 1
E2
   
Y
 g1 20  9 6   g 1 4 3  
e1 e2 e3  0 G4
 g2 6 12  15    g 2  3 3    x 2  3 3 
 O
   
 10  6  g 3 1  3  y 4 1  2
-1
 g3 9 
  12 12   g 4 4 0 
g4 8 -2 E3
-3 G3
G-by-E table
-4
-4 -3 -2 -1 0 1 2 3 4 5
60
Singular Value Decomposition (SVD) &
Singular Value Partitioning (SVP)
The ‘rank’ of Y, i.e.,
the minimum number Matrix Matrix
of PC required to characterising “Singular values” characterising
fully represent Y the rows the columns
SVD:   uik k vkj

X ij  SVD
Common uses value
k 1 of f
r f=1
  (uik  )(
 SVP f 1 f
vkj )
SVP: k 1
k k f=0
f=1/2
Rows scores Column scores
Plot Biplot Plot 61

Biplot
 The simplest biplot is to show the first two PCs together with the
projections of the axes of the original variables
 x-axis represents the scores for the first principal component
 Y-axis the scores for the second principal component.
 The original variables are represented by arrows which
graphically indicate the proportion of the original variance
explained by the first two principal components.
 The direction of the arrows indicates the relative loadings on
the first and second principal components.
 Biplot analysis can help to understand the multivariate data

i) Graphically
ii) Effectively
iii) Conveniently.
62
Biplot of Iris Data
-10 -5 0 5 10
1 33
0.2 Sepal W.
10
1
3
1 1
111
0.1
1111
5
1
1 333 333 Sepal L.
22 2 33 33
11 1 22223 3 33
333
1 11111 2 3 3 3 3
Comp. 2
1
111 2 233 3 3 Petal W.
11
0.0
11 1 33 23
2 Petal L.
2222 3 3
0
1 222 22 233 33
11
1 2 2 22 333
1 11 1
1 11 22 323
1 222 3 3 3
11 22 22
2
-0.1
-5
1 3
22 2
2 22 3 23
2 22
2
-10
-0.2
1
1= Setosa
2
2= Versicolor
3= Virginica -0.2 -0.1 0.0 0.1 0.2
63
Comp. 1
Image Compression Example
Pansy Flower image, collected from
http://www.ats.ucla.edu/stat/r/code/pansy.jpg
This image is 600×465 pixels
64
Singular values of flowers image
Plot of the singular values

65
Low rank Approximation to flowers image
Rank-1 approximation Rank- 5

66
approximation
Rank-20 approximation Rank-30 approximation

67
68

69
Rank-150 approximation True Image

70
Outlier Detection Using SVD
Nishith and Nasser (2007,MSc. Thesis) propose a graphical
method of outliers detection using SVD.
It is suitable for both general multivariate data and regression
data. For this we construct the scatter plots of first two PC’s,
and first PC and third PC. We also make a box in the scatter
plot whose range lies
median(1stPC) ± 3 × mad(1stPC) in the X-axis and
median(2ndPC/3rdPC) ± 3 × mad(2ndPC/3rdPC) in the Y-
axis.
Where mad = median absolute deviation.
The points that are outside the box can be considered as
extreme outliers. The points outside one side of the box is
termed as outliers. Along with the box we may construct
another smaller box bounded by 2.5/2 MAD line
71
Outlier Detection Using SVD (Cont.)
HAWKINS-BRADU-KASS
(1984) DATA
Data set containing 75 observations

with 14 influential observations.
Among them there are ten high
leverage outliers (cases 1-10)
and for high leverage points
(cases 11-14) -Imon (2005).
Scatter plot of Hawkins, Bradu and kass data (a) scatter plot of first two PC’s and
(b) scatter plot of first and third PC. 72
Outlier Detection Using SVD (Cont.)
MODIFIED BROWN DATA
Data set given by Brown (1980).
Ryan (1997) pointed out that the
original data on the 53 patients
which contains 1 outlier
(observation number 24).
Imon and Hadi(2005) modified

this data set by putting two more
Scatter plot of modified Brown data (a) scatter plot of first outliers as cases 54 and 55.
two PC’s and (b) scatter plot of first and third PC.
Also they showed that observations
24, 54 and 55 are outliers by using
generalized standardized
Pearson residual (GSPR) 73
Cluster Detection Using SVD
Singular Value Decomposition is also used for cluster
detection (Nishith, Nasser and Suboron, 2011).
The methods for clustering data using first three

PC’s are given below,
median (1st PC) ± k × mad (1st PC) in the X-axis and
median (2nd PC/3rd PC) ± k × mad (2nd PC/3rd
PC) in the Y-axis.
Where mad = median absolute deviation. The value of
k = 1, 2, 3.
74
75
Principals stations in climate data
76
Climatic Variables
The climatic variables are,
1. Rainfall (RF) mm
2. Daily mean temperature (T-MEAN)0C
3. Maximum temperature (T-MAX)0C
4. Minimum temperature (T-MIN)0C
5. Day-time temperature (T-DAY)0C
6. Night-time temperature (T-NIGHT)0C
7. Daily mean water vapor pressure (VP) MBAR
8. Daily mean wind speed (WS) m/sec
9. Hours of bright sunshine as percentage of maximum possible sunshine
hours (MPS)%
10. Solar radiation (SR) cal/cm2/day
77
Consequences of SVD
Generally many missing values may present in the data. It may also contain
unusual observations. Both types of problem can not handle Classical singular
value decomposition.
Robust singular value decomposition can solve both types of problems.
Robust singular value decomposition can be obtained by alternating L1

regression approach (Douglas M. Hawkins, Li Liu, and S. Stanley Young,
(2001)).
78
The Alternating L1 Regression Algorithm for Robust Singular Value
Decomposition.
There is no obvious choice of

the initial values of u1
Initialize the leading

left singular vector u1
Fit the L1 regression coefficient cj by

n Calculate right singular vector v1=c/║c║
minimizing  xij  c j u i1 ; j=1,2,
…,p i 1 , where ║.║ refers to Euclidean norm.
Again fit the L1 regression coefficient

Calculate the resulting estimate of
the left eigenvector ui=d/ ║d║ p
di by minimizing x
j 1
ij  d i v j1 ; i=1,2,….,n
Iterate this process untill it converge.
For the second and subsequent of the SVD, we replaced X by a deflated matrix 79
obtained by subtracting the most recently found them in the SVD X X-λkukvkT
Clustering weather stations on Map
Using RSVD
80
References
• Brown B.W., Jr. (1980). Prediction analysis for binary data. in
Biostatistics Casebook, R.G. Miller, Jr., B. Efron, B. W. Brown, Jr., L.E.
Moses (Eds.), New York: Wiley.
• Dhrymes, Phoebus J. (1984), Mathematics for Econometrics, 2nd ed.
Springer Verlag, New York.
• Hawkins D. M., Bradu D. and Kass G.V.(1984),Location of several
outliers in multiple regression data using elemental sets. Technometrics,
20, 197-208.
• Imon A. H. M. R. (2005). Identifying multiple influential observations in
linear Regression. Journal of Applied Statistics 32, 73 – 90.
• Kumar, N. , Nasser, M., and Sarker, S.C., 2011. “A New Singular Value
Decomposition Based Robust Graphical Clustering Technique and Its
Application in Climatic Data” Journal of Geography and Geology,
Canadian Center of Science and Education , Vol-3, No. 1, 227-238.
• Ryan T.P. (1997). Modern Regression Methods, Wiley, New York.
• Stewart, G.W. (1998). Matrix Algorithms, Vol 1. Basic
Decompositions, Siam, Philadelphia.
• Matrix Decomposition.
http://fedc.wiwi.hu-berlin.de/xplore/ebooks/html/csa/node36.html
81
82

Matrix Decomposition and Its Application in Statistics NK

Uploaded by

Copyright:

Available Formats

Matrix Decomposition and Its Application in Statistics NK

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix Decomposition and Its Application in Statistics NK

Uploaded by

Copyright:

Available Formats

Matrix Decomposition and its

Some of most frequently used decompositions are the LU, QR,

Solution: This system is solved using forward substitution

Solution: This system is solved using Backward substitution

Let A be a m × m with nonsingular square matrix. Then there exists two

Where, A  LU J-L Lagrange A. M. Turing

If each such elementary matrix Ei is a lower triangular matrices,

Now reducing the first column we have

To find out the unique LU decomposition, it is necessary to

• Uses: The LU decomposition is most commonly used in the solution of

Thus the system LY = b can be solved by the method of forward

Thus, to solve AX = b, we first solve LY = b by forward substitution

If A is a m×n matrix with linearly independent columns, then A can be

Proof: Suppose A=[u1 | u2| . . . | un] and rank (A) = n.

Thus A can be decomposed as A=QR , where R is an upper triangular and 19

Example: Find the QR decomposition of

5th Step: r23  q 2T a3  1 6

Uses: QR decomposition is widely used in computer codes to find the

Let X=QR. Then

X X b  QR  QR b  R Q QRb  R Rb

Cholesky Decomposition: If A is a real, symmetric and positive definite

Proof: Since A is a n×n real and positive definite so it has a LU

Thus, A=LDMT .Since A is Symmetric so, A=AT . i.e., LDMT =MDLT.

then we can write A=GGT.

We need to solve  a11 a12  a1n  l11 0  0  l11 l 21  l n1 

• If we Decompose A as LDLT then

In Kernel principal component analysis, Cholesky

Where A is a square matrix, also then λ is said to be a characteristic root of

Characteristic root is unique but characteristic vector is not unique.

We calculate characteristics root λ from the characteristic equation |A- λI|=0

Geometric Multiplicity: the geometric multiplicity of an

A. L. Cauchy established the Spectral

In multivariate analysis our data is a matrix. Suppose our data is

Using spectral decomposition we can write ∑=PΛPT . Where Λ is

C.Eckart and G. Young prove low rank approximation of SVD (1936).

Now we shall show that S=X thus completing the proof.

The preceding, however, implies that for arbitrary orthogonal

Which in turn implies that, P1  I m , P2  I n

Thus X  S  Qr Dr1 / 2 RrT  UV T

x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)

Rewriting the SVD

Low rank approximation technique is very much

Ã and X are both mn matrices.

set smallest r-k

• where the λi are ordered such that λi  λi+1.

The SVD equation for Ri is

The least square solution

where Ag be the MP inverse of A.

This can be written as

If we reduced variable by using SVD then it performs like PCA.

Suppose X is a mean centered data matrix, Then

o SVD Based PC is more Numerically Stable.

 Data Reduction both variables and observations.

SVD:   uik k vkj

Plot Biplot Plot 61

 Biplot analysis can help to understand the multivariate data

This image is 600×465 pixels

Plot of the singular values

Rank-1 approximation Rank- 5