Spectral Kurtosis of Choi-Williams Distribution An

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

SS symmetry

Article
Spectral Kurtosis of Choi–Williams Distribution and
Hidden Markov Model for Gearbox Fault Diagnosis
Yufei Li 1 , Wanqing Song 1 , Fei Wu 1, *, Enrico Zio 2 and Yujin Zhang 1
1 School of Electronic & Electrical Engineering, Shanghai University of Science Engineering, Shanghai 201620,
China; [email protected] (Y.L.); [email protected] (W.S.); [email protected] (Y.Z.)
2 Energy Department, Politecnico di Milano, Via La Masa 34/3, 20156 Milan, Italy; [email protected]
* Correspondence: [email protected]

Received: 12 January 2020; Accepted: 8 February 2020; Published: 15 February 2020 

Abstract: A combination of spectral kurtosis (SK), based on Choi–Williams distribution (CWD) and
hidden Markov models (HMM), accurately identifies initial gearbox failures and diagnoses fault types
of gearboxes. First, using the LMD algorithm, five types of gearbox vibration signals are collected
and decomposed into several product function (PF) components and the multicomponent signals are
decomposed into single-component signals. Then, the kurtosis value of each component is calculated,
and the component with the largest kurtosis value is selected for the CWD-SK analysis. According to
the calculated CWD-SK value, the characteristics of the initial failure of the gearbox are extracted.
This method not only avoids the difficulty of selecting the window function, but also provides original
eigenvalues for fault feature classification. In the end, from the CWD-SK characteristic parameters at
each characteristic frequency, the characteristic sequence based on CWD-SK is obtained with HMM
training and diagnosis. The experimental results show that this method can effectively identify the
initial fault characteristics of the gearbox, and also accurately classify the fault characteristics of
different degrees.

Keywords: Choi–Williams distribution; spectral kurtosis; HMM; gearbox fault classification

1. Introduction
In rotating machinery, gearboxes are widely used in various industries as a universal component
for changing speed and transmitting power. When the gearbox fails early, the vibration signal usually
exhibits nonlinear and nonstationary characteristics [1–4]. If early weak faults are found, and their
features are effectively extracted in time, then, equipment maintenance can be performed to reduce the
danger [5]. When the gear in the gearbox has local faults such as pitting, spalling, scratching, broken
teeth, and spalling of the bearing, it causes a series of transient shock responses. In the early stage of
the fault, the shock fault characteristic signal is faint and often obscured by strong background noise.
It is especially important to correctly distinguish the working status of the gearbox [6–9]. In recent
years, the concept of spectral kurtosis has been proposed to solve the above problems to some extent.
Therefore, it is of certain academic significance to study the application of the spectral kurtosis method
in gearbox fault diagnosis [10–12].
Currently, the traditional methods of gearbox vibration signal processing and analysis
technology mainly include time-domain waveform probability statistical analysis, correlation analysis,
coherence analysis, time-domain synchronous averaging, frequency-domain probability statistical
analysis, detailed spectrum analysis, etc. The concept of spectral kurtosis (SK) was first proposed by
Dwyer [13], who used it to detect transient components in noise signals. The core idea of spectral
kurtosis is to be able to calculate the kurtosis value of each spectral line in the frequency spectrum.
To further define SK in detail, a short-time Fourier transform of spectral kurtosis (STFT-SK) calculation

Symmetry 2020, 12, 285; doi:10.3390/sym12020285 www.mdpi.com/journal/symmetry


Symmetry 2020, 12, 285 2 of 12

method was proposed which helped to link theoretical concepts with practical applications [14].
Then, it was verified that it detected transient signals in nonstationary signals with additive noise. By
locating transients in a frequency domain with heavy noise, an improved SK was proposed for early
fault detection of bearings. Although kurtosis is based on temporal signals, which is effective under
some conditions. Its performance is low in the presence of a low signal-to-noise ratio and non-Gaussian
noise [15].
Then, SK was combined with a filter to extract early fault signals. An optimal denoising filter was
proposed which enhanced the transient component of the gear vibration signal [16]. Then, the SK-based
filtered residual signal was proposed and the local power was defined as the smoothed squared
envelope [17]. In 2011, a real-time gear fault feature extraction method was proposed, which combined
a one-dimensional map and a band-pass filter, however, they were inherently slow and not suitable for
real-time applications [18]. A gear tooth fault detection method based on the maximum correlation
kurtosis deconvolution method was proposed. The experimental data were from a gearbox with gear
chip fault, and the results were compared between healthy and faulty vibrations [19]. Compared with
SK-based gearbox fault feature extraction method, a sparse signal decomposition based on tunable
Q-factor wavelet transform (TQWT) was proposed. It showed that the proposed method outperformed
empirical mode decomposition and SK in extracting fault features of gearboxes [20].
There are many algorithms that have been combined with SK. Several calculation methods which
have been based on SK are summarized, such as those based on STFT, wavelet transform (WT),
and Wigner–Ville distribution (WVD) [21]. Among them, STFT-SK is limited by the choice of window
function, and poor time-frequency resolution appears in strong noise. WT- SK has better time-frequency
resolution, but wavelet bases and decomposition scales are difficult to confirm, and therefore it is
not able to obtain optimal diagnosis. WVD-SK has good time-frequency resolution, but there are
cross-interference terms that cannot be eliminated.
For failure mode recognition, the current failure mode recognition methods mainly include
Bayesian classification, Fisher criterion, nearest neighbor method, fuzzy classification algorithm,
neural network algorithm, kernel-based classification algorithm, etc. [22–24]. However, these methods
have been based on static analysis, ignoring the dynamic information of gearbox failure changes.
Among them, the neural network model is the most widely used, but the neural network model mainly
deals with the static classification process and is not suitable for the dynamic signal processing of
gearbox failures [25,26]. In addition, the neural network model requires large samples. When the
samples are limited, it shows poor generalization ability and the possibility of the optimization process
is easily trapped in local extremes [27].
The hidden Markov model (HMM) is a statistical analysis model used to describe a Markov process
with hidden unknown parameters. The HMM model has strong feature classification capabilities
based on these two stochastic processes. It is especially suitable for statistical analysis of nonstationary,
repetitively poor dynamic signals. According to the pattern matching principle, the HMM model trains
a reliable model with a small number of samples and finds the pattern most similar to the detection
signal as the recognition result [28]. The HMM model has been well applied in the fields of speech
recognition, traffic monitoring system, and medical image recognition [29–31]. When compared with
neural networks, it retains more statistical information of training data and has a higher recognition
rate and robustness [32].
Therefore, in this paper, the combination of LMD, Choi–Williams distribution and SK (CWD-SK),
and HMM are used to identify the initial failure of the gearbox and accurately distinguished the four
types of failure. Then, the extracted CWD-SK is used as the feature vector to input HMM and back
propagation (BP) neural network to compare the accuracy of fault diagnosis.
This paper is organized as follows: In Section 2, the main calculation steps of CWD-SK and the
impact of window functions are described; in Section 3 the basic principles of HMM are outlined; in
Section 4, the application of this method to gearbox fault diagnosis is introduced; and in Section 5,
the summary is stated.
Symmetry 2020, 12, 285 3 of 12

2. SK Based on CWD

2.1. Definition of SK
SK detects signals to non-Gaussian signals, and also determines the frequency of the excited
component. For the time being, it is difficult to describe nonstationary signals. We use the Cramer–Wold
decomposition to describe a stationary random process in the time domain. We define signal Y(t),
as the response of the system with time varying impulse response h(t, s), excited by a signal X(t).
Then, Y(t) is presented as
Z +∞
Y (t) = e2π f t H (t, f )dH ( f ) (1)
−∞

where, H (t, f ) is the time varying transfer function of the considered system and is interpreted as the
complex envelope of the signal Y(t) at frequency f . SK is based on the fourth-order spectral cumulant
of a conditionally nonstationary process (CNS) process:

C4Y ( f ) = S4Y ( f ) − 2S22Y ( f ) ( f , 0) (2)

where, S2n ( f ) is the second-order spectral cumulant, which is the measure of the energy of the complex
envelope, it can be expressed as Equation (3):
   
2n 2n
S2nY (t, f ) = E H (t, f )dX( f ) /d f = E H (t, f ) · S2nX (3)

Therefore, the SK is defined as the energy normalized cumulant, which is a measure of the peak
of the probability density function H:

C4Y ( f ) S4Y ( f )
KY ( f ) = = −2 ( f , 0) (4)
S22Y ( f ) S22Y ( f )

2.2. Algorithm of CWD-SK


For nonstationary signal x(t), its instantaneous autocorrelation function is expressed as:
τ τ
R(t, τ) = x(t + )x∗ (t − ) (5)
2 2

The Fourier transform of R(t, τ) is the WVD of the signal x(t):


+∞
τ τ
Z
XWVD (t, f ) = x(t + )x∗ (t − )e− j f τ dτ (6)
−∞ 2 2

In order to eliminate the existence of WVD cross-interference terms, the following window
function (also called exponential kernel function) is derived

g(τ, ν) = exp(−aτ2 ν2 ) (7)

The inverse Fourier transform of the exponential kernel function is expressed as:

+∞
σ σt2
Z r
G(t, τ) = g(τ, ν)e jvt dv = exp ( − ) (8)
−∞ 4πτ2 4τ2

Now, g(0, τ) = g(τ, 0) = 1, g(0, 0) = 1, and when τ , 0, g(τ, ν) < 1, where τ is the time shift
parameter. Cx (t, f ) is expressed as:

x r σ σt2 τ τ
Cx (t, f ) = 2
exp(− 2 )x(µ + )x∗ (µ − )e−j2π f τ dµdτ (9)
4πτ 4τ 2 2
Symmetry 2020, 12, 285 4 of 12

where θ is offset parameter, σ is scale factor and usually constant. If the value of σ is too large,
the resolution of the self-term is higher; if the value of σ is too small, the suppression performance
of the cross term is better. In general, the choice of σ needs to consider both the self-term resolution
and the cross-term suppression. Therefore, the kernel function ensures higher time-frequency
resolution, and also effectively suppresses the cross-terms of two functions with different frequencies
and time centers.
According to the definition of CWD, the second-order instantaneous spectral distance and
fourth-order instantaneous spectral distance of x(t) are obtained as follows:
  
2
Ŝ ( f ) = E C ( t, f )



 2x x
  k (10)

 4
 Ŝ4x ( f ) = E Cx (t, f )


k

where, E{•}k represents the average of k − th order. k represents the number of sampling points.
Finally, Equation (9) brings in Equation (4), and CWD-SK algorithm is defined as follows:

C4x ( f ) Ŝ24x ( f ) − 2Ŝ22x ( f ) Ŝ24x ( f )


kx ( f ) = = = − 2, ( f , 0) (11)
Ŝ22x ( f ) Ŝ22x ( f ) Ŝ22x ( f )

2.3. Impact of Window Functions


SK is a time-frequency analysis algorithm whose time-frequency resolution is closely related to the
window function [14]. Some examples of window functions include rectangular, Hanning, Hamming,
Blackman and Kaiser windows, etc. When the fluctuation is too large, we usually think that the value
of k increases, which affects the end result. Therefore, it is necessary to compare the smoothness of
the curves.
In this type, we use curvature to intuitively compare the smoothness of CWD-SK for different
window functions. Usually assuming a curve y = f (x) exists, the magnitude of the curvature value is
calculated by Equation (12).
f 00 (x)
K= 3/2
(12)
(1 + f 0 2 (x))
f 0 (x) represents the first derivation of x, f 00 (x) represents the second derivation of x.
In order to evaluate the complexity of the curve shape, a curvature deviation is proposed [33].
Here the smoothness of the curve is determined in Equation (13).
m
X
Smoothness = abs(kn − kmean ) (13)
n=1

where n represents the index of sampling point, m is the length of data, kn represents curvature of the
n − th discrete point, and kmean represents the average of the estimated curvature, respectively.

3. Diagnosis Flow Based on HMM


For the fault characteristic indexes that have been extracted to different degrees, first, the
normalization and quantification of the characteristic indexes are performed. Secondly, to establish the
HMM model, the obtained observation sequence is set to a finite discrete value. Finally, the discretized
value is used as the quantized model training feature value.
The hidden Markov models are based on Markov chains, there are N states in the system, n o such as
S = {S1 , S2 , . . . , Sn }, and called qt at time t. The transition matrix between states is A = aij .
h i
aij (k) = P qi+1 = S j qt = Si , 1 ≤ i, j ≤ N (14)
Symmetry 2020, 12, 285 5 of 12
Symmetry 2020, 12, x FOR PEER REVIEW 5 of 12

For some
For some HMMs,
HMMs, any
any state
state can
can reach
reach other
other states
states in in one
one transition;
transition; in
inother
otherHMMs, when aaij >
HMMs,when 0,,
ij > 0
it transitions between certain states can occur.
it transitions between certain states can occur.
The difference between HMM and Markov chain is that for each state, the outside world can only
The difference between HMM and Markov chain is that for each state, the outside world can
make one observation and obtain an n-dimensional observation vector Vk . This vector is related to the
only make one observation and obtain an n-dimensional observation vector Vk . This vector is related
state of the system and is discrete or continuous.
to the state of the system and is discrete or continuous.
For continuously distributed observations, the probability that the state j corresponds to the
For continuously distributed observations, the probability that the state j corresponds to the
observation vector distribution is B = b j (vi ), here
observation vector distribution is B = b j (vi ) , here
h i
b j (vi ) = P vi qt = S j , 1 ≤ j ≤ N (15)
b j (vi ) = P éëvi | qt = S j ùû ,1 £ j £ N (75)
The probability distribution is generally taken as a mixed Gaussian distribution:
The probability distribution is generally taken as a mixed Gaussian distribution:
M
M
) = å w jj,m j ,m å j ,m
X X
b j (bvji()vi= ,m N
N (
( o
ot t,,µµ j,m,, ))
j,m (86)
(16)
m =11
m=

where MMis the


where is the number
number of mixed
of mixed Gaussian
Gaussian distributions,
distributions, wm iswthe is the positive mixing weight, the
m positive mixing weight, the sum is

sumandisN1,(oand
t , µ j,m ,å is an n-dimensional
GaussianGaussian distribution.
P
1, µ j),mis
N, (ot ,j,m anj ,n-dimensional
m)
distribution.
The initial state distribution is π = {π
The initial state distribution is p = {p ii } , where }, where

i = P [1
πip= P[q q1== SSii ]],,1
1 £≤ii£≤NN (97)
(17)

So far,
So far,thethe parameters
parameters of theofHMM
the HMM are summarized
are summarized as three
as three groups λ =groups
(A, B, π)l. The , B, p ) . The
= ( Aobservation
observation
sequence sequence
generated bygenerated
this model byisthis
O =model
o1 o2 . .is. oTO, =
ot ois
1o2the
, ot is the observation
...oTobservation vector at time vector
t, andatTtime t,
is the
and Tobservation
total is the total length.
observation length.
Figure 11 shows
Figure shows the
the diagnosis
diagnosis process
process of of HMM.
HMM. The The Baum–Welch
Baum–Welch algorithm
algorithm [34,35]
[34,35] is is suitable
suitable
for training, adjusting, and optimizing parameters in observation sequences. Therefore, the obtained
for training, adjusting, and optimizing parameters in observation sequences. Therefore, the obtained
sequence of
sequence ofobservation
observation probability
probability values is similar
values to the to
is similar sequence of observation
the sequence values. Then,
of observation values.we
calculate
Then, we the maximum
calculate value, HMM
the maximum value,status
HMMrecognition,
status recognition, and theand status of different
the status fault fault
of different levels. The
levels.
purpose
The is toisestablish
purpose the corresponding
to establish the corresponding HMM HMM model, get the
model, getdate of theofunknown
the date the unknown fault fault
state state
and,
then,then,
and, enterenter
eacheach
model in turn,
model andand
in turn, calculate
calculate and and compare
comparethe thepossibilities.
possibilities. Finally,
Finally, the unknown
the unknown
signal failure
signal failure type
type is
is obtained
obtained from
from the
the output
output probability
probability of of the
the maximum
maximum model.
model. As As can
can bebe seen,
seen,
the most
the most probable
probable path
path through
through thethe sequence
sequence is is observed
observed by by the
the Viterbi
Viterbi algorithm.
algorithm. TheThe above
above is is the
the
HMM diagnosis
HMM diagnosis process.
process.

Fault
Extract Feature HMM HMM model library Comparison recognition

PF1 vector training is established Pi =In P(O /λ i)

Sample to be tested

Figure 1. Diagnosis
Figure 1. Diagnosisflow
flowbased
based on hiddenMarkov
on hidden Markov models
models (HMMs).
(HMMs).

4. Experimental Data Analysis


4. Experimental Data Analysis
4.1. Experiment Platform and Data Preprocessing
4.1. Experiment Platform and Data Preprocessing
All the experimental data are obtained from the MFD310 system, as shown in Figure 2. This gearbox
All the experimental data are obtained from the MFD310 system, as shown in Figure 2. This
(reducer) is a three-stage transmission, consisting of four shafts, four pairs of bearings, seven straight
gearbox (reducer) is a three-stage transmission, consisting of four shafts, four pairs of bearings, seven
gear teeth, and a box. The power source of this system is the motor. The input shaft of the first gear
straight gear teeth, and a box. The power source of this system is the motor. The input shaft of the
box is first connected to the first speed torque sensor and, then, connected to the motor. After the first
first gear box is first connected to the first speed torque sensor and, then, connected to the motor.
gear box is decelerated, the output shaft and the input shaft of the second gear box are connected.
After the first gear box is decelerated, the output shaft and the input shaft of the second gear box are
The output shaft is connected to the second speed torque sensor after deceleration of the second gear
connected. The output shaft is connected to the second speed torque sensor after deceleration of the
box and, finally, connected to the eddy current brake. So that the location of the fault is relatively close
second gear box and, finally, connected to the eddy current brake. So that the location of the fault is
and the required data can be accurately and sensitively collected, the vibration signals are collected
relatively close and the required data can be accurately and sensitively collected, the vibration signals
are collected respectively in five states of the gearbox, such as normal, slight fault, moderate fault,

5
Symmetry 2020, 12, 285 6 of 12
Symmetry 2020, 12, x FOR PEER REVIEW 6 of 12

severe fault,inand
respectively fiveflaking.
states ofThethe frequency of shaft
gearbox, such II is 7slight
as normal, Hz, the frequency
fault, moderateoffault,
shaftsevere
I is 10fault,
Hz, and
the
sampling points is 4096, and the sampling frequency is 3387.77 Hz. The gearbox meshing frequency
flaking. The frequency of shaft II is 7 Hz, the frequency of shaft I is 10 Hz, the sampling points is 4096,
Symmetry 2020, 12, x FOR PEER REVIEW 6 of 12
is 307
and Hz.
the sampling frequency is 3387.77 Hz. The gearbox meshing frequency is 307 Hz.
severe fault, and flaking. The frequency of shaft II is 7 Hz, the frequency of shaft I is 10 Hz, the
sampling points is 4096, and the sampling frequency is 3387.77 Hz. The gearbox meshing frequency
is 307 Hz.

Figure 2. MFD310
Figure 2. MFD310 gearbox
gearbox experimental
experimental platform
platform for
for various
various working conditions fault
working conditions fault diagnosis.
diagnosis.
Figure 2. MFD310 gearbox experimental platform for various working conditions fault diagnosis.
Five kinds of vibration signals for the same gearbox are collected respectively, and the time
domain and Five Fourier
Fastkinds
Fourier transform
transform
of vibration analysis
analysis
signals are
for theare performed,
performed,
same which
gearbox arewhich isischaotic,
collected chaotic, complicated.
complicated.
respectively, ItItshows
and the time showsa
arandom
random distribution.
domain and
distribution. Therefore
Fast Fourier
Thereforeit isitalmost
transform is almostimpossible
analysis are to judge
performed,
impossible the running
which
to judgeis chaotic, state of theof
complicated.
the running state gearbox
It shows a based
the gearbox
based onrandom
on this.
distribution. Therefore it is almost impossible to judge the running state of the gearbox based
Therefore, this paper
this. Therefore, chooses
this paper chooses the the
adaptive time
adaptive timeanalysis,
analysis,FastFastFourier
Fouriertransform
transform (FFT) (FFT)
on this. Therefore, this paper chooses the adaptive time analysis, Fast Fourier transform (FFT)
analysis,analysis,
analysis, and Local
and
and mean
Local mean decomposition
decomposition
Local mean decomposition (LMD)
(LMD)totoperform
(LMD) to perform datapreprocessing
data
perform data preprocessing
preprocessing on onfive
the
on the thestates
five five states
states
of
of of
the
the gearbox
gearbox (as shown
(asgearbox
the shown in Figure
in shown
(as the the
in Figure
the 3). 3).3).
LMD
Figure LMD preprocesses
preprocesses
LMD noise. the the vibration
vibration signals
signals in the
in the fivefive states
states of
of the
the gearbox
gearbox to remove
to remove interference
interference factorsfactors
suchsuch as noise.
as noise.

(a) normal condition

(a) normal condition

(b) slight fault

(b) slight fault

(c) moderate fault

Figure 3. Cont.
6

6
Symmetry 2020, 12, x FOR PEER REVIEW 7 of 12
Symmetry 2020, 12, 285 7 of 12
(c) moderate fault

(d) severe fault

(e) flaking condition


Figure
Figure 3. Time
3. Time domain,FFT
domain, FFTand
and LMD
LMD plots
plotsinin
5 states.
5 states.

For the For the calculation of each PF component, the corresponding kurtosis value is calculated (in
calculation of each PF component, the corresponding kurtosis value is calculated (in
Table 1), and the PF with the larger kurtosis value is selected (for the convenience of subsequent
Table 1),CWD-SK
and theanalysis,
PF with the larger kurtosis value is selected (for the convenience of subsequent
usually the PF with the highest kurtosis is selected) for the CWD-SK processing.
CWD-SK analysis, usually the PF with the highest kurtosis is selected) for the CWD-SK processing.
Table 1. PF Kurtosis value of the five gearbox signals.
Table 1. PF Kurtosis value of the five gearbox signals.
PF 1 2 3
PF Kurtosis(normal)
1 3.730 3.212
2 3.098 3
Kurtosis (slight fault) 3.980 3.402 3.196
Kurtosis(normal) 3.730
Kurtosis (moderate fault) 4.645 3.212
4.320 4.089 3.098
Kurtosis (slight fault) Kurtosis 3.980
(severe fault) 4.966 3.402
4.781 4.342 3.196
Kurtosis (moderate fault) 4.645
Kurtosis(flaking) 5.905 4.320
5.455 5.403 4.089
Kurtosis (severe fault) 4.966 4.781 4.342
Kurtosis(flaking)
4.2. 5.905on CWD-SK
Initial Fault Feature Extraction Based 5.455 5.403

SK is a statistical tool. The non-Gaussian component in the signal is detected by SK. And the
4.2. Initial Fault Feature
existence Extraction
of the transient andBased on CWD-SK
its position in the frequency domain can be clearly pointed out. Once
the gearbox fails, the frequency component of the faulty gear increases as compared with the normal
SK is a statistical tool. The non-Gaussian component in the signal is detected by SK. And the
state, so the amplitude of SK in the frequency increases accordingly.
existence of the
Thetransient
CWD-SK in andPF1itsisposition
calculatedinaccording
the frequency domain
to Equation (11).can
Thisbe clearly
method pointedthe
calculates out. Once the
value
gearbox based
fails, the frequency
on CWD-SK, component
which is denoted ofasthek .The
faulty gear
value of increases
CWD-SK inasnormal
compared with isthe
conditions normal
shown in state,
Figure 4a. In
so the amplitude ofnormal conditions,
SK in the frequency is 1.75 in the
increases
K1mean frequency domain. When the gearbox fails early,
accordingly.
The KCWD-SK
2mean
becomes
in PF12.04.isWhen the gearbox
calculated fails, the
according K value is (11).
to Equation generallyThisgreater
method thancalculates
two. In other
the value
based onwords,
CWD-SK, there iswhich
a slightisfailure
denoted of theasgearbox,
k. The Figure
value4b, of which
CWD-SK is in line with the conditions
in normal actual situation. In
is shown in
theIncase
Figure 4a. of moderate
normal and severe
conditions, K1meangearbox
is 1.75 failures, K3mean and
in the frequency K 4mean are
domain. Whenalmost
the the same and
gearbox fails early,
cannot be
K2mean becomes distinguished
2.04. When theaccording
gearboxtofails,
conventional
the K value methods. The HMM
is generally modelthan
greater mentioned
two. In below
otheris words,
required for further training and classification. Thus, CWD-SK can identify initial gearbox failures
there is a slight failure of the gearbox, Figure 4b, which is in line with the actual situation. In the
accurately.
case of moderate and severe gearbox failures, K3mean and K4mean are almost the same and cannot be
distinguished according to conventional methods. The HMM model mentioned below is required for
further training and classification. Thus, CWD-SK can identify initial gearbox failures accurately.
7
Symmetry
Symmetry 2020,
2020, 12,12,
285x FOR PEER REVIEW 8 of8 12
of 12

(a) normal (b) slight fault

(c) moderate fault (d) severe fault (e) flaking


Figure
Figure 4. 4. ValueofofChoi–Williams
Value Choi–Williamsdistribution
distribution and
and spectral
spectral kurtosis
kurtosis(CWD-SK)
(CWD-SK)with
withfive
fivecases.
cases.

4.3.4.5. Selection
Selection of of WindowFunction
Window Function
AsAs shownininTable
shown Table2,2, in
in order
order totoavoid
avoidthe
thesudden
sudden increase andand
increase decrease of theofSK
decrease curve,
the the
SK curve,
theaverage
averagevalue
valueofofCWD-SK
CWD-SK is is
selected
selectedas the characteristic
as the characteristicvalue. Under
value. normal
Under circumstances,
normal the
circumstances,
theaverage
averageCWD-SK
CWD-SK of of
thethe
gearbox
gearboxis 1.75, which
is 1.75, is inisline
which in with the analysis
line with that SK
the analysis is not
that SK more
is notthan
more
two
than in the
two casecase
in the of normal condition.
of normal When the
condition. gearbox
When has a slight
the gearbox hasfailure, the
a slight average
failure, value
the of CWD-
average value
SK obviously
of CWD-SK increases
obviously to 2.023.to
increases This indicates
2.023. that CWD-SK
This indicates detects thedetects
that CWD-SK initial failure well,failure
the initial whichwell,
is
consistent with the actual situation.
which is consistent with the actual situation.

Table2.2.The
Table Themean
mean of
of CWD-SK
CWD-SK in
in five
five cases.
cases.

Condition
Condition The mean ofThe
CWD-SK
Mean of CWD-SK
Normal 1.756
Normal 1.756
Slight Slight 2.023 2.023
Moderate
Moderate 2.187 2.187
Severe Severe 2.019 2.019
FlakingFlaking 5.746 5.746

From
From Equation
Equation (9),
(9), when
when thevalue
the valueisistoo
toolarge,
large,ititproves
provesthat
thatthe
thesmoothing
smoothing effect
effect of
of this
this curve
curve is
is not ideal. If not, it can prove that the curve smoothing effect is better. According to Table 3, four
not ideal. If not, it can prove that the curve smoothing effect is better. According to Table 3, four typical
typical window functions are selected. It is obtained that CWD-SK is not sensitive to the selection of
window functions are selected. It is obtained that CWD-SK is not sensitive to the selection of the
the window function. It reduces the difficulty of selecting the window function.
window function. It reduces the difficulty of selecting the window function.
Table 3. Average smoothness of SK with different window functions.
Table 3. Average smoothness of SK with different window functions.
Window Functions Smoothness
Window Functions
Rectangular 0. 447Smoothness
Rectangular Hanning 0. 449 0.447
Hanning Hamming 0. 508 0.449
Hamming Blackman 0.458 0.508
Blackman 0.458

4.4. Five Types of Gear Fault Characteristics Classification


We can see that K2mean is larger than K1mean , so 8we can identify initial gearbox failure and the slight
faults and the normal are well distinguished. When the gearbox begins to appear moderate and severe
functions and, then, go to zero and normalize to a 10 × 4 matrix.
In order to prevent situations where the model is caught in a wireless loop or training fails, the
Lloyd algorithm is used to scalar quantize the CWD-SK sequence, and input the quantized sequence
into the HMM. The training algorithm uses the Baum–Welch algorithm. As the number of iterations
increases,
Symmetry 2020,the
12, maximum
285 log-likelihood estimate increases until convergence is reached. After training,
9 of 12
four hidden HMM recognition models are obtained.
A total of 20 groups (five groups of each state) are selected based on the CWD-SK feature vector
fault, the average value of K cannot be obvious distinguished. Therefore, it is necessary to use HMM
as training samples. Before training, the Lloyd algorithm is used to scalar the feature vector and the
for fault status identification.
Baum–Welch algorithm is used for training. During the HMM training process, as the number of
The training curve is shown in Figure 5. The HMM modeling and training, slight fault,
iterations increases, the maximum natural logarithmic estimate value continues to increase until it
moderate fault, severe fault, and flaking represent four kinds of hidden state recorded as λ2 , λ3 ,
reaches convergence. After training, the HMM recognition models corresponding to four hidden
λ4 , λ5 states, respectively, in the gearbox. The initial probability distribution vector Bλ, the initial
states are obtained. All states reach convergence when the number of iterations is 20, and the
state transition matrix πλ, and the initial observation probability matrix are all obtained by random
convergence speed is fast.
functions and, then, go to zero and normalize to a 10 × 4 matrix.

Figure5.5.Log-probabilities
Figure Log-probabilitiesoutput
outputfive
fivefault
faultsignals
signalsatatvarious
variousHMM
HMMmodel.
model.

InAfter
orderthetoHMM
preventmodel training
situations is completed,
where the model a classifier
is caughtbased on the fault
in a wireless loopstate of the gearbox
or training fails,
is established. As can be seen, the four fault states of the gearbox reach rapid convergence after 20
the Lloyd algorithm is used to scalar quantize the CWD-SK sequence, and input the quantized sequence
iterations.
into the HMM. ForThe
the training
trained HMM model,
algorithm usesthe
theremaining
Baum–Welch (at l 2 l3 l4
algorithm. and model )of
l5 number
As the 20iterations
groups of
increases, the maximum log-likelihood estimate increases until convergence is reached.
CWD-SK feature vectors (five states of each state) are input as test samples. Of course, before After training,
testing,
four hidden HMM recognition models are obtained.
it uses the Lloyd algorithm to scale the CWD-SK sequence and inputs the scaled quantized sample
A total of 20 groups (five groups of each state) are selected based on the CWD-SK feature vector
CWD-SK sequence into each state.
as training samples.
According Before4,training,
to Table the the
the higher Lloyd algorithm
degree is used
of failure to scalar
of the secondthesame
feature vector
type and the
of failure, the
Baum–Welch algorithm is used
larger the log-likelihood for training.
estimate. At the sameDuring thethe
time, HMM training
output valueprocess, as the number
of the natural logarithmicof
iterations
probabilityincreases, the maximum
estimation of each state natural
is thelogarithmic
maximum in estimate value
this state. Forcontinues
the four to increase
failure statesuntil it
of the
reaches convergence. After training, the HMM recognition models corresponding to four hidden states
gearbox, diagnosis results show that this method can accurately classify faults. From the experimental
are obtained.
data, the HMM All states reach convergence
fault diagnosis model can when the number
successfully of iterations
identify the fourisfault
20, and theof
states convergence
the gearbox
speed is fast.
with high accuracy and a small amount of data.
After the HMM model training is completed, a classifier based on the fault state of the gearbox
is established. As can be seen, the four fault states of the gearbox reach rapid convergence after
20 iterations. For the trained HMM model, the remaining (at λ2 λ3 λ4 and λ5 model) 20 groups of
CWD-SK feature vectors (five states of each state) are input as test samples. Of course, before testing,
it uses the Lloyd algorithm to scale the CWD-SK sequence 9 and inputs the scaled quantized sample
CWD-SK sequence into each state.
According to Table 4, the higher the degree of failure of the second same type of failure, the larger
the log-likelihood estimate. At the same time, the output value of the natural logarithmic probability
estimation of each state is the maximum in this state. For the four failure states of the gearbox,
diagnosis results show that this method can accurately classify faults. From the experimental data, the
HMM fault diagnosis model can successfully identify the four fault states of the gearbox with high
accuracy and a small amount of data.
Symmetry 2020, 12, 285 10 of 12

Table 4. HMM recognition result.

Logarithm Likelihood Probabilities of the Input Sample Model


Fault Case
λ2 λ3 λ4 λ5 Recognition Result
Slight fault −15.114 −∞ −∞ −∞ λ2
Moderate fault −54.124 -55.964 −∞ −∞ λ3
Severe fault −∞ −62.14 −76.44 −134.82 λ4
Flaking −∞ −132.67 −199.211 −211.342 λ5

As shown in Table 5, in order to prove the accuracy of CWD-SK as the feature vector, HMM is
input for classification and recognition. Comparing the training results of HMM with the training
results of the BP neural network, we set some parameters of BP as follows: trainParam_Show =
10, trainParam_Epochs = 1000, trainParam_mc = 0.75, trainParam_Lr = 0.05, trainParam_lrinc =
1.5, and trainParam_Goal = 0.1. For SK as compared with the input of feature vectors, the overall
recognition accuracy of the CWD-SK model is higher.

Table 5. Comparison of CWD-SK for fault recognition.

Moderate Recognition
Recognition Model Slight Fault Severe Fault Flaking
Fault Rate
HMM 5 4 5 5 95%
CWD-SK
BP 4 4 5 5 90%
HMM 4 5 4 5 90%
SK
BP 4 4 4 5 85%

5. Conclusions
In this study, we focus on the nonlinear and nonstationary characteristics of a gearbox’s five
state (normal, slight fault, moderate fault, severe fault, and flaking) of vibration signals, and a gear
fault extraction and classification recognition method, CWD-SK, is combined with HMM. CWD-SK is
insensitive to window function types and anti-noise, which avoids the difficulty of selecting the
window function. After LMD decomposition, a PF component with a larger kurtosis value is selected
according to the maximum kurtosis criterion. Then, the average value of the CWD-SK is further
calculated, and the characteristic frequencies of the gearbox in normal and initial slight fault conditions
are extracted. Finally, the HMM is used to identify the fault pattern of the characteristic signal after
extracting feature vectors and normalizing the signal. The experimental results prove that the method
can identify initial gearbox failures and accurately classify the different fault status of the signals.

Author Contributions: Conceptualization, Y.L. and W.S.; Data curation, Y.L., W.S. and F.W.; Formal analysis, Y.L.,
W.S. and F.W.; Funding acquisition, F.W., Y.Z. and W.S.; Investigation, Y.L., W.S. and F.W.; Methodology, Y.L. and
W.S.; Project administration, W.S., F.W. and Y.Z.; Resources, W.S., F.W. and E.Z.; Visualization, W.S., F.W. and Y.Z.;
Writing–Original draft, Y.L.; Writing–Review & editing, Y.L., W.S., F.W., E.Z. and Y.Z. All authors have read and
agreed to the published version of the manuscript.
Funding: This project was funded by the Key Project of Science and Technology Commission of Shanghai
Municipality (Grant No. 18511101600) and the Natural Science Foundation of Shanghai (Grant No.17ZR1411900
and 14ZR1418500).
Conflicts of Interest: We declare that we have no financial and personal relationships with other people or
organizations that can inappropriately influence our work, there is no professional or other personal interest of
any nature or kind in any product, service and/or company that could be construed as influencing the position
presented in, or the review of, the manuscript entitled.

References
1. Zhao, B.; Zhang, S.; Man, J.; Zhang, Q.; Chen, Y. A modified normal contact stiffness model considering
effect of surface topography. Proc. Inst. Mech. Eng. Part J. J. Eng. Tribol. 2014, 229, 677–688. [CrossRef]
Symmetry 2020, 12, 285 11 of 12

2. Wu, C.; Li, B.; Pang, J.; Steven, Y. High Speed Grinding of HIP-SiC Ceramics on Transformation of Microscopic
Features. Int. J. Adv. Manuf. Technol. 2019, 102, 1913–1921. [CrossRef]
3. Song, W.; Carlo, C.; Chi, C. Fractional Brownian Motion and Quantum-Behaved Particle Swarm Optimization
for Short Term Power Load Forecasting: An Integrated Approach. Energy 2019. [CrossRef]
4. Song, W.; Chen, X.; Carlo, C. Multi-Fractional Brownian Motion and Quantum-Behaved Partial Swarm
Optimization for Bearing Degradation Forecasting. Complexity 2020. [CrossRef]
5. Liu, H.; Song, W.; Li, M.; Aleksey, K.; Enrico, Z. Fractional Lévy stable motion: Finite difference iterative
forecasting model. Chaos Solitons Fractals 2020. [CrossRef]
6. Hao, Y.; Song, L.; Cui, L.; Wang, H. A three-dimensional geometric features-based SCA algorithm for
compound faults diagnosis. Measurement 2018, 134, 480–491. [CrossRef]
7. Shen, C.; Yang, J.; Tang, J.; Liu, J.; Cao, H. Parallel processing algorithm of temperature and noise error
for micro-electro-mechanical system gyroscope based on variational mode decomposition and augmented
nonlinear differentiator. Rev. Sci. Instrum. 2018, 89, 076107. [CrossRef]
8. Wang, Z.; Zheng, L.; Wang, J.; Du, W. Research of novel bearing fault diagnosis method based on improved
krill herd algorithm and kernel Extreme Learning Machine. Complexity 2019. [CrossRef]
9. Wang, Z.; Zheng, L.; Du, W. A novel method for intelligent fault diagnosis of bearing based on capsule
neural network. Complexity 2019. [CrossRef]
10. Gao, Y.; Villecco, F.; Li, M.; Song, W. Multi-Scale Permutation Entropy Based on Improved LMD and HMM
for Rolling Bearing Diagnosis. Entropy 2017, 19, 176. [CrossRef]
11. Ding, Z.; Sun, G.; Jiang, X.; Guo, M.; Liang, S. Predictive Modeling of Microgrinding Force Incorporating
Phase Transformation Effects. J. Manuf. Sci. Eng. 2019. [CrossRef]
12. Wang, Z.; Wang, J.; Cai, W.; Zhou, J.; Du, W.; Wang, J.; He, G.; He, H. Application of an Improved Ensemble
Local Mean Decomposition Method for Gearbox Composite Fault Diagnosis. Complexity 2019, 2019, 1564243.
[CrossRef]
13. Dwyer, R. Detection of non-Gaussian signals by frequency domain Kurtosis estimation. In Proceedings of
the ICASSP’83. IEEE International Conference on Acoustics, Speech, and Signal Processing, Boston, MA,
USA, 14–16 April 1983; Volume 2, pp. 607–610.
14. Antoni, J. The spectral kurtosis: A useful tool for characterising non-stationary signals. Mech. Syst. Signal
Process. 2006, 20, 282–307. [CrossRef]
15. Wang, D.; Tse, P.; Tsui, K.-L. An enhanced Kurtogram method for fault diagnosis of rolling element bearings.
Mech. Syst. Signal Process. 2013, 35, 176–199. [CrossRef]
16. Jia, F.; Lei, Y.; Shan, H.; Lin, J. Early Fault Diagnosis of Bearings Using an Improved Spectral Kurtosis by
Maximum Correlated Kurtosis Deconvolution. Sensors 2015, 15, 29363–29377. [CrossRef]
17. Combet, F.; Gelman, L. Optimal filtering of gear signals for early damage detection based on the spectral
kurtosis. Mech. Syst. Signal Process. 2009, 23, 652–668. [CrossRef]
18. Hussain, S.; Gabbar, H.A. A novel method for real time gear fault detection based on pulse shape analysis.
Mech. Syst. Signal Process. 2011, 25, 1287–1298. [CrossRef]
19. McDonald, G.; Zhao, Q.; Zuo, M. Maximum correlated Kurtosis deconvolution and application on gear tooth
chip fault detection. Mech. Syst. Signal Process. 2012, 33, 237–255. [CrossRef]
20. Cai, G.; Chen, X.; He, Z. Sparsity-enabled signal decomposition using tunable Q-factor wavelet transform for
fault feature extraction of gearbox. Mech. Syst. Signal Process. 2013, 41, 34–53. [CrossRef]
21. Liu, Z.; Zhang, Q. An Approach to Recognize the Transient Disturbances with Spectral Kurtosis.
Instrumentation and Measurement. IEEE Trans. Instrum. Meas. 2014, 63, 46–55. [CrossRef]
22. Lüscher, P.; Weibel, R.; Burghardt, D. Integrating ontological modelling and Bayesian inference for pattern
classification in topographic vector data. Comput. Environ. Urban Syst. 2009, 33, 363–374. [CrossRef]
23. Zhang, X.-Y.; Liu, C.-L. Evaluation of weighted Fisher criteria for large category dimensionality reduction in
application to Chinese handwriting recognition. Pattern Recognit. 2013, 46, 2599–2611. [CrossRef]
24. Binsen, P.; Hong, X. A mixed intelligent condition monitoring method for nuclear power plant. Ann. Nucl.
Energy 2020, 140, 107307.
25. Melin, P.; Castillo, O. A review on the applications of type-2 fuzzy logic in classification and pattern
recognition. Expert Syst. Appl. 2013, 40, 5413–5423. [CrossRef]
26. Sever, A. Neural network algorithm to pattern recognition in inverse problems: Applied Mathematics and
Computation. Crossmark 2013, 221, 484–490.
Symmetry 2020, 12, 285 12 of 12

27. Wang, G.; Li, Y.; Qing, X. TVAR-HMM-based Rolling Bearing Fault Diagnosis. J. Tianjin Univ. 2010, 43,
168–173.
28. Dong, M.; He, D.; Banerjee, P. Equipment health diagnosis and prognosis using hidden semi Markov models.
Int. J. Adv. Manuf. Technol. 2006, 30, 738–749. [CrossRef]
29. Najkar, N.; Razzazi, F.; Sameti, H. A novel approach to HMM-based speech recognition system using particle
swarm optimization. Math. Comput. Model. 2010, 52, 1910–1920. [CrossRef]
30. Zhou, J.; Cheng, L.; Zhou, L.; Chu, Z. Traffic Incident Prediction on Intersections Based on HMM. J. Transp.
Syst. Eng. Inf. Technol. 2013, 13, 52–59. [CrossRef]
31. Chen, T.; Xue, Z.; Wang, C. Motion correction for cellular-resolution multi-photon fluorescence microscopy
imaging of awake head-restrained mice using speed embedded HMM. Comput. Med Imaging Graph. 2012, 36,
171–182. [CrossRef]
32. Tai, A.; Ching, W.-K.; Chan, L. Detection of machine failure: Hidden Markov Model approach. Comput. Ind.
Eng. 2009, 57, 608–619. [CrossRef]
33. Liu, Z. A Classification Method for Complex Power Quality Disturbances Using EEMD and Rank Wavelet
SVM. IEEE Trans. Smart Grid 2016, 6, 1678–1685. [CrossRef]
34. Reddy, K.; Rao, K. Excitation modelling using epoch features for statistical parametric speech synthesis.
Comput. Speech Lang. 2019, 60, 101029. [CrossRef]
35. Xu, M.; Piero, B.; Sameer, A.; Enrico, Z. Fault prognostics by an ensemble of Echo State Networks in presence
of event based measurements. Eng. Appl. Artif. Intell. 2020, 87, 103346. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like