All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Description

Difference Histograms: A new tool for time series analysis applied to bearing
fault diagnosis
Barend J. van Wyk
a,
*
, Michaël A. van Wyk
b
, Guoyuan Qi
a
a
French South African Technical Institute in Electronics (F’SATIE) at the Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa
b
School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
a r t i c l e i n f o
Article history:
Received 3 May 2007
Recei

Transcript

Difference Histograms: A new tool for time series analysis applied to bearingfault diagnosis
Barend J. van Wyk
a,
*
, Michaël A. van Wyk
b
, Guoyuan Qi
a
a
French South African Technical Institute in Electronics (F’SATIE) at the Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa
b
School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
a r t i c l e i n f o
Article history:
Received 3 May 2007Received in revised form24 December 2008Available online 9 January 2009Communicated by R.C. Guido
Keywords:
Time series classiﬁcationFeature extractionBearing fault diagnosisPattern spectraVibration analysisDifference Histograms
a b s t r a c t
A powerful tool for bearing time series feature extraction and classiﬁcation is introduced that is compu-tationally inexpensive, easy to implement and suitable for real-time applications. In this paper the pro-posed technique is applied to two rolling element bearing time series classiﬁcation problems and shownthat in some cases no data pre-processing, artiﬁcial neural network or nearest neighbour approaches arerequired. From the results obtained it is clear that for the speciﬁc applications considered, the proposedmethod performed as well as or better than alternative approaches based on conventional featureextraction.
2009 Elsevier B.V. All rights reserved.
1. Introduction
The concept of a Difference Histogram, a new tool for time ser-ies feature extraction, is introduced in this paper and applied totwo rolling element bearing time series classiﬁcation problems.Since rolling element bearing failures are one of the foremostcauses of failures in rotating machinery, condition monitoring isimportant for system maintenance and process automation. Inmany cases the simplest approach is to directly measure vibrationof the rotating machine using an accelerometer. The presence of noise and the wide variety of possible faults complicate diagnosticprocedures. Very often fault diagnosis rely on expert experience,statistical analysis, or the use of classical time and frequency do-main analysis techniques.During the past decade various signal processing and patternrecognition approaches were added to the arsenal of availablediagnostics tools: Nikolaou and Antoniadis (2002) introduced aneffective demodulation method based on the use of complexshifted Morlet wavelets, Chen and Mo (2004) used wavelet trans-formtechniquesincombinationwithafunctionapproximationap-proach to extract fault features which were used with a neuralnetwork, Lou and Loparo (2004) introduced a scheme based onthe wavelet transform and a neuro-fuzzy classiﬁcation strategy,Altman and Mathew (2001) used discrete wavelet packet analysisto enhance the detection and diagnostics of low-speed rolling ele-ment bearing faults, Zhang et al. (2005) introduced an approachbased on localised wavelet packet bases of vibration signals, Sunand Tang (2002) applied the wavelet transform to detect abruptchanges in vibration signals, and Prabhakar et al. (2002) alsoshowed that the discrete wavelet transform can be used for im-proved detection of bearing faults.Subrahmanyam and Sujatha (1997) demonstrated that a multi-layered feedforward network and an ART-2 network can be usedfor the automatic detection and diagnosis of localised ball bearingdefects, Kowalski and Orlowska-Kowalska (2003) showed thatKohonen networks can be used as an introductory step before aneural detector for initial classiﬁcation, Spoerre (1997) appliedthe cascade correlation algorithm to bearing fault classiﬁcationproblems, Gelle and Colas (2001) used blind source separation asa pre-processing step to rotating machinery fault detection anddiagnosis, Zhang et al. (2005) used a genetic programming ap-proach and Samanta et al. (2003) used a support vector machinein conjunction with a genetic algorithm.Results obtained using the proposed Difference Histograms fortworollingelementbearingtimeseriesfeatureextractionandclas-siﬁcation problems are compared to the work of Samanta and Al-Balushi (2003) and Kith et al. (2006), both based on conventionaltime domain feature extraction and supervised learning. The
0167-8655/$ - see front matter
2009 Elsevier B.V. All rights reserved.doi:10.1016/j.patrec.2008.12.012
*
Corresponding author. Tel.: +27 12 382 4191; fax: +27 12 382 5294.
E-mail address:
vanwykb@gmail.com (B.J. van Wyk).Pattern Recognition Letters 30 (2009) 595–599
Contents lists available at ScienceDirect
Pattern Recognition Letters
journal homepage: www.elsevier.com/locate/patrec
feature extraction methodology described in (Lou and Loparo,2004) is also explored for purposes of comparison. The DifferenceHistogram algorithm is introduced in Section 2 and the data sets,feature extraction and classiﬁcation methodologies and resultsare described in Sections 3 and 4. Section 5 concludes the paper.
2. Difference Histograms
The idea of a
Difference Histogram
is summarised by the follow-ing three deﬁnitions:
Deﬁnition 1.
A
Difference Histogram
,
X
, is deﬁned as a scaledrepresentation of the number of occurrences of the lengths of
Segments of Increase
in a block of
N
samples of a discrete timeseries,
/
(
n
).
Deﬁnition 2.
A
Segment of Increase
is a group of consecutivesamples in a discrete time series
U
(
n
) such that
U
ð
n
þ
1
Þ
U
ð
n
Þ
>
U
ð
n
Þ
U
ð
n
1
Þ
e
, where
e
is a
Tolerance Parameter
.
Deﬁnition 3.
The
Tolerance Parameter
,
e
, is deﬁned as a positivereal number chosen to maximise some distance measure between
X
i
, where
X
i
,
i
¼
1
;
. . .
;
C
are the Difference Histograms obtainedfrom time series belonging to
C
different classes.As is evident from Algorithm1 which operates directly on con-secutive blocks of
N
samples of a
discrete time series
U
, a
DifferenceHistogram
is extremely easy to implement and has a complexity of only
O
ð
N
Þ
, which makes it ideal for real-time applications:
Algorithm 1
1:
initialise:
e
,
k
¼
0,
D
¼
0,
X
¼
0
2:
for
n
¼
2
:
N
3:
D
U
ð
n
Þ
U
ð
n
1
Þ
4:
if
D
>
D
e
5:
increment
k
6:
else
7:
increment
X
ð
k
Þ
8:
k
09:
end if
10:
D
D
11:
end for
12:
scale
X
Scalingisneededtokeep
X
independent of theblocksize
N
. Forour implementation we have divided each histogram bin,
X
ð
k
Þ
, by
N
=
100 where 100 was simply chosen for convenience to have thevalues of
X
ð
k
Þ
in a convenient range. Since in many cases only
se-lected
histogrambinsindexedby
k
arecalculatedinsteadofthefulldifference histogram, conventional normalisation is not recom-mended. It should be noted that the algorithm processes blocksof data where the size of each block is deﬁned by
N
and thereforeevenif
X
ð
k
Þ
is increasedeachtimeduringthe
N
1loop, themax-imum value
X
ð
k
Þ
can have is
N
1. Singularity will therefore onlybeanissueif
N
tendstoinﬁnity. It isobviousthataDifferenceHis-togramcanalsobedeﬁnedusing
Segments of Decrease
.A
Segment of Decrease
is be deﬁned as a group of consecutive samples in a dis-crete time series
U
, indexed by
n
, such that
U
ð
n
Þ
U
ð
n
1
Þ
<
U
ð
n
þ
1
Þ
U
ð
n
Þ þ
e
.Itshouldbeobservedthat
e
createsatoleranceregion around
D
(where
D
is the value for
D
at iteration
n
1).How to select
e
for a speciﬁc application will be illustrated inSection 3.In Sections 3 and 4 the simplicity and power of the DifferenceHistogram for time series classiﬁcation will be illustrated usingtwo well-known datasets. In this paper we only consider the
two-class
case for which the procedure is summarised by Algo-rithm 2:
Algorithm 2
1: Use training data and Algorithm 1 to compute
k
X
1
X
2
k
forincreasing values of
e
with
N
set sufﬁciently large or equal tothe total number of training samples available.2: Determine the optimal value for
e
, i.e. the value that maxi-mises the separation between
X
1
and
X
2
.3: Givenanoptimal
e
, calculate
X
usingtrainingdatatorepeat-edly train a classiﬁer such as a neural network (described inSection4)orsomedistance-basedmethod(describedinSection3) for increasing values of
N
, starting with
N
sufﬁciently small.4: Choose smallest
N
giving acceptable training results.5: Perform classiﬁcation using chosen
e
and
N
.Inthemulti-classcase an approachcanbe adoptedreminiscentof the
linear machine
approach described in (Duda et al., 2000): for
C
different classes this means that
C
two-class classiﬁers aretrained with the
k
th classiﬁer classifying
X
k
and
X
k
. Finally, themulti-class classiﬁer is implemented to yield the class associatedwith that two-class classiﬁer whose output is the maximum.
3. Landustrie dataset
The data used in this case study are measurements from accel-erometers on a submersible pump driven by an electric motor ac-quired in the
Delft Machine Diagnostics by Neural Networks
projectwith the assistance of Landustrie B.V, The Netherlands, and canbe freely downloaded from http://www.aypma.nl/PhD/pump_-sets.html. Separate measurements were obtained for a normalbearing and a bearing with an outer race defect at the upper end.The sensors were placed at ﬁve different positions and sampledat 51.2kHz while the pump was rotating at 1123rpm. A total of 20,480 samples were recorded for each sensor, under normal con-ditions and when the bearing had an outer race defect. For eachsensor the ﬁrst 10,240 samples were used for training and theremaining samples for testing. Fig. 1 shows the scaled histogramobtained using the training time series from sensor 4 with
e
¼
0
:
018. The difference between a normal and faulty bearing isclearlyvisible.Ingeneral,iftimeseriesdata(whetherofadefectivedeviceornot)isstationary,thenthevarianceofthehistogram,asa
1 2 3 4 5 6 7 8 9 1002468101214161820Bin Number
S c a l e d M a g n i t u d e
Normal BearingFaulty Bearing
Fig. 1.
DifferenceHistogramsextractedfromthetrainingsetsfornormalandfaultybearings for sensor 4.596
B.J. van Wyk et al./Pattern Recognition Letters 30 (2009) 595–599
feature,willdecreaseasymptoticallyastheblocksize
N
(numberof samples) used to calculate it increases, and in the limit, will con-verge. However, if the data is non-stationary then the histogram,asafeature,mayormaynotconverge,inthelimit,astheblocksize
N
used to calculate increases to inﬁnity.Steps1and2of Algorithm2:foreachsensor,thebestchoicefor
e
, the Tolerance Parameter, can be determined by calculating
X
1
,the Difference Histogram obtained using the 10,240 training sam-ples recorded under normal conditions and
X
2
, the Difference His-togramobtainedusingthe10,240trainingsamplesrecordedunderbearing fault conditions, and computing
k
X
1
X
2
k
2
for increasingvalues of
e
. Fig. 2 illustrates the result and shows that the optimalTolerance Parameter for sensor 4 is 0.018 (corresponding to themaximum histogram separation value in Fig. 2). Table 1 has been
obtained by repeating the process for all ﬁve sensors.Step 3 of Algorithm 2: once the optimal Tolerance Parametersfor each sensor have been determined, the associated DifferenceHistograms (or selected bins from suitable sensors) can be usedas the input to a classiﬁer such as a neural network for trainingand classiﬁcation. However, for the
two-class
application consid-ered in this section, it was found that using the
l
1
norm as a dis-tance measure, i.e. using only the most discriminative histogrambin, together with a hard threshold proved more than sufﬁcient.From Table 1 it is clear that recordings from sensor 4 are the mostsuitable for classiﬁcation since this sensor has the largest histo-gramseparation, followedbysensor 1as asecondchoice. Thenextstep is now to ﬁnd the most discriminative histogram bin associ-ated with sensor 4. By calculating the differences between corre-sponding bins of the Difference Histograms
X
1
and
X
2
, derivedfrom the training data from the normal and faulty bearings, it ispossibletodeterminethemostdiscriminativebinswhicharelistedin Table 2 (histogram bins 1, 2 and 3 cf. Fig. 1). For the most dis-
criminative bin
k
a
Bin Threshold
given by
ð
X
1
ð
k
Þ þ
X
2
ð
k
ÞÞ
=
2 canbecalculated.Classifyingasignalasbelongingtoanormalorfaultybearing then boils down to comparing the value of the most dis-criminative bin from the Difference Histogram obtained from thetesting set, to the
Bin Threshold
of the most discriminative bin. Asshown in Table 2, for sensor 4 the most discriminative bin is
k
¼
1 with an associated
Bin Threshold
of 13.07. As shown inFig. 1, thevalueofbin1forthefaultybearingclassexceedstheva-lue of bin1for thenormal bearingclass. Therefore, if
X
ð
1
Þ
exceeds13.07 it is classiﬁed as belonging to the faulty bearing class, other-wise it is classiﬁed as belonging to the normal bearing class.Step 4 of Algorithm 2: the optimal block size,
N
, must now bedetermined. The results of the Difference Histogram classiﬁcationexperiment using training data and the most discrimitive histo-gram bin associated with sensor 4 is shown in Fig. 3. For compar-ison the results fromsensors 1 and 5 are also shown in Figs. 4 and5. These ﬁgures showthepercentagecorrect classiﬁcations, fromatotalof
b
10
;
240
=
N
c
classiﬁcations,usingthetesttimeseriesforthenormal and faulty bearing classes, respectively, where the blocksize
N
is the size of the batch of samples processed before a classi-ﬁcationdecisionismade.Fig.3showsthatforsensor1ablocksizeof
N
>
400 is sufﬁcient for a 100% correct classiﬁcation rate. Fig. 5showsthatsensor 5isnotsuitable. Sensor1gavesimilarresultstosensor 4 for
N
¼
600 and that although not optimal, sensors 2 and3 can also be used provided that
N
is chosen large enough.Step 5 of Algorithm 2: using the test data from sensor 4 and ablock size of
N
¼
600 gave a 100% correct classiﬁcation rate. Alter-natively using test data from sesnor 1 and a block size of
N
¼
600also gave a 100% correct classiﬁcation rate.For this dataset the Difference Histogram method was com-pared to nearest neighbour approaches using the same featuresproposed by Samanta and Al-Balushi (2003) who experimentedwith the same dataset: root mean square
rms
¼
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ P
U
2
ð
n
Þ
N
q
, variance(
r
2
¼
E
f
U
2
ð
n
Þg
),normalisedthirdcentralmoment
c
3
¼
E
f
U
3
ð
n
Þg
r
3
,nor-malised fourth central moment
c
4
¼
E
f
U
4
ð
n
Þg
r
4
and the normalisedsixth central moment
c
6
¼
E
f
U
6
ð
n
Þg
r
6
, where
U
ð
n
Þ ¼
U
ð
n
Þ
l
and
l
¼
E
f
U
ð
n
Þg
. As in (Samanta and Al-Balushi, 2003) the testing
and training time series were respectively divided into 20 non-overlapping blocks of 1024 samples. Each block was processed toextract these ﬁve features.
0 0.01 0.02 0.03 0.04 0.054567891011Tolerance Parameter
H i s t o g r a m S e p a r a t i o n
Fig. 2.
Inﬂuence of Tolerance Parameter on separation of histograms for data fromsensor 4.
Table 1
Determining the optimal Tolerance Parameter.
Sensor 1 2 3 4 5Optimal Tolerance Parameter 0.011 0.011 0.007 0.018 0.000Maximum histogram separation 8.42 2.97 2.35 10.83 2.85
Table 2
Determining the optimal histogram bins for sensor 4.
Histogram bin 1 2 3Bin difference 10.62 1.63 0.76Bin threshold 13.07 3.06 3.54
0 200 400 600 800 1000707580859095100Block Size
P e r c e n t a g e C o r r e c t B l o c k C l a s s i f i c a t i o n s
Normal BearingFaulty Bearing
Fig. 3.
Classiﬁcation results for sensor 4.
B.J. van Wyk et al./Pattern Recognition Letters 30 (2009) 595–599
597
Since both a normal bearing and a faulty bearing recording areavailable for each sensor, there were 40 feature vectors availableper sensor (i.e. 20 feature vectors for normal bearing data and 20feature vectors for faulty bearing data). As in (Samanta and Al-Balushi, 2003), we divided these feature vectors, for each sensor,into two groups. One group for training (consisting of the ﬁrst 12featurevectorsfornormalbearingdataandtheﬁrst12featurevec-torsforfaultybearingdata)andonegroupfortesting(consistingof the 16 remaining feature vectors).All ﬁve features were used to represent the sensor signals. Allsensor signals were tested both individually and in groups. Theobjective of this experiment was to demonstrate the diagnosticcapability of the Nearest Neighbour (NN) and the Variable-kernelSimilarity Metric (VSM) learning approaches for different sensorsignals. The results of the diagnostic capability of NN comparedagainst the VSM for training and testing are reported. The VSMapproach introduced by Lowe (1995) attaches more impor-tance to closer neighbours by determining the weight assigned toeach neighbour by learning the optimal parameters of a kernelfunction.Table 3 (for training) and Table 4 (for testing) report the results
oftheexperiment.TheseresultsshowthattheVSMperformedbet-terthantheNN,asexpected.Forthesetwotablesweobservedthatwhenever sensor 1 or sensor 5 was used as input (individually orgrouped), the success rate for training and testing is worse thanwhenusingotherinputsignals.Theeffectsofusingdifferentsignalfeature combinations for training and testing were also investi-gated, but no signiﬁcant improvement in performance was ob-served. The reader may consult Kith et al. (2006) for more detail.TheresultsinTables3and4aresimilartothatobtainedbySaman-
ta and Al-Balushi (2003) using their artiﬁcial neural network ap-proach without pre-ﬁltering the sensor signals. The reader isreferred to their work for more information on the structure anddetails of the feedforward neural network used. Samanta and Al-Balushi (2003) also studied the effects of various pre-processingtechniques like band-pass and high-pass ﬁltering, envelope detec-tion and wavelet transform processing, achieving a 100% trainingand test success in some cases where more than one feature ormore than one sensor signal were used for training and testing. Itis therefore signiﬁcant to note that a 100% success rate can beachieved using only a single Difference Histogram bin from anindividual sensor
without
using a nearest neighbour or artiﬁcialneural network approach.
4. Case Western dataset
The dataset used in this section was acquired by the Case Wes-tern Reserve University (CWRU) Bearing Data Center with helpfrom Rockwell Science, CVX, and the Ofﬁce of Naval Researchandcanbe freely downloadedfromhttp://www.eecs.case.edu/lab-oratory/bearing. The test setup consisted of a motor where bear-ings supported the motor shaft, a torque transducer and adynamometer. Single point faults of sizes 7 mils, 14 mils and 21mils wereintroducedtotheouter raceway, ball and inner racewayof the front end and drive end bearings respectively. For each fault
0 200 400 600 800 10007580859095100Block Size
P e r c e n t a g e C o r r e c t B l o c k C l a s s i f i c a t i o n s
Normal BearingFaulty Bearing
Fig. 4.
Classiﬁcation results for sensor 1.
0 500 1000 1500 2000051015202530354045Block Size
P e r c e n t a g e C o r r e c t B l o c k C l a s s i f i c a t i o n s
Normal BearingFaulty Bearing
Fig. 5.
Classiﬁcation results for sensor 5.
Table 3
Effects of input signals from different sensors on identiﬁcation of machine conditionwith ﬁve features (rms,
r
2
;
c
3
;
c
4
;
c
6
). Results for training.
Sensor(s) Training successNN (%) VSM (%)1 29 702 95 1003 91 1004 100 1005 29 622,3 97 1002,3,4 97 1001,2,3,4 83 891,2,3,4,5 75 81
Table 4
Effects of input signals from different sensors on identiﬁcation of machine conditionwith ﬁve features (rms,
r
2
;
c
3
;
c
4
;
c
6
). Results for testing.
Sensor(s) Test successNN (%) VSM (%)1 43 622 87 1003 87 1004 100 1005 43 502,3 93 932,3,4 91 951,2,3,4 81 891,2,3,4,5 81 78598
B.J. van Wyk et al./Pattern Recognition Letters 30 (2009) 595–599

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks