Documents

Difference Histograms A new tool for time series analysis applied to bearing fault diagnosis-dataset.pdf

Categories
Published
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
Difference Histograms: A new tool for time series analysis applied to bearing fault diagnosis Barend J. van Wyk a, * , Michaël A. van Wyk b , Guoyuan Qi a a French South African Technical Institute in Electronics (F’SATIE) at the Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa b School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa a r t i c l e i n f o Article history: Received 3 May 2007 Recei
Transcript
  Difference Histograms: A new tool for time series analysis applied to bearingfault diagnosis Barend J. van Wyk a, * , Michaël A. van Wyk b , Guoyuan Qi a a French South African Technical Institute in Electronics (F’SATIE) at the Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa b School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa a r t i c l e i n f o  Article history: Received 3 May 2007Received in revised form24 December 2008Available online 9 January 2009Communicated by R.C. Guido Keywords: Time series classificationFeature extractionBearing fault diagnosisPattern spectraVibration analysisDifference Histograms a b s t r a c t A powerful tool for bearing time series feature extraction and classification is introduced that is compu-tationally inexpensive, easy to implement and suitable for real-time applications. In this paper the pro-posed technique is applied to two rolling element bearing time series classification problems and shownthat in some cases no data pre-processing, artificial neural network or nearest neighbour approaches arerequired. From the results obtained it is clear that for the specific applications considered, the proposedmethod performed as well as or better than alternative approaches based on conventional featureextraction.   2009 Elsevier B.V. All rights reserved. 1. Introduction The concept of a Difference Histogram, a new tool for time ser-ies feature extraction, is introduced in this paper and applied totwo rolling element bearing time series classification problems.Since rolling element bearing failures are one of the foremostcauses of failures in rotating machinery, condition monitoring isimportant for system maintenance and process automation. Inmany cases the simplest approach is to directly measure vibrationof the rotating machine using an accelerometer. The presence of noise and the wide variety of possible faults complicate diagnosticprocedures. Very often fault diagnosis rely on expert experience,statistical analysis, or the use of classical time and frequency do-main analysis techniques.During the past decade various signal processing and patternrecognition approaches were added to the arsenal of availablediagnostics tools: Nikolaou and Antoniadis (2002) introduced aneffective demodulation method based on the use of complexshifted Morlet wavelets, Chen and Mo (2004) used wavelet trans-formtechniquesincombinationwithafunctionapproximationap-proach to extract fault features which were used with a neuralnetwork, Lou and Loparo (2004) introduced a scheme based onthe wavelet transform and a neuro-fuzzy classification strategy,Altman and Mathew (2001) used discrete wavelet packet analysisto enhance the detection and diagnostics of low-speed rolling ele-ment bearing faults, Zhang et al. (2005) introduced an approachbased on localised wavelet packet bases of vibration signals, Sunand Tang (2002) applied the wavelet transform to detect abruptchanges in vibration signals, and Prabhakar et al. (2002) alsoshowed that the discrete wavelet transform can be used for im-proved detection of bearing faults.Subrahmanyam and Sujatha (1997) demonstrated that a multi-layered feedforward network and an ART-2 network can be usedfor the automatic detection and diagnosis of localised ball bearingdefects, Kowalski and Orlowska-Kowalska (2003) showed thatKohonen networks can be used as an introductory step before aneural detector for initial classification, Spoerre (1997) appliedthe cascade correlation algorithm to bearing fault classificationproblems, Gelle and Colas (2001) used blind source separation asa pre-processing step to rotating machinery fault detection anddiagnosis, Zhang et al. (2005) used a genetic programming ap-proach and Samanta et al. (2003) used a support vector machinein conjunction with a genetic algorithm.Results obtained using the proposed Difference Histograms fortworollingelementbearingtimeseriesfeatureextractionandclas-sification problems are compared to the work of  Samanta and Al-Balushi (2003) and Kith et al. (2006), both based on conventionaltime domain feature extraction and supervised learning. The 0167-8655/$ - see front matter   2009 Elsevier B.V. All rights reserved.doi:10.1016/j.patrec.2008.12.012 *  Corresponding author. Tel.: +27 12 382 4191; fax: +27 12 382 5294. E-mail address:  vanwykb@gmail.com (B.J. van Wyk).Pattern Recognition Letters 30 (2009) 595–599 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec  feature extraction methodology described in (Lou and Loparo,2004) is also explored for purposes of comparison. The DifferenceHistogram algorithm is introduced in Section 2 and the data sets,feature extraction and classification methodologies and resultsare described in Sections 3 and 4. Section 5 concludes the paper. 2. Difference Histograms The idea of a  Difference Histogram  is summarised by the follow-ing three definitions: Definition 1.  A  Difference Histogram ,  X , is defined as a scaledrepresentation of the number of occurrences of the lengths of  Segments of Increase  in a block of   N   samples of a discrete timeseries,  / ( n ). Definition 2.  A  Segment of Increase  is a group of consecutivesamples in a discrete time series  U ( n ) such that  U ð n  þ 1 Þ U ð n Þ  >  U ð n Þ  U ð n   1 Þ  e , where  e  is a  Tolerance Parameter  . Definition 3.  The  Tolerance Parameter  ,  e , is defined as a positivereal number chosen to maximise some distance measure between X i , where  X i ,  i  ¼  1 ; . . . ; C   are the Difference Histograms obtainedfrom time series belonging to  C   different classes.As is evident from Algorithm1 which operates directly on con-secutive blocks of   N   samples of a  discrete time series  U , a  DifferenceHistogram  is extremely easy to implement and has a complexity of only  O ð N  Þ , which makes it ideal for real-time applications:  Algorithm 1 1:  initialise:  e ,  k  ¼  0,  D  ¼  0,  X  ¼  0 2:  for   n  ¼  2  :  N  3:  D    U ð n Þ  U ð n   1 Þ 4:  if   D  >  D  e 5:  increment  k 6:  else 7:  increment  X ð k Þ 8:  k    09:  end if  10:  D    D 11:  end for  12:  scale  X Scalingisneededtokeep X  independent of theblocksize N  . Forour implementation we have divided each histogram bin,  X ð k Þ , by N  = 100 where 100 was simply chosen for convenience to have thevalues of   X ð k Þ  in a convenient range. Since in many cases only  se-lected  histogrambinsindexedby k  arecalculatedinsteadofthefulldifference histogram, conventional normalisation is not recom-mended. It should be noted that the algorithm processes blocksof data where the size of each block is defined by  N   and thereforeevenif   X ð k Þ  is increasedeachtimeduringthe  N    1loop, themax-imum value  X ð k Þ  can have is  N    1. Singularity will therefore onlybeanissueif   N   tendstoinfinity. It isobviousthataDifferenceHis-togramcanalsobedefinedusing Segments of Decrease .A Segment of Decrease  is be defined as a group of consecutive samples in a dis-crete time series  U , indexed by  n , such that  U ð n Þ  U ð n   1 Þ  < U ð n  þ 1 Þ  U ð n Þ þ e .Itshouldbeobservedthat e createsatoleranceregion around  D  (where  D  is the value for  D  at iteration  n   1).How to select  e  for a specific application will be illustrated inSection 3.In Sections 3 and 4 the simplicity and power of the DifferenceHistogram for time series classification will be illustrated usingtwo well-known datasets. In this paper we only consider the two-class  case for which the procedure is summarised by Algo-rithm 2:  Algorithm 2 1: Use training data and Algorithm 1 to compute  k X 1   X 2 k  forincreasing values of   e  with  N   set sufficiently large or equal tothe total number of training samples available.2: Determine the optimal value for  e , i.e. the value that maxi-mises the separation between  X 1  and  X 2 .3: Givenanoptimal e , calculate X  usingtrainingdatatorepeat-edly train a classifier such as a neural network (described inSection4)orsomedistance-basedmethod(describedinSection3) for increasing values of   N  , starting with  N   sufficiently small.4: Choose smallest  N   giving acceptable training results.5: Perform classification using chosen  e  and  N  .Inthemulti-classcase an approachcanbe adoptedreminiscentof the  linear machine  approach described in (Duda et al., 2000): for C   different classes this means that  C   two-class classifiers aretrained with the  k th classifier classifying  X k  and  X k . Finally, themulti-class classifier is implemented to yield the class associatedwith that two-class classifier whose output is the maximum. 3. Landustrie dataset The data used in this case study are measurements from accel-erometers on a submersible pump driven by an electric motor ac-quired in the  Delft Machine Diagnostics by Neural Networks  projectwith the assistance of Landustrie B.V, The Netherlands, and canbe freely downloaded from http://www.aypma.nl/PhD/pump_-sets.html. Separate measurements were obtained for a normalbearing and a bearing with an outer race defect at the upper end.The sensors were placed at five different positions and sampledat 51.2kHz while the pump was rotating at 1123rpm. A total of 20,480 samples were recorded for each sensor, under normal con-ditions and when the bearing had an outer race defect. For eachsensor the first 10,240 samples were used for training and theremaining samples for testing. Fig. 1 shows the scaled histogramobtained using the training time series from sensor 4 with e  ¼  0 : 018. The difference between a normal and faulty bearing isclearlyvisible.Ingeneral,iftimeseriesdata(whetherofadefectivedeviceornot)isstationary,thenthevarianceofthehistogram,asa 1 2 3 4 5 6 7 8 9 1002468101214161820Bin Number     S  c  a   l  e   d   M  a  g  n   i   t  u   d  e Normal BearingFaulty Bearing Fig. 1.  DifferenceHistogramsextractedfromthetrainingsetsfornormalandfaultybearings for sensor 4.596  B.J. van Wyk et al./Pattern Recognition Letters 30 (2009) 595–599  feature,willdecreaseasymptoticallyastheblocksize N  (numberof samples) used to calculate it increases, and in the limit, will con-verge. However, if the data is non-stationary then the histogram,asafeature,mayormaynotconverge,inthelimit,astheblocksize N   used to calculate increases to infinity.Steps1and2of Algorithm2:foreachsensor,thebestchoicefor e , the Tolerance Parameter, can be determined by calculating  X 1 ,the Difference Histogram obtained using the 10,240 training sam-ples recorded under normal conditions and  X 2 , the Difference His-togramobtainedusingthe10,240trainingsamplesrecordedunderbearing fault conditions, and computing  k X 1   X 2 k 2  for increasingvalues of   e . Fig. 2 illustrates the result and shows that the optimalTolerance Parameter for sensor 4 is 0.018 (corresponding to themaximum histogram separation value in Fig. 2). Table 1 has been obtained by repeating the process for all five sensors.Step 3 of  Algorithm 2: once the optimal Tolerance Parametersfor each sensor have been determined, the associated DifferenceHistograms (or selected bins from suitable sensors) can be usedas the input to a classifier such as a neural network for trainingand classification. However, for the  two-class  application consid-ered in this section, it was found that using the  l 1  norm as a dis-tance measure, i.e. using only the most discriminative histogrambin, together with a hard threshold proved more than sufficient.From Table 1 it is clear that recordings from sensor 4 are the mostsuitable for classification since this sensor has the largest histo-gramseparation, followedbysensor 1as asecondchoice. Thenextstep is now to find the most discriminative histogram bin associ-ated with sensor 4. By calculating the differences between corre-sponding bins of the Difference Histograms  X 1  and  X 2 , derivedfrom the training data from the normal and faulty bearings, it ispossibletodeterminethemostdiscriminativebinswhicharelistedin Table 2 (histogram bins 1, 2 and 3 cf. Fig. 1). For the most dis- criminative bin  k  a  Bin Threshold  given by  ð X 1 ð k Þ þ X 2 ð k ÞÞ = 2 canbecalculated.Classifyingasignalasbelongingtoanormalorfaultybearing then boils down to comparing the value of the most dis-criminative bin from the Difference Histogram obtained from thetesting set, to the  Bin Threshold  of the most discriminative bin. Asshown in Table 2, for sensor 4 the most discriminative bin is k  ¼  1 with an associated  Bin Threshold  of 13.07. As shown inFig. 1, thevalueofbin1forthefaultybearingclassexceedstheva-lue of bin1for thenormal bearingclass. Therefore, if   X ð 1 Þ  exceeds13.07 it is classified as belonging to the faulty bearing class, other-wise it is classified as belonging to the normal bearing class.Step 4 of  Algorithm 2: the optimal block size,  N  , must now bedetermined. The results of the Difference Histogram classificationexperiment using training data and the most discrimitive histo-gram bin associated with sensor 4 is shown in Fig. 3. For compar-ison the results fromsensors 1 and 5 are also shown in Figs. 4 and5. These figures showthepercentagecorrect classifications, fromatotalof  b 10 ; 240 = N  c classifications,usingthetesttimeseriesforthenormal and faulty bearing classes, respectively, where the blocksize  N   is the size of the batch of samples processed before a classi-ficationdecisionismade.Fig.3showsthatforsensor1ablocksizeof   N   >  400 is sufficient for a 100% correct classification rate. Fig. 5showsthatsensor 5isnotsuitable. Sensor1gavesimilarresultstosensor 4 for  N   ¼  600 and that although not optimal, sensors 2 and3 can also be used provided that  N   is chosen large enough.Step 5 of  Algorithm 2: using the test data from sensor 4 and ablock size of   N   ¼  600 gave a 100% correct classification rate. Alter-natively using test data from sesnor 1 and a block size of   N   ¼  600also gave a 100% correct classification rate.For this dataset the Difference Histogram method was com-pared to nearest neighbour approaches using the same featuresproposed by Samanta and Al-Balushi (2003) who experimentedwith the same dataset: root mean square  rms  ¼  ffiffiffiffiffiffiffiffiffiffiffiffiffiffi P U 2 ð n Þ N  q   , variance(  r 2 ¼  E  f U 2 ð n Þg ),normalisedthirdcentralmoment c 3  ¼  E  f U 3 ð n Þg r 3  ,nor-malised fourth central moment  c 4  ¼  E  f U 4 ð n Þg r 4  and the normalisedsixth central moment  c 6  ¼  E  f U 6 ð n Þg r 6  , where  U ð n Þ ¼  U ð n Þ  l  and l ¼  E  f U ð n Þg . As in (Samanta and Al-Balushi, 2003) the testing and training time series were respectively divided into 20 non-overlapping blocks of 1024 samples. Each block was processed toextract these five features. 0 0.01 0.02 0.03 0.04 0.054567891011Tolerance Parameter    H   i  s   t  o  g  r  a  m    S  e  p  a  r  a   t   i  o  n Fig. 2.  Influence of Tolerance Parameter on separation of histograms for data fromsensor 4.  Table 1 Determining the optimal Tolerance Parameter. Sensor 1 2 3 4 5Optimal Tolerance Parameter 0.011 0.011 0.007 0.018 0.000Maximum histogram separation 8.42 2.97 2.35 10.83 2.85  Table 2 Determining the optimal histogram bins for sensor 4. Histogram bin 1 2 3Bin difference 10.62 1.63 0.76Bin threshold 13.07 3.06 3.54 0 200 400 600 800 1000707580859095100Block Size    P  e  r  c  e  n   t  a  g  e   C  o  r  r  e  c   t   B   l  o  c   k   C   l  a  s  s   i   f   i  c  a   t   i  o  n  s Normal BearingFaulty Bearing Fig. 3.  Classification results for sensor 4. B.J. van Wyk et al./Pattern Recognition Letters 30 (2009) 595–599  597  Since both a normal bearing and a faulty bearing recording areavailable for each sensor, there were 40 feature vectors availableper sensor (i.e. 20 feature vectors for normal bearing data and 20feature vectors for faulty bearing data). As in (Samanta and Al-Balushi, 2003), we divided these feature vectors, for each sensor,into two groups. One group for training (consisting of the first 12featurevectorsfornormalbearingdataandthefirst12featurevec-torsforfaultybearingdata)andonegroupfortesting(consistingof the 16 remaining feature vectors).All five features were used to represent the sensor signals. Allsensor signals were tested both individually and in groups. Theobjective of this experiment was to demonstrate the diagnosticcapability of the Nearest Neighbour (NN) and the Variable-kernelSimilarity Metric (VSM) learning approaches for different sensorsignals. The results of the diagnostic capability of NN comparedagainst the VSM for training and testing are reported. The VSMapproach introduced by Lowe (1995) attaches more impor-tance to closer neighbours by determining the weight assigned toeach neighbour by learning the optimal parameters of a kernelfunction.Table 3 (for training) and Table 4 (for testing) report the results oftheexperiment.TheseresultsshowthattheVSMperformedbet-terthantheNN,asexpected.Forthesetwotablesweobservedthatwhenever sensor 1 or sensor 5 was used as input (individually orgrouped), the success rate for training and testing is worse thanwhenusingotherinputsignals.Theeffectsofusingdifferentsignalfeature combinations for training and testing were also investi-gated, but no significant improvement in performance was ob-served. The reader may consult Kith et al. (2006) for more detail.TheresultsinTables3and4aresimilartothatobtainedbySaman- ta and Al-Balushi (2003) using their artificial neural network ap-proach without pre-filtering the sensor signals. The reader isreferred to their work for more information on the structure anddetails of the feedforward neural network used. Samanta and Al-Balushi (2003) also studied the effects of various pre-processingtechniques like band-pass and high-pass filtering, envelope detec-tion and wavelet transform processing, achieving a 100% trainingand test success in some cases where more than one feature ormore than one sensor signal were used for training and testing. Itis therefore significant to note that a 100% success rate can beachieved using only a single Difference Histogram bin from anindividual sensor  without   using a nearest neighbour or artificialneural network approach. 4. Case Western dataset The dataset used in this section was acquired by the Case Wes-tern Reserve University (CWRU) Bearing Data Center with helpfrom Rockwell Science, CVX, and the Office of Naval Researchandcanbe freely downloadedfromhttp://www.eecs.case.edu/lab-oratory/bearing. The test setup consisted of a motor where bear-ings supported the motor shaft, a torque transducer and adynamometer. Single point faults of sizes 7 mils, 14 mils and 21mils wereintroducedtotheouter raceway, ball and inner racewayof the front end and drive end bearings respectively. For each fault 0 200 400 600 800 10007580859095100Block Size    P  e  r  c  e  n   t  a  g  e   C  o  r  r  e  c   t   B   l  o  c   k   C   l  a  s  s   i   f   i  c  a   t   i  o  n  s Normal BearingFaulty Bearing Fig. 4.  Classification results for sensor 1. 0 500 1000 1500 2000051015202530354045Block Size    P  e  r  c  e  n   t  a  g  e   C  o  r  r  e  c   t   B   l  o  c   k   C   l  a  s  s   i   f   i  c  a   t   i  o  n  s Normal BearingFaulty Bearing Fig. 5.  Classification results for sensor 5.  Table 3 Effects of input signals from different sensors on identification of machine conditionwith five features (rms,  r 2 ; c 3 ; c 4 ; c 6 ). Results for training. Sensor(s) Training successNN (%) VSM (%)1 29 702 95 1003 91 1004 100 1005 29 622,3 97 1002,3,4 97 1001,2,3,4 83 891,2,3,4,5 75 81  Table 4 Effects of input signals from different sensors on identification of machine conditionwith five features (rms,  r 2 ; c 3 ; c 4 ; c 6 ). Results for testing. Sensor(s) Test successNN (%) VSM (%)1 43 622 87 1003 87 1004 100 1005 43 502,3 93 932,3,4 91 951,2,3,4 81 891,2,3,4,5 81 78598  B.J. van Wyk et al./Pattern Recognition Letters 30 (2009) 595–599
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks