Documents

Archaeological Applications of Kernel Density Estimates.pdf

Categories
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
Journal of Archaeological Science (1997) 24, 347–354 Some Archaeological Applications of Kernel Density Estimates M. J. Baxter and C. C. Beardah Department of Mathematics, Statistics and Operational Research, The Nottingham Trent University, Nottingham NG11 8NS, U.K. R. V. S. Wright Prehistoric and Historical Archaeology, University of Sydney, NSW 2006, Australia (Received 10 November 1995, manuscript accepted 11 March 1996) Kernel density estimates, which at their simplest can be view
Transcript
  Journal of Archaeological Science  (1997)  24,  347–354 Some Archaeological Applications of Kernel Density Estimates M. J. Baxter and C. C. Beardah Department of Mathematics, Statistics and Operational Research, The Nottingham Trent University,Nottingham NG11 8NS, U.K. R. V. S. Wright Prehistoric and Historical Archaeology, University of Sydney, NSW 2006, Australia ( Received 10 November 1995, manuscript accepted 11 March 1996  ) Kernel density estimates, which at their simplest can be viewed as a smoothed form of histogram, have been widelystudied in the statistical literature in recent years but used hardly at all within archaeology. They provide an e ff  ectivemethod of data presentation for univariate and particularly bivariate data and this is illustrated with a range of examples. The methodology can be used as an informal approach to spatial cluster analysis, and one example suggeststhat it is competetitive with other approaches in this area. A reason for the lack of use of kernel density estimates byarchaeologists may be the lack of accessible software. The analyses described here were undertaken in the MATLABpackage using routines developed by the second author, and are available on request.   1997 Academic Press Limited Keywords:  KERNEL DENSITY ESTIMATES, BIVARIATE DATA, CONTOURING, SPATIALCLUSTERING, MATLAB. Introduction K ernel density estimates (KDEs) at their simplestcan be thought of as an alternative to thehistogram. They typically provide a smootherrepresentation of the data and, unlike the histogram,their appearance does not depend on a choice of starting point. In this sense KDEs alleviate problemswith the histogram that have been perceived by somearchaeologists (Whallon, 1987). The smoothness of the KDE means that it isaesthetically more pleasing than the histogram. It alsofacilitates the presentation of several data sets in asingle figure, and makes it easier to compare data sets.This has been argued and illustrated in Baxter &Beardah (1995 b ).It might be argued that, with univariate data, theadvantages of using a KDE as opposed to a histogramfor data representation are not so great as to causethem to be preferred on a routine basis. For bivariatedata the case for using KDEs is much stronger, and thepurpose of this paper is to illustrate this by example.Two-dimensional histograms require large amounts of data, are unwieldy, may be di ffi cult to interpret, andcannot easily be used as the basis for other methods of data representation such as contouring. This paper willillustrate how KDEs readily overcome these problems.Although the possibility of using KDEs for archaeo-logical data presentation is implicit in Orton’s (1988)comments on Whallon’s (1987) paper, we are notaware of any such uses outside our own work. Anexample of an application to bivariate data is given inBaxter & Beardah (1995 a ). This arose when one of us(MJB) wished to explore the potential of the method-ology for representing results from a principal compo-nent analysis of archaeometric compositional data andasked the second author (CCB) if it was possible to dothis in the MATLAB package. Subsequent collabor-ation, described in Beardah & Baxter (1995) andBaxter & Beardah (1995 b ), has led to the developmentof a set of MATLAB routines that include many of theapproaches described in the recent book by Wand &Jones (1995). That book, the earlier text of  Silverman(1986), and the paper by Bowman & Foster (1993) may be referred to for the technical developments thatunderpin the work described here.The main ideas of kernel density estimation necess-ary for this paper are presented in the next section,with more technical detail and discussion of compu-tational matters in the appendix. The main section of the paper illustrates applications of the methodology,and the concluding section summarizes what we thinkare its merits. Kernel density estimation Histograms are among the most common methodsof data presentation in archaeology. Anyone whohas drawn a histogram by hand will know that its 347 0305-4403/97/040347+08 $25.00/0/as960119   1997 Academic Press Limited  appearance may be crucially a ff  ected both by the pointat which the histogram is started—the srcin—and thewidth of the intervals used, or ‘‘bin-width’’. Goodcomputer software packages will make automatic andsensible choices for the origin and bin-width, but itshould be possible to vary these and this will a ff  ect theresults obtained.Let the srcin of the histogram be  m 0 , with subse-quent interval boundaries at  m 1 ,  m 2 , etc. and assumethat ( m  j   –  m  j–1 )= c  for some constant  c  for  j  =1,2, . . . (i.e.intervals are of equal width). Let    and  q  be values suchthat    is small and  q  = c . It is then possible to imaginethe construction of successive histograms with srcinsat ( m 0 + i   ) for  i  =0,1, . . . ,  q  –1. If the  q  histograms soobtained are averaged then an average shifted histo-gram (ASH) (Scott, 1992) is obtained. The appearanceof the ASH will  not  be dependent on the choice of   m 0 .Its smoothness will depend on  c , and increases as  c increases. The limiting form of the ASH, as    0, is akernel density estimate. An example is given in Baxter& Beardah (1995 b ).Another way to think of KDEs is as follows. Given n  points  X  1 ,  X  2 , . . . ,  X  n  situated on a line a KDE canbe obtained by placing a ‘‘bump’’ at each point andthen summing the height of each bump at each pointon the X-axis. The shape of the bump is defined by amathematical function, the kernel  K  ( x ), that integratesto 1. The spread of the bump is determined by awindow- or band-width,  h , that is analogous to thebin-width,  c , of a histogram. The kernel is usually asymmetric probability density function.The shape of the resulting KDE does not depend ona choice of srcin and is relatively insensitive to theexact form of   K  (x), which is taken to be a normaldensity function in the rest of the paper. The choice of  h  is more critical and will be considered shortly.We have presented two simple ways of conceptual-ising what a KDE is. Mathematically, the latterapproach gives the KDE aswhere  f  | ( x ) is an estimate of the density underlying thedata.Large values of   h  over-smooth, while small valuesunder-smooth the data. A variety of approaches can beused to select  h , including subjective choice and it mayoften be sensible to look at KDEs for several valuesof   h .More objective or data-driven choices of   h  can bemade, and a wide range of methods have been pro-posed for this. These are described in detail in Wand& Jones (1995) and in summary form in Baxter & Beardah (1995 b ). An outline of a subset of thesemethods is given here.The data can be thought of as a sample of   n  froman underlying and unknown true density,  f  ( x ). It ispossible to define a measure of ‘‘closeness’’ between theKDE and the true density, leading to an estimate of   h that ‘‘maximizes’’ the closeness. If it is assumed thatthe true density is normal then it can be shown that anoptimal choice of   h  is h =1·06 n  1/5  ˆ,where   ˆ is an estimate (possibly robust) of    , the S.D.of the normal distribution. This is the  normal scale  ruleand will typically over-smooth the data if the under-lying density is not normal.The estimate of   h  depends, in general, on propertiesof the true density that are unknown, and in particularon a quantity that may be interpreted as the ‘‘rough-ness’’ of the density. A family of direct plug-in (DPI)estimates can be defined in which an estimate of   h  canbe obtained by ‘‘plugging-in’’ an estimate of roughnessinto the equation that defines  h . More details are givenin the Appendix.A related approach is the ‘‘solve the equation’’(STE) method, in which an equation that relates  h  to afunction of the unknown density is defined. In essence,an initial estimate of   h  leads to an estimate of thedensity, that in turn leads to a new value for  h  and anew density estimate. The process continues until theestimate of   h  converges. Wand & Jones (1995: 96)suggest that a suitable data analytic strategy is to lookat several di ff  erent estimates of   h , but that if a singlevalue is required DPI and STE estimates appear to beamong the more suitable.The prime purpose of the paper is to illustrate theuse of bivariate KDEs and the generalization to theseis relatively straightforward. By analogy with theprevious discussion of univariate KDEs we maythink in terms of   n  points in a plane defined byco-ordinates  X  ( i  ) =( X  i  ,  Y  i  ), for  i  =1,2, . . . ,  n . Locatinga ‘‘bump’’ at each point corresponds in this caseto centering a three-dimensional bump or ‘‘hill’’ ateach point and then, at each point in the plane,summing the height of the bumps. The bump, orkernel, is taken in this paper to be a bivariate normaldistribution.For two variables,  X   and  Y  , a bivariate normaldistribution is defined by the means of   X   and  Y  , takento be zero; their S.D.; and their correlation, whichdetermines the orientation of the bump. If this corre-lation is taken to be zero, as we do here, then smooth-ing will be in the direction of the coordinate axes andthe degree of smoothing is determined by the S.D. Onewill often not lose much by taking the correlation to bezero, whereas smoothing equally in both directions, byusing the same window-widths, is not generally tobe recommended (Wand & Jones, 1995: 108).The theory underlying the optimal choice of window-widths is not as well developed for the bivari-ate as for the univariate case. The examples in thispaper use window-widths for the  X   and  Y   directionsdetermined as for the univariate case, using either STEestimates or the normal scale estimates. 348 M. J. Baxter  et al.  With the assumption of zero correlation therepresentation of the bivariate KDE,  f  | ( x ,  y ), is given bywhere  h 1  and  h 2  are the window-widths in the  X   and  Y  directions.An attraction of using KDEs is that they can be usedas a basis for producing contour plots of the data andthis leads to graphical representations of data of a kindthat archaeologists should find familiar. The followingdiscussion of how contouring can be used is based onthe paper by Bowman & Foster (1993).After a bivariate KDE has been obtained each(two-dimensional) data point is associated with adensity height that may be ranked from largest tosmallest. The first 50% ranked observations, forexample, may be used to define contours that enclosethe densest 50% of the data. The level of contouringcan be varied to contain any specified proportion of the data, and several contours can be superimposedon a plot, with the original data if this is helpful.Bowman & Foster (1993: 173) note that in someways this provides a two-dimensional analogy to theone-dimensional boxplot, and also that the approachis useful for looking for modes or clusters in thedata.A further extension, noted in the same paper, occurswhen the data points can be classified, by period orcontext for example. In this case a particular contourlevel such as 75% might be selected and then contoursat this level drawn for each group separately, to revealhow similar or distinct they are. This will also beillustrated in the next section. Examples There are many ways in which univariate KDEs mightbe used in archaeology, and several of these have beenillustrated in our previous work. Data presentation fora single data set and comparison between the distri-butions of di ff  erent data sets are obvious uses. It isworth remarking that the boxplot, another good wayof looking at and comparing univariate data, does notwork well with multi-modal data. Bounded data, in thesense that certain values are impossible, and dataa ff  ected by outliers can be handled using boundarykernels and adaptive estimates respectively, and thisis discussed and illustrated in Beardah & Baxter(1995).For practical purposes a distinction may be drawnbetween kernel density estimation as applied to simple,or simply transformed, variables, and as applied tocomposite variables such as those derived in principalcomponent and other forms of multivariate analysis.This latter greatly extends the potential for the use of KDEs and is illustrated in Examples 1, 3 and 4. Example 1 Principal component analysis is one of the more com-monly used multivariate methods in archaeology and adetailed account and bibliography is given in Baxter(1994). Typically, data are standardized and an analy- sis results in new, linear combinations of the srcinalvariables, called principal components, that can beinspected for structure using plots (usually) based onthe first two or three components. If there is structurein the data it will often show in the first component andit can be useful to examine this using a KDE.The data used for the first example are 105 speci-mens of Roman waste glass, with a principal compo-nent analysis based on their chemical compositionwith respect to 11 oxides. The data are given, andextensively analysed, in Baxter (1994). The specimens come from two sites and the statistical analyses suggestthat there are perhaps three clusters in the data that arerelated to, but do not exactly coincide with the siteclassification.As a first illustration of kernel density estimationFigure 1 shows two KDEs for the principal componentscores, based on the normal scale estimate of   h  and anSTE estimate of   h . The normal scale estimate over-smooths the data, as expected, and misses the centraland smaller mode suggested by the STE approach.The usual bivariate component plot can be repre-sented by a KDE in various ways. Figure 2 shows ascatter plot of the scores on the first two componentsand Figure 3 shows a KDE using the STE estimate of  h . Three main concentrations are evident. For thisexample inspection of the scatterplot has led one of us(Baxter, 1994) to the same conclusion, so that a KDE is not essential. In Examples 3 and 4 much largerdata sets are used for which the scatterplot is a lessuseful tool. 80.30–8First component    R   e   l   a   t   i  v   e   f   r   e   q  u   e   n   c  y 20.050.250.20.150.1–6 –4 –2 0 4 6 Figure 1. Two univariate kernel density estimates for scores on thefirst principal component of an analysis of the chemical compositionof 105 specimens of Romano-British waste glass. ——: STE rule; – – –: normal scale rule. Kernel Density Estimates 349  Example 2 An obvious use for bivariate KDEs is in the presen-tation and interpretation of spatial data in the form of coordinates of find spots, for example. To illustratethis an ethnoarchaeological data set, Binford’s (1978)Mask Site data, is used. The data are taken fromappendix A of  Blankholm (1991), who uses them to test a variety of approaches to intrasite spatial analysis.The data, as presented by Blankholm, consists of thespatial coordinates of five classes of find that mightoccur in the archaeological record, such as artefacts,large bones and bone splinters. We use the subsetbased on the coordinates of the locations of 276 bonesplinters.Figures 4 and 5 show analyses in which the normalscale rule and STE estimates have been used to deter-mine window widths separately for the two coordinatedirections. Both analyses show 25, 50, 75 and 100%contours superimposed on the distribution of the bonesplinters. Once again the normal scale analysis pro-duces a smoother picture. There are clearly three mainconcentrations in the data with the STE analysissuggesting a subdivision of one of these, in the bottomright of the graph, into two groups and a fifth group inthe upper left of the figure.It is instructive to compare our results with thoseobtained by a variety of methods in Blankholm (1991).His figure 9, using contouring at equal heights (ratherthan encompassing specified proportions of the data),is less revelatory of structure than our figures, while a k  -means cluster analysis (his figure 17) suggests a threecluster distribution. Contour maps or clustering arisingfrom local density analysis (his figure 32) and nearestneighbour analysis (his figure 39) are also given. Wethink that our figures, and particularly that for theSTE analysis, suggest structure as well as—or moreclearly than—the analyses in Blankholm (1991). 54–8–5Component 1    C   o   m   p   o   n   e   n   t   2 2–620–2–4–4 –2 0 41 3–3 –1 Figure 2. Principal component plot for the first two componentsfrom an analysis of the chemical composition of 105 specimens of Romano-British waste glass. 60.20–6Component 1    R   e   l   a   t   i  v   e   f   r   e   q  u   e   n   c  y 20.050.150.1–4 –2 0 4Component 2 –550 Figure 3. A KDE estimate, based on an STE rule for the selection of  h , for the data. 13123Component 1    C   o   m   p   o   n   e   n   t   2 5981074 661145 1211107 8 9 Normal scale rule Figure 4. A KDE of the Mask Site data using the normal scale rule.The contours are for 25, 50, 75 and 100% inclusion levels. 131243Component 1    C   o   m   p   o   n   e   n   t   2 5981074 66115 1211107 8 9 STE rule Figure 5. As for Figure 4 but using an STE estimate. 350 M. J. Baxter  et al.
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks