All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Description

Verma and Quiroz-Ruiz 302
ABSTRACT
In this paper, the modiﬁcations of the simulation procedure as well as new, precise, and accurate
critical values or percentage points (for the majority of data with four decimal places; respective average
standard error of the mean ~0.0001–0.0025) of nine discordancy tests, with 22 test variants, and each with
seven signiﬁcance levels α = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, and 0.005, for normal samples of sizes n
up to 100 are reported. Prior to our work

Transcript

Verma and Quiroz-Ruiz
302
ABSTRACT
In this paper, the modi
ﬁ
cations of the simulation procedure as well as new, precise, and accurate critical values or percentage points (for the majority of data with four decimal places; respective average standard error of the mean ~0.0001–0.0025) of nine discordancy tests, with 22 test variants, and each with seven signi
ﬁ
cance levels
α
= 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, and 0.005, for normal samples of sizes n up to 100 are reported. Prior to our work, only less precise critical values were available for most of these tests, viz., with one (for n <20) and three decimal places (for greater n) for test N14; two decimal places for tests N2, N3
–
k=2,3,4, N6, and N15; and three decimal places for N1, N4
–
k=3,4, N5, and N8; but all of them with unknown errors. In fact, the critical values were available for n only up to 20 for test N2, up to 30 for test N8, and up to 50 for N4
–
k=1,3,4, whereas for most other tests, in spite of the availability for n up to 100 (or more), interpolations were required because tabulated values were not reported for all n in the range 3–100. Therefore, the applicability of these discordancy tests is now extended up to 100 observations of a particular parameter in a statistical sample, without any need of interpolations. The new more precise and accurate critical values will result in a more reliable application of these discordancy tests than has so far been possible. Thus, we envision that these new critical values will result in wider applications of these tests in a variety of scienti
ﬁ
c and engineering
ﬁ
elds such as agriculture, astronomy, biology, biomedicine, biotechnology, chemistry, electronics, environmental and pollution research, food science and technology, geochemistry, geochronology, isotope geology, meteorology, nuclear science, paleontology, petroleum research, quality assurance and assessment programs, soil science, structural geology, water research, and zoology. The multiple-test method with new critical values proposed in this work was shown to perform better than the box-and-whisker plot method used by some researchers. Finally, the so-called “two standard deviation” method frequently used for processing inter-laboratory databases was shown to be statistically-erroneous, and should therefore be abandoned. Instead, the multiple-test method with 15 tests and 33 test variants, all of which now readily applicable to sample sizes up to 100, should be used. To process inter-laboratory databases, our present approach of multiple-test method is also shown to perform better than the “two standard deviation” method.Keywords: outlier methods, normal sample, two standard deviation method, 2s, reference materials, Monte Carlo simulations, critical value tables, Dixon Q-test, skewness, kurtosis, petroleum hydrocarbon.
RESUMEN
En este trabajo se reportan las modi
ﬁ
caciones del procedimiento de la simulación así como valores críticos o puntos porcentuales nuevos y más precisos y exactos (para la mayoría de los datos con cuatro puntos decimales; el error estándar de la media ~0.0001–0.0025) para nueve pruebas de discordancia con 22 variantes, y cada una con siete niveles de signi
ﬁ
cancia
α
= 0.30, 0.20, 0.10, 0.05, 0.02, 0.01 y 0.005, para muestras normales con tamaño n hasta 100. Antes de nuestro trabajo, solamente se disponía de valores críticos menos precisos para la mayoría de estas pruebas, viz., con uno (para n <20) y tres puntos decimales (para n mayores) para la prueba N14, dos puntos decimales para las pruebas N2, N3
–
k=2,3,4, N6 y N15, y tres puntos decimales para N1, N4
–
k=3,4, N5 y N8, pero todos ellos con
Critical values for 22 discordancy test variants for outliers in normal samples up to sizes 100, and applications in science and engineering
Surendra P. Verma
*
and Alfredo Quiroz-Ruiz
Centro de Investigación en Energía, Universidad Nacional Autónoma de México, Priv. Xochicalco s/no., Col Centro, Apartado Postal 34, 62580 Temixco, Morelos, Mexico* spv@cie.unam.mx
Revista Mexicana de Ciencias Geológicas, v. 23, núm. 3, 2006, p. 302-319
Critical values for 22 test variants for outliers, and applications
303
INTRODUCTION
In a recent paper (Verma and Quiroz-Ruiz, 2006), we covered the following points: (1) explained the need of new critical values or percentage points of statistical tests for normal univariate samples; (2) developed and reported a highly precise and accurate Monte Carlo type simulation procedure for N(0,1) random normal variates; (3) presented new, precise, and accurate critical values for seven signi
ﬁ
cance levels
α
= 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, and 0.005, and for sample sizes
n
up to 100 for six Dixon discordancy tests (with 11 test variants) for normal univariate samples; and (4) highlighted the use of these new values in very diverse
ﬁ
elds of science and engineer-ing, including the Earth Sciences. We had included all six frequently used discordancy tests (N7 and N9-N13; see pp. 218-236 of Barnett and Lewis, 1994), initially proposed by Dixon (1950, 1951, 1953), for simulating new, precise, and accurate critical values for
n
up to 100 (number of data in a given statistical sample,
n
= 3(1)100 for test N7,
i.e.
, for all values of
n
between 3 and 100;
n
= 4(1)100 for tests N9 and N11;
n
= 5(1)100 for tests N10 and N12; and
n
= 6(1)100 for test N13).It is pertinent to mention that researchers (
e.g.
, Dybczy
ń
ski
et al.
, 1979; Dybczy
ń
ski, 1980; Barnett and Lewis, 1994; Verma, 1997, 1998, 2005; Verma
et al.
, 1998; Velasco
et al.
, 2000; Guevara
et al.
, 2001; Velasco-Tapia
et al.
, 2001) have recommended that most, if not all the available discordancy tests should be applied to a given data set to detect discordant outliers. Because the critical values for the six Dixon tests have now been signi
ﬁ
cantly improved and extended (Verma and Quiroz-Ruiz, 2006), there is still the need for improving the existing critical values and simulating new ones for the remaining tests for normal samples listed by Barnett and Lewis (1994). It is also important to note that we are dealing with tests for normal univariate samples,
i.e.
, the statistical sample is assumed to be drawn from a normal population. Obviously, different sets of critical values would be re-quired for other types of distributions such as exponential or Poisson distributions (see Barnett and Lewis, 1994, or Zhang, 1998, for more details).In the present work, for simulating new, precise, and accurate critical values for the same seven signi
ﬁ
cance levels (
α
= 0.30 to 0.005) and for
n
up to 100, we have included most of the remaining tests for normal univariate samples (nine tests with 22 test variants): N1 (upper or lower version), N2 (two-sided), N3–k=2,3,4 (upper or lower), N4–k=1,2,3,4 (upper or lower), N5–k=2 (upper-lower pair), N6–k=2 (upper-lower pair), N8 (two-sided; also known as Dixon Q-test), N14 (skewness or third moment test), and N15 (kurtosis or fourth moment test); see pp. 218-236 of Barnett and Lewis (1994), or pp. 89-97 of Verma (2005). The new critical values are compared with the literature values and are shown to be more precise and accurate, enabling thus statistically more reliable applications in many science and engineering
ﬁ
elds. We also present a few examples for the application of all normal univariate tests for which we have reported new critical values in this as well as in our earlier paper (Verma and Quiroz-Ruiz, 2006).
errores desconocidos. En realidad se disponía de los valores críticos solamente para n hasta 20 para la prueba N2, hasta 30 para la prueba N8, y hasta 50 para N4
–
k=1,3,4, mientras que para muchas otras pruebas, a pesar de la disponibilidad para hasta n 100 (o más) se requería de interpolaciones, dado que los valores tabulados no fueron reportados para todos los n en el intervalo de 3 a100. Por consecuencia, la aplicabilidad de las pruebas de discordancia es extendida hasta 100 observaciones de un determinado parámetro en una muestra estadística, sin necesidad de realizar las interpolaciones de los valores críticos. Los valores críticos nuevos y más precisos y exactos resultarán en una aplicación más con
ﬁ
able de las pruebas de discordancia que ha sido posible hasta ahora. De esta manera, consideramos que estos nuevos valores críticos resultarán en aplicaciones más amplias de estas pruebas en una variedad de campos de conocimiento cientí
ﬁ
co y de ingenierías, tales como agricultura, astronomía, biología, biomedicina, biotecnología, ciencia del suelo, ciencia nuclear, ciencia y tecnología de los alimentos, contaminación ambiental, electrónica, geocronología, geología estructural, geología isotópica, geoquímica, investigación del agua, investigación del petróleo, meteorología, paleontología, programas de aseguramiento de calidad, química y zoología. El método de pruebas múltiples con nuevos valores críticos propuesto aquí proporciona mejores resultados que el método de la grá
ﬁ
ca de “box y whisker” usado por algunos investigadores. Finalmente, se demostró que el así llamado método de “dos desviaciones estándar”, frecuentemente usado para procesar las bases de datos interlaboratorios, es erróneo y, por lo tanto, debe ser abandonado. En su lugar debe usarse el método de pruebas múltiples con 15 pruebas y 33 variantes, todas ellas ahora rápidamente aplicables para los tamaños de muestras hasta 100. Nuestro procedimiento de pruebas múltiples parece funcionar mejor que el método de “dos desviaciones estándar” para el procesamiento de datos geoquímicos provenientes de muchos laboratorios.Palabras clave: métodos de valores desviados, muestra normal, prueba de dos desviaciones estándar, 2s, materiales de referencia, simulaciones Monte Carlo, tablas de valores críticos, prueba Q de Dixon, sesgo, curtosis, hidrocarburos de petróleo.
Verma and Quiroz-Ruiz
304
by some workers to
ﬁ
x the limit of 2
σ
, or more appropri-ately 2
s
(because the population parameters
μ
and
σ
are not known for most experimental data) for testing samples of
ﬁ
nite size for discordant outliers,
i.e
., observations falling within (
⎯
x
±
2
s
) are retained and those outside this range are rejected, irrespective of the actual value of
n
(where
n
,
⎯
x
, and
s
are, respectively, the total number of samples, location parameter – sample mean, and scale parameter – sample standard deviation). This kind of outlier test belongs to a group of old, outdated test procedures (pre-1925!) characterized by two general defects (see for more details pages 30-31, 108-116, and 222-223 in Barnett and Lewis, 1994): they fail to distin-guish between population variance (
σ
) and sample variance (
s
), and, more importantly, they are erroneously based on the distributional behavior of a random sample value rather than on an appropriate sample
extreme
value. Barnett and Lewis (1994, p. 31) go on stating that even a more serious shortcoming of such an outdated procedure is “the failure to recognize that
it is an extreme x
(1)
or x
(
n
)
which (by the very nature of outlier study)
should
ﬁ
gure in the test statistic, rather than an arbitrary sample value x
j
”. This shortcoming is certainly overcome by tests N1 or N2 above (see equations 1 to 3), in which an extreme value (
x
(1)
or
x
(
n
)
) is tested by the respective test statistic. Thus, an
ad hoc
procedure of a “
ﬁ
xed multiple of standard deviation” leads to rejection of any observation
x
j
for which (|
x
j
−
⎯
x
|/
s
) is suf
ﬁ
ciently large, in fact, >2 for the 2
s
method or >3 for the 3
s
method –but with no regard to the effect of sample size
n
on the distribu-tion form of the statistic. We will further comment on the shortcomings of this procedure after we have presented the new critical values for tests N1 and N2.
Masking and swamping effects and different types of test statistics
We will brie
ﬂ
y point out the reasons for our recom-mendations to apply all outlier tests to a given data set in-stead of only a few of them (see Verma, 1997, 1998; Verma
et al.
, 1998 for more details).One problem in testing for a single outlier in a normal-ly distributed sample is the sensitivity to the phenomenon of
masking
(Bendre and Kale, 1987; Barnett and Lewis, 1994). A discordancy test of the most extreme observation (
e.g
.,
x
n
) is rendered insensitive by the proximity of the next most extreme observation (
x
n
-1
), in which case the presence of the latter would have masked the
ﬁ
rst. Dixon test N7 is especially susceptible to such masking effects (see Verma and Quiroz-Ruiz, 2006, for the test statistic
TN
7), although the N1 statistic (Grubbs, 1950) is probably not too much better. One solution to this problem is to use test statistics that are less sensitive to masking. There are a number of Dixon-like statistics for this purpose (N11-N13; Barnett and Lewis, 1994; Verma and Quiroz-Ruiz, 2006). In the case of test N12, for example, the numerator is the difference
DISCORDANCY TESTS
We will not repeat the explanation of discordancy tests; the reader is referred to Barnett and Lewis (1994), Verma (2005), or our earlier paper (Verma and Quiroz-Ruiz, 2006). The nine tests with their 22 variants for which critical values were simulated are listed in Table 1. We note that critical values were available in the literature only for
n
up to 20 for test N2, up to 30 for N8, and up to 50 for N4–k=1,3,4, whereas for most other tests, in spite of the availability for
n
up to 100 (or more), interpolations were occasionally required because tabulated values were not reported for all
n
(see Barnett and Lewis, 1994, or tables A4 to A18 in Verma, 2005).
Tests N1 and N2
We will brie
ﬂ
y describe discordancy tests N1 and N2, which are, respectively, the upper (or lower) and extreme outlier tests in a normal sample with both population mean (
μ
) and population variance (
σ
2
) unknown (Barnett and Lewis, 1994), because their statistically “erroneous” version has been used as a popular, so-called “two standard devia-tion” method by numerous workers (
e.g.
, Stoch and Steele, 1978; Ando
et al.
, 1987, 1989; Gladney and Roelandts, 1988, 1990; Gladney
et al.
, 1991, 1992; Itoh
et al.
, 1993; Imai
et al.
, 1995, 1996) to process inter-laboratory data for international geochemical reference materials.The test statistic of N1 for upper or lower outlier is, respectively:(1)or (2)whereas that of N2 is(3) Here, for an ordered array
x
(1)
,
x
(2)
,
x
(3)
,…
x
(
n
-2)
,
x
(
n
-1)
,
x
(
n
)
of
n
observations
x
(1)
is the lowest observation and
x
(
n
)
the highest one;
⎯
x
is the sample mean; and
s
is the sample standard deviation.
Two standard deviation method: a statistically errone-ous and outdated version of tests N1 and N2
For the standard normal distribution, the total prob-ability of observations at a distance greater than 1.96
σ
(
i.e
., about 2
σ
) from the mean
μ
,
i.e
., outside the (
μ
±1.96
σ
) is 0.05, in other words, about 95% of the area of the density curve is contained within this range (
e.g
., Otto, 1999). These considerations of the normal density curve have been used
s x xTN
nu
)(1
)()(
−=
s x xTN
nu
)(1
)()(
−=
s x xTN
l
)(1
)1()(
−=
s x xTN
l
)(1
)1()(
−=
⎥⎦⎤⎢⎣⎡ −−=
s x xs x x MaxTN
n
)1()(
,:2
⎥⎦⎤⎢⎣⎡ −−=
s x xs x x MaxTN
n
)1()(
,:2
Critical values for 22 test variants for outliers, and applications
305
between the outlier (
x
n
or
x
1
) being tested and its second-nearest neighbor (
x
n
-2
or
x
3
). No masking effect is observed by the measurement value
x
n
-1
or
x
2
. Outlier masking occurs because there are actually two (or more) outliers, and some statistics, such as N1, work best when testing data sets with a single outlier (
e.g
., Prescott, 1978). If a data set contains more than one outlier, it is necessary to modify the statistical approach, considering the next outlier assemblages: (1) two or more upper outliers, (2) two or more lower outliers, and (3) a combination of one (or more) upper outlier(s) and one or more lower outlier(s). Caution is, however, required if one is dealing with chemical data obtained from different analytical techniques, in which case other statistical tests, such as F-test, Student t, or ANOVA, should, in fact, be applied prior to the application of discordancy tests (see,
e.g.
, Verma, 1998, 2005). This topic will be dealt with in more detail in a separate paper.In general, two testing approaches have been ap- plied for these cases (Barnett and Lewis, 1994): (1) the consecutive testing approach, where a test statistic such as N1 or N7 is applied repeatedly to a data set (one outlier at a time); or (2) the block testing approach, where a statistic simultaneously tests
k
(= 2, 3, 4, or more) observations in the data set. In the consecutive testing, the most extreme outlier is evaluated; if it gives a positive test result,
i.e
., if this outlier is declared to be discordant, it is removed from the data set, and then the most extreme remaining outlier is tested. This procedure is repeated until all the outliers are tested, or until an outlier gives a negative test result,
i.e
., it is not discor-dant. However, the disadvantage is the susceptibility of this procedure to masking effects. Tests N1 and N7 are certainly poor candidates for this type of testing procedure although tests N14 and N15 (high-order moment tests) should be ap- plied consecutively when more than one outlier is present in a statistical sample (see Barnett and Lewis, 1994). Test N14 was recommended for a one-sided detection, although Iglewicz and Martinez (1982) reported that this test is great-ly affected by masking effect, and thus should not be used when more than one outlier is suspected. Both tests N14 and N15 are used for testing an extreme value (two-sided tests). The poor ef
ﬁ
ciency of these consecutive tests based on the use of standard deviation (
s
), such as N1 or N2, can be adduced to the fact that value of
s
is greatly in
ﬂ
uenced by the presence of discordant outliers. As a consequence, the presence of several outliers may cause a suf
ﬁ
ciently large increase in standard deviation, with the result that no outliers are detected (Barnett and Lewis, 1994).An alternative to consecutive testing is block testing, where a statistic is used to test all
k
outliers at once (
e.g
., test N3 or N4; Table 1). Also, some differences in performance have been reported for block-testing statistics (Prescott, 1978; Hayes and Kinsella, 2003). For example, McMillan (1971) suggested that test N4–
k
=2 is more robust than test N3–
k
=2. It is important to point out that, assuming that all outliers have been identi
ﬁ
ed, statistics intended for block tests are not susceptible to outlier masking. However, the application of this procedure could generate another problem, known as outlier
swamping
(Barnett and Lewis, 1994). A block test could be insensitive if the second ob-servation
x
n
-1
is close to the next neighbor (
x
n
-2
) and is not outlying or discordant. The pair
x
n
, x
n
-1
might not reach discordancy level when tested jointly, even though
x
n
on its own is discordant in relation to the other
n
-1 observa-tions (
i.e
., swamping effect of
x
n
-1
). In block testing, all
k
outliers are labeled as contaminants. What this means in practice is that contaminants belong to a different prob-ability distribution
than the rest of the data or they are accepted as “normal” deviated measurements drawn from a different normal distribution. There is no middle ground, as there is in consecutive testing, where it is possible to establish when some outliers are contaminants and when some are not. Thus, there is the possibility that a marginal outlier might be falsely declared a contaminant because it is “carried along” in the block testing procedure by others, more extreme, outliers. Or perhaps a few marginal outliers cause the block testing procedure to fail, which means that the contaminants that are in the block will not be identi
ﬁ
ed. However, some procedures have been proposed to establish
k
(
i.e
., the number of contaminants in the sample), free from the masking and swamping effects, when testing upper or lower outliers (Zhang and Wang, 1998). The block testing procedure can also be applied consecutively if more that
k
outliers are suspected in a given data set.In summary, more work is needed to better understand the relative merits and usefulness of different test procedures. As pointed out in our earlier paper (Verma and Quiroz-Ruiz, 2006), these evaluations can also be performed empirically or through computer simulations, which is planned to be carried out in a future study. From the practical point of view, however, we can evaluate an experimental data set, using both types of outlier tests –single-outlier (consecutive procedure) as well as multiple-outlier (block procedure) tests– termed here as the “multiple-test” method.
SIMULATION PROCEDURE FOR MORE PRECISE AND ACCURATE CRITICAL VALUES
Our highly precise and accurate Monte Carlo type simulation procedure has already been described in detail (Verma and Quiroz-Ruiz, 2006) and, therefore, will not be repeated here. However, some required changes will be mentioned. As in our earlier work, our simulations were of sizes 100,000, and were repeated 10 times (each using a different set of 10,000,000 random normal variates) for obtaining the
ﬁ
nal mean critical value or percentage point and its standard error. However, for test N1, two independent test statistics (one for upper and the other for lower outlier) were simu-lated and thus 20 independent results could be obtained from

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks