Chapter 11 Language of Descriptive Statistics
Section 11.3 Statistical Measures11.3.2 Robust Measures
The measures presented in this section are robust with respect to outliers: large deviations of single data values do not affect this measures (or only affect it slightly).
Consider an original list
for a sample of size . Let the data be the property values of a quantitative property .
Info 11.3.7
The list gained by ascending sorting
of the original list is called an ordered list or ordered sample (of the original list ). The i entry in the ordered list is the th smallest value in the original list.
Example 11.3.8
Let us again consider the original list for the sample of size from the examples above. Ascending sorting results in the following ordered sample:
In contrast to the arithmetic mean, the (empirical) mean is not sensitive to outliers. For example, the largest value in the ordered original list can be arbitrarily enlarged without changing the median.
Example 11.3.10
In the example above, the sample size is even. Thus, we have for the median
Approximately half of the values in the original list are less than or equal to the median, and half of the values are greater than or equal to the median . This principle can be generalised to define quantiles. For this purpose, take an original list for a sample of size of a quantitative property .
Info 11.3.11
Let
be the corresponding ordered sample and
Then
is called a sample -quantile or simply -quantile of .
The -quantile is also called the lower quartile. It splits off approximately the lowest 25 % of data values from the highest 75 %. Accordingly, the -quantile is called the upper quartile. For we have the median, i.e. . If , the ordered list is split so that approximately of the data value are less or equal to and approximately of the data values are greater or equal to .
Example 11.3.12
Consider again the original list for the sample of size from the examples above together with the ordered sample
For , the -quantile is defined by , i.e. for the lower quartile we have
For the upper quartile, we set and obtain , hence
For , the -quantile is defined by , i.e. for the lower quartile we have
For the upper quartile, we set and obtain , hence
again, let a sample of size be given to a quantitative property with the corresponding ordered sample
and
Info 11.3.13
The -trimmed (or -truncated) sample mean is defined as
The -trimmed mean is an arithmetic mean that discards the largest and smallest data points from the calculation. Thus, it is a flexible protection tool against outliers at the boundaries of the data range. However, we mustn't forget that we no longer take all data into account when we use this tool.
Example 11.3.14
In the already much considered data set, the ordered sample is given by
and for and we obtain for the -trimmed mean of the sample
It is less than the arithmetic mean since outliers, such as , were ignored.
and for and we obtain for the -trimmed mean of the sample
It is less than the arithmetic mean since outliers, such as , were ignored.