Abstract

A trimming mean eliminates the extreme observations by removing observations from each end of the ordered sample. In this paper, we adopted the Hogg's and Brys's tail weight measures. In addition, a new algorithm was proposed as a linear estimator based on the quartile; we used a quartile to divide the data into three and four groups. Then two new estimators were proposed. These classes of linear estimators were examined via simulation method over a variety of asymmetric distributions. Sample sizes 50, 100, 150, and 200 were generated using R program. The results of 50 were tabulated, since we have similar results for the other sizes. These results were tabulated for 7 asymmetric distributions with total trimmed proportions 0.10 and 0.20 on both sides, respectively. The results for these estimators were ordered based on their relative efficiency.

1. Introduction

Trimming mean is a statistical measure of central tendency much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or a sample at the beginning and the end of the whole data, and typically discarding an equal amount of both [1]. For most statistical applications, 5 to 25 percent of the ends are discarded. The trimming mean is a useful estimator because it is less sensitive to outliers than the mean but will still give a reasonable estimate of central tendency or mean for many statistical models. In this regard it is referred to as a robust estimator. Under normality, the best possible amount of trimming is zero. But under very small departures from normality, the mean is no longer optimal and can present rather poorly [2]. The best known study of the robust estimates of location was Princeton Robust Study [3]. They recommended α-trimmed mean to desire the trimmed proportion to minimize the estimated variance. Numerous papers on robust estimators of location have emerged since the Princeton Robust Study, and the conclusions are as varied. Most of these studies also centered on symmetric and contaminated symmetric distributions. In addition, these estimators classically utilized one or more additional statistics, such as ancillary or selector statistics, to adjust the value of the estimator to adapt the sample distribution [4]. However, in practical applications, there is no guarantee that the observed sources are symmetric. Particularly, when the number of samples is small, the interference of outliers significantly increases. Therefore, proper asymmetric truncations need to be made. In this paper, asymmetric trimmed algorithms are suggested to estimate the parameters of asymmetric trimming mean which used the classification scheme, at the first step by tail length and skewness, and at the second step a ratio involving left and right truncations based on the quartile values. Additionally, the total truncations are assumed to be and of the whole data.

2. Robust Location Measures

Many location estimators can be presented in the same technique by ordering the values of the sample as and applying the weight function [5] where is specified to reduce the influence of certain observations in the form of weighting and represents the ordered data. As for the sample mean, the value is . To make the comparison between different estimators easier, we will present Hogg’s and Brys’s Tail Weight Measures and new estimators based on the quartile in the next sections.

2.1. Hogg’s and Brys’s Tail Weight Measures

Hogg and Lenth [6] proposed and defined the tail measure, , and the peakedness measure, , while Schmid and Trede [7] have studied these estimators. In addition, Brys et al. [8] proposed and discussed the tail weight measure; these estimators are defined as follows, respectively, where are the corresponding distribution percentiles, and the percentiles and should be used to preserve the robustness property of this measure according to Brys et al. [8]. The procedures of Reed and Stark [9] are adopted to define sets of adaptive linear estimators. We use these estimators to compute asymmetric trimming mean. The general proposal for their approach is presented as follows.

(i)Let be the value for the total amount of trimming from the sample.(ii)Then, the proportion to be trimmed from the lower end of the sample can be determined by the proportion , where and are the numerator and denominator portions of the previous defined , , and equations (2.2), (2.3), and (2.4), respectively.(iii) The upper trimming proportion can be defined by . Based on this general system, the hinge estimators, which are trimmed, these hinges used to identify the values of lower and upper observations that should be trimmed; it can be defined as the following:

Then, we computed the greatest integer number of and . After that, we assume that and and the -trimmed mean is defined as where is the th ordered observation.

2.2. New Estimators Based on the Quartile

A quartile represents any of the three values, which divides the sorted data set into four equal groups. This enables each group to represent one fourth of the whole sample. Now, we define the new estimator based on the quartile to determine the asymmetric trimming mean as follows: assume that are ordered data, and let , , and, be the quartiles. Then, two estimators are defined based on these values. The first estimator is computed when the data is divided into three groups, while the second estimator is computed by dividing the data into four groups. The general proposal for their approaches is defined as follows. Let and be the first and last values of the ordered data, respectively. Then, we assume that the interval is the first group, the interval is the second group, the interval is the third group, and the interval is the fourth group which is the last group. This permits the new estimators to be defined.

2.2.1. Three-Group Estimator

The three-groups estimator method is defined as follows.(i)L3G is the mean of the observations for the first interval , and it can be defined as where signifies the number of observations in the first group.(ii) is the mean of the observations in the interval , and it can be defined as where and are the number of observations in the second and third groups, respectively.(iii) is the mean of the observations in the third group , and it can be defined as where is the number of observations in the whole data set.

Then, the hinge estimator for the three-group method is defined as follows: In order to determine a proportion to be trimmed from the lower end of the sample , then we use the following proportion: where UH3G and LH3G are the numerator and denominator of the estimator H3G, respectively, is the total trimming proportion to be trimmed from the sample, and the upper trimming proportion can be defined by . Let and , then, the asymmetric trimming mean is calculated as defined by (2.8).

2.2.2. Four-Group Estimator

The four-group estimator method is defined as follows.(i)Let be the mean of the observations in the first interval, and this mean is the same as the equation (2.9). (ii)M4G1 is the mean of the observations in the interval , and it can be defined as (iii) M4G2 is the mean of the observations in the interval and it can be defined as (iv) is the mean of the observations in the fourth group, and it is same as .

Then, the hinge estimator can be described for three-group method as follows: In order to determine a proportion to be trimmed from the lower end of the sample , then we use the following proportion: where UH4G and LH4G are the numerator and denominator of estimator , respectively, is the total trimming proportion to be trimmed from the sample, and the upper and lower trimming proportions are defined as the above.

3. Simulation Methods

In the Princeton Robust Study [3], efficiency was introduced to provide a basis for comparing two estimators. The relative efficiency of two procedures is the ratio of their efficiencies, although, frequently this phrase is used where the comparison is made between two procedures. The natural logarithm of the relative efficiency of the estimator was introduced by [9]. In this paper, we computed the relative efficiency (RE) for the proposed new measure estimators depending on the quartile values method and the comparative Hogg’s and Brys’s estimators method, selecting the smallest variance from the proposed methods as base, and then each variance divides this base as the Relative Efficiency. Additionally, the values of the total trimming proportions to be trimmed from the sample are and . Random numbers were generated for each of the seven asymmetric distributions using R program. These distributions were selected Beta (2,4), Gamma (3,2), Chi square , Burr (3,1), Pareto (3,1), Weibull (3,1), and Skewed-normal distributions; sample sizes of 50, 100, 150, and 200 were generated for each iteration of 1000, 2500, and 5000. Since similar results were found for all iteration values, we tabulated the results of the iterations with fit of the mean for each distribution with . Then, we tabulated the results of a simulation study when the sample size is 50 for these distributions. The estimators were ordered within each table based on the type of the distribution with minimum relative efficiency.

4. Results

Within these classes of estimators, we are concerned with the behavior of the new estimators 3G0.10, 3G0.20, 4G0.10, and 4G0.20. The 3G0.10 and 3G0.20 estimators are two of the top four estimators in all the distributions studied (Table 1), while there are mixed results for the estimators 4G0.10, and 4G0.20. Additionally, T0.10 is one of the top four estimators in five of the distributions studied (five of seven). In order to investigate the performance of the new estimators, there are four properties of estimators: biases, efficiency, mean square error, and consistency [10]. In this study, the relative mean square errors are computed, as the values of the MSE for different sample sizes cannot be analogous. However, an estimator can be chosen as the minimum of the 3G10, 3G20, 4G10, and 4G20, and the relative of the estimators which can be calculated by the following equation: while Table 2 showed the relative mean square errors based on the seven asymmetric distributions. The methods 3G, 4G, and T were consistent among the top three estimators.

5. Conclusion

Most of the previous studies on the problem of the intervention of the mean estimation data sets ignored the type of the distribution of the datasets. But statistics such as skewness and tail length, both describe the distribution characteristics. In this paper, we proposed new measure estimators by dividing the whole datasets into groups with respect to the quartile values. The boundaries of the groups were derived from quartile, in order to determine the proportions of trimming on both sides of the datasets. Therefore, these boundaries were divided with regard to the type of the probability density function with respect to the value of the quartile (first, second, and third). The proposed method was tested via a simulation study over the adaptive estimators which were proposed by Brys et al. [8] for seven asymmetric distributions. The relative efficiency (RE) is defined as the ratio of the variance of the minimum estimators for the proposed methods (three and four with proportions 0.10 and 0.20) to the variance that was derived from other hinge estimators proposed by Brys et al. [8] with the same values of proportions. The variances for proposed methods are lower than the variance derived from the Brys et al. [8] method. All the relative efficiencies for these estimators are less than or equal one. Similarly, most of the values of the RMSE for all the asymmetric distribution are also less than one except for the estimator P0.20 under Burr distribution. The methods 3G, 4G, and T were consistent among the top three estimators for the distributions in this study.

Acknowledgment

The authors greatly appreciate the constructive remarks and suggestions made by the referees, which led to improvement of the paper.