Abstract

This article developed an estimator for finite population mean under probability-proportional-to-size sampling in the presence of extreme values. Theoretical properties such as bias, variance, and consistency are derived. Monte Carlo simulations were performed to assess the consistency and efficiency of the proposed estimator. It is found that the proposed estimator is more efficient than the competing estimators for all values of c between 0 and 1. The gain in precision of the proposed estimator is much higher than that of its competitors for small values of c. Empirical applications of the proposed estimator are illustrated using three real data sets, and the results revealed that the proposed estimator performed better than the conventional and Sarndal (1972) estimators.

1. Introduction

Over the years, attempts have been made by many researchers to enhance estimates of population parameters such as mean, total, and median with optimum statistical properties [1]. For instance, recent studies [2, 3] and many others proposed estimators for various parameters with the aim of finding estimates which describe data with extreme values in a good manner. However, the mean estimate is either under- or overestimated in situations where extreme values are present in the study variable such as expenditure, taxes, income, production, and consumption. These extreme values introduce significant bias since they increase the estimation variance. Conventional estimation methods are unable to provide realistic and precise estimates in such cases. Specialized techniques based on nonparametric, semiparametric, and biased reduction densities are employed to increase precision of such estimates [4, 5]. Alternative approaches to developing estimators for finite population means is the use of regression-based estimators, see for instance [6]. Though significant gains in precision can be obtained with these methods, they are generally computationally laborious and time consuming, especially in large samples [7, 8]. These, among others, called for consideration of alternative methods that fairly are user friendly without sacrificing precision.

To overcome this challenge, Sarndal [9] proposed an unbiased estimator for a finite population mean in the presence of extreme values under simple random sampling. The authors in other works [2, 10, 11] proposed an improved ratio-type estimator for the estimation of finite population mean when there exist minimum or maximum values. Moreover, a ratio, product, and regression type estimators for the estimation of finite population mean when there exist extreme values were proposed [1, 10, 12]. Other procedures have been proposed in recent studies aimed at increasing precision of mean estimates when variability in the study population is high [1, 6, 13].

Although these approaches have achieved significant improvement in the precision of population parameters, the gain in precision and computational efficiencies still leave much to be desired. This study therefore seeks to develop an efficient estimator for the estimation of the finite population mean in the presence of extreme values.

The rest of the paper is organized as follows. Section 2 presents literature review on existing mean estimators; Section 3 contains the proposed estimator and the derivations of its theoretical properties. In Section 4, the comparisons of the theoretical properties of the proposed estimator with the competing estimators are carried out. The simulation and empirical studies are contained in Sections 5 and 6, respectively, whilst conclusion is in Section 7.

2. Review of Existing Mean Estimators

Consider selecting a random sample of size from a population of size and the probability of selection associated with the size of the primary units, an unbiased estimator of population mean and variance under probability-proportional-to-size sampling scheme are given as follows:andwhereand is the study variable, is the value of the study variable for the population unit, i, is the selection probability of unit in the population at any given draw. is the define variate given by . Without loss of generality, .

To avoid overestimating or underestimating the population mean when observations in actual data contain unexpected large or small values, Sarndal [9] suggested an unbiased estimator given by the following equation:withwhere , c is a constant withis the conventional variance.

The minimum mean variance is given by the following equation:

The major drawback of this estimator is its slow convergence for small values of leading to reduced precision of mean estimates. To address this challenge, Ahmad and Shabbir [1] proposed a product ratio estimator using an auxiliary variable. This led to a complex estimator without significant gain in precision. The square root transformation of provides the needed stability of the variance and hence improves precision remarkably without much computational efforts.

3. Proposed Estimator

The proposed estimator is a modification of the Sarndal estimator [9] for finite population mean. An estimator of finite population mean when extreme values are present under probability proportional to size sampling scheme is proposed.

Let be independent and identically distributed random samples with mean under probability proportional to size sampling scheme and c, a non-negative constant. The proposed estimator is formulated as follows:

The bias of the proposed estimator isbut

The variance of the proposed estimator is given bywhere , c is a constant and the conventional variance is

Consistency of the proposed estimator is a limiting function of the bias, thus,is trivial.

4. Comparison of Estimators

In this section, the proposed estimator is compared with the conventional and Sarndal [9] estimators under the probability-proportional-to-size sampling scheme.

4.1. Condition (i)

From equations (2) and (11),

Suppose , and is positive.

Consequently,

4.2. Condition (ii)

From equations (5) and (11),

Suppose , and is positive.

Consequently,

Thus, the proposed estimator performs better than the conventional and the Sarndal estimator [9] when conditions (i) and (ii) are satisfied.

5. Simulation

Monte Carlo simulations were performed for samples of size for 500 replications under the probability-proportional-to-size sampling scheme for a finite population of size 5000 units with an extremely minimum value of 10 and maximum value of 4900. The variance of the proposed estimator and the Sarndal estimator [9] were determined for different sample sizes of and values of c. The variance of the conventional estimator and were assumed to be constant for each sample size, n. The variance of the proposed estimator and Sarndal estimator [9] were computed at . Tables 16 show the variance of the estimator for each value at different values of c, respectively. As seen in Table 1, the variance of the proposed estimator is smaller than that of Sarndal estimator [9] for all values between 0 and 1. Furthermore, as the sample size increases, lower variances are observed for both estimators. This suggests that the mean approaches the population mean, hence demonstrates consistency of the proposed estimator.

The variances of estimators for sample size 100 are presented in Table 2.

For sample size 150, the results are indicated in Table 3.

The variance of estimators for sample size 200 is shown in Table 4.

The variance of estimators at sample size 250 is shown in Table 5.

Finally, results of the variance of estimator for sample size 300 are indicated in Table 6.

6. Empirical Applications

To determine the performance of the proposed estimator relative to some existing estimators, three data sets from three different populations were used. Two data sets were obtained from literature [14, 15], and the third data set was extracted from Ghana Living Standard Survey Round 7 data [16]. The estimates for these populations are given in the following.

Population 1. (see [15])
Y: area under wheat crop in 1964.
N = 34, n = 12, , , , .

Population 2. (see [14])
Y: population size in 1930 (in 1000).
N = 49, n = 20, , , , .

Population 3. (see [16])
Y: total amount on house expenses.
N = 9594, n = 500, , , , and .
Table 7 shows the variance associated with each of the estimators in different populations. It is observed that the variance of the proposed estimator is smaller in each population compared with the conventional and Sarndal [9] estimators. The proposed estimator is a better estimator of mean than existing ones, especially for large sample sizes [7, 8].
The following expression is used for efficiency comparison:The percent relative efficiencies are summarized in Table 8.
Clearly, the proposed estimator is consistently better than its competitors in both simulation and applications, especially when the value of c is less than unity.

7. Conclusion

A new estimator for a finite population mean under the probability proportional to size sampling in the presence of extreme values is proposed. Theoretical properties such as bias and variance were derived. Empirical studies on real life data and simulation studies were performed, and the proposed estimator was compared with existing estimators. Empirical results confirmed the proposed estimator to have smaller variance than the conventional and Sarndal [9] estimators. The proposed mean estimator was found to be better and more efficient than the existing estimators for small values of c.

Data Availability

All the data used in this study are published data and hence publicly available.

Conflicts of Interest

The authors declare that they have no conflicts of interest.