Abstract

Recently, different distributions have been generalized using the -R {Y} framework but the possibility of using Dagum distribution has not been assessed. The -R {Y} combines three distributions, with one as a baseline distribution, with the strength of each distribution combined to produce greater effect on the new generated distribution. The new generated distributions would have more parameters but would have high flexibility in handling bimodality in datasets and it is a weighted hazard function of the baseline distribution. This paper therefore generalized the Dagum distribution using the quantile function of Lomax distribution. A member of -Dagum class of distribution called exponentiated-exponential-Dagum {Lomax} (EEDL) distribution was proposed. The distribution will be useful in survival analysis and reliability studies. Different characterizations of the distribution are derived, such as the asymptotes, stochastic ordering, stress-strength analysis, moment, Shannon entropy, and quantile function. Simulated and real data are used and compared favourably with existing distributions in the literature.

1. Introduction

The quality of the result of a statistical model depends so much on the fitness of the assumed probability distribution to the data. Thus, significant effort has been made in developing different families of probability distributions along with their relevant statistical methods [1].

However, there are still many important real problems where any of the existing standard and newly developed distributions do not fit the data appropriately, especially in the areas of finance, engineering, medicine, and environmental hazards. The Dagum distribution is one of the most important distributions in modeling income and wealth distribution, especially personal income, and is mostly associated with the study of income distribution [2]. It is related to the Gini index (see [3]); it is not the only three-parameter distribution used to model income distribution but it is often most appropriate [4].

The following Dagum distributions have been proposed: beta-Dagum distribution [5], Mc-Dagum Distribution [6], weighted Dagum distribution [7], gamma-Dagum distribution [8], exponentiated Kumaraswamy–Dagum distribution [9], extended Dagum distribution [10], transmuted Dagum distribution [11], Dagum–Poisson distribution [12], exponentiated generalized exponential Dagum distribution [13], and power log-Dagum distribution [14].

Johnson et al. [15] asserted that the use of four-parameter distributions should be sufficient for most practical purposes and that at least three parameters are needed to model any real data, but they doubted any noticeable improvement arising from, including a fifth or sixth parameter. However, we were motivated by the work of [13], with a six-parameter distribution. Their six-parameter distribution performed better than its submodels with fewer parameters. Other authors have also demonstrated and showed that distributions with more parameters have greater flexibility of modelling reliability and survival data than their submodels with fewer parameters, thereby proving Johnson et al. [15] wrong in their statement. Aljarrah et al. [16] mentioned that adding the fifth parameter to the Normal-Weibull{Cauchy} distribution improved the fit of the model to the data with an increase of more than 22 points in the log-likelihood value. Several works by the following authors, Paranaíba et al. [17]; Cordeiro and Lemonte [18]; Domma and Condino [5]; Oluyede et al. [8]; Silva et al. [10]; and Bakouch et al. [14], further supported the work of Nasiru et al. [13].

The motivation of this work is that the Dagum distribution despite being one of the most important and appropriate distributions in modelling income and wealth, and its relationship with Gini Index, is not being generalized via the -R {Y} framework. The -R {Y} framework is a combination of 3 distributions, , , and , where the quantile function of is used as a frame to hold the cdf of , which is being transformed by , with some parameters of each distribution having effect on the newly formed distribution. One major importance of providing new distribution through the quantile function of an existing distribution is that the newly formed distribution has the tendency of having higher flexibility in handling bimodality in datasets and it is a weighted hazard function of the baseline distribution (Dagum distribution in this case). For more detailed informations on the importance of using this method, -R {Y}, see Aljarrah et al. [16]; Alzaatreh et al. [19, 20]; Zubair et al. [21]; and Famoye et al. [22]. Also, for detailed knowledge of Dagum distribution, see Bandourian et al. [4]; Kleiber and Kotz [2]; Kleiber [3]; Domma and Condino [5]; Oluyede and Rajasooriya [6]; Oluyede and Ye [7]; Oluyede et al. [8]; Huang and Oluyede [9]; Silva et al. [10]; Shahzad and Asghar [11]; Oluyede et al. [12]; Nasiru et al. [13]; and Bakouch et al. [14].

Thus, in this study, a new generalization of the Dagum distribution named that the -Dagum family is generated and a member of this family, the exponentiated-exponential-Dagum{Lomax} (EEDL) distribution with six parameters, is proposed and its properties are studied. This proposed distribution will not only take into consideration high flexibility in the shape and scale parameters but also takes care of skewness (right and left), kurtosis and tail variation, and sometimes can be stable for some parameter values.

The rest of the article is organized as follows. In Section 2, the proposed distribution was derived along with some of its characterizations, and the parameters of the proposed distribution were estimated using maximum likelihood estimation (MLE). A simulation study to assess the stability and performance of the parameter estimates was carried out. This is followed by the application of the new model demonstrated using two real datasets, and finally conclusion is given based on the simulation study and the real applications.

2. Proposed -Dagum{Y} Class

The beta-generated family defined as T-X family by [23] was extended by [24] to - family, and further extension was made by [16] to the - by making to be the quantile function of a random variable and defined the -X {Y} family as

The - in (1) was redefined by Alzaatreh et al. [19] as -R {Y}. They gave the unified definition of -R {Y} family. The cdf of the -R {Y} family is defined bywhere is the pdf of a random variable , is the quantile function of a random variable and is the cdf of a random variable . is differentiable and monotonically nondecreasing. It is necessary that and have the same support.

The pdf corresponding to the cdf in (2) is given by

In the literature, many authors have used this -R{Y} framework to develop probability distributions, such as Aljarrah et al. [16]; Alzatraah et al. [19, 20]; Nasir et al. [25, 26]; Jamal et al. [27, 28]; Zubair et al. [21]; Famoye et al. [22]; and Jamal and Nasir [29]. None of these authors has generalized Dagum distribution using this framework.

In this research, we let be a random variable that follows Dagum distribution with cdf, and is given by

The cdf of the -Dagum family is thus defined by putting equation (4) in (2) to have

Equation (5) is the cdf, and of the proposed -Dagum (or simply -) class of distribution. Let the pdf of Dagum distribution be given as

From equation (3), the corresponding pdf to equation (5) is given by

Remark 1. If follows - class of distributions, it is easy to see that(i)(ii)(iii)If , then Dagum(iv)If , then where is the quantile function of , is the cdf of and is the quantile function of ; where is generated from a standard uniform distribution. Remark 1(i) is a random variable, while Remark 1(ii) is a quantile function (see [21]).
The cdf in (5) can be used to generate many distributions, who are members of the -D{Y} class of distributions.

2.1. The -{Lomax} Family

Let be a Lomax random variable with pdf given by

The cdf and quantile function of Lomax distribution are given by and , respectively. From equations (5) and (7), the cdf and pdf of the - {Lomax} distribution are, respectively, given as

The cdf of the -Dagum{Lomax} class of distributions, using Lomax quantile function, is defined in (9). This equation (9) is a new way of generalizing Dagum distribution. So, can be any univariate probability distribution with support .

2.2. Some Properties of -D{Lomax} Class of Distribution

Some general properties of the -D{Lomax} class are discussed in this section.

Lemma 1. Given any random variable with pdf , then the random variable follows -Dagum{Lomax} distribution in (9).

Proof. It is easy to see the result from Remark 1(i). Lemma 1 shows the relationships between and random variables. Random variable can be generated from random variable using these relationships. If, for instance, random variable is a known standard random variable, in which quantile function is known, then random variates can be simulated by first simulating values.

Lemma 2. The quantile functions for the -D{Lomax} distribution is given by

Proof. It is easy to see the result from Remark 1(ii).

Theorem 1. Shannon’s entropy of the -{Lomax} class of distribution can be expressed as

Proof. Since , it follows that . Hence, based on the pdf in equation (3), we can writeThis implies thatFor the - class of distribution, we haveSo, the result in Theorem 1 follows from (14) and (15).

3. Proposed Six-Parameter Exponentiated-Exponential-Dagum{Lomax} Distribution

The six-parameter exponentiated-exponential-Dagum{Lomax} (EEDL) distribution is proposed and we derived some of its characterizations.

3.1. Cumulative Distribution Function (cdf) of EEDL Distribution

Gupta and Kundu [30] defined the pdf of exponentiated exponential distribution aswhere is the scale parameter and is the shape parameter, and the cdf is given by

Substituting equations (16) and (17) into equation (9), we have

From (18), it is notable that is a constant and can be replaced with and is also a constant and can be replaced with without any loss of generality so that (18) is reduced towhere , , , and are shape parameters, which define the shape (skewness, kurtosis, and mode) of the distribution, while and are scale parameters, which defines the spread of the distribution. Thus, (19) is now the cdf of the new probability distribution called the exponentiated-exponential-dagum{lomax} (EEDL) distribution.

3.2. Probability Density Function (pdf) of EEDL Distribution

The pdf of the new probability distribution, that is, EEDL distribution can be obtained by differentiating equation (19) with respect to or by substituting equations (8) and (16) directly into equation (10) to havewhere . Equation (20) is the pdf of the new probability distribution, EEDL. The six-parameter distribution will be a good distribution to model any environmental hazard, survival, and time to failure data parameters.Ifthenand the cdf becomeswhere is a function of .

3.3. Quantile Function of EEDL Distribution

In probability theory, we can characterize a random variable by its quantile function. It is much important in deriving measures of partition such as the median, quartiles, octiles, deciles, and percentiles.

Lemma 3. The quantile function of the EEDL distribution for random variable, uniformly distributed on the interval [0, 1], is given by

Proof. Let , by making the subject of the formula in equation (19), and the inverse function of is given byThis quantile function in (25) will be used to generate random variates in the simulation study. The median and 1st and 3rd quartiles can be obtained by setting  = 0.5, 0.25, and 0.75, respectively. Other measures of partitions can also be obtained by setting appropriately.

3.4. Survival Function of EEDL Distribution

If follows an EEDL distribution and be the probability that any given device of interest will survive to a given point in time, , such that , that is, , then the survival function, , is a function that gives the probability that such a device will survive beyond .

Suppose that is the cdf of EEDL distribution supported on the interval as proposed in equation (19); then, the survival function of EEDL is given byand in terms of , equation (26) becomes

3.5. Hazard Function of EEDL Distribution

Let be a random variable that follows an EEDL distribution, with pdf and survival function given in (22) and (27), respectively; then, its hazard function is given bywhere is a function of as defined in equation (21).

3.6. Cumulative Hazard Function of EEDL Distribution

Let be a random variable that follows an EEDL distribution, with survival function given in (27), then its cumulative hazard function is given bywhere is a function of as defined in equation (17).

Figure 1 depicts the pdfs of EEDL distribution for various values of the parameters and depicts that the distribution can be stable (normal), positively skewed, or negatively skewed. Figure 2 depicts that the EEDL distribution can be stable (symmetric), positively or negatively skewed, unimodal, or bimodal. The behaviour of EEDL can help to model any environmental hazard data or any data with high degree of variability. The four shape parameters can capture any features or variation in a dataset.

A distribution can also be characterized by its asymptotes, stochastic ordering, stress, and strength properties.

3.7. Asymptotes of EEDL Distribution
3.7.1. Vertical Asymptotes of EEDL Distribution

Theorem 2. If and are the pdf and hazard rate of EEDL distribution, then line , if it exist, is a vertical asymptote of the graph of the functions and if the following statements hold:The horizontal asymptote for is given by

Proof. The denominator of the pdf of EEDL distribution in (22) is equated to zero to haveand to solve for x, we haveAlso, the denominator of the hazard rate of EEDL distribution in (28) is equated to zero to havebutThe proof is complete. Note that if is an even real number, then is a complex number. Also, if is an odd real number, then will be negative because . Take note that the limit is one-sided, since . This implies that the vertical asymptote of EEDL distribution does not exist for all real values of .

3.7.2. Horizontal Asymptotes of EEDL Distribution

Let and be the pdf and hazard functions of EEDL distribution.

The horizontal asymptotes are horizontal lines that the function approaches as .

The horizontal asymptote for is given by

The horizontal asymptote for is given by

Take note that the limit is one-sided, since .

3.8. Stochastic Ordering of EEDL Distribution
3.8.1. General Order Statistics of EEDL Distribution

Theorem 3. The pdf of the general order statistics of EEDL distribution exists and it is given by

Proof. Let denote the order statistics of a random sample that follows EEDL distribution, from a continuous population with cdf and pdf . Then, the pdf isSubstitute (22) and (23) into (39) to have (40):Thus, equation (41) completes the proof.
Let and be two random variables that follow the EEDL distribution. is less than if , where P (.) denotes the probability of an event.

Theorem 4. Let and be two random variables that follow the EEDL distribution. If and , then ( is equal to in distribution).

Proof. From the pdf of the general order statistic of the EEDL distribution given in (41), set  = 1 to arrive at the order statistic given byAlso, from (41), set  = 2 to arrive at the order statistic given byFor , we can show that :Using series expansion, we have the inequality asInequality (45) can be reduced toTake the expectation of both sides to haveNote that the expectation of a constant is a constant.
To make equality prevail, we test when to arrive atThus, the proof is complete. Hence, and are random samples from EEDL distribution.

3.9. Stress-Strength Reliability Analysis of EEDL Distribution

Theorem 5. Let and be two independent random variables that follow the EEDL distribution with pdfs and , respectively. If represents the stress and represents the strength, then the stress-strength reliability of EEDL distribution with and does not depend on or .

Proof. The probability that is the reliability of the stress-strength of the EEDL distribution, and it is given bywherePut equations (51) and (52) in (50) to haveBy using linear expansion, we haveThe integral in (54) using gamma function becomeswhere , , and . Equation (55) is the stress-strength reliability function of the EEDL distribution with and .
However, if , equation (55) is reduced toThus, equation (56) completes the proof. Hence, the stress-strength reliability, , does not depend on .
The probability density, cumulative distribution, quantile, survival, hazard, cumulative hazard functions, asymptotes, stochastic ordering, and stress-strength analysis are some of the different ways of characterizing a random variable.

3.10. Related Distributions

Most generalized distributions have relationship with their base distributions by varying one or more of its parameters’ value. EEDL is also related with its parent distribution.

3.10.1. Transformations

(1) Exponentiated Exponential Distribution

Lemma 4. If , then the random variable has an exponentiated exponential distribution with parameters and .

Proof. By using the transformation method, the result is shown as follows:(2) Ratio of Exponential and Lomax Distributions

Lemma 5. If , then the pdf of random variable is the ratio of two pdfs of random variables that have an exponential distribution with parameter and Lomax distribution with parameter , respectively.

Proof. By using the transformation method, the result is shown as follows:

3.11. Moments of EEDL Distribution

The moments of a probability distribution are a very important property in describing the distribution. The mean, variance, standard deviation, measure of skewness and kurtosis, and other parametric measures that describe the distribution can be derived from it.

Theorem 6. Let follow an EEDL distribution, and the moment of can be expressed in terms of gamma function of with parameters and , and it is given bywhere

Proof. Recall that and so that the pdf of EEDL distribution is reduced toSee the complete proof of equation (62) in Appendix I.Substitute (64) into (63) to haveEquation (66) completes the proof, whereRecall gamma function:Thus, the th moment of the EEDL distribution is given byIf the subscripts , then equation (69) reduces toIf , we have the mean of the EEDL distribution, which does not depend on variables and , and it is given byand the variance is

3.12. Shannon Entropy of EEDL Distribution

The Shannon entropy of a random variable is a measure of variation of uncertainty. It is defined by [31] as for a random variable with pdf . Let be a random variable that follows EEDL distribution with pdf as given in (62):

The Shannon entropy of EEDL distribution iswhere is a function of . It can be written as

3.13. Maximum Likelihood Estimation of EEDL Distribution Parameters

Recall from (20) the pdf of EEDL distribution, and we derived the log likelihood function as

Remember that and

We can differentiate (75) easily with respect to and .

Differentiate (75) partially with respect to , and equate the result to zero and solve for :

Differentiate (75) partially with respect to , equate the result to zero and solve for :

Divide through by

The estimate of is not in a closed form. From (58), we also differentiate partially with respect to other parameters, equate their results to zero, and solve for each of them. Their solutions are not in closed form as well, so we resolved by using R package “MaxLik.”

3.14. Simulation

A simulation study is carried out to investigate the performance of the estimators. The standard error of estimate (SE), average absolute bias (AAB), and root mean square error (RMSE) of the maximum likelihood estimators of the parameters of the EEDL were examined. The simulation study was repeated for  = 1000 times each with sample sizes  = 20, 50, 100, 250, and 500 and parameter values:  = 1.5,  = 0.1,  = 2.0,  = 0.3,  = 0.01, and  = 0.9.

Table 1 presents the MLE estimates, standard error of estimate, and AAB and RMSE values of the parameters , , , , , and for different sample sizes. The result shows that as the sample size approaches infinity, the AAB and RMSE decrease asymptotically to zero, proving their consistency. It is consistent in the sense that it converges to the true parameter value as the number of observations becomes larger and the error reduces to zero. See Figure 3 for pictorial view.

3.15. Applications

The EEDL distribution is fitted to two real datasets retrieved from [13]. The first application is a data on failure times for 36 appliances subjected to an automatic life test, while the second data is on failure time data on 100 cm yarn subjected to 2 : 3% strain level. EEDL distribution is compared with that of exponentiated generalized exponential Dagum distribution (EGEDD), the exponentiated Kumaraswamy Dagum (EKD) distribution and the Mc-Dagum (McD) distribution using Log-likelihood, Akaike information criterion (AIC), and Kolmogorov–Smirnov (K-S) statistic criteria.

3.15.1. Application 1: Appliances Data

The appliances data in Table 2 was obtained from [13, 32]. The dataset consists of failure times for 36 appliances subjected to an automatic life test. The dataset is depicted in Figure 4 and shows that there is a gap in the histogram with positive skewness (2.279) and highly leptokurtic (9.669). Table 3 displays the maximum likelihood estimates of the parameters with their corresponding standard errors in brackets. Table 3 shows all the parameters of the EEDL distribution and other distributions.

Table 4 clearly shows that the EEDL distribution provides a better fit to the appliances data than the other models. Its log-likelihood, AIC and approach zero faster than other distributions. It also has the smallest K-S statistic compared with other distribution in this research. Thus, the EEDL is a better fit to the appliances data.

3.15.2. Application 2: Yarn Data

Table 5 represents the data on time to failure of a 100 cm polyster/viscose yarn subjected to 2 : 3% strain level in textile experiment in order to assess the tensile fatigue characteristics of the yarn. The dataset can be found in [13, 33, 34]. The skewness and kurtosis of the data are 1.336119 and 5.802452, respectively. The data is positively skewed and very peaked, as depicted in Figure 5.

The maximum likelihood estimates of the parameters of the fitted models with their corresponding standard errors in brackets are given in Table 6. All the parameters of the EEDL are significant at the 5% significance level. The EEDL provides a better fit to the yarn data than the EGEDD, EKD, and McD distributions, as shown in Table 7.

Table 7 shows that EEDL log-likelihood and its AIC approaches zero than that of others and has the smallest K-S statistic compared to the other models. Thus, the EEDL is a better fit.

The kernel density graphs of the data and the EEDL distribution are super-imposed on the histograms on Figures 4 and 5. This shows that the newly proposed EEDL distribution fits the two data.

Figures 6 to 9 show that EEDL distribution fit the two data well. The EEDL distribution fits the yarn data more than the appliances data. The QQ plots in Figures 10 and 11 shows that most of the data points, especially the ones at the middle, fall on the theoretical line, but the few ones outside the unit circle (−1 and 1) of the theoretical quantiles fall outside the line. This means that if the edges of the data are trimmed, the EEDL distribution would be a perfect fit for the data. However, Tables 4 and 7 shows a confirmatory test that the EEDL distribution is a good fit to both the appliances and yarn data via the K-S statistic and value.

4. Concluding Remarks

This research proposed a new univariate continuous probability distribution called exponentiated-exponential-Dagum {Lomax} distribution, EEDL, which is a member of the T-Dagum{lomax} family and presented results on its statistical properties, such as the cumulative distribution function, density function, the quantile function, survival function, hazard function, cumulative hazard function, asymptotes, stochastic ordering, stress-strength analysis, moments, and Shannon entropy. The maximum likelihood estimation of the parameters of the model was derived. The newly proposed EEDL distribution was applied to two datasets that are positively skewed and very peaked and the results of its performance were compared favourably with EGEDD, EKD, and McD. Further research can be carried out on submodels of EEDL.

See Table 8 for seven of its submodels with fewer parameters. These submodels will be investigated in subsequent work. Exponential-DagumLomax distribution is one of its submodels. The more significant a parameter of a distribution is, the more likely its fit is better to some datasets (see [8, 10, 13] and [14]).

Appendix

Linear Expansion of EEDL Distribution

Let where

Data Availability

The two datasets used are appliances data and yarn data. Both datasets are provided in the body of the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.