Abstract

Background. Yin deficiency (YD) is a pathological condition characterized by emaciation, afternoon fever, dry mouth, and night sweats. The incidence of YD is 23.3%. A 27-item Yin Deficiency Scale (YDS) was developed to estimate the clinical severity of YD. This study aimed to develop three short-form YDS versions to reduce the burden of response time, using three item-reduction approaches: Rasch, equidiscriminatory item-total correlation (EITC), and factor-based analyses. Methods. Two datasets were analyzed from previous studies (169 outpatients from May to June 2009 and 237 healthy college students from January to April 2016). The optimal response category was examined using Rasch analysis. Items with higher item-total correlations were determined using the EITC. Using a factor-based approach, the items were reduced, while maintaining the original YDS construct. Reliability was estimated using the person separation index (PSI) and Cronbach’s α values. The predictive accuracy was examined using the area under the curve (AUC). Finally, the relationship between YD and dysfunctional breathing (DB) was examined using factor scores from the YDS and the Korean version of the Nijmegen Questionnaire (KNQ). Results. We developed two 14-item YDS versions using the Rasch and EITC approaches, and a 16-item YDS version using a factor-based approach. Rasch analysis suggested an optimal response category of five points. The PSI of Rasch and Cronbach’s α of the EITC and factor-based versions were 2.19, 0.855, and 0.827. The AUCs of the three short-form YDS were 0.812, 0.811, and 0.818. The sensitivity of the EITC-YDS was 0.632, which was lower than its specificity of 0.875. The fatigue-related scores of the factor-based YDS were fairly correlated with the factor scores of the KNQ estimating the DB (r = 0.349–0.499). Conclusion. The 14-item Rasch- and 16-item factor-based YDS may replace the original YDS during YD’s primary screening, epidemiological surveys, and health checkups.

1. Introduction

Yin deficiency (YD) is a pathological condition with diverse symptoms and signs, including emaciation, fatigue, pain, and weakness, especially in the lower limbs; afternoon or night coughs; dry mouth; night sweating; and frequent urination [1]. A previous study reported that the incidence of YD was 23.3% in the elderly group [2]. YD is induced by insufficient yin fluid, including intra- and extracellular fluid, lymphatics, blood, and synovial fluid, and thus, the diminished moisturizing function may secondarily result in heat- or dryness-related symptoms and signs [3]. Some studies have reported that YD is a subtype of the pathological patterns of climacteric syndrome [4], tuberculosis [5], diabetes mellitus [6], and psychiatric disorders [7]. Increased YD was associated with the survival rate in late-stage cancer [8]. Considering the incidence of YD and its broad physiological and pathological spectrum, a questionnaire that can initially screen for the presence or absence of YD will be helpful for clinical trials, epidemiological surveys, and health checkups.

Park et al. developed the 27-item Yin Deficiency Scale (YDS) [9]. The YDS consists of eight factors with a Cronbach’s α of 0.885 [9]. Based on receiver operating characteristic (ROC) curve analysis, the predictive accuracy of the YDS estimated by the area under the curve (AUC) was 0.875, and its cutoff value was determined as ten points. Since its development, the 27-item YDS has been widely used to evaluate the clinical severity of YD. YD scores estimated by the YDS were associated with an aggravation of the quality of life [9]. Increased YDS scores were associated with decreased blueness of the face and tongue tip [10, 11]. Regarding vocal quality, increased YDS scores were associated with decreased modulation of the fundamental frequency [12]. The YDS scores for the young population with dysfunctional breathing (DB) were higher than those without DB [13]. Although the YDS has been broadly utilized to estimate the clinical severity of YD, it has 27 items, which may require response time and the ability to complete it [14]. In particular, patients with difficulty with handwriting or cognition may be affected by the length of the different questionnaires [14]. Therefore, this study aimed to reduce the number of items in the original YDS using three item reduction methods: Rasch, equidiscriminatory item-total correlation (EITC), and factor analyses.

Rasch’s analysis is based on item response theory, in which each item response in the questionnaire is taken as an outcome of the independent interaction between the respondents’ abilities and item difficulty [15]. To overcome the limitations of classical test theory, Rasch analysis includes an examination of item hierarchy, fitting error, and differential item functioning (DIF) by sex and age [14, 16]. EITC is a modified version of item-total correlation (ITC) [17]. In the EITC, items are discriminated through three percentile points (25%, 50%, and 75%) of the total scores, and correlations between the dichotomous scores of the items and the total scores of the YDS may be calculated within the three percentile categories [18, 19]. The third approach is factor-based item reduction. The Korean version of the Nijmegen Questionnaire (KNQ), which assesses DB-related symptoms, comprises four etiological factors [20]. If the number of YDS items is reduced while maintaining the construct of factors and reliability levels, it will be helpful to understand the relationship between the etiology of YD and DB by examining the correlations between the short-form YDS and the factors of the KNQ.

In summary, Rasch analysis minimizes bias due to item hierarchy and DIF by sex and age, whereas EITC and factor analyses reduce item numbers while maintaining the reliability and construct validity of the original questionnaire. By comparing the advantages of these three item-reduction approaches, researchers and clinicians may be able to relieve the burden of time or handwriting of respondents through a short-form questionnaire that suits their purposes. Finally, we calculated the AUC of the three short-form YDS versions using receiver operating characteristic (ROC) curve analysis and compared their predictive accuracies with the 27-item YDS.

2. Methods

2.1. Data Sources

Two datasets were used in the study. One dataset was previously used to develop the original version of YDS [1]. In the previous study, 169 outpatients (39 men aged 42.1 ± 14.7 years; 130 women aged 43.5 ± 14.9 years) from 12 Korean medical clinics completed the 27-item YDS from May to June 2009 [1]. Twelve Korean medical doctors with clinical experience, blinded to the YDS scores, determined the presence or absence of the YD pattern for each outpatient. Another dataset was collected from 237 college students (130 men aged 21.4 ± 1.9 years; 107 women aged 21.4 ± 3.0 years) who had no impediments to daily life caused by psychological or respiratory problems from January to April 2016, and they were asked to complete the KNQ, together with the 27-item YDS [13]. In the two datasets, the YDS items were rated on a 7-point Likert scale: 1 = disagree very strongly; 2 = disagree strongly; 3 = disagree; 4 = neither agree nor disagree; 5 = agree; 6 = agree strongly; 7 = agree very strongly. The items of the KNQ are rated on a 5-point Likert scale: 0 = never; 1 = rarely; 2 = sometimes; 3 = often; 4 = very often. The second dataset did not include information on the clinicians’ determination of YD and was used only to examine the relationship between the factor scores of the short-form YDS and the KNQ. The study protocol was approved by the Institutional Review Board of Kyung Hee University Oriental Medical Hospital at Gangdong (approval number: KHNMCOH 2021-02-001).

2.2. Rasch Analysis

Rasch analysis used the partial credit model because the 27-item YDS was answered using polytomous responses such as a 7-point Likert scale [15]. The first step in Rasch analysis was to evaluate the appropriateness of the response category. Category probability curves and the ordering of the response categories were examined. If the peak of one curve overlapped with another peak, the response category was excessive, and one of the two response categories was removed [15]. Along with examining category probability curves, the ordering of response categories was examined using step calibration values. Despite the well-separated peaks of each probability curve, the disordered step calibration value, the decreased calibration value among all other increased calibration values, indicated an excessive response category and the category needed to be fused with the adjacent category. Therefore, the examination of the optimal response category was repeated until the separation of probability curves and the ordering of calibration values was satisfied. The second step in Rasch analysis was to examine the DIF. DIF analysis is a measurement of bias and refers to the difference in the probability of providing a certain response between groups [16]. In most cases, age and sex differences result in DIF [16, 21]. Therefore, differences in the item responses of the YDS between the sexes and between the older and younger age groups were examined. Rasch modeling assumes that items are weighted according to their difficulty along a linear logistic function, and the mean square (MnSq) levels and chi-square statistics divided by the degrees of freedom are calculated to examine whether the difficulty of each item fits the linear function [14]. Therefore, the third step was to evaluate the MnSq levels of infit and outfit for each item [18]. Through the evaluation of MnSq levels, misfitting items were deleted, and iterations of fit evaluations were conducted until all items were free of fitting errors [18]. Finally, the unidimensionality and reliability of Rasch YDS were examined [22, 23].

2.3. EITC

EITC, a modified ITC, was used to reduce items in the questionnaire [24]. ITC focuses on the correlations between each item’s scores and the questionnaire’s total scores. The EITC reset the three cutoff points according to the three percentile levels of the total scores (25%, 50%, and 75%) and transformed the total scores into dichotomous values [18, 19]. For example, total scores below and above the cutoff point of 25% were transformed into scores of 0 and 1. Similarly, other dichotomous total scores were determined according to 50% and 75% cutoff points. Thereafter, the EITC was calculated as the correlation between the three sets of dichotomous value-transformed total scores and the questionnaire item scores. Three sets of tables according to the three percentile categories were rearranged in descending order according to the EITC score. Four or five items with the top-ranked EITC were extracted from the 25% percentile category. The same number of items with the top-ranked IETC as those in the 25% percentile category were extracted from the 50% and 75% percentile categories. If the same item was on the top list for both the 25% and 50% categories, it was dropped from the list of the 50% group, and the next-ranked item from that group was substituted into the 50% list. The item in the 75% list was dropped, and the next-ranked item was substituted if it was in both 50% and 75% ranks [18, 19]. As it was reported that Cronbach’s α > 0.800 is preferable [25], we calculated the minimal item numbers to guarantee a Cronbach’s α of 0.800 using the Spearman–Brown prophecy formula [26]. If the total number of items to satisfy Cronbach’s α level is a multiple of three, all top-ranked items may be extracted from the three percentile groups. However, if the total number is not a multiple of three, the multiple of three items exceeding the minimal numbers suggested by the Spearman–Brown prophecy was primarily extracted from the three percentile groups, and the items with the lowest EITC were removed until the minimum item numbers satisfying Cronbach’s α of 0.800 were reached.

2.4. Factor Analysis

The items of the original YDS were previously determined using the contribution scores to YD by 50 Korean medical clinicians who were asked to rate forty-three items on a 7-point Likert scale (1 = no contribution to YD; 7 = greatest contribution to YD) using the Delphi method [27]. Through two iterations of feedback, 30 items with a contribution score over 4.00 were extracted, and the following study finally determined the 27-item YDS satisfying reliability and construct validity [1]. Table 1 lists the final 27 items and mean contribution scores for YD estimated by clinicians [27], and Supplementary Table S1 lists eight factors of the YDS extracted from the 27 items using principal component analysis (PCA) [1]. As shown in Supplementary Table S1, eight factors were associated with the symptoms, lesions, and subtypes of YD. For example, cough, fever, pain, and fatigue were named according to the symptoms of YD, whereas urine and skin factors were named according to the lesions affected by YD. Kidney liver deficiency is one of the most frequently observed subtypes of YD in clinical cases.

As mentioned earlier, we speculated that a factor-based approach may help examine the relationship between the symptoms, lesions, and subtypes of YD and the clinical severity of the disease. A factor-based approach was implemented using the four-step item-reduction procedure proposed by Smith et al. [28]. This procedure had the advantage of minimizing the loss of reliability level while maintaining the construct of factors. In Step 1, Cronbach’s α values of all factors and whether the values may increase when an item is removed from each factor were examined. In Step 2, we examined whether the decrease in Cronbach’s α values may be minimized when removing an item. In Step 3, we examined whether face or content validity was maintained after the items were removed or retained in Steps 1 and 2. If face or content validity collapses, returning to Steps 1 and 2, some items may be retained despite their low reliability. In Step 4, a principal component analysis was conducted to examine whether there were remarkable changes in the construct for the reduced items. However, we omitted step 3 from the development of the short-form YDS because all items of the original YDS satisfied content validity via experts’ contribution scores for the YD. In other words, it was inappropriate to add dropped items to maintain or increase the validity level. Therefore, we conducted factor-based item reduction, where Cronbach’s α values of the eight factors of the original YDS were examined, and the items contributing to an increase in overall Cronbach’s α value (step 1) or contributing to a minimal decrease in the value (step 2), were removed until two items remained with each factor. Thereafter, a PCA was conducted on the short-form YDS to examine whether there were any changes to the construct of the original YDS (Step 4). Finally, the overall Cronbach’s α coefficient of the short-form YDS was compared to that of the 27-item YDS.

2.5. Statistical Analysis
2.5.1. Rasch Analysis

In examining the response category, the ordering of the item responses was acceptable when the total counting numbers of each response category were higher than 10 points, the average measure and step calibration showed an ascending order, and the outfit level of each category was lower than 2.0 [29]. If there was any violation among the item response categories, the category was unified with an adjacent category, and examination of the ordering of all categories was repeated until the violations were corrected. Together with the numerical examination, the overlap of a category curve peak with other curves was examined, and the fusion of two adjacent categories was repeated until all the peaks of the category curves were well separated [30]. DIF was assessed in both sex and age groups. The median value of the participants’ age was 41.0 years, thus those >41 years to the older group and those <41 years to the younger group. Differences in logits between men and women and between the older and younger groups were examined using the chi-square test [16]. Infit and outfit were assessed, and items with MnSq values of infit or outfit <0.70 or ≥1.40 were removed sequentially [18]. From the item response perspective, unidimensionality denotes that, among the short-form YDS, the second factor may comprise only one item, which helps avoid scoring unrelated dimensions within the reduced items [22]. Unidimensionality was determined when the unexplained variance in the first contrast extracted from the PCA was <2.0 [22]. Person separation and reliability indexes were calculated to examine the reliability level of the reduced items where separation index ≥2.0, or reliability index ≥0.8, was considered high-reliability levels [31].

2.5.2. EITC Analysis

The Spearman–Brown prophecy test was used to determine which item numbers could predict a Cronbach’s coefficient >0.800, as this value is considered a preferable level of reliability [25, 26]. After determining the minimal item numbers, each item and the total score were transformed into dichotomous variables according to the three percentile levels. EITCs were calculated using Spearman’s rho correlations, and the top-ranked items were rearranged in descending order of EITCs among the three percentile categories. As mentioned, one or two items with lower EITC levels were removed from the item pool with the top-ranked EITC, according to the total number of items calculated using the Spearman–Brown prophecy formula.

2.5.3. Conduction of PCA

For the items within the eight factors, we examined which items resulted in a slight increase or minimal decrease in Cronbach’s α of each factor [21]. Sequential removal of items was maintained until only two items remained for each factor. Since the “kidney-liver deficiency” factor of the 27-item YDS comprised only two items, the item removal procedure was not conducted for this factor. Through item reduction using the factor-maintaining method, 16 items, including two items in the eight factors, were determined for the short-form YDS. A PCA was conducted for the factor-based YDS comprising 16 items to examine whether there are any changes in the construct of the eight factors in the short-form YDS compared with that of the 27-item YDS. Only factors with eigenvalues greater than 1.0 were retained in PCA using the Kaiser criterion. Along with construct changes, we examined whether there were any changes in the overall Cronbach’s α level for the 16-item YDS compared to that of the 27-item YDS.

2.5.4. ROC Curve Analysis

After examining the reliability, construct validity, and dimensionality of the three short-form YDS using Rasch, EITC, and factor-based approaches, ROC curve analyses were conducted to compare their predictive accuracy for YD. In the three ROC curve analyses, the total scores of the short-form YDS served as test variables, and the presence or absence of YD, as determined by 12 clinicians in the previous study, served as the gold standard [1]. The predictive accuracy levels of the three short-form YDS were independently calculated using AUC. It is generally accepted that AUC values >0.9, 0.7–0.9, and 0.5–0.7 indicate high, moderate, and low accuracies, respectively [32]. An optimum cut-off point corresponded to the maximal Youden index (Youden index = sensitivity + specificity −1) [32].

2.5.5. Correlation Analysis

A previous study reported that the 16-item KNQ consisted of four factors: neuropsychological, respiratory, neurogastrointestinal, and neuromuscular [20]. Correlations between the total and factor scores of the KNQ and factor-based YDS were examined using Pearson’s rho coefficient. A correlation coefficient ≥0.8 was considered a “very strong correlation,” that of 0.6–0.7 was considered a “moderate correlation,” that of 0.3–0.5 was considered a “fair correlation,” and that of 0.1–0.2 was considered a “poor correlation” [33]. Correlation and factor analyses, reliability tests, and ROC curve analyses were performed using SPSS version 21 (SPSS Inc., Chicago, IL, USA), while Rasch analysis, including category probability, DIF, fitting error, unidimensionality, and person reliability index, was performed using Winsteps 4.8. Statistical significance was set at .

3. Results

3.1. Rasch Analysis

The category characteristics of the 27 items are summarized in Table 2. The 7-point category responses were arranged in ascending order, and all outfit MnSq levels were below 2.0. The counts for the seven categories exceeded 10. However, the step calibration value for category 4 was lower than that for category 3, indicating that both categories were disordered. Figure 1(a) shows the probability curve of Question 1 (Q1: night cough), according to a 7-point Likert scale, where the peak of category 3 was fused with that of category 4. This and the step calibration results indicated that categories 3 and 4 were disordered. Furthermore, the peak of category 5 sank under that of category 6, although the step calibration was slightly increased (0.04). This disordering between categories 3 and 4 and between 5 and 6 for the Q1 “night cough” were found equally for the probability curves of the other 26 question items. Categories 3 and 4 were first fused to correct for the categories’ disordering because the degree of disordering of categories 3 and 4 was greater than that of categories 5 and 6. The fusion of categories 3 and 4 reduced seven categories to six (Table 2). The probability curve and step calibration analyses were repeated for the six categories. Consequently, the disordering of categories between categories 3 and 4 has been corrected. However, there was still the disordering of categories 4 and 5 among the six categories, which corresponded to the disordering of categories 5 and 6 among the seven categories (Figure 1(b)). This indicated that disorder existed among the six categories; categories 4 and 5 were unified, and the third-step calibration and probability curve analysis were conducted. Finally, 5-step categories showed an ascending order of step calibration throughout all categories, and all peaks of the probability curve were well separated (Figure 1(c)). This indicated that five response categories (disagree very strongly disagree strongly, disagree, agree strongly, and agree strongly) were suitable for respondents’ answers to the YDS.

Table 3 presents the DIF results based on sex and age. The logit values for “afternoon fever (Q4),” “night fever (Q7),” “morning fatigue (Q15),” “susceptibility of heat and cold (Q16),” “night hot soles (Q22),” and “sweating during sleep (Q23)” in the older group were higher than those in the younger group, while the logit values for “persistent cough (Q2),” “residual urine (Q13),” “difficulty in containing the urine (Q14),” and “bone steaming (Q21)” in the younger group were higher than those in the older group. Regarding sex differences, the logit value only for “dark yellow urine (Q27)” in women was higher than in men. Therefore, 11 items showing DIF by age or sex were removed, and the remaining 16 were analyzed using Rasch analysis.

Table 4 lists the fit levels of the sixteen items without DIF. In the first analysis, “afternoon cough (Q3)” and “wake due to night urination (Q11)” showed fitting errors [18]. Therefore, the second analysis was conducted after removing the two items from the item pool. As a result, the remaining 14 items were free of fitting error, ranging in infit and outfit values from 0.70 to 1.39, and additional fitting analysis was not considered [18]. The raw or overall scores denoting the frequency of responses ranged from 341 points “night cough (Q1)” to 650 points “fatigue (Q17)”. In the reliability test, the person separation index was 2.19, and the person reliability was 0.83, indicating that the 14 items by Rasch analysis showed a high level of reliability [31]. Table 5 lists the dimensionality results of the 14 items. Unexplained variance in the first contrast was 1.994 (<2.0), implying the 14 items by Rasch analysis as unidimensional [22]. According to the category response, DIF, fitting error, and dimensionality analyses by Rasch analysis, the 14-item YDS rated on five categories was finally determined.

3.2. EITC Analysis

Table 6 lists the EITC results by three percentage points (25%, 50%, and 75%). In the Spearman–Brown prophecy analysis, 14 items were suggested as minimal numbers for guaranteeing Cronbach’s α of 0.800. Therefore, in each percentile category, five items with top-ranked EITC values were extracted from the three percentile categories, respectively. Among the 15 items, “hair loss (Q25)” with the lowest EITC value (r = 0.368) was removed because the purpose of item shortening by EITC was to reduce items as many as possible while retaining Cronbach’s α of 0.800. Finally, 14 items were determined as short-form YDS by ETIC. Cronbach’s α for the 14-item YDS by EITC was 0.855.

3.3. Factor-Based Analysis

As mentioned earlier, items that contributed to a slight increase or minimal decrease in Cronbach’s α values within a factor were removed item by item until two items remained within each factor. PCA was then conducted to examine the changes in the construct and Cronbach’s α of the factors. Table 7 lists the factor loadings of the 16 items and the Cronbach’s α values of the factors. The eight factors in the 27-item YDS were reduced to five in the 16-item YDS. “Cough” and “fever” factors of the 27-item YDS were still preserved in the 16-item YDS, while “pain-weaknes” and “fatigue” factors of the 27-item YDS were unified into one in the 16-item YDS. Similarly, the “urine factor” and “skin-hair factor” were unified into one. The Cronbach’s α values of the eight factors, which consisted of two items, ranged from 0.282 to 0.818 (Supplementary Table S1). However, the Cronbach’s alpha for the five factors in the short-form YDS increased from 0.492 to 0.818. The total percentage of variance in the 16-item YDS by factor-based reduction was 61.61%, and the overall Cronbach’s α of the 16 items was 0.828.

3.4. ROC Curve Analysis

Table 8 lists the ROC curve analyses of three short-form YDS versions by Rasch, EITC, and factor analyses. Supplementary Figure S1 shows maximal Youden points on the ROC curves of the three short-form YDS versions. The previous study has reported that the sensitivity, specificity, AUC, and cut-off points of the 27-item YDS were 78.7%, 84.8%, 0.885, and 10 points, respectively [1]. The AUC is a reflection how well the test distinguishes between YD and non-YD groups [32]. The AUC serves as a single measure summarizing the discriminative ability of a test across the full range of cut-offs, independently with the prevalence of disease or pathological pattern [34]. In this study, the AUC levels of 14-item Rasch, 14-item EITC, and 16-item factor-based YDS were 0.812, 0.811, and 0.818, indicating that three short-form YDS had moderate accuracy for determining YD.

In examining sensitivity and specificity levels using the maximal Youden index, Rasch and factor-based models revealed similar sensitivity and specificity levels ranging from 0.737 to 0.789. However, for the EITC model, the sensitivity level (0.632) at the maximal Youden index (0.507) was lower than the specificity level (0.875), while the Youden index with similar sensitivity and specificity levels (0.719 and 0.723, respectively) was 0.443, being lower than the maximal Youden value of 0.507. Figure 2 shows which items were overlapped or separated in the three short-form YDS versions. For example, “frequent urination (Q12)” and “dry and cracked heel (Q18)” were included only in the Rasch model, while “dry mouth (Q5),” “weakness of the lower limbs (Q8),” “night itch (Q19),” and “rough skin (Q26)” were overlapped with the three short-form YDS.

3.5. Correlation Analysis

In the examination of the incidence of YD among 237 college students, 51 students showed a total score of 27-item YDS over 10 points, and the incidence of YD was 21.5%. Table 9 lists Pearson’s correlations between the total and factor scores of the KNQ and the three short-form YDS versions. The total KNQ scores were positively correlated with the three short-form YDS by Rasch (r = 0.564), EITC (r = 0.498), and factor analysis (r = 0.517). The four-factor scores of the KNQ also showed fairly positive correlations with the total scores of the three short-form YDS versions (r; 0.352–0.489). About five-factor scores of the factor-based version, “fever,” “cough,” “sweating-feet,” and “urine-hair” had poor or fairly positive correlations with the total and the factor scores of the KNQ (r; 0.128–0.499).

4. Discussion

In this study, we developed three short-form YDS versions using the Rasch-, EITC-, and factor-based approaches. The main finding of this study was that the reliability and predictive accuracy of the three short-form YDS versions were comparable to those of the original 27-item YDS. This indicates that the 14-item Rasch and EITC YDS and the 16-item factor-based YDS can be utilized to estimate the severity of YD or determine the presence or absence of YD in clinical cases. However, our results also suggest that caution should be exercised when prioritizing the short-form YDS according to the characteristics of each approach, clinical situation, and study purpose.

Regarding the brevity of the three short-form YDS versions, the item reduction ratio of the Rasch and EITC approaches (13/27, both) was higher than that of the factor-based approach (11/27). Therefore, the short-form YDS by the Rasch and EITC approaches may be prioritized because the two questionnaires may shorten the completion time compared to the short-form YDS by a factor-based approach. In examining the reliability of the three short-form YDS versions, reliability levels estimated by Cronbach’s α were preferable or higher. Interestingly, the final Cronbach’s α of the EITC YDS was 0.855, which was higher than the value initially predicted by the Spearman–Brown prophecy formula (0.800). Although it was possible to reduce some items with lower EITC until Cronbach’s α reached 0.800, we did not conduct additional item reduction by EITC because it might lower the predictive accuracy of the EITC YDS. Therefore, according to the Spearman–Brown prophecy formula, we determined the item number of the EITC YDS to be 14. Factor-based approaches are known to reduce the number of items while maintaining factor constructs [28]. By reducing items using the factor approach, an undesirable decrease in reliability within each factor was minimized because the items contributing to a slight increase or decrease in intrafactor reliability were primarily removed from the factor. The final Cronbach’s α for the factor-based YDS was 0.827, indicating a preferable level of reliability [25]. In the Rasch approach, a higher person separation index denotes higher sensitivity in distinguishing between high and low respondents [14, 31]. This study’s person separation index was 2.19, indicating a high-reliability level [31]. In summary, the three approaches to item reduction may not have significantly decreased the reliability of the original YDS.

In examining the predictive accuracy of the three short-form YDS versions, the AUC values of Rasch, EITC, and factor-based approaches were 0.812, 0.811, and 0.818, respectively. These values were considered as having “moderate accuracy” [32], and were similar to the AUC of 0.875 for the original YDS. Therefore, predictive accuracy equivalent to the original YDS may be expected when utilizing the 14-item Rasch and EITC YDS versions and the 16-item factor-based YDS version. However, it should be noted that among the three short-form YDS versions, the EITC YDS showed lower sensitivity (0.632) than specificity (0.875) at the maximal Youden points. One possibility is that the total scores of the 14 items in the EITC had a nonparametric distribution, which may have formed the jagged contour of the AUC. On the jagged contour, the increases or decreases in sensitivity and specificity tended to become irregular as the Youden index increases [35]. This means that for short-form YDS determined by the EITC, lower sensitivity or higher false-negative predictivity may have been barriers to the determination of YD using ROC curve analysis. Therefore, considering reliability, predictive accuracy, sensitivity, and specificity simultaneously, this study suggests using the short-form YDS version using Rasch and factor-based approaches rather than the EITC YDS.

Although both Rasch and factor-based versions showed satisfactory reliability and predictive accuracy, it should be emphasized that the short-form YDS by Rasch approach had a few advantages over the YDS by factor-based approach. Rasch analysis clarified the response category of the 27-item YDS by modifying the 7-point response scale of the original version of the YDS to five points. The response category of the five points of the short-form YDS was lower than that of the short-form Phlegm Pattern Questionnaire, where 6-point categories were finally determined using Rasch analysis [36]. After modifying the response category, 11 items with DIF regarding sex and age distribution and two items with infit or outfit errors were removed from the twenty-seven items of the YDS, and finally, fourteen items were determined. Among the fit indices, the outfit index was more sensitive to unexpected responses in items far from the person measure, whereas the infit index was more sensitive to unexpected responses in items close to the person measure [37]. Therefore, the short-form YDS from Rasch analysis may be broadly used in clinical cases, such as health checkups and epidemiological surveys, to minimize bias due to sex, aging, or unexpected responses.

In addition to the advantages of the Rasch approach, the advantages of the factor-based approach must also be described. This study showed “weak” positive correlations between the “cough” factor scores of the 16-item YDS and the “neuropsychological,” “respiratory,” and “neurogastrointestinal” factors of the KNQ. This suggests that the etiology of cough in YD may not be closely related to the etiology of dysfunctional breathing. Rather, the “fatigue” factor scores of YDS had “strong” positive correlations with the scores of “neuropsychological,” and “neurogastrointestinal” factors. This result may not guarantee the causality of YD-related fatigue with neuropsychological or neuro-gastrointestinal symptoms [13, 20]. Correlations between the factor scores of the factor-based YDS and the KNQ suggest that fatigue due to YD needs to be monitored and treated more intensively than other etiological or symptomatic factors of YD in patients with dysfunctional breathing. Therefore, the factor-based YDS may be used exclusively to examine YD’s etiological, regional, and symptomatic characteristics in diverse diseases and syndromes.

This study had some limitations. Item reduction by DIF in Rasch analysis is affected by sample characteristics, including environmental or racial differences. Therefore, another item reduction of the YDS by Rasch analysis is needed in other samples to examine the similarity or dissimilarity of the 14-item YDS by Rasch analysis in this study. It should also be mentioned that the dataset used for item reduction in the original YDS was collected from outpatients who visited Korean medical clinics, whereas the dataset used to examine the correlation between YD and dysfunctional breathing was collected from a healthy young population. Therefore, it is necessary to examine the correlation between YD and dysfunctional breathing in the patient group. In the first dataset, there were more women (130 outpatients) than men (39 outpatients), which may have affected the results of the Rasch analysis. Further studies are needed to overcome these limitations regarding sample characteristics, healthy populations, and differences in the number of sexes.

5. Conclusions

This study aimed to develop three short-form YDS versions using Rasch, EITC, and factor-based approaches. Two datasets from previous studies (169 outpatients and 237 healthy college students) were analyzed. As a result, two types of the 14-item YDS were determined by Rasch and EITC analyses. A factor-based analysis suggested a 16-item YDS consisting of eight factors. The Rasch analysis suggested a 5-point response category to correct for the disordering of responses. The three-item reduction method showed moderate predictive accuracy in the ROC curve analysis. However, the specificity of the EITC method was lower than that of other item reduction methods. Factor scores of the short-form YDS were either weakly or strongly correlated with those of the KNQ. In conclusion, the 14-item Rasch YDS may be utilized to estimate YD’s clinical severity or screen out YD for health checkups, primary care, or epidemiological surveys. In contrast, the 16-item Rasch YDS may be utilized to examine the relationship between the etiological factors of YD and other diseases.

Data Availability

The data used to support the findings of this study have not been made available because they include personal information.

Conflicts of Interest

The authors declare no conflicts of interest associated with this work.

Supplementary Materials

Supplementary Table S1: eight factors of the 27-item Yin Deficiency Scale. Supplementary Figure S1: ROC curves of the three short-form YDS versions and maximal Youden points. ROC, receiver operator characteristics; YDS, Yin Deficiency Scale; EITC, equidiscriminatory item-total correlation. A: ROC curve of the 14-item YDS using the Rasch approach; B: ROC curve of the 14-item YDS using the EITC; C: ROC curve of the 16-item YDS using factor analysis. In each ROC curve, the red dot corresponds to the point of maximal Youden index. (Supplementary Materials)