Abstract

This paper draws on data from a quantitative study of upper secondary students’ general mathematical self-efficacy, anxiety towards mathematics, and their relationship to achievement in mathematics. The main objective of this article is to discuss the type of information that may be lost if potential problems of validity and extreme multicollinearity in exploratory factor analysis would be solved by only removing variables without doing a profound analysis. We also describe a method that treats Likert items in the questionnaire as ordinal variables that may represent the underlying continuous variable. Our study shows, for example, that removal of problematic variables without a profound analysis leads to a loss of significant information about test anxiety. Our qualitative analysis of problematic variables also led to an unexpected finding regarding the relationship between general mathematical self-efficacy and motivational values in mathematics.

1. Introduction

In social cognitive theory, self-efficacy beliefs play an important role in students’ learning, and mathematics education research has many times shown a positive correlation between perceived mathematics self-efficacy and mathematics achievement (e.g., [13]).

Most quantitative studies of students’ mathematical self-beliefs are analyzed using exploratory factor analysis (EFA) or similar methods, which often are based on correlations. Despite this fact, it is difficult to find research articles that include a discussion on the “factorability” of the correlations. For example, extreme multicollinearity is considered problematic and particularly common in data coming from social science fields such as education and didactics. Yet studies often lack an analysis of the severity of multicollinearity and possible consequences of removing variables. Sufficient intercorrelation between variables is essential to conduct EFA. But too highly correlated variables (extreme multicollinearity) are a problem in EFA, since it makes it impossible to determine the unique contribution of each such variable to a factor [4]. If data show indications of extreme multicollinearity, one approach is to use a quantitative criterion such as r > 0.80 and remove one or both of the highly intercorrelating variables. Another issue relating to the “factorability” of the correlations is variables with lots of intercorrelations below 0.30. The removal of these types of variables is equally important because they will not be able to cluster in any of the factors [4, 5].

Research articles on studies using EFA, which also discuss data preparation for EFA, are rare, especially in the research areas relating to self-efficacy beliefs. However, some research articles highlight the value of dealing with extreme multicollinearity. For example, Marsh [6] present and discuss a previously conducted study [7] that had examined the relationships between self-efficacy, self-concept, and achievement in mathematics. In the mentioned study, it was incorrectly claimed that self-efficacy could predict performance best. Marsh [6] explain why the interpretation is incorrect by pointing out that self-efficacy and self-concept were very strongly correlated, the standard errors were large, and the confidence intervals were wide. Thus, there were no good reasons to claim that neither self-concept nor self-efficacy predicts the achievement best. They concluded that researchers should be very cautious with interpretations of results when extreme multicollinearity is present in data.

This paper draws data from a pilot study aiming at investigating general mathematics self-efficacy and anxiety towards mathematics among upper secondary students in a municipal school in Sweden. Data were collected using a questionnaire consisting of a set of Likert scales measuring the abovementioned latent variables. The participants’ responses (N = 79) are analyzed using, among other things, EFA.

With this paper, we want to describe and discuss what kind of challenges are related to the preparation of data—especially data of a size that is typical for many doctoral studies in didactics—for a successful EFA, including, among other things, the confronting of possible problems with multicollinearity and construct validity. More specifically, we want to investigate the advantages of including an in-depth analysis of identified problematic variables in EFA, instead of only using a quantitative criterion. Our research questions for this article are as follows:(1)What kind of information regarding students’ mathematics self-beliefs is obtained with the help of an in-depth analysis of variables with lots of low intercorrelation coefficients?(2)What kind of information regarding students’ mathematics self-beliefs is obtained with the help of an in-depth analysis of variables with high intercorrelations?

To answer these questions properly, we first begin to recall the most important prerequisites for using EFA successfully and describe shortly a theoretical framework to study self-beliefs to show what kind of challenges a researcher typically meets within such a framework. Thereafter, we represent our method for the pilot study in more detail. However, since this paper is not a report of the final findings from the pilot study, we only briefly discuss our final factor model. This discussion and the final factor model are contained in a section after the results of the present study.

1.1. Prerequisites for EFA

In the literature, there are many good introductory texts about how to conduct EFA successfully. Examples that fit also small data include [810]. Additionally, when issues related to the “factorability” of the correlation matrix, such as treatment of extreme multicollinearity and redundancy, some recommended procedures can be found in, e.g., [5] and in multivariate textbooks such as [11]. However, we found no complete guidelines on how to conduct the pretreatment of multicollinear data, possibly because of many subjective criteria. For example, one heuristic is to look for correlations greater than 0.80 and remove one or several of the highly correlating variables [4, 5]. However, a problem with that kind of heuristics is that two correlations of 0.60 may have the same effect as one of 0.80 [12]. Additionally, the remaining explanatory variables must always be theoretically grounded, which makes the decision of removing interdependent variables especially important and worth examining thoroughly.

The determinant of the correlation matrix assesses multicollinearity in data. If the determinant is less than the heuristic 0.00001, extreme multicollinearity is evident in the data, and therefore, one of the highly correlating variables should be removed. Further, Bartlett’s test can be used to test if the overall correlations are too small. But, even if the test is statistically significant, it is still essential to identify and remove variables that have lots of intercorrelations below 0.30. The motivation for the removal of those variables is that they may not be enough focused to contribute to underlying factors [4, 5].

The basis for EFA or similar methods is correlations. One of the most commonly used correlation methods is the Pearson correlation, which also happens to be the one most incorrectly used in social and behavioral research [13]. Pearson correlations assume that the variables (often Likert items) are continuous (and metric) variables. However, this preassumption is debatable, since ordinal variables are by their nature preferably ordered categorical variables, which makes the metric assumption quite controversial.

One way of dealing with the preassumption of metric variables is using the polychoric correlation, which is the estimation between two normally distributed continuous underlying variables with the aid of two observed ordinal variables [14]. The calculations of polychoric correlations are based on frequency tables, which in some cases also include zero frequency cells. This problem can, however, dealt with by applying a continuity correction, i.e., the zeros replaced by a small number such as 0.5. But the presence of many such corrections leads to unstable correlations, and this problem is particularly common in data with small sample sizes [15]. Therefore, continuity correction should be used with caution in data with small sample sizes. The correlation between Likert objects with more than six response categories has shown almost the same correlation coefficients whether using Pearson or polychoric correlation [16]. Thus, the use of Pearson correlations with ordinary data can be discussed (c.f. [17]). Still, the methodology issue of EFA with ordinal data based on Pearson correlations remains [13].

Concerning studies with small sample sizes, for example, in the field of education and didactics, there is a continuous debate on the necessary sample size to conduct an adequate EFA. A common “rule of thumb” is that a minimum of 10 participants per variable is required [4, 5]. However, EFA could still be possible and reasonable with small sample sizes, because the evaluation of reliable factor model, i.e., if the factor model is a good recovery of the population solution, mainly depends on the communalities of the variables and the number of variables per factor [1822].

1.2. Theoretical Framework

Perceived self-efficacy is defined as a personal judgment of “how well one thinks one can execute courses of action required to deal with prospective situations” [23]. Further, mathematics self-efficacy is defined as an individual’s belief about his/her ability in mathematics, i.e., the strength of the confidence in one’s ability to accomplish a mathematical task. Self-efficacy shares both similarities and differences with another psychological construct, self-concept. Self-concept is a perception of oneself, a self-description of the physical, psychological, and social self. When self-efficacy focuses on an individual’s ability in completing a specific task and concerns future-oriented conceptions, self-concept has a stronger relation to the social environment and involves the judgment of the self that is past-oriented [24, 25]. Both constructs are an important part of the overall perception of self [2].

2. Method

The data collection for this study was conducted at a municipal high school in Sweden, which has a total of about 400 students. The self-reporting questionnaire was administered to students online during four regular math lessons. The questionnaire included altogether 29 five-step Likert items assessing a respondent’s general mathematics self-efficacy and anxiety beliefs. The participating students are sixteen or seventeen years old, and they are studying at a university entrance qualification program. All participants participated in the study voluntarily. Seventy-nine students, including first- and second-year students, chose to participate in the study. The number of girls was 57, and the number of boys was 22.

2.1. Measures

The underlying constructs of students’ general mathematical self-efficacy and anxiety beliefs were assessed using a Swedish adaptation of a previous questionnaire, the Mathematical Self-Efficacy and Anxiety Questionnaire (MSEAQ).

The MSEAQ has been developed by May [26]. The pilot version of the original questionnaire consisted of items based on research literature and adapted items from previous questionnaires designed to measure mathematics self-efficacy and anxiety towards mathematics. The items that ended up in the final version were designed acknowledging the feedback received at a conference and from a group of college students in a series of interviews conducted during the pilot. Finally, May [26] conducted an EFA based on data from 109 college students. Thirteen students were interviewed to help with the interpretation of the retained factors. This led, for instance, to that the item “I believe I can think like a mathematician” was removed from the data before the final EFA, since students’ interviews revealed that the students had not interpreted this item consistently. In this study, however, we chose to include that item in the questionnaire. The other 28 items in the original MSEAQ include thirteen self-efficacy scales (MSEAQ-SE) and fifteen anxiety scales (MSEAQ-A). The original MSEAQ was considered highly reliable in terms of internal consistency: for MSEAQ-SE, α = 0.90, and for MSEAQ-A, α = 0.91. These reliability estimates are based on the responses from 61 students.

According to May [26], the conducted EFA resulted in a five-factor model: mathematics self-efficacy, grade anxiety, future, in-class anxiety, and assignment factor. The first factor represents a general mathematics self-efficacy (e.g., “I believe I can get an “A” when I am in a mathematics course”), factor 2 relates to anxieties toward grades in mathematics (e.g., “I worry that I will not be able to do well on mathematics tests”), factor 3 relates to anxieties regarding future career and mathematics courses (e.g., “I get nervous when I have to use mathematics outside of school”), factor 4 represents in-class anxiety (e.g., “I am afraid to give an incorrect answer during my mathematics class”), and finally, factor 5 represents abilities and anxieties to complete assignments (e.g., “I believe I can complete all of the assignments in a mathematics course”).

Table 1 presents the translation of the MSEAQ with the original wordings in italics. Each statement was followed by a five-point Likert scale (English wordings in italics): 1 = aldrig (never), 2 = sällan (seldom), 3 = ibland (sometimes), 4 = ofta (often), and 5 = vanligtvis (usually). In the original MEAQ, the “no response” option was also available. However, in our study, that option was not included because students were assumed to be able to assess statements concerning his/her mathematics self-beliefs.

When a questionnaire is translated into another language, there is always a risk of decreased reliability [27] which may even raise a need to change the factor model. This risk can, however, be analyzed and the possible problems can be solved by using confirmatory factor analysis (CFA). In the present study, a relevant question is whether the five-factor model of May [26] fits our data. Thus, a CFA was conducted using the lavaan package version 0.6-7 [28] in R version 4.0.2 [29]. In general, Likert items are considered ordered categorical. Consequently, diagonally weighted least squares (DWLS) with polychoric correlations and Satorra–Bentler correction were implemented to estimate the model parameters. The five-factor model fit was reasonably well (see [30] for cutoff values), with a TLI of 0.989 (robust TLI 0.978), although the RMSEA index was almost within an acceptable range with 0.079 (robust 0.114), 90% CI (0.064, 0.094), and robust CI (0.106, 0.122). If we aimed to report the final factor model, the factor model of May [26] would hence need some improvements to be classified to have a good fit. However, for this study, the above indices are sufficient to show that the items used in this study are adequate. Besides, the translated mathematics self-efficacy subscale (se) and mathematics anxiety subscale (an) were considered highly reliable in terms of internal consistency: for self-efficacy, α = 0.92, and for anxiety, α = 0.91.

2.2. Data Analysis

The sample consisted of 9% missing values and was assumed to be missing completely at random. One often-used strategy is to remove missing values completely (listwise/pairwise), if a maximum of 10% of the values is missing. However, that kind of missing data strategy can often lead to a loss in statistical power and biased standard errors, especially if removing a large part of the sample. Another frequently used method is single imputation (mean or regression), but this method often results in an underestimation of variances and overestimation of correlations [31]. A recommended strategy for dealing with missing data is, however, multiple imputations, and that was applied to these data using mice version 3.8.0 [32] with a multinomial logit model in R version 4.0.2 [29].

All statistical calculations were performed using the statistical software R version 4.0.2 [29]. Variables with high and low polychoric intercorrelations were selected. More precisely, polychoric correlations were calculated using the psych package version 2.0.8 [33] in R version 4.0.2 [29], and the correlation matrix was scanned for intercorrelations |r| > 0.60 and |r| < 0.30. These items were analyzed to identify which of these items having validity and multicollinearity issues. Further, a qualitative analysis was conducted to explore what kind of information that might be lost if the identified problematic items were removed from the data. More precisely, the qualitative analyses focused on investigating the content of each item and comparing which aspects or relations between the latent variables that are covered or lost depending on which items that would be included or excluded from the factor model.

3. Results and Discussion

In the following sections, we present the results by first discussing the low and high intercorrelations between variables and answering the research questions, and then describing the final factor model.

3.1. Low Intercorrelations between Variables

Table 2 presents the correlation matrix, and it shows that variable an24 has lots of low |r| < 0.30 intercorrelations between variables. However, variable an24 has a few moderate statistically significant intercorrelations, e.g., the correlation with variable an6 (r = 0.46), variable an8 (r = 0.36), and variable an25 (r = 0.45). When students assessed statement an24, more than 50% of the students reported that they never or seldom worry that they will not be able to get an “A” in the mathematics course. In contrast to an24, more than 50% of the students reported that they often or usually worry about getting a good grade, or worrying doing well on tests. Additionally, concerning variable an25, more than 50% of the students reported that they sometimes or more frequently think about the statement “I worry that I will not be able to learn well in my mathematics course.” This result shows that variable an24 is interpreted differently depending on a student’s ambition when it comes to the course grade. However, this may also be an indicator of students’ general self-belief in mathematics, since students worry about getting a good grade but not as much to get an “A,” the highest grade. Consequently, the removal of variable an24 from EFA without an in-depth analysis would result in a significant loss of information associated with student’s mathematics self-beliefs.

Table 2 also contains other low or nonexisting correlations worth mentioning. For example, one might expect that the correlation between variable an24 and variable se13, “I believe I can get an “A” when I am in a mathematics course,” (r = 0.16, ) would be negative. Variable an24 is related to mathematics anxiety, and variable se13 is related to mathematics self-efficacy, and previous research has shown that the constructs are strongly correlated. However, the correlation is not statistically significant in these data. This can be considered as a significant finding which however would be lost if variable an24 were automatically removed only by the criteria based on |r| < 0.30.

However, the variable an24 is somewhat problematic. Because the interpretation of the statement can take place in many different ways; some students do not pay much attention to their grades but are only interested in passing the course. Other students think that they cannot get the highest grade even if they wanted to. Thus, most students worry about getting a grade that is good for them but not necessary “A”. Consequently, the interpretation of the results from variable an24 is ambiguous. Further, the nonstatistically significant correlation between variables an24 and se13 may indicate ambiguity in the students’ assessment of the statement: “I worry that I will not be able to get an “A” in my mathematics course”. Therefore, we suggest the removal of variable an24 from EFA, but the observation of students’ vague interpretation of this statement contributes to an understanding of students’ mathematical self-beliefs. This finding would not have been possible without an in-depth analysis of the variables.

3.2. High Intercorrelations between Variables

The corresponding determinant of the correlation matrix is much less than the critical value of the abovementioned heuristics (0.00001), which indicates severe problems with multicollinearity. Therefore, the correlation matrix was scanned for very high intercorrelations. Several problematic variables were identified, especially, an24, se21, an22, an17, an2, an8, and an26.

Table 3 shows the correlation matrix with only the correlations higher than 0.60. Several pairs of variables were identified having very high intercorrelation coefficients, e.g., an26 and an2 (r = 0.85), an22 and se21 (r = −0.82), an22 and an17 (r = 0.79), and an2 and an8 (r = 0.75), respectively. However, to make adequate decisions about the removal of variables required an in-depth analysis.

Variable an26 correlates highly with both an2 and an8, and therefore, the straightforward procedure should be to remove all three variables, except for one. But, since all these variables are associated with a specific dimension of mathematics anxiety: test anxiety, the removal of two of them might result in a too scarce description of mathematics anxiety. Concerning variables an2 and an8, the students seem to have interpreted the variables an2 and an8 in the same way, as asking for the same thing, i.e., an indication of redundancy variables. When students are tense in the preparation of mathematics tests, it is plausible that the students are also worried about taking the mathematics test. Besides, the high intercorrelation between an2 and an26 shows that if a student is nervous or tense to prepare for tests, he/she is probably also worried about taking the test. These noted issues in our data indicate a problem with multicollinearity. Although we are risking a too scarce description of test anxiety, we suggest the removal of two of the three variables (an2, an8, and an26), i.e., the variables an2 and an8.

Our data show that students who are tense in the preparation of a mathematics test are more likely to be worried about taking the mathematics test, which shows an association between the perceived mathematics anxiety during the preparation for a mathematics test and the actual taking of the mathematics test. Besides, this might give some clues for future studies about the sources of students’ test anxiety. Consequently, removing very high correlating variables without a profound analysis would result in a loss of information about student’s test anxiety.

Further, concerning variable an22, more than half of the students responded that they sometimes or more often worry that they will not be able to understand mathematics. In light of this, an interpretation of the collinearity between variables an22 and an17 (r = 0.79) might be that if a student worries about being able to understand mathematics, it means that the student is probably also worried about whether he or she knows enough mathematics to succeed with future courses. This observation might indicate that variables an22 and an17 are redundant variables. Moreover, students could report being less worried about doing well in future courses if they are studying their last mandatory course in mathematics since they have decided not to attend them. This interpretation can be an indication of the low validity of variable an17. Therefore, together with the fact that an17 correlates highly with an22, we suggest the removal of variable an17 from EFA.

A similar validity problem, as mentioned above, concerns variable se21: “I feel that I will be able to do well in future mathematics courses,” which also has a high intercorrelation coefficient with variable se19 (r = 0.76). Less than half of the students reported that they sometimes or more often feel like they can do well in future mathematical courses. Besides, variables se21and an22 have a high intercorrelation coefficient (r = −0.82), which also supports the interpretation of the variables an17 and an22 as redundancy variables. Hence, we suggest that also variable se21 should be removed from EFA.

However, even if the correlation is high, we sometimes conclude that no variables should be removed. Let us consider, for instance, se28 and se9 (r = 0.75). Additionally, the correlation between se9 and se28 is positive. Now, most of the students reported that they never or seldom believe they could think like a mathematician, and they never or seldom consider themselves a person who is good at mathematics. These two items represent different dimensions of an underlying construct because a person who believes in him/herself as good at mathematics does not automatically mean that the same person also thinks she/he can think like a mathematician. The variables se9 and se28 are all related to general mathematics self-efficacy and measure different dimensions of an underlying construct. Although the intercorrelation coefficient is high, we conclude that the variables should not be removed from EFA. This high correlation rather indicates that students believe that a person who is good at mathematics is a person who also can think like a mathematician. Consequently, students possibly have a preconception about what it takes to be good at mathematics, and an indication of students’ static beliefs about intelligence [34], and could thus be a symptom of students’ helplessness towards getting a high grade in the mathematics course.

3.3. The Final Factor Model

For the reader’s interest, we also report shortly which factor model our EFA resulted in, and how this model was found, given the data collected for the pilot study.

First, the identified problematic variables an24, an17, se21, an2, and an8 were removed from EFA. After that, the Kaiser–Meyer–Olkin (KMO) measure verified the sampling adequacy for factor analysis KMO = 0.86, which is “great” according to the literature. All the KMO values for the items were over 0.74, which is well above the acceptable minimum of 0.50 [35, 36]. Further, Bartlett’s test of sphericity, , showed that correlations were large enough for EFA. For this study, parallel analysis and scree plot were used for factor retention.

The scree plot (see Figure 1) shows some ambiguity in the interpretation of how many factors to retain, however, based on the parallel analysis (see Table 4); four factors were retained. Since at the fourth simulation, the eigenvalue calculated from the actual data is below the eigenvalue calculated from the simulated data. Further, since these factors are very likely to be correlated because of the strong relationship between mathematics self-efficacy and anxiety towards mathematics, the oblique rotation was used.

Furthermore, during the factor analysis, two variables were excluded: an11 and an3 because of low communality (0.40) (c.f. [8]). The significant loading cutoff was 0.55 (cf. [4, 10]). The final factor model presented in Table 5 was produced using the minimum residuals method with oblique rotation (oblimin). The corresponding structure matrix is presented in Table 6. The retained four factors in the final factor model could explain 65% of the variance (see Table 5).

A measure of factor model fit is the sum of the squared residuals divided by the sum of the correlations, which for this factor model is 0.99 (values over 0.95 indicate a good fit [4]). Besides, fewer than 50% of the factor residuals have absolute values greater than 0.05, and the factor residuals are approximately normally distributed (see Figure 2). All these measures indicate a good factor model fit [4].

The items that load on the same factor suggest that factor 1 represents general mathematics self-efficacy, factor 2 abilities and anxieties to complete assignments, factor 3 anxiety towards mathematics in classroom settings, and factor 4 anxiety towards evaluations, which we label test anxiety. The first two subscales had high reliabilities with α = 0.89 and α = 0.84, and in-class anxiety and test anxiety subscales had α = 0.77 and α = 0.74.

4. Conclusions

The qualitative criteria that we used above for removing items from EFA were based upon the indications of redundancy or low validity. Redundancy concerns variables that are interpreted in the same way by the respondents, and a variable with low validity does not seem to measure what it was intended to measure. For example, variable an24, “I worry that I will not be able to get an “A” in my mathematics course,” was removed from EFA because students might assess the item differently depending on which grade they have set as a goal to themselves. Hence, we concluded that the item has low validity. Further, variable an24 has lots of correlation coefficients below 0.30. Therefore, if we instead had decided to keep variable an24, it probably would have trouble loading into the latent variables.

Our findings show that significant information can be reliably measured with a lesser number of items. For example, although two problematic variables related to test anxiety were removed from EFA, test anxiety appeared as a meaningful factor with some unexpected but interpretable loadings in the final factor model. The item “I get nervous when taking a mathematics test” has the strongest loading in the test anxiety factor, but the item “I worry that I will not be able to use mathematics in my future career when needed” also loads on the same factor and has to do with situations where students mathematics knowledge is evaluated. Also, the item “working on mathematics homework is stressful for me” loaded on this factor and this may be interpreted so that students mainly work on mathematics homework when preparing themselves for an exam.

Although in this study we have presented an adequate EFA based on a small sample size, we acknowledge that large sample sizes most often contribute to better generalizability [8]. Concerning the present data, a more relevant question is what makes a sample size adequate for analysis. The answer seems to depend on how many variables represent each factor and the strength of variable loadings and communalities. By the literature we have referred to, data must be strong, i.e., every factor has to be related to, at least, four items with high loadings and high communalities and there should not be any cross-loadings [8, 22]. All extracted factors in the final factor model above have items with high loadings. However, only the first two factors have at least four items with high loadings and communalities. Although there are some nonsignificant “cross-loadings” in the factor model, data can be considered strong in light of typical acceptable factor models.

Multicollinearity is common in data that have been collected for studies in social sciences. In most cases, it is not taken as a big issue. Researchers simply have to cope with some degree of multicollinearity. One way of decreasing the degree of multicollinearity is to increase sample size, which in most cases decreases the standard errors. Still, to increase the validity of discovered latent variables, it is important to assess the severity of the multicollinearity and making a profound analysis of problematic variables. Regardless of sample size, extreme multicollinearity is problematic and could in the worst cases result in an ambiguous factor model (c.f. [6].)

A thorough quantitative study often includes the usage of EFA or another method suitable for constructing sum variables that represent the latent variables. However, before conducting an EFA, a pretreatment of data is required to make data more adequate for EFA. That includes a profound analysis of the correlations between variables and possible removal of problematic variables. This procedure can be a complex process and, in many cases, requires decisions that have to be taken on subjective criteria, for example: What is the adequate sample size for EFA? What quantitative criteria are suitable for identifying variables that are the source of extreme multicollinearity? We showed in this study that there are no general objective answers to these questions, and we emphasize the importance of analyzing data qualitatively when we make this type of subjective decision.

Data Availability

The anonymised version of data used in this study is available upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.