Abstract

Inarguably, saving is very important for the life of a senior citizen. Artificial neural network (ANN) and multiple linear regression (MLR) analyses have been successfully used to predict and analyze factors affecting the savings of people in several regions of the world. Many studies concluded that ANN is more efficient than MLR. However, some studies concluded that MLR is more efficient. To investigate this issue further, this study directly compared the efficiencies of unoptimized ANN and MLR in predicting and analyzing factors affecting the savings of people in the central region of Thailand in 2019, based on secondary data from a household socioeconomic survey, i.e., the National Statistical Staff Household Income Survey. The data were collected from January 2019 to December 2019 from questionnaires distributed to samples of households. The savings of people in the 25 provinces of Thailand were investigated with MLR and unoptimized ANN. Their prediction efficiencies were compared in terms of root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), and processing time. The results showed that for all categories of savings—savings of low-, middle-, and high-income households—MLR was faster in processing time. It also provided a lower RMSE and a higher R2 than the unoptimized ANN. Nevertheless, unoptimized ANN provided a lower MAE than MLR for the savings of low- and high-income household data. The most important factor affecting the savings of low-, middle-, and high-income households was the factor of deposit interest, bond, share dividends, and other types of investment.

1. Introduction

The meaning of savings in the Royal Institute dictionary is conserving, economizing, and preserving [1]. Saving means the careful use of property and money, which are two major factors of living. Savings money is important because it ensures security for the savers and contributes to economic stability. Savings are important to the economy because they support people’s wellbeing, families, and communities. Management of finance and savings is essential in building stability and security, in terms of the livelihood of the people, families, communities, and nations. In addition to the economic benefits to the country and the livelihood of individuals, especially to elderlies, savings also have social, cultural, and educational benefits. In education on savings, member groups must learn the principles and practices as well as the potential outcomes of the savings to be motivated to save. Saving process is educational. People self-train to know how to save [1].

Saving is important for future needs because one may never know for certain whether the source of income would be as secure as it has been. Regular income may disappear, but living expenses still remain. The main objective of saving campaigns is to make people aware of the importance of saving and learn how to save money constantly to maintain life stability [2].

After Thailand’s economic crisis in 1997, the situation of Thai people’s savings slowly improved. Thailand was able to gradually repay the loan to the International Monetary Fund and developed from just surviving to becoming sustainable [3].

Savings are closely related to the actual earned income and household consumption. Such income is considered household income and can be used for actual expenses. Households allocate this income for consumption. The remaining money is then kept as savings. Savings are closely related to the theory of consumption. It is believed that households with certain consumption expenditures and income levels find it difficult to reduce their consumption expenditures when their income levels are reduced. The amount of consumption expenditure over a period of time was based on the past, present, and future income expectations over a lifespan [4].

There exist several statistical procedures for finding the relationship between factors affecting the savings of people in some regions and the actual savings of those people. One of the popular methods is MLR analysis. There are also several machine learning procedures for predicting a quantitative variable. The ANN method is a popular one [5]. MLR and optimized ANN have been used to predict the outcomes of several fields of application, and many studies have reported that ANN performed better than MLR. Some studies, however, reported in contrary. Since an optimized ANN has to undergo many processing rounds, which takes time and effort, we thought of comparing unoptimized ANN and MLR, to see which one would be suitable for rapid screening purpose, i.e., to see the power of raw ANN and MLR.

Therefore, our research objectives were to compare the prediction efficiencies of MLR and unoptimized ANN and to analyze factors affecting the savings of people in the central region of Thailand. The comparison was in terms of RMSE, MAE, R2, and processing time. The analysis used secondary data from the National Statistical Office on the Household Socio-Economic Survey 2019. The survey was conducted every two years. Data were collected from January 2019 to December 2019 with questionnaires and interviews of sampled households.

Our contributions to the field of statistics and economics include the following points: (1) MLR analysis was found to be superior (faster preparation and processing time, lower RMSE, and higher R2) to the unoptimized ANN method in processing data on the savings of people in the central region of Thailand and (2) deposit interest, bond, share dividends, and other types of investments were the major factors affecting the savings of people in the central region of Thailand in 2019.

2. Literature Review

In 2019, factors affecting the savings behavior of people in Songkhla, Thailand, were studied. The aim of that study was to use MLR to determine factors affecting the savings behavior of people. The research results demonstrated that the factors affecting savings were personal factors, including gender and age. In addition, macroeconomic factors in monetary policy significantly affected the savings behavior of the people [6]. In this same year, economic factors affecting the household savings of people in Thailand were studied based on MLR analysis. The objective was to determine factors affecting household savings. The study showed that the economic factors affecting the household sector savings included inflation, long-term stock funds, and national saving fund [7].

Two years later in 2021, a study was conducted to identify and estimate the main determinants of household saving behavior in rural Ethiopia. The authors analyzed the data using MLR. Their findings suggested that household disposable income, education of household head, number of income earners in the family, and livestock ownership had a statistically significant positive effect on household savings. Similarly, family size, participation in off-farm activities, and distance from the data collection center had a statistically significant negative effect on household savings [8]. Another study based on MLR analysis aimed to investigate factors affecting Thailand’s household savings and saving behavior, using data collected from the Household Socio-Economic Survey, 2016, of Thailand’s National Statistical Office. The results indicated that household savings were affected by regional factors. Positive factors affecting the cumulative savings were the age of the head of the family, average monthly income, digital expenses, records of income and expenses, and retirement saving plans. Average monthly debt payment and number of family members negatively impacted household savings [9]. In the following year, another study used MLR to investigate the determinants of household savings in a model. Fixed-effect least squares and two-stage least squares estimation procedures in MLR were applied to data from 14 countries spanning the period 2000–2018. The analysis presented some evidence that social security affected the savings significantly but not the interest rate or old-age dependency ratio [10].

In the current year, 2023, household savings and negative interest rates in many countries were investigated. The objective of that study was to analyze the determinants of household savings in a model. An MLR’s fixed-effect least squares estimation procedure was used to analyze a set of data from 20 countries in the period of 2000–2020. The analysis provided evidence that the negative interest rates led to a statistically and economically significant increase in savings. The positive effect of income uncertainty and lagged saving rates was smaller with negative interest rates [11].

ANN has been used in many fields of application. Examples of recent studies are prediction of the outbreak of coronavirus disease (2019), prediction of air quality, and prediction of air pollution. An optimized ANN model was developed to predict confirmed cases and deaths from COVID-19. The best prediction performance, in terms of RMSE, R, and MAE, was realized using past 7 days’ cases as input variables in the training and testing dataset. The ANN model would be suitable for predicting confirmed cases and deaths of COVID-19 in the time afterward. The predicted confirmed cases and deaths of COVID-19 were very close to the actual confirmed cases and deaths [12]. Another example is the prediction of air quality. ANN was a significant method for protecting public health because it could provide early warning of harmful air pollutants. The objective of that example was to use ANN and wavelet ANN (WANN) to identify the linear and nonlinear associations between the air pollution index (API) and meteorological variables. The research results demonstrated that WANN (R = 0.8846 for Xi’an and R = 0.8906 for Lanzhou) performed better than the ANN (R = 0.8037 for Xi’an and R = 0.7742 for Lanzhou) during the forecasting stage. WANN was effective in short-term API forecasting because it could recognize historic patterns and thereby identify nonlinear relationships between the input and output variables [13].

The final example is the prediction of particulate matter air pollution (PM2.5 and PM10). Fine particulate matter (PM2.5) affects climate change and human health. A study was conducted to use an optimized ANN to predict monthly PM2.5 concentration in Liaocheng, China, from 2014 to 2021. The ANN employed in the study contained a hidden layer with 6 neurons, an input layer with 11 parameters, and an output layer. The ANN achieved a high prediction performance in terms of R (0.9570), MAE (4.6 µg/m3), and RMSE (6.6 µg/m3) [14]. In the year after, two similar studies were conducted. ANN and WANN were used to predict daily PM2.5 concentration in Shanghai, China. The results show that the optimal input elements for daily PM2.5 concentration-predicting models were the PM2.5 from the previous 3 days and fourteen meteorological elements. It was emphasized that the WANN model obtained optimal prediction results in terms of R (0.9316) [15]. Finally, accurate prediction of air pollution is a difficult problem to be solved in the field of atmospheric research. ANN was exploited to predict hourly PM2.5 and PM10 concentrations in Chongqing, China. Thirteen kinds of training functions to obtain the optimal function were compared. The ANN model exhibited good performance in predicting hourly PM2.5 and PM10 concentrations. The forecast results would support fine management and help improve the ability to prevent and control air pollution in advance, accurately and scientifically [16].

Regarding papers comparing MLR and optimized ANN, in 2020, a paper reported that MLR and ANN models were applied to public spending execution in Peru. The aim of that research was to use MLR and an ANN model with multilayered perceptron to determine the influence of spending execution on the regional government’s public budget. The determination coefficient R2 was 95.9% for the MLR model, which was slightly better than 95.3% for the ANN model. ANN and MLR models obtained very similar results, achieving good models [17]. Another study, a comparative study of MLR analysis and the back propagation ANN method for predicting the financial strength of banks, was conducted in India. The main objective of that study was to forecast the performance of Indian banks. The two methods were compared of their prediction accuracy. Financial data spreading over 10 years from 2010 to 2019 were collected from 19 Indian public sector banks. The data consisted of 17 financial ratios collected from financial statements and other publications of the sampled organizations. Significant ratios that were determinants of the Capital Adequacy Ratio (CAR) were identified by MLR; then, these identified ratios were used as the input for the ANN model. MLR analysis identified 7 financial ratios that had a positive relationship with the dependent variable (CAR). These 7 independent variables were used to predict the financial strength of CAR of the banks. Then, a feed-forward back propagation ANN model was developed with these 7 independent variables to predict the CAR. Finally, the performances of these two methods were compared in terms of MSE, RMSE, and MAPE. The result was that the ANN model scored an improvement of 55.67% in MSE over the MLR model, 33.425% in RMSE over the MLR model, and 99.32% over the MLR model in MAPE [18].

Another study in 2021 compared MLR and ANN in bank performance prediction: a study of 11 Botswanan banks. Return on assets was used as the dependent variable, while management quality, credit risk, liquidity, financial leverage, and capital adequacy were the independent variables. When using MLR, the cost-to-income ratio and the loan loss provision to total loan ratio were found to be the two most significant drivers of bank performance. ANN achieved an R2 value of 84.37% which was significantly higher than the R2 value of 70.00% for MLR. ANN also showed a better predictive ability in terms of MAE and MSE [19].

In the last year, 2022, a study compared MLR and ANN predicting BYD stock price. BYD was a leading company in the new energy industry. That paper concluded that the backpropagation ANN had a better explanation ability than the MLR model [20]. Finally, a study investigating factors affecting savings of Gen Y in Bangkok, Thailand, was conducted. One of the study’s objectives was to investigate the factors affecting the savings of Gen Y in Bangkok, using quantitative research and survey research methods with a questionnaire for 400 samples, which is selected by simple random sampling. The inferential statistic used was MLR. The results showed that Gen Y saved 4,378.03 baht per month with the major purpose of saving for spending in an emergency situation [21].

All of the abovementioned works that compared MLR and optimized ANN concluded that ANN was better than MLR in terms of MAE. However, for rapid deployment, ANN might not be better since it needed more time to run through a lot of processing rounds to optimize its parameters. Therefore, we wished to compare these two methods in terms of processing time in addition to various prediction error measures.

3. Materials and Methods

3.1. Data Collection

This research is based on secondary data from the National Statistical Office, Government Complex Commemorating Majesty the King’s 80th Birthday Anniversary, Ratthaprasasanabhakdi Building, 2nd Floor, Chaeng Watthana Road, Lak Si, Bangkok, Thailand, on the Household Socio-Economic Survey 2019 (Household Income Survey), which was carried out every 2 years. The reason for using these secondary data is that we could not collect primary data nationwide in Thailand ourselves. This is because the research funding that we were able to procure was quite limited. Therefore, the research team had to rely on secondary data from the public sector of the National Statistical Office, which were publicly available data. The previous research under this project was conducted according to current research standards, which had been updated every 2 years.

Data were collected from January 2019 to December 2019 from a questionnaire distributed to a sample of households. The National Statistical Office had already collected data every two years to track the trend of savings in the country, whether they were increasing or decreasing. The data collected before that were from January 2017 to December 2017, and the data collected after that were from January 2021 to December 2021, which were similar in nature.

The questionnaire had 40 questions, with many types of question, including quantitative and qualitative questions. Initially, more questions were considered for analysis about topics that were expected to affect savings. However, some questions were answered in a variety of ways and could not be categorized into smaller groups, such as questions about occupation. Some questions were not answered completely, causing a large amount of missing values, such as the number of rooms in the households. The research team, therefore, ignored the responses to those questions. The selected questions were suitable to eliminate bias because they do not have much influence on the respondents, such as the number of people gaining income from work, the number of household members, and the number of household members who were not working. The other questions were similar in nature.

The data were collected from January 2019 to December 2019 with a questionnaire to interview people in the central region of 25 provinces of Thailand selected as sample units. The National Statistical Office was sampling using a stratified two-stage sampling. The provinces in the central region were divided into 25 stratums, each of which was divided into 2 substratums according to the characteristics of municipal and nonmunicipal areas. Each substratum has an enumeration area (EA) as the sample unit for the first step and the private household as the sample unit for the second step. In that study, the authors were interested only in the central region comprising 1,902 EA, the population, divided into 900 EA municipal areas and 1,002 EA nonmunicipal areas.

The samples were 1,256 EA in the central region, where 616 EA were municipal areas, and 640 EA were nonmunicipal areas. The total number of private household samples was 15,640, including 9,240 municipal and 6,400 nonmunicipal households. The researchers selected only 11,586 sample households with complete information, divided into 2,191 low-income households savings (Y1: the first quintile), 6,994 middle-income households savings (Y2: the second quintile to the fourth quintile), and 2,401 high-income households savings (Y3: the fifth quintile).

3.2. Variables

The method of selecting independent variables was based on a review of previous literature on variables that affected household savings. Most of the variables in the questionnaire were selected for analysis, except for some variables, such as occupation. This is because there were a lot more occupations in Thailand, exceeding 10. If occupation was converted to a dummy variable, it would have many levels, making it difficult to analyze and interpret the results. Therefore, the research team excluded the occupation variable. A limitation of multiple regression analysis is that the number of independent variables should not be too many, especially qualitative variables, as this can make the interpretation difficult. The research team therefore selected three important qualitative independent variables, excluding occupation. In addition, some of the independent variables were not fully answered by respondents, resulting in a large number of missing values, such as the number of rooms in the households. To conclude, 26 variables were chosen and analyzed based on significance and practicality in order to gain the most accurate information from responders.

The dependent variables in this research consist of three levels: savings of low-income households (the first quintile) in the range of 638−11,037 baht (Y1); savings of middle-income households (the second quintile to the fourth quintile) in the range of 11,043−37,767 baht (Y2); and savings of high-income households (the fifth quintile) in the range of 37,771−683,347 baht (Y3). The twenty six independent variables were twenty three quantitative variables and three qualitative variables, as shown in Table 1.

3.3. Data Partition and Data Analysis

The authors used SPSS Statistical Package to divide the dataset into low-, middle-, and high-income levels. The complete dataset consisted of 11,586 households, divided into 2,191 low-income households, 6,994 middle-income households, and 2,401 high-income households. As a training dataset, 1,557 low-income households (70%) were selected for modeling and 634 households (30%) were used as the testing dataset for the prediction. Similarly, as a training dataset, 4,887 middle-income households (70%) were selected for modeling and 2,107 households (30%) were used as the testing dataset for the prediction. For high-income households, 1,695 households (70%) were selected as the training dataset for modeling and 706 households (30%) were used as the testing dataset for the prediction. The multiple linear regression analysis was carried out using SPSS Statistical Package, and the unoptimized artificial neural network method was carried out using the WEKA package. As a matter of fact, the authors had intended to use WEKA package for both multiple linear regression analysis and unoptimized artificial neural network method, but WEKA was not able to verify some assumptions, such as multicollinearity, autocorrelation, and homoscedasticity. Therefore, we choose to use SPSS to examine the assumptions and analyze the results of multiple linear regression analysis instead of WEKA.

3.3.1. Multiple Linear Regression Analysis

MLR is a relationship analysis of multiple variables. It consists of a quantitative dependent variable and k independent variables. The independent variables may all be quantitative variables or may be both quantitative and qualitative variables. The relationship between the dependent variable and the independent variables is linear [22].

Operationally, MLR is a method for selecting independent variables that significantly affected a dependent variable. The method provides a smaller number of independent variables that significantly affect a dependent variable. A stepwise regression was used to select the independent variables because it was popular. A capability of MLR featured in this research is that it was able to find factors that affect people’s savings in the central region of Thailand. The qualitative independent variables had to be converted to dummy variables before the data were imported into the software package. The quantitative independent variables and the dependent variable could be imported directly into it.

A flowchart of the overall MLR investigation and analysis steps is shown in Figure 1. The first step in the flowchart is the assumption checking before constructing the regression. The things that needed to be checked were the correlation among independent variables and the correlation between an independent variable and a dependent variable [22]. The dependent variable was checked for normal distribution using the Lilliefors significance correction [22]. In the case that the dependent variable did not have a normal distribution, it could be corrected by transforming the data using the Box–Cox transformation method [23]. The independent variables were checked for multicollinearity using the variance inflation factor (VIF). If VIF was greater than 10, then the independent variables in the model had multicollinearity [24]. In the case that the independent variables had high multicollinearity, this could be solved by a factor analysis method [22].

Second, data were used to construct a regression equation with a stepwise method [25]. Third, prediction efficiency was compared in terms of root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), and processing time.

Finally, assumption was checked after constructing the regression. The residuals were checked for normal distribution using the Lilliefors significance correction [22]. If the residuals did not have a normal distribution, then they could be transformed by the Box–Cox transformation method [26]. The residuals were checked for independence using the Durbin–Watson method. If the residuals were in the range of 1.5–2.5, they were independent [22]. If the residuals were not independent, they could be corrected by taking to adjust the regression model so that there would be no relationship between and [27]. The residuals were checked for equal variance using the graph between the standardized residual and predicted value [28]. In the case that the residuals did not have equal variance, this could be corrected by a weighted least square (WLS) method [22].

3.3.2. Artificial Neural Network Method

ANN is a computer technique that simulates human brain with the ability to learn, recognize, and classify things. ANN’s processing mimics the function of the brain and transmits information among neurons, with many neuron connections and parallel processing. A backpropagation algorithm is used to teach a multilayer perceptron ANN. Backpropagation algorithm is a popular ANN model because it can solve linear and nonlinear problems [5]. The ANN used in this study worked with a quantitative dependent variable. Many studies have compared ANN to multiple linear regression analysis (MLR) and found that it produced good results.

Operationally, ANN is a method of multiplying the input data by the weights of each input data path. The results from the nodes in each input data path will be passed on to a neuron to combine the data values using a combination function. The neuron will then send the output data to an activation function to adjust the values to the desired range. After that, the output data will be sent out as the input data for neurons connected in the next layer of the neural network. The output node contains the processed output. The importance of ANN in this research is that it had the capability to find factors that affect people’s savings in the central region of Thailand. Both the qualitative and quantitative independent variables and the quantitative dependent variable could be imported directly into the software package.

MLR and ANN are generally used as a predictive technique in many fields of application, such as in engineering, science, economic, education, social science, and medicine. More detailed examples are such as the following various kinds of prediction: outbreak of coronavirus disease 2019, bank performance, financial strength of banks, stock price, air quality, air pollution, public spending execution, savings of Gen Y, and household savings behavior. The household savings behavior and factor affecting savings observed in this study would be cross-checked with those observed in several previous studies to find out whether they support this study or not.

A flowchart of the overall ANN steps is shown in Figure 2. First, the structure of ANN was established: the number of hidden layers, the number of nodes in the input layer, hidden layers, output layer, and the type of activation function [29]. Our unoptimized ANN had 27 input nodes according to the number of independent variables, 23 hidden nodes, and 1 output node for predicting the savings of low-income households. It had 27 input nodes, 25 hidden nodes, and 1 output node for predicting the savings of middle-income households and high-income households. Learning rate, momentum, and number of training iterations were set to 0.2, 0.8, and 10,000, respectively [30].

Second, the data were separated into 2 sets. The first dataset was the training dataset, which was 70 percent of the total dataset. It was used to train the given network. The second dataset was the testing dataset, which made up 30 percent of the total data set. It was used to predict the outcome and evaluate the performance of the network [5, 30]. Then, the data were imported into the nodes of the input layer. The data were pushed from the nodes of the input layer to the nodes of the hidden layer. The sum of all nodes was calculated in the hidden layers. The values of the nodes in the input layer were multiplied by the weight of each connection line [31]. The sum of all nodes in the hidden layers was adjusted with a sigmoid activation function [31, 32]. After that, the sum of the nodes in the output layer was computed using the sum function, which multiplies the values of the nodes in the hidden layer by the weight of each connection line [31]. The sum of all data in the output layer was then adjusted with a linear activation function. The errors in the output layer were calculated from the output values, compared to the target values [31]. The weights of the connection lines between the nodes in the output layer were then adjusted so that the training dataset has the minimum RMSE [30] and MAE [33]. Then, the process was repeated until the errors in the output layer reached the specified minimum threshold or the assigned number of iterations.

The RMSE and MAE were calculated for each iteration. Then, the validity of the neural network was verified by applying the weights obtained from the network training to validate the testing dataset. The RMSE and MAE obtained on the training dataset were compared with those obtained on the testing dataset. If the data values were very different, this can be corrected by trying a new weight setting or redesigning the neural network [30]. Prediction efficiency was compared in terms of RMSE, MAE, R2, and processing time. Finally, the obtained network was used on the testing data to find the predicted value [30].

3.4. Prediction Efficiency Comparison

Multiple linear regression analysis and unoptimized artificial neural network methods were compared of their prediction efficiencies in terms of root mean square error, mean absolute error, coefficient of determination, and processing time.

3.4.1. Root Mean Square Error (RMSE)

Root mean square error is based on the same principle as statistical variance. Measuring square root of error with this method yields a relatively high error since the error is squared at any time before the sum is taken and the mean is calculated. At the end, the square root is removed. The smaller RMSE indicates a more accurate prediction. The formula is as follows:where is actual value and is predicted values [30]. Many studies used RMSE for prediction efficiency comparison because the dependent variable was quantitative [3436]. Therefore, the researchers chose to use RMSE. We also chose to use other measures such as mean square error (MSE). MSE is the same as RMSE, but MSE is obtained by taking the square of RMSE. The simplest method for estimating the accuracy of a model is using MSE. The lower the MSE, the better the model. MSE is a good measure because it consists of both bias and variance [37]. It is a function of estimation error and model complexity (i.e., degrees of freedom) [38].

3.4.2. Mean Absolute Error (MAE)

Mean absolute error is the mean of the absolute error of . Mean absolute error measures how close the predicted value is to the actual value. The formula is as follows:where is actual value, is predicted value, and n is sample size [33].

3.4.3. Coefficient of Determination (R2)

Coefficient of determination is the proportion or percentage that the independent variables explain the variation in the dependent variable. The formula is as follows:where SSR is the variation of Y due to the influence of X1, X2, …, Xk, and SST is the total variation [39].

3.4.4. Processing Time

Processing time is the time period (in seconds) from the start of processing to the end of processing either ANN or MLR. The simulation running process was manually timed with a digital watch (Solvil et Titus, Switzerland) by the author from start to finish and the measured time interval was taken as the processing time. A shorter processing time is considered more efficient in predicting the savings of people in the central region [40].

3.5. Investigational Steps

A flowchart of the overall investigation steps is shown in Figure 3: data collection, data partition, data analysis, and prediction efficiency comparison. Data collection collected and divided the data into 3 groups of low-, middle-, and high-income household savings, to differentiate among households of different incomes. The first dataset was the training dataset (70 percent of the entire dataset). It was used for training the given method. The second dataset was the testing dataset (30 percent), which was used for predicting the data to compare the performance of the network. Data analysis consisted of applying MLR and unoptimized ANN to the data. Prediction efficiency comparison was in terms of lower root mean square error (RMSE), lower mean absolute error (MAE), higher coefficient of determination (R2), and shorter processing time.

4. Results

4.1. Results of the Statistical Data Analysis of the Underlying Variables

The household saving data of people were collected from the secondary data of the central region of Thailand. Descriptive statistics for the savings of low-, middle-, and high-income household are shown, categorizing as statistics of various independent variables and dependent variable in Tables 2 and 3.

Table 2 shows the mean and standard deviation of low-, middle-, and high-income household savings classified by independent variables.

Table 3 shows that sex (X15), marital status (X17), secondary school and high school education levels (X18-1), and vocational, diploma, bachelor’s, and master’s degree education levels (X18-2) were the qualitative variables. Both X15 and X17 were nominal scale, while X18-1 and X18-2 were ordinal scale. The frequencies and percentages for low-, middle-, and high-income household savings of these four qualitative variables are shown in Table 3.

Descriptive statistics in the form of histogram for savings of low-, middle-, and high-income households are shown in Figures 46, respectively.

Figure 4 shows that the histogram of 2,191 low-income household savings is skewed to the right.

Figure 5 shows that the histogram of 6,994 middle-income household savings is skewed to the right.

Figure 6 shows that the histogram of 2,401 high-income household savings is only slightly skewed to the right.

4.2. Analysis and Results of Multiple Linear Regression
4.2.1. Savings of Low-Income Households

The results on savings of low-income households were obtained from the testing dataset for 634 low-income household savings (the first quintile, 30 percent).(1)Check the multiple linear regression assumptions before constructing the regression equation.(1.1)The dependent variable of savings of low-income households had a normal distribution.The savings of low-income household variable was tested for a normal distribution. It was found that the Lilliefors test statistic was 0.381 and value was ≤0.001 (<), so it was not a normal distribution (not shown). Therefore, the savings of low-income household variable were transformed using the Box–Cox transformation method. It was found that was 0. We, then, chose of 0 and used the natural logarithm transformation. After that, it was tested again for a normal distribution. It was found that the Lilliefors test statistic was 0.087 and value was ≤0.001 (<), so the savings variable in the natural logarithm was also not a normal distribution. Nevertheless, the central limit theorem stated that if a population did not a normal distribution, and if the random sample size was larger than or equal to 30, then the sample mean had an approximate normal distribution. Here, the sample size is 634, so it is assumed that the savings of low-income household variable had an approximate normal distribution.(1.2)The independent variables had no multicollinearity.For the savings of low-income households, the independent variables were checked whether the assumption of multicollinearity, based on tolerance and VIF, was satisfied, as shown in Table 4.Table 4 shows that the VIF of every independent variable was between 1.005 and 2.068, which was less than 10; therefore, every independent variable had no multicollinearity. Hence, the assumptions on the data were validated before the multiple linear regression analysis.(2)Construct multiple linear regression analysis.Due to t-test statistic of 8.317 and value of ≤0.001 (<), the independent variable X12 was correlated with the savings of low-income households when the other independent variables were constant. Similarly, the independent variables X4, X6, and X18-1 were correlated to the savings of low-income households with an estimate regression equation as follows:Factors affecting the savings of low-income households in the order of importance were determined by standardized coefficients beta. The factors were deposit interest, bond, share dividends, and other types of investment (X12), household expenses (X4), food expenses, beverages and household tobacco (X6), and secondary school and high school education levels (X18-1). The root of mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) were 91,397.9315, 243,314.6593, and 0.7152, respectively, as listed in Table 5.The scatter plot shows the existence of a positive relationship between the savings of low-income households and the prediction of the multiple linear regression model in Figure 7.(3)Check the multiple linear regression assumptions after constructing the regression equation.(3.1)The residuals had a normal distribution (normality). The Lilliefors test statistic was 0.295, and the value was ≤0.001 (<), which means the residuals were not normally distributed (not shown). Nevertheless, based on the central limit theorem (CLT) for a large sample size (n > 30), it can be assumed that the residuals had a normal distribution.(3.2)The residuals were not correlated, or they were independent (no autocorrelation). The Durbin–Watson test statistic was 2.008, which was between 1.5 and 2.5. Therefore, the residuals had no correlation, or the residuals were independent.(3.3)The residuals had equal variance (homoscedasticity). Figure 8 shows that the standard residuals between were randomly distributed about zero and parallel to the horizontal axis. Therefore, the residuals had equal variance. Hence, the assumptions on the data were validated after multiple linear regression analysis.

4.2.2. Savings of Middle-Income Households

The results on savings of middle-income households were obtained from the testing dataset for 2,107 middle-income household savings (the second-fourth quintile, 30 percent).(1)Check the multiple linear regression assumptions before constructing the regression equation.(1.1)The dependent variable of savings of middle-income households had a normal distribution.The savings of middle-income household variable was tested for normal distribution. It was found that the Lilliefors test statistic was 0.386 and the value was ≤0.001 (<), so it was not a normal distribution (not shown). Therefore, the savings of middle-income household variable were transformed using the Box–Cox transformation method. It was found that was −0.01, which was close to 0. We, then, chose of 0 and used the natural logarithm transformation. After that, it was tested again for a normal distribution. It was found that the Lilliefors test statistic was 0.052 and the value was ≤0.001 (<); so, the savings variable in the natural logarithm was also not a normal distribution. Nevertheless, the central limit theorem stated that if a population did not had a normal distribution, and if the random sample size was larger than or equal to 30, then the sample mean had an approximate normal distribution. Here, the sample size was 2,107. Therefore, it was assumed that the savings of middle-income household variable had an approximate normal distribution.(1.2)The independent variables had no multicollinearity.For the savings of middle-income households, the independent variables were checked whether the assumption of multicollinearity, based on tolerance and VIF, was satisfied, as shown in Table 6.Table 6 shows that the VIF of every independent variable was between 1.010 and 1.974, which was less than 10; therefore, every independent variable had no multicollinearity. Hence, the assumptions on the data were validated before the multiple linear regression analysis.(2)Construct multiple linear regression analysis.Due to t-test statistics of 11.93 and value of ≤0.001 (<), the independent variable X12 was correlated with the savings of middle-income households when the other independent variables were constant. Similarly, the independent variables X5, X8, X2, X16, X11, X20, X14, and X24 were correlated to the savings of middle-income households with an estimate regression equation as follows:Factors affecting the savings of middle-income households in the order of importance were determined by standardized coefficients beta. The factors were deposit interest, bond, share dividends, and other types of investment (X12), household consumption expenditures (X5), number of household members (X2), pension and allowance (X8), number of household members ages 60 and over (X20), age (X16), income from renting rooms/land and other assets (X11), household debt (X14), and number of members that had a card to certify the right for medical treatment (X24). The root of mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) were 259,150.5120, 58,097.1327, and 0.7608, respectively, as listed in Table 5.The scatter plot shows the existence of a positive relationship between the savings of middle-income households and the prediction of the multiple linear regression model in Figure 9.(3)Check the multiple linear regression assumptions after constructing the regression equation.(3.1)The residuals had a normal distribution. The Lilliefors test statistic was 0.282, and the value was ≤0.001 (<), which means the residuals were not normally distributed (not shown). Nevertheless, based on the central limit theorem (CLT) for a large sample size, it can be assumed that the residuals had a normal distribution.(3.2)The residuals were not correlated, or they were independent. The Durbin–Watson test statistic was 2.001, which was between 1.5 and 2.5. Therefore, the residuals had no correlation, or the residuals were independent.(3.3)The residuals had equal variance. Figure 10 shows that the standard residuals between were randomly distributed about zero and parallel to the horizontal axis. Therefore, the residuals had equal variance. Hence, the assumptions on the data were validated after multiple linear regression analysis.

4.2.3. Savings of High-Income Households

The results on savings of high-income households were obtained from the testing dataset for 706 high-income household savings (the fifth quintile, 30 percent).(1)Check the multiple linear regression assumptions before constructing the regression equation.(1.1)The dependent variable of savings of high-income households had a normal distribution.The savings of high-income household variable was tested for normal distribution. It was found that the Lilliefors test statistic was 0.337 and the value was ≤0.001 (<), so it was not a normal distribution (not shown). Therefore, the savings of high-income household variable were transformed using the Box–Cox transformation method. It was found that was 0.04, which was close to 0. We, then, chose of 0 and used the natural logarithm transformation. After that, it was tested again for a normal distribution. It was found that the Lilliefors test statistic was 0.058 and the value was ≤0.001 (<), so the saving variable in the natural logarithm was also not a normal distribution. Nevertheless, the central limit theorem stated that if a population did not have a normal distribution, and if the random sample size was larger than or equal to 30, then the sample mean had an approximate normal distribution. Here, the sample size was 706, and it was assumed that the savings of high-income household variable had an approximate normal distribution.(1.2)The independent variables had no multicollinearity.For the savings of high-income households, the independent variables were checked whether the assumption of multicollinearity, based on tolerance and VIF, was satisfied, as shown in Table 7.Table 7 shows that the VIF of every independent variable was between 1.036 and 2.378, which was less than 10; therefore, every independent variable had no multicollinearity. Hence, the assumptions on the data were validated before the multiple linear regression analysis.(2)Construct multiple linear regression analysis.Due to t-test statistic of 7.215 and value of ≤0.001 (<), the independent variable X12 was correlated with the savings of high-income households when the other independent variables were constant. Similarly, the independent variables X24, X20, X7, X9, X14, X26, and X3 were correlated to the savings of high-income households with an estimate regression equation as follows:Factors affecting the savings of high-income households in the order of importance were determined by standardized coefficients beta. The factors were deposit interest, bond, share dividends, and other types investment (X12), number of household members ages 60 years and over (X20), total household income (X7), number of members receiving subsistence allowances for the elderly (X26), grants received from other people (X9), household debt (X14), and number of members that have a card to certify the right of medical treatment (X24), and number of household members not working (X3). The root of mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) were 851,552.7515, 1,926,924.5832, and 0.7895, respectively, as listed in Table 5.The scatter plot shows the existence of a positive relationship between the savings of high-income households and the prediction of the multiple linear regression model in Figure 11.(3)Check the multiple linear regression assumptions after constructing the regression equation.(3.1)The residuals had a normal distribution. The Lilliefors test statistic was 0.247 and the value was ≤0.001 (<), which means the residuals were not normally distributed (data were not shown). Nevertheless, based on the central limit theorem (CLT) for a large sample size, it can be assumed that the residuals had normal distribution.(3.2)The residuals were not correlated, or they were independent. The Durbin–Watson test statistic was 1.822, which was between 1.5 and 2.5. Therefore, the residuals had no correlation, or they were independent.(3.3)The residuals had equal variance. Figure 12 shows that the standard residuals between were randomly distributed about zero and parallel to the horizontal axis. Therefore, the residuals had equal variance. Hence, the assumptions on the data were validated after the multiple linear regression analysis.

4.3. Analysis and Results of Unoptimized Artificial Neural Network
4.3.1. Savings of Low-Income Households

The prediction efficiencies of savings of low-income households of unoptimized ANN on many runs are shown in Table 8.

Table 8 shows that the testing dataset for 634 low-income household savings (the first quintile) was 30 percent of the total dataset. The results of the analysis of the testing dataset for low-income household savings, with unoptimized artificial neural network using multilayer perceptron algorithm, were MAE, RMSE, and R2 of 54,384.4958, 127,910.8087, and 0.6807, respectively.

4.3.2. Savings of Middle-Income Households

The prediction efficiencies of savings of middle-income households of unoptimized ANN on many runs are shown in Table 9.

Table 9 shows that the testing dataset for 2,107 middle-income household savings (the second quintile to the fourth quintile) was 30 percent of the total dataset. The results of the analysis of the testing dataset for middle-income household savings, with unoptimized artificial neural network using a multilayer perceptron algorithm, were MAE, RMSE, and R2 of 170,355.9166, 326,001.4820, and 0.6934, respectively.

4.3.3. Savings of High-Income Households

The prediction efficiencies of savings of high-income households of unoptimized ANN on many runs are shown in Table 10.

Table 10 shows that the testing dataset for 706 high-income household savings (the fifth quintile) was 30 percent of the total dataset. The results of the analysis of the testing dataset for high-income household savings, with unoptimized artificial neural network using multilayer perceptron algorithm, were MAE, RMSE, and R2 of 877,077.2266, 2,779,219.6003, and 0.6277, respectively.

The developed artificial neural network methods for savings of low-, middle-, and high-income households are shown in Figure 13.

Figure 13 shows an input layer and an output layer consisting of 27 input nodes and 1 output node. Figure 13 shows a hidden layer with 23 hidden nodes for low-income households. For middle- and high-income households, the number of hidden nodes was 25. The weights of the nodes were placed on the connection lines between input nodes, hidden nodes, and output node.

4.4. Results of Efficiency Comparison in Predicting Savings of People in the Central Region

The prediction efficiencies of MLR and unoptimized ANN on savings of low-, middle-, and high-income households were compared in terms of RMSE, MAE, R2, and processing time and are as shown in Table 5.

Table 5 shows the efficiency comparison in predicting savings of low-, middle-, and high-income households of MLR and unoptimized ANN. The testing dataset was used to predict the outcomes. It was found that MLR had a lower RMSE, processing time, and a higher R2 than unoptimized ANN for all savings of low-, middle-, and high-income households. Nevertheless, unoptimized ANN accomplished a lower MAE than MLR for the savings of low- and high-income households.

5. Discussion

In this study, an efficiency comparison of prediction methods of household savings of people in the central region of Thailand and analysis of factors affecting savings of people in the central region of Thailand were conducted, using secondary data on the 2019 Household Socio-Economic Survey, the National Statistical Staff’s Household Income Survey. The investigation involved using MLR and unoptimized ANN. Their efficiency comparison was based on RMSE, MAE, R2, and processing time. Three main topics are discussed as follows:(1)The results of this study demonstrated that MLR provided a lower RMSE, a shorter processing time, and a higher R2 than unoptimized ANN on all saving categories of low-, middle-, and high-income households. Nevertheless, the unoptimized ANN provided a lower MAE than MLR for the savings of low- and high-income households. Another study partly confirms the abovementioned conclusion. Using R2 as the metric, Morales and Huanca [17] applied MLR and ANN to public spending execution in Peru and concluded that MLR was better than ANN. The determination coefficients R2 achieved was 95.9% for the MLR model and 95.3% for the ANN model. Their conclusion is in agreement with ours but in contrast to several other studies [1820].(2)One of the strengths and weaknesses of the ANN model was that it relies on multiple internal parameters. The strength is that these parameters can be evaluated and adjusted to achieve the most accurate prediction. The weakness is that it takes a lot of processing rounds to optimize these parameters. Therefore, for a rapid use like for a screening purpose, an unoptimized ANN may not be as accurate as an optimized ANN or an MLR. In this research, the ANN’s internal parameters—the learning rate and momentum—were set to default values, 0.2 and 0.8, respectively [30]. With those default values, the unoptimized ANN did not perform as well as the MLR.(3)The most important factor affecting savings of low-, middle-, and high-income households was the factor of deposit interest, bond, share dividends, and other types of investment. The investigated factors in this study were similar to the investigated factors in another study in 2018 by [41], that investigated the saving behavior and the factors affecting saving behavior of people in Bangkok. The results showed that return, risk, and promotion affected the behavior, the amount of savings, savings objectives, and savings patterns, which are the same as our conclusion involving deposit interest. In 2019, factors affecting the saving behavior of people in Songkhla Province were investigated by [6]. The results of that study demonstrated that the macroeconomic factors in monetary policy had an effect on the saving behavior of people in Songkhla Province, which is the same as our conclusion involving the factor of bond, share dividends, and investments. Another study [7] investigated the economic factors affecting household saving of people in Thailand. The authors showed that the economic factors affecting the household sector saving included inflation, long-term stock funds, and the national saving fund, which are the same as our conclusion. In 2021, the study of [9] based on MLR analysis aimed to study factors affecting Thai household savings and saving behavior. The results indicated that household savings were affected by retirement saving plans which are the same as our conclusion involving deposit interest. In the following year [10], another study used MLR to investigate the determinants of household savings in a model. The study concluded that savings were not affected by the interest rate which differs from our conclusion that savings were affected by deposit interest. Finally, in 2023, the study of [11] investigated household savings and negative interest rates in many countries. The result demonstrated that negative interest rate led to a statistically and economically significant increase in savings. This is an interesting apparent conflict with our conclusion that positive deposit rate increased savings.

6. Conclusions

In this paper, the authors were to compare the prediction efficiency of two predictive methods—multiple regression analysis (MLR) and unoptimized artificial neural network (unoptimized ANN)—and to investigate the factors affecting savings of people in central region of Thailand. The comparison of MLR and unoptimized ANN was in terms of RMSE, MAE, R2, and processing time. In addition, factors affecting the savings of people in the central region of Thailand were analyzed using MLR analysis. The results can be summarized as follows:(1)For all savings of income categories, low-, middle-, and high-income households, MLR achieved a lower RMSE and processing time as well as a higher R2 than the unoptimized ANN. However, for the savings of income categories of low-, and high-income households, the unoptimized ANN provided a lower MAE than MLR did. Lower MSE, MAE, and processing time are good, but higher R2 is good.(2)The estimated multiple regression equation for savings of low-income households is as follows:

The estimated multiple regression equation for savings of middle-income households is as follows:

The estimated multiple regression equation for savings of high-income households is as follows:

The most influential factor affecting savings of low-, middle-, and high-income households was deposit interest, bond, share dividends, and other types of investment (X12).

As for the speed at which the two predictive methods executed, if the data were processed by an optimized ANN, several parameters such as momentum and learning rate had to be adjusted by several processing rounds. The total amount of time for optimizing then running the optimized ANN could be a matter of days. Therefore, we compared the processing time between the MLR and unoptimized ANN, and it was found that unoptimized ANN still took more processing time than MLR. Therefore, for rapid screening purpose, MLR may be better than ANN. The disadvantage of MLR is that the data must conform to the assumptions, whereas artificial neural network method does not require the data to conform to any assumptions [42, 43].

The reliability of MLR is unquestionable since it has been used in various fields of study for a very long time. ANN, on the other hand, is a newer method. Even though ANN has been proven to be reliable in many fields of study recently, it was not as well-established as MLR.

The most important factor affecting savings of low-, middle-, and high-income households was deposit interest, bond, share dividends, and other types of investment.

A limitation of this research was that in the survey of household incomes, some variables might have had missing values and some variables could not even be collected for analysis. In addition, since the survey of household incomes was carried out sporadically, the conclusion from the analysis may not reflect the current situation fully.

6.1. Future Recommendations
(1)Other default parameter values for the unoptimized ANN should be assigned. For example, learning rate, momentum, and number of training iterations should be set to 0.3, 0.7, and 20,000, respectively.(2)There were various occupations that could not be collapsed into a small number of occupations. If they could be collapsed, then we would add an occupational variable to the data analysis as well.(3)The research can be extended to cover the people from all regions of Thailand if there is sufficient funding for it.(4)Other variables related to savings may need to be collected such as economic factors: inflation, long-term stock funds, and the national savings fund.(5)New methods other than MLR and ANN should be further investigated.(6)From the results of this study, government agencies should devise a plan to encourage savings for the people in order for them to live a better life in the future.

Data Availability

The data used to support the findings of this study are not restricted by the Ethics Board. The data were available freely from the National Statistical Office, Government Complex Commemorating Majesty the King’s 80th Birthday Anniversary, Ratthaprasasanabhakdi Building, 2nd Floor, Chaeng Watthana Road, Lak Si, Bangkok, Thailand, 10210, email address: [email protected] and [email protected], telephone: 02-141 7500−03, fax: 02-143 8132, website: https://www.nso.go.th, for researchers who meet the criteria for access to confidential data.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors would like to thank the committee of the School of Science, King Mongkut’s Institute of Technology Ladkrabang for funding this research project (Grant no. 2564-02-05-001). The authors would like to thank a master degree project student in the Department of Statistics for her help in coordinating data collection from her agency.