Abstract

Using real-time production data of concrete to predict its 28-day compressive strength has great significance for improving the engineering structure quality and overcoming the shortage of the traditional tests long period of concrete compressive strength. The current research has the shortcomings such as insufficient prediction accuracy, inadequate matching between data characteristics and model characteristics, and redundant input parameter information. This paper proposes a BP neural network prediction model optimized by Spearman and PCA. The model first uses Spearman method to reduce the number of the input variables by eliminating variables that have a low correlation with the compressive strength and then uses PCA to eliminate the correlation between material-related variables. Following this, the new uncorrelated input variables optimized by Spearman and PCA are input to the BP neural network-based model to predict the compressive strength of concrete. The results showed that it yielded the mean absolute percentage error (MAPE) of 2.78% and the root mean-squared error (RMSE) of 1.66 MPa, far lower than the error of 4.82% and 2.92 MPa obtained by the nonoptimized BP neural network, respectively. The proposed model fully exploits real-time monitoring data from the concrete mixing station, and its results are close to those of traditional tests. It has great practical significance to guide the concrete production and construction, shorten the production cycle, and reduce the project cost.

1. Introduction

The compressive strength of concrete is an important parameter for determining the grade of its strength and an important index to compare and evaluate the mechanical properties of concrete [1]. Once concrete has been produced at the mixing station, test pieces are formed from it, and their compressive strength is tested after 28 days of curing. The results are used to determine whether the grade of concrete strength is suitable for the project at hand. Therefore, there is a time lag from production to test, and the project quality management will also cause demolition and rework due to unqualified concrete, resulting in the increase of project cost.

To solve the management problems caused by the lag of compressive strength test results, many researchers have investigated methods to predict the compressive strength of concrete by using concrete proportioning data. Zhu et al. [2] proposed a method to predict the compressive strength of recycled concrete aggregate based on gray correlation analysis and explained the influence of the water-cement ratio and the volume of the aggregate on its compressive strength. Chen et al. [3], Wang et al. [4], Jiao et al. [5], Tu et al. [6], and Pan et al. [7] used the water-binder ratio, content of silica fume, and contents of the aggregate, cement, and fly ash as inputs to the ANN neural network and applied the dolphin partner method to it and the GA-ANN to predict the compressive strength of concrete. Ma et al. [8], Xu et al. [9], and Jovic et al. [10] used the water-binder ratio as well as the contents of cement, coal gangue powder, lithium slag, water reducer, and coarse and fine aggregates as input variables, and the BP neural network model, the multivariate regression-based model, and the adaptive neural fuzzy inference system are used individually to predict the compressive strength of stone powder concrete, lithium slag concrete, and silica fumes. Al-Jamimi et al. [11] got a conclusion that support vector machine (SVM) and genetic algorithm (GA) as the mixed model (SVM GA) had the best effect on the prediction of concrete compressive strength. Naser et al. [12] proposed multivariate adaptive regression splines (MARS) to predict the compressive strength of ecofriendly concrete.

The above research results shows that it is feasible to predict the compressive strength of concrete by using proportioned data and the BP neural network. However, due to deviations between the feeding system and the control system of mixing plant, differences existed between the actual production data and the designed proportion data in practice. So, how to use actual production data to predict its compressive strength and the prediction accuracy requires further research. Real-time production monitoring data of concrete from mixing stations contain more than 10 material parameters, and those parameters are highly correlated. If taking those parameters as the input values of ANN directly, the redundancy and correlation of data will affect the predictive accuracy and convergence speed of the ANN inevitably. Reducing the number of input variables is the most direct method to improve the predictive efficiency of the ANN. Researchers have used machine learning models [1315], support vector machine algorithms [16], and clustering algorithms [17] to optimize and reduce the dimensionality of the model parameters to improve the accuracy of prediction. Vakharia and Gujar [18] used the ten-fold cross method to verify the input parameters and then used feature sorting to eliminate irrelevant features. Yu et al. [19] use enhanced cat swarm optimization (ECSO) to optimize the key parameters of SVM. The reduction of the input variables can be performed from both the degree of correlation between the variables with the predicted outcome and the degree of correlation between the variables. Spearman’s analysis is often used as a data reprocessing tool for multifactor analyses, such as those reported by Fang et al. [20], Gan et al. [21], and Li et al. [22]. They all use Spearman’s method to identify the correlation degree between the input variables and the target variable and to optimize the screening parameters according to the strength of the correlation degree. By using principal component analysis (PCA) to eliminate correlations between input variables, the dimensionality of the input variables in research on the seasonal predictions of PM2 [23], phosphorus content at the endpoint of the converter [24], and the throughput of cargo at ports [25] was reduced. It can be seen from these studies that the Spearman and PCA have shown good applicability in practice.

To sum up, the predicting model of the compressive strength of concrete is a typical multi-input and single-output nonlinear system, which is very close to the structure of BP neural network model in ANN method. Compared with the machine learning algorithm (such as SVM), BP neural network has the advantages of strong fault tolerance, generalization, and adaptability. So, this study uses real-time monitoring data on concrete mixing to propose a BP neural network model optimized by Spearman and PCA to predict the compressive strength of concrete. The proposed model uses Spearman’s analysis and PCA to reduce the dimensionality of the input variables for prediction and then uses them in the BP neural network model of prediction to forecast the compressive strength of concrete.

2. Proposed Model

2.1. Data Sources

The real-time production monitoring data of concrete from mixing station were used as the input variables for prediction, and the compressive strength of concrete achieved by the compressive strength test was used as the output data for training and verification. The input variables consisted of eight raw material consumption data per mixing production process and five production proportioning data. The eight raw material consumption data consist of two kinds cementitious materials (cement and fly ash), four kinds aggregates (crushed stone 1 with particle size from 16 mm to 31.5 mm, crushed stone 2 with particle size from 10 mm to 20 mm, crushed stone 3 with particle size from 5 mm to 10 mm, and sand), water, and water reducer. The five production proportioning data consist of the total consumption of cementitious materials per mixing production process, the total consumption of four aggregates per mixing production process, water-cement ratio (ratio of water consumption per cubic meter of concrete to the consumption of cement), water-binder ratio (ratio of water consumption per cubic meter of concrete to the consumption of cement and fly ash), and sand ratio (the ratio of sand consumption to the consumption of all aggregates).

2.2. BP Neural Network Model Optimized by Spearman and PCA

The 13 original inputs and their high correlation will affect the prediction accuracy of BP neural network and its operation efficiency. Therefore, Spearman’s correlation analysis and PCA are used to optimize the input variables of BP neural network model. The model to predict the compressive strength of concrete, optimized by Spearman and PCA, is shown in Figure 1.

2.3. Optimization and Dimension Reduction of Input Variables by Spearman and PCA
2.3.1. Spearman Correlation Analysis

Spearman correlation analysis was used to examine the correlation between the 13 kinds of monitoring data on mixing-based production and the compressive strength of concrete. The input variables with significant correlations were retained while those with insignificant correlations were eliminated for dimension reduction. The steps of the calculation were as follows:

Step 1. calculate Spearman’s correlation coefficient . where represents the coefficient for the correlation between the -th input variable and the output variable, represents the number of groups representing samples, and represent the order of the -th group of values of the data samples and the average order of the -th group of data samples of the -th input variable, respectively, and and represent the order and the average order of the compressive strength of the -th group of sample data, respectively.

Step 2. obtain Spearman’s correlation coefficient for the hypothesis test by looking up the table of its critical values. When the number of groups of samples was 38, the confidence was 95%. Then, the input and output variables can be considered to be related. If not, they can be removed.

2.3.2. Principal Component Analysis (PCA)

PCA involves mapping -dimensional features to uncorrelated -dimensional orthogonal features. After the input variables were reduced by Spearman, PCA is further used to remove the correlation between the input variables. The solution to PCA was obtained by programming in Python, and its processes of analysis and solution were as follows.

(1) Step 1: Centralize the Sample Data. There are data items in groups of samples , the -th of which is , (). Calculate the average value of each data item where represents the value of the -th sample data in the -th group.

Convert into an array with the center of the sample as the origin

(2) Step 2: Calculate the Covariance Matrix of the Sample Array. The covariance matrix is used to measure the correlation between random variables, and is calculated as follows:

(3) Step 3: Calculate the Eigenvalue of the Covariance Matrix and Its Contribution Rate . The contribution rate is the proportion of the difference between the principal components and the total difference in the original data. The higher the contribution rate is, the stronger is the explanatory power of the principal components with regard to the total difference. The eigenvalues of the covariance matrix and their corresponding eigenvectors are obtained by Equation (5). The eigenvalues are arranged in descending order, and the contribution rate of each as well as the cumulative contribution of the first principal components is calculated:

(4) Step 4: Choose the Principal Components. When the value of causes the rate of cumulative contribution , take the first principal components . The principal component can be obtained by calculating the load of the principal components

2.4. BP Neural Network Prediction Model

The BP neural network used to predict the compressive strength of concrete is a multilayer feed-forward network based on the error back-propagation algorithm. A flowchart of the analysis and calculation of the proposed model is shown in Figure 2, and the steps are as follows:

Step 1. initialize the BP neural network, set the maximum number of iterations of training to 10,000, and set the weights between the neurons in the input layer to , those in the hidden layer and the output layer to , and the thresholds and to random numbers. Set the learning rate to 0.035 and the target error to 1.1e-2, calculate the interval of the number of nodes in the hidden layer according to Equation (10), and determine the number of nodes in this layer by predicting the average value of the absolute error of the sample and the coefficient of sample fitting. where represent the numbers of nodes in the input layer, hidden layer, and output layer, respectively, and is a constant, .

Step 2. it is assumed that groups of real-time samples of monitoring data for the mixing-based production of concrete are used as inputs. Of them, the sample of the -th group has data items . Take the data of the -th sample and their corresponding compressive strength as the input training set . Then, the input vector and the expected output response based on sequential assignment can be given as . When the number of training samples is not sufficiently large, input them circularly.

Step 3. in the process of forward-propagation, calculate the output of the -th group of input samples in the hidden layer and the output of the output layer , followed by the calculation of the error between and the expected value of the network where is an activation function. The sigmoid function () is commonly used as the activation function.

Step 4. in the back-propagation process, adjust the weights between the neurons of each layer by using the error . Update the thresholds al and of the nodes of the network according to .

Step 5. judge whether the network satisfies the termination conditions. If not, return to Step 3.

3. Verification of Cases and Analysis of Data

3.1. Data Processing and Verification of the Cases

The examples were taken from 380-day real-time production monitoring data of C40 concrete with the same mixing proportions from March 18, 2021 to April 2, 2022 and the compressive strength of concrete achieved by the compressive strength test. Because of the large amount of examples, we calculated the average value of data with the same mixing ratio after every 10 days to obtain one group of sample data. After processing, 38 groups of sample data were obtained as shown in Table 1.

Sample training, validation, and testing were performed at a ratio of 6 : 2 : 2. The data of the first 20 groups (groups 1 to 20) in Table 1 were taken as the training data. The data of the 21st to 29th groups and the 30th to 38th groups are used as the verification group and prediction group, respectively. To ensure the rationality of data grouping, the numerical distribution of various data is analyzed through the half violin chart to avoid the influence of outlier grouping on the accuracy of the model.

3.2. Dimension Reduction of the Input Variables
3.2.1. Spearman’s Correlation Analysis of Input Variables

Spearman’s correlation analysis was used to calculate the correlation between the input variables and the output variables. The results are shown in Table 2.

The results showed that the values of of cement, crushed stone 1, crushed stone 2, crushed stone 3, sand, fly ash, cementitious materials, four aggregates, and water-cement ratio were all higher than 0.321. The confidence level of the test was 95%. It was concluded that these variables were related to the compressive strength of concrete. However, the values of of water, sand ratio, water-binder ratio, and water reducer were all less than 0.321, and it was concluded that they were not related to the compressive strength of concrete. We, thus, eliminated these variables from the model. This led to a reduction in the number of input variables from 13 to 9.

3.2.2. PCA Analysis of Input Variables

Firstly, correlation analysis is carried out among the 9 inputs optimized by Spearman to confirm whether PCA principal component analysis needs to be carried out further. The analysis results are shown in Table 3.

It can be seen from the results that except for the water-cement ratio, there is a high correlation between the input variables. It is necessary to further reduce the dimension of the input variables through PCA. The results are shown in Table 4. The cumulative efficiency of contributions of the first three principal components was 92.8461%, indicating that they represented most of the information of the input variables and could be used as optimized inputs to the BP neural network to predict the compressive strength of concrete.

According to Equation (9), the expressions of the first three principal components are as follows:

3.3. Verification of Two BP Neural Network-Based Prediction Models
3.3.1. Determining the Number of Nodes in Hidden Layer

The input and output layers of the Spearman and PCA optimized BP neural network model were 3 and 1, respectively, and the interval of the number of nodes of the hidden layer was [3, 12]. The input and output layers of the nonoptimized BP neural network model were 13 and 1, and the interval of its number of nodes in the hidden layer was [5, 14]. The first 20 groups of data were selected as training samples and the followed 9 groups as testing samples. By predicting the compressive strength of concrete using different numbers of nodes of the hidden layer, we chose the number of nodes that yielded the smallest prediction error as the optimal number for the hidden layer of the prediction model. These values were 6 and 5 for the Spearman and PCA optimized BP neural network model and the nonoptimized model, respectively.

3.3.2. Verifying the Predictions of the Models

Use the data of the validation group and forecast group (the 21st group to the 38th group) to verify and compare the prediction model after training, the results are shown in Table 5. The mean absolute percentage errors (MAPE) incurred by the Spearman and PCA optimized BP neural network and the nonoptimized network were 2.78% and 4.82%, respectively. Therefore, the optimized BP neural network model of prediction was more accurate.

3.4. Analysis
3.4.1. Predictive Accuracy and Applicability

Figure 3 shows a comparison of the prediction results of the nonoptimized and the Spearman and PCA optimized BP neural networks. The results of the latter were more consistent with the empirically acquired values, and, thus, its predictions of the compressive strength of concrete were more representative.

Figure 4 shows a diagram chart of the relative errors incurred by the models. Table 6 shows the comparison of predict errors incurred by the two models. The MAE incurred by the Spearman and PCA optimized BP neural network and the nonoptimized network were 1.30 MPa and 2.30 MPa, respectively, and their RMSE were 1.66 MPa and 2.92 MPa, respectively. It is clear that the proposed Spearman and PCA optimized BP neural network model had significantly smaller predictive errors than the nonoptimized model.

3.4.2. Analysis of Convergence Rate

Figure 5 compares the processes of iterations of the two models. It is clear that the latter converged more quickly, and its curve of iterations stabilized after about 2,500 iterations. This shows that the optimized model had a higher efficiency of calculation than the nonoptimized one.

4. Conclusion

(1)This paper proposes the Spearman and PCA optimized BP neural network model to predict the compressive strength of concrete by using the real-time production monitoring data, which solves the problems of long test period of traditional concrete, low engineering efficiency, and high cost(2)Spearman and PCA are used to reduce the quantity and dimension of input variable, which effectively solves the defects of traditional BP neural network, such as low calculation efficiency and insufficient prediction accuracy, caused by the original sample data with many material variables and the large correlation between each material variable(3)Three principal components with a cumulative contribution rate of 92.8461 are selected to establish a BP model to predict the compressive strength. Referring to the actual test data, the results showed that the MAPE of the Spearman and PCA optimized BP model is 2.78%, the RMSE is 1.66 MPa, and the MAE is 1.30 MPa, which is obviously superior to the corresponding values of the nonoptimized BP neural network: 4.82%, 2.92 MPa, and 2.3 MPa. Compared with the reference [12], the MAE and RMSE of machine learning model are 3.6 MPa and 4.13 MPa, respectively. It is verified that the predicted values of the model are more consistent with the actual compressive strength

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

K.Z. and H.W. were responsible for the methodology. X.Z. and Y. Z. were involved in the validation. K.Z. was assigned for the data curation. writing, and original draft preparation. Y.Z. and H.W. were tasked for writing, reviewing, and editing. H.W. was charged for the supervision. Y.Z. and X.Z were responsible for the project administration. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This research was funded by the Traffic Research Project of the Department of Transport of Shaanxi Province (grant nos. 18-33X and 21-04X).