Abstract

In this paper, we propose an algorithm to identify and solve systems of high-order equations. We rely on traditional solution methods to build algorithms to solve automated equations based on deep learning. The proposal method includes two main steps. In the first step, we use YOLOV4 (Kumar et al. 2020; Canu, 2020) to recognize equations and letters associated with the VGG-16 network (Simonyan and Zisserman, 2015) to classify them. We then used the SymPy model to solve the equations in the second step. Data are images of systems of equations that are typed and designed by ourselves or handwritten from other sources. Besides, we also built a web-based application that helps users select an image from their devices. The results show that the proposed algorithm is set out with 95% accuracy for smart-education applications.

1. Introduction

Today, artificial intelligence (AI) has created a revolution in society with the remarkable development of science and technology, especially since their explosion. The applications of AI are applied to many aspects of life that make human life more and more comfortable. There are many applications of AI and deep learning, for example, Google translate tool, facial recognition system, cancer detection through X-ray pictures, self-driving cars of Tesla, and smart-education applications. These are very practical applications for life.

Solving equations is a fundamental problem involving many different applications for learning and researching process. We will solve not only primary equations but also higher equations with complex forms. As a result, it is difficult to consider what the type of problem is and how it will be solved.

Manual computation with simple equations such as hidden cubic, quadratic, and first-degree equations or systems of first-order equations is usually not too difficult. However, it is difficult to find a solution when encountering many unknown parameters of systems of equations. In that case, we have to try many times with different values. After having a solution, we will perform the next steps. However, testing of values is difficult to perform by hand. Therefore, we completely solve this difficult problem specifically by computer vision with the current development of deep learning (DL).

The goal of the paper is to identify and solve an equation or system of high-order equations. Our proposed algorithm has four new points, as follows.

Firstly, we rely on traditional solution methods to build algorithms to solve automated equations using DL.

Secondly, the authors [1] use the YOLOv4 model to optimize the speed and accuracy of objects while detecting them. The results show that the model is suitable for detecting objects at a real-time speed of 65 frames per second (FPS) on the Tesla V100, although accuracy is not good with 43.5% AP and 65.7% AP50 for the MS COCO dataset. The authors [2] use simple online and real-time tracking (SORT) to track objects for long periods of occlusions that will reduce the number of identity switches. The algorithm is performed at 20 Hz and ID switches are reduced from 1423 to 781 (45%). Therefore, we use YOLOv4, an algorithm that has been evaluated to be highly feasible [3, 4], to identify equations in the identification block.

Thirdly, we can see that the accuracy of VGG-16 is similar to that of VGG-19 while the number of parameters is smaller, as shown in Table 1. Therefore, we use VGG-16 [5] to increase accuracy for character recognition block for final block.

Finally, we built datasets by ourselves from different sources.

The rest of the paper is presented as follows. In Section 2, we present related work. In Sections 3 and 4, we present and evaluate the effectiveness of the proposed model, respectively. Finally, we give conclusion in Section 5.

Identifying and solving problems about equations and systems of high-order equations is one of many practical applications of DL [620]. The solutions are deployed on many applications such as Quanda, Photomath, and Mathway with millions of users on iOS and Android.

Many solutions are given based on handwriting recognition [6, 7]. A diagram to recognize mathematical expressions based on SVM and simple mathematical expressions is proposed in [8]. The paper focuses on many techniques used for extraction and identification features. The authors [21] transformed technical and scientific literature into electronic form by memorizing mathematical symbols and expressions. Features are calculated using image center and bounding box techniques that have been suggested. Classification is performed by a neural network with an accuracy up to 90%. The authors [22] have proposed a diagonal feature extraction technique for handwritten characters using a feed-forward neural network algorithm. The technique uses vertical, horizontal, and diagonal features for character classification.

Recognition of mathematical symbols using convolutional neural network (CNN) is also proposed [9, 10, 13, 16, 18, 20]. In recent years, CNN has produced a series of research results in image classification arrays. An offshoot of AI neural networks called DL has shown good potential in solving taxonomy problems. DL began to evolve in 2012 when Alex et al. demonstrated their network architecture called AlexNet [10] for image classification to produce outstanding results.

In [20], the authors proposed to extract characters by the equation segmentation method and character by image processing technique. The extracted characters will be put into the CNN model. The sorted string will be processed for solving and give better results. The authors [23] propose a system of model equations for solving a wide range of verbal algebra problems. The authors [24] try to handle arithmetic problems with multiple steps and operations that do not depend on annotations or predefined patterns. The authors [25] present a new way of learning how to use formulas to solve simple addition problems in arithmetic.

In [18], the authors propose a method for solving algebraic to identify parameters in heat conduction equations with classic Tikhonov regularization algorithms. The results show that the regularization method based on the dynamic system method is more effective than Tikhonov. In [13], the authors propose a method of mathematical formula identification for PDF documents. The results show that the proposal method improves accuracy by up to 80%. In [16], the authors propose prealgebraic problem solving of fifth-year students before and after applying the math model. The results show that the method is able to solve many challenging arithmetic word problems. In [17], the authors proposed a regularization strategy-mollification method to analyze the stability of a problem. The results show that the proposed method is effective and stable.

In [20], the authors perform recognition and solution for quadratic equation that is written by hand. The authors use the NIST dataset similar to the famous MNIST dataset for digital characters with each character consisting of 2000 images for data training. They proposed line and character segmentation to separate them from the image for classification. Since characters are extracted from images using line segmentation and character segmentation which are two purely image processing methods, handwriting characters may stick together or italic images. Besides, the scope of problem is confined to square.

Based on the abovementioned analysis results, we propose to identify equations and letters by combining YOLOV4 and VGG-16. The scope of our problems also extends to solving equations or systems of high-order equations that are typed, handwritten, or from other sources. To the best of our knowledge, using CNN for solving mathematical equations has not been considered in the literature.

3. Proposal Solution

3.1. Overview System

Proposal system is described in Figure 1.

In the first step, we will identify equations with model training by YOLOv4 and a system of image datasets that are gathered from multiple sources. In the step, if we encounter skewed images, their bounding boxes will be partially affected by the equations above or below. When an image is tilted, characters are also skewed. Therefore, recognizing and classifying each character will not be accurate.

To solve this problem, we will rotate the image based on the bounding boxes that were identified in the first step. After rotating the image, we will detect the equation again. We will then continue to use YOLOv4 to recognize characters and use the VGG-16 model to classify them based on bounding boxes. Finally, we will have a raw string. We need to process them to put them into the SymPy module. SymPy will immediately give the result. To make system interoperable, we will deploy the model on framework calling Flask.

3.2. Implementation Steps
3.2.1. Equation Recognition

YOLO is the best one while comparing the methods using YOLO [26], single shot detector (SSD) [27], R-CNN [28], and faster R-CNN [29]. This algorithm involves CNN (the original version GoogLeNet is called Darknet) that divides the input into grids and cells. Each cell predicts directly bounding box and classifies object. It is highly capable in object recognition and is compact and capable of running in real time at high speed while deploying on devices. Therefore, we will use YOLOV4 for recognizing the equation [3, 4].

Implementation methods include four steps as follows:(i)Step 1: preparing data. Firstly, we need to prepare data from online sources (such as Google and Internet) or data by ourselves or data from textbooks as shown in Figure 2. In this case, the number of images to train with Yolov4 is 1046.(ii)Step 2: labeling the data. In this step, we define the bounding boxes of equations based on manual labeling. In the paper, we will use the LabelImg tool. This process is essentially drawing boxes around equations of image. Figure 3 shows an example using the LabelImg tool that automatically generates an “a.txt” file. The file describes the position of the equation of the image.(iii)Step 3: running model on Google Colab. To train our dataset, we will use a server powered by Google Colab. It is possible to run Python code and libraries to take full advantage of GPUs.There are several implementations of the YOLO algorithm. In this paper, we use the Darknet algorithm. It is an open source writing by C and CUDA that serves as the foundation of YOLO. It has the advantage of been supported by both the CPU and GPU at same time. The Darknet is used as a framework for YOLO training. We proceed to upload the folder containing the images and the corresponding.txt file that is being labeled after downloading to Google Drive.In the next step, we will proceed to edit the “yolov4.cfg” file in the Darknet folder as follows:(1)Changing line subdivisions to 64(2)Changing classes from 80 to 1 at lines 610, 696, and 783(3)Changing valuable filters from 255 to 24 at lines 603, 689, and 776(4)Changing batch stream to 64(5)Setting max_batches to where is number of selecting classes equal to(6)Setting width and height to a multiple of 32 where we set as As a final step, we will download pretrained weights of “yolov4.conv.137” from AlexeyAB and perform training.When we have a data with a “.txt” file generated by LabelImg, we will upload it to Google Colab to train. The results are shown in Figure 4.(iv)Step 4: identifying equation with the training model. We get weight contained in the “yolov4_training_2000.weights” file after training in the 2000 epoch. We will use weight and configure parameters for the OpenCV library to identify equations.We will perform equation recognition with a training model after training data. The results are shown in Figure 5.

3.2.2. Image Rotation

Since all of our images are not taken straight, recognition is inaccurate. Therefore, we will rotate them since equations or systems of equations are recognized more accurately [3032].

The steps for implementation are as follows:(i)Step 1: identifying equation and its area. The recognition equation and its area include two small steps. Firstly, we need to identify the area that contains the equation. We will then identify each of the equations individually. The results are shown in Figures 6 and 7.(ii)Step 2: processing equation area. We will proceed to process the area of the equation with white characters and a black background. If the equation area is out of bounding box, we will fill in black color. We will use OpenCV functions such as “cv2.cvtColor, cv2.bitwise_not, cv2.threshold” to perform tasks. The results are shown in Figures 8 and 9.(iii)Step 3: rotating image. Steps are performed as follows:(1)Determining rectangle with the smallest area surrounding character and rotation angle(2)Finding matrix to rotate image(3)Performing to rotate imageTo define a rectangle with the smallest area around a character and rotation angle, we will first define all characters with valuable pixel more than zero. We will then use function “cv2.minAreaRect” to determine the rotation angle () that is depicted in Figures 10 and 11.

After getting the rotation angle of the equation, we calculate the center coordinates of the original image. Therefore, we will calculate the matrix to rotate the image according to [33]. The matrix is obtained when using rotation angle () and center coordinates of image.

If we have an image of length as and width as , the center coordinate of image is as follows:

We can represent transformation matrix as follows:where , , and is rotation angle. In the paper, we choose.

The image transformation can be represented by an affine transformation. The matrix can be used to represent rotation, translation, and scaling operations. The usual way to represent an affine transformation is to use a matrix. We have the matrix calculated as follows:

2D image vector is as follows:

2D image vector after transformation is as follows:

To save time, we perform the steps of matrix calculation and use two functions ‘‘cv2.getRotationMatrix2D, cv2.warpAffine”. Results are shown in Figure 12.

3.2.3. Equation Recognition

After rotating the image, we will detect the equation again. The purpose of this step is to identify equations more accurately. Results are shown in Figure 13.

3.2.4. Character Recognition

In this step, we also perform similar to step 1 to identify equation. The number of images to train is 285.

3.2.5. Character Classification

Implementation steps are as follows:(i)Step 1: preparing the data. We will prepare 20-character data as shown in Figure 14 and Table 2. The images in the dataset are black and white images where the foreground and background are white and black, respectively.The model to train data is VGG-16 that is proposed by [5]. 13 layers of convolution2D and 3 layers fully connected as shown in Figure 15 with image input of pixels.Features of the VGG-16 architecture are used as follows:(1)Input layer: the VGG-16 architecture receives input image with three color channels as red, green, and blue.(2)Convolution layer: image goes through 13 convolution layers where the size of the kernels is with stride of 1, and the output size after passing through the convolution layer remains equal.(3)After going through convolutional layer, we will use Batch to normalize data.(4)Trigger function: the trigger function is used by ReLU.(5)Pooling class: MaxPooling is used in the VGG-16 architecture with size and equal stride. After 2 or 3 consecutive convolution layers, there will be a MaxPooling layer.(6)Fully connected class: the first class has 512 nodes, and the last one has 20 nodes corresponding to the 20 character labels that we want to classify.(7)Total number of parameters is 14,932,692 where learning and remaining parameters are 14,924,244 and 9,448, respectively.(ii)Step 2: performing training. Next step is training process using the VGG-16 model. The results are shown in Figure 16.(iii)Step 3: performing classification. In this step, we will perform to sort data. Results are shown in Figure 17.

3.2.6. Processing Text and Solving Equation

We get results while combining to recognize equation and character, as shown in Figure 18.

We will use the SymPy module to solve the equation after extracting text from the image. Its goal is to be a fully featured computer algebra system while keeping the code as simple as possible for easy understanding and extensibility.

3.2.7. Deploying on Website

In the final step, we deploy the program for the website application. We use the Flask framework that is easy to implement.

4. Simulation and Result

4.1. Setup

In our simulation, we setup parameters as follows. The VGG-16 architecture receives input image with three color channels as red, green, and blue. Convolution has 13 layers where the size of the kernels is with stride of 1. We use a trigger function as ReLU. MaxPooling is used in VGG-16 architecture with size . The first fully connected class has 512 nodes, and the last fully connected class has 20 nodes. The total number of parameters is 14,932,692 where the learning parameters are 14,924,244.

We use 23 images for testing and evaluate two parameters (accuracy of equation and character and time processing). We use the Tesla P100-PCIE-16 GB GPU for training and testing data.

4.2. Result

Our system can recognize equations up to 91.3% with a 23-image test image dataset. In the case of character recognition and classification, the accuracy of our system is up to 97%. Results are shown in Figures 1922.

In Figure 19, results show accuracy of up to 100% in both first and second order equations. In Figure 20, we perform a loss function evaluation according to the YOLOV4 model. Results show that the loss function decreases when the number of iterations increases. When the number of iterations gains 2000, the average loss function is 3.5% for 0.16 hours.

Figures 21 and 22 are the results of loss and accuracy functions for the VGG-16 model. The results show that both functions will reach saturation by 10 epochs. The results show that proposal model is highly feasible, especially for teaching and training.

Table 3 shows the accuracy result for the calculation. In Table 3, we see that the VGG-16 model gains accuracy 2% higher than the YOLOV4 model.

Table 4 shows accuracy result for calculation when we perform evaluations for 23 different equations. In Table 4, results show that the recognition rate is 96% with 22 correct cases and only 1 of wrong recognition.

We see that the processing time per equation is 14 seconds on Google Colab with a Tesla P100-PCIE-16 GB GPU.

To compare with the traditional method, we can easily use hands to calculate. When encountering problems with high-order equations or many parameters, it is difficult to find a solution. Therefore, a proposal system with a processing time of up to 14 seconds is able to help students or researchers quickly find a solution that is useful for smart applications.

Besides, we compare the accuracy of the proposal with other methods. The results are shown in Table 5 that the proposed method has the best results in terms of handwriting recognition and equation solving with an accuracy of up to 95%.

5. Conclusion

The paper focuses on the studding of neural networks for identifying and solving systems of equations. In this paper, we have identified equations with an accuracy of 97% and recognized and classified characters up to 95%. However, the system also has the disadvantage that processing time is not fast since it has to go through many models.

Therefore, we will change the system by removing models and collecting more data about equations and characters to improve the accuracy of the system in the future.

Data Availability

The authors confirm that all the data in the study were built by themselves. They used 23 images for training and testing. They also used the Tesla P100-PCIE-16GB GPU to perform simulation.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was carried out in the framework of the project funded by the Ministry of Education and Training (MOET), Vietnam, under the grant B2020-BKA-06. The authors would like to thank the MOET for their financial support.