Abstract

This study aims to develop a robust metalearning system for rapid classification on a large number of tasks. The model-agnostic metalearning (MAML) with the CACTUs method (clustering to automatically construct tasks for unsupervised metalearning) is improved as EW-CACTUs-MAML after integrated with the entropy weight (EW) method. Few-shot mechanisms are introduced in the deep network for efficient learning of a large number of tasks. The process of implementation is theoretically interpreted as “gene intelligence.” Validation of EW-CACTUs-MAML on a typical dataset (Omniglot) indicates an accuracy of 97.42%, performing better than CACTUs-MAML (validation accuracy = 97.22%). At the end of this paper, the availability of our thoughts to improve another metalearning system (EW-CACTUs-ProtoNets) is also preliminarily discussed based on a cross-validation on another typical dataset (Miniimagenet).

1. Introduction

Generally, a learning algorithm is defined as a procedure for processing the data D to make predictions from every input [1]. That is, is a particular function that maps to . In this sense, the goal of machine learning is to recover a function from data, including learning classifiers, regression, and policies [2]. Consequently, the learning algorithm is said to be consistent if

Differing from traditional machine learning, metalearning is interpreted as “learn to learn,” which can achieve (1), where the function from to can be actually presented as a universal metalearner [3]. The main research directions of metalearning include metalearning based on the metric space (e.g., prototypical networks), metalearning based on parameter optimization (e.g., model-agnostic metalearning), and model-based metalearning (e.g., reinforcement metalearning) [15]. The datasets for metalearning are very large, and hence, the automatic classification of learning tasks is always a great challenge [6]. Due to this challenge, few engineering applications of metalearning are reported [7, 8].

The objectives of this study are (1) to analyze the major reasons for the challenge, (2) to develop a method for tackling the challenge, and (3) to propose a scheme for engineering applications of metalearning. The organization of the whole paper is as follows. In Section 2, we formulate the problem as a challenge in a large-scale matrix operation, and in Section 3, we theoretically analyze how to further improve the accuracy and efficiency in classification. Experiments and discussion are presented in Section 4, where the room for improvement in parameter optimization is also highlighted.

2. Problem Formulation

2.1. Representation of the Model

We utilize the entropy weight method to improve the metalearning processes, where model-agnostic metalearning (MAML) is employed as the prototypical network [912].

Let be the vector of initial parameters for the model f and denote the updated parameters. Let be the nonzero learning rate. For K-shot learning, we use 5-way-5-shot to build the prediction model [1315].

According to the universal function approximation theorem [1620], can also be represented as an approximator for functions on

2.2. Interpreting the Learning Process

Let l be the lth task and represent the input feature values which were evaluated by parameters , bias , and transformation variable . Let represent the weight matrices, which include a set of linear layers with nonnegative input and activations. Let be the output function. Let be the learned parameters.

We improve the traditional gradient descent utilized in the prototypical network to update the weights of the learner f, which can be represented as

Choose and such that

Let denote a function that produces a K-shot discretization of its inputs. Select and such that

The loss in classification is calculated with a cross-entropy function

A simplified interpretation of metalearning processes is shown in Figure 1.

3. Theoretical Analyses

3.1. Construction of Tasks

Suppose there is an embedding learning algorithm on D; then, we can obtain the mapping data from the embedding space . For the cluster , the centroid of cluster is calculated from

Given a source matrix

The weight matrix calculated from r with the entropy weight method is

The prototype of the kth class is generated from

Hence, the set of examples labeled with class k is

We utilize k-means clustering division to get P and a set of partitions [2128]. Let N be a support set of one-shot labels and Q be a query set. Each task can be sampled from a permutation with the one-shot labels obtained from CACTUs. That is,

3.2. Parameters’ Optimization

Entropy weight method is utilized in computing relative weights for every data of tasks DJ and adapting to new tasks DJi which also determine the parameters of the model through the calculations of gradient descents [29, 30]. Let be the global learning rate (a fixed metalearning parameter). Then,

Parameters are optimized by sampling tasks from P(J)—associated with .

The goal of the optimization process is to use the updated parameters to calculate the outer layer updates. Let be the learning rate in the inner layer. The parameters’ heredity during the optimization process is

The relationship between the total loss and the task loss during the parameters’ optimization process is shown in Figure 2.

3.3. Theoretical Implementation

The implementation of EW-CACTUs-MAML includes two steps, which can be theoretically interpreted as “gene intelligence” (to highlight parameters’ heredity).

First, in order to implement multistep gradients’ updates, define an initial gene (that is, the initialization parameter). The multistep gradients’ updates can be implemented through the calculation of the input training tasks’ update genes. Second, continue to join the training data for each task and update genes. The optimal genes will be obtained in multiple gradient descents. Certainly, the parameters for a certain task may need to be updated several times to get the optimal result, as shown in Figure 3.

In order to simplify the genetic process, a future expectation for the best situation is that one update is enough for finding a gene, and during the whole process, only limited data with small samples are necessary for learning, as shown in Figure 4.

4. Experiments and Discussion

4.1. Performance of the Model

Two typical datasets, the Omniglot dataset and the Miniimagenet dataset, will be employed in this section. The Miniimagenet dataset has been widely used in the fields of metalearning and few-shot learning [3137]. The famous original reference of the dataset is [37], where the matching networks for one-shot learning were presented to tackle a key challenge in machine learning—learning from a few examples. Up to now, Miniimagenet has become a benchmark dataset in the field of metalearning and few-shot learning [3840]. The dataset contains 60000 colorful pictures with size 84 × 84 in 100 categories, including 600 samples in each category [41]. The Omniglot dataset contains 1623 handwritten characters from 50 different letters, which were drawn online by 20 different people with Amazon’s Mechanical Turk [42]. Each image is paired with stroke data, and for the coordinate sequence [x, y, t] of each stroke data, the time t is in milliseconds [43]. Omniglot is a benchmark dataset in the field of one-shot and few-shot learning [40, 4449]. We utilize 60% of the Omniglot dataset as the training set and 40% of this dataset as the validation set, as shown in Figure 5.

According to the 300 iterations of the training and testing datasets in the deep cluster of the Omniglot dataset, the average value of the validation accuracy is 97.42%, which indicates that EW-CACTUs-MAML is robust on the Omniglot dataset.

4.2. Competitiveness and Practicability

The performance of CACTUs-MAML on the Omniglot dataset is shown in Figure 6, including details for the training process and validation process. According to the 300 iterations of the training and testing datasets in the deep cluster of the Omniglot dataset, the average value of the validation accuracy is 97.22%.

Comparing the dynamic curves of the train loss, train accuracy, validation loss, and validation accuracy of CACTUs-MAML with those of EW-CACTUs-MAML in Figure 5, we conclude that the proposed model is competitive with CACTUs-MAML. The comparisons of EW-CACTUs-MAML and CACTUs-MAML in the performance on the Omniglot dataset are shown in Table 1.

The loss of EW-CACTUs-MAML in validation is 0.20578947, which is less than the loss of CACTUs-MAML in validation. The accuracy of EW-CACTUs-MAML in validation is 97.42%, which is higher than the accuracy of CACTUs-MAML in validation. It must be noted that CACTUs-MAML can represent one most competitive model on the Omniglot dataset [50]. Consequently, these results already demonstrate that the proposed model is competitive and practicable.

4.3. Uncertainty Analysis and Discussion

We tried to validate the model EW-CACTUs-MAML on another typical dataset Miniimagenet, but the size of this dataset is too big so that the computer sources were spent out before completing the performance of EW-CACTUs-MAML. Since we also want to validate the availability of the EW method in improving other metalearning systems, we then tried to improve another competitive metalearning system CACTUs-ProtoNets [50] as EW-CACTUs-ProtoNets. Fortunately, the computer sources are enough for performing the alternative model on both the Miniimagenet dataset. Details for the training and validation processes of CACTUs-ProtoNets and EW-CACTUs-ProtoNets on the Miniimagenet dataset are shown in Figure 7.

It must be pointed out that we utilized 80% of the Miniimagenet dataset as the training set and 20% of this dataset as the validation set for training/testing EW-CACTUs-ProtoNets and CACTUs-ProtoNets, which is similar with our strategy for training/testing EW-CACTUs-MAML and CACTUs-MAML. We explicitly compared the performance of the models EW-CACTUs-ProtoNets and CACTUs-ProtoNets on the Miniimagenet dataset, as shown in Table 2.

The Miniimagenet dataset is really challenging. The CACTUs-ProtoNets model is already most competitive on the Miniimagenet dataset, but the validation accuracy is still less than 50% [50]. The low validation accuracy has not been improved after integrated with the EW method. As a cross-validation, performance of the model EW-CACTUs-ProtoNets on the Miniimagenet dataset revealed a challenge in the practical applications to complicated datasets [5154]. The EW method can improve CACTUs-MAML, but it cannot improve CACTUs-ProtoNets.

One possible explanation for this is that CACTUs-MAML is a parameter-based model, while CACTUs-ProtoNets is a metric-based model. An unresolved issue is how to improve the performance of CACTUs-ProtoNets on the Miniimagenet dataset and other complicated datasets. Although the performance of our method on the Omniglot dataset implies the feasibility of the practical applications in optical character recognition (OCR), further validation on other engineering datasets is still necessary [5559]. These should be next research priorities.

5. Conclusion

We apply few-shot mechanisms in completing task construction and propose a new method to optimize the previous algorithm, which is a competitive metalearning system. Entropy weight method is utilized to improve the prototypical network. The traditional gradient descent is in turn improved and utilized in the prototypical network to update the weights of the basic learner. The implementation of the proposed method is interpreted as “gene intelligence” to highlight parameters’ heredity. The performance of EW-CACTUs-MAML indicates a robust prediction, which is competitive in the comparisons with CACTUs-MAML. Next research priorities are to further improve the performance of CACTUs-ProtoNets on the Miniimagenet dataset and to further validate the model on more complicated engineering datasets.

Data Availability

All the data utilized to support the theory and models of the present study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (41571299) and the High-Level Base-Building Project for Industrial Technology Innovation (1021GN204005-A06).