Abstract

Aspect-level sentiment classification aims to identify the sentiment polarity of a review expressed toward a target. In recent years, neural network-based methods have achieved success in aspect-level sentiment classification, and these methods fall into two types: the first takes the target information into account for context modelling, and the second models the context without considering the target information. It is concluded that the former is better than the latter. However, most of the target-related models just focus on the impact of the target on context modelling, while ignoring the role of context in target modelling. In this study, we introduce an interactive neural network model named LT-T-TR, which divided a review into three parts: the left context with target phrase, the target phrase, and the right context with target phrase. And the interaction between the left/right context and the target phrase is utilized by an attention mechanism to learn the representations of the left/right context and the target phrase separately. As a result, the most important words in the left/right context or in the target phrase are captured, and the results on laptop and restaurant datasets demonstrate that our model outperforms the state-of-the-art methods.

1. Introduction

The aspect-level sentiment classification is a fine-grained task in sentiment analysis [1]. Given a review and a target occurring in the review, it aims to identify the sentiment polarity (e.g., negative, neutral, or positive) expressed on each target in its context. For example, considering this review “the voice quality of this phone is amazing, but the price is ridiculous,” we observe that there are two targets (“voice quality” and “price”) with completely opposite polarities. The sentiment expressed on target “voice quality” is positive, whereas the sentiment for target “price” is negative. Jiang et al. [2] introduced a target-dependent Twitter sentiment classifier, which showed that not considering the target information discussed in the review results in 40% of sentiment classification errors. Therefore, the task of aspect-level sentiment classification is also aimed at predicting a sentiment category for a review-target pair.

Different from sentence- and document-level sentiment analysis [36], for aspect-level sentiment classification, a review may contain multiple review-target pairs, and thus, separating different contexts for different targets is a challenge. Many methods based on neural networks have been proposed for aspect-level sentiment classification. For example, Dong et al. [7] used the adaptive recursive neural network to evaluate the sentiments of specific targets in context words. Vo and Zhang [8] separated the whole review into three sections (the target, its left contexts, and its right contexts) and used neural pooling functions and sentiment lexicon to extract the feature vector for a given target. Tang et al. [9] divided the review into the left part with the target and the right part with the target and then used two long short-term memory (LSTM) networks to encode the two parts, respectively. Zhang et al. [10] used a gated neural network to capture the interaction information between the target and its surrounding contexts. To further focus on important words of the sentence that modulate the sentiment of the targets, Wang et al. [11] introduced LSTM networks and an attention mechanism to concatenate word representations with target embeddings to generate the final sentiment representations.

Although the previous approaches have realized the importance of targets in sentiment classification, these approaches only focus on the impact of targets on context modeling. How to use the interaction information between contexts and the target phrase to separately model contexts and targets has become a new research issue. Ma et al. [12] proposed an interactive attention network (IAN) that uses two LSTM networks to model the contexts and target phrase, respectively, and then uses the hidden states from the contexts to generate an attention vector for the target phrase, and vice versa. Based on [12], Huang et al. [13] proposed an attention-over-attention (AOA) neural network, which models targets and reviews simultaneously using two LSTMs and then the target representation and text representation can be interacted through the AOA module. Zheng and Xia [14] designed a left-center-right separated neural network to model the left context, target phrase, and right context, respectively, and modeled the relation between the target and the left/right context using a rotatory attention mechanism.

To further improve the representations of targets and contexts, we propose an interactive neural network model named LT-T-TR. Firstly, it divides a review into three parts: the left context with the target phrase, the target phrase, and the right context with the target phrase. Three Bidirectional Long Short-Term Memory networks (Bi-LSTMs) are used to model these parts, respectively. Secondly, different words in reviews have different contributions to the final representation, and contexts and targets are influenced by each other, so attention weights of the target phrase and the left/right context are computed by interactive attention between the target phrase and the left/right context. The process is made up of two parts: the first is target-to-context attention, which includes the target-to-left context attention and the target-to-right context attention, to get better representations of the left/right contexts; the second is context-to-target attention that includes the left context-to-target attention and the right context-to-target attention. After computing these attention weights, we get the target phrase and left/right context representations. Next, these representations are concatenated to generate the final classification vectors. Experimental results on laptop and restaurant datasets show that our method achieves obvious improvements. The main contributions of this study can be summarized as follows:(a)Dividing a review into three parts: the left context with the target phrase, the target phrase, and the right context with the target phrase. Three Bi-LSTMs are used to model these parts, respectively.(b)Computing attention weights of the left/right context and the target phrase and getting representations of the target phrase and the left/right context using attention weights.(c)Concatenating these representations to form the final classification vectors and evaluating our model on laptop and restaurant datasets.

2. Model

In this section, we first give the task definition of aspect-level sentiment classification. Afterward, we introduce the different components of our model as displayed in Figure 1.

2.1. Task Definition

Given a review consisting of words, are the preceding context words, are the target words, and are the following context words. We divide the review into three parts: the left context consisting of and , the target phrase consisting of , and the right context consisting of and . Aspect-level sentiment classification aims at determining the sentiment polarity of review toward target . For example, the sentiment polarity of review “the voice quality of this phone is amazing, but the price is ridiculous” toward target “voice quality” is positive, but the polarity toward target “price” is negative.

2.2. Bi-LSTMs

First, we represent each word in as word embedding [15] and get word vectors , , and for , , and , where is the embedding dimension. Then, we feed these three-part word vectors to three Bi-LSTMs [16], respectively, to learn the hidden word semantics. Each Bi-LSTM is obtained by stacking a forward LSTM and a backward LSTM, which are good at learning long-term dependencies [17]. In the LSTM architecture, there are three gates (input gate, forget gate, and output gate) and a cell memory state. Each cell can be updated as follows:where is the sigmoid function, denotes elementwise multiplication, and stands for matrix multiplication; and denote the weight matrices and biases, respectively; is the input word vector, and is the previous hidden state.

For the left context , the input of Bi-LSTM is and we get hidden states as follows:where the output is obtained by concatenating the corresponding states of the forward and backward LSTM. Similarly, we can get the hidden semantic states for target and the hidden states for the right context in the same way.

Then, through an average pooling operation, we can obtain the initial representations of , , and as follows:

2.3. Attention Layer

After getting the hidden representations of the context and the target phrase generated by three Bi-LSTMs, we use the attention mechanism to calculate the different importance of words in the left/right context and the target phrase.

2.3.1. Target-to-Context Attention

Given the hidden representations of the left context and the average representation of target , we first get the target-to-left context attention representation bywhere is the weight of that we can obtain from a softmax function:

Here, is a score function that indicates the importance of words in the left context influenced by the target:where is a nonlinear function, is the weight matrix, is the bias, and is the transpose of .

Similar to equations (6)–(8), we can also obtain the target-to-right context attention representation using the average representation of the target .

2.3.2. Context-to-Target Attention

For the hidden representations of target , we first compute the weight representations as follows:where and are the weight matrix and bias, respectively.

Then, through calculating the weighted combination of the hidden states of the target phrase, we can obtain the left context-to-target representation as follows:

Similar to equations (9)–(11), we can obtain the right context-to-target representation by using and the hidden representations of the target.

After getting and , we get the final representation of the target phrase through concatenating and :

2.4. Final Classification

Then, we concatenate , , and as the final representation of review :

We project into the space of targeted classes through a nonlinear function:where and are the parameters. Finally, the sentiment polarity of the review with sentiment polarity toward a target is calculated as follows:

2.5. Model Training

The model is trained in an end-to-end way. The loss function is the crossentropy error:where means all training data, means a review-target pair, is the number of categories of sentiment, is the probability of predicting as class given by the softmax function, and shows whether class is the correct sentiment category.

3. Experiment

3.1. Experimental Settings
3.1.1. Datasets

We conduct our experiments using the dataset for SemEval 2014 Task 4 [18]. This dataset contains customer reviews on restaurants and laptops. Each review has one or more targets with their corresponding polarities. The polarity of targets can be positive, negative, neutral, or conflict. However, we only consider the first three labels for classification. The statistics of the datasets are shown in Table 1.

3.1.2. Parameters and Evaluation Metric

In our experiments, the dimensions of word embeddings, attention vectors, and LSTM hidden states are set to 300. All word embeddings are initialized by GloVe [19], and we randomly initialize the out-of-vocabulary words from uniform distribution . All weight matrices are randomly initialized from uniform distribution , and all bias terms are set to zero. The dropout rate is set to 0.5.

We adopt the accuracy to evaluate the performance of our model, which is defined as follows:where is the number of correctly predicted samples and is the total number of samples.

3.2. Model Comparisons

We compare our model with some baseline approaches:Majority: the largest sentiment polarity in the training set is regarded as the classification result of each sample in the test set.LSTM: a standard LSTM which models the review as a whole and uses the last hidden state of LSTM as the final revive representation [9].TD-LSTM: TD-LSTM obtains the final sentiment representation by concatenating two LSTM networks which model the preceding and following contexts surrounding the target, respectively [9].AE-LSTM: AE-LSTM concatenated the target vector with each word in review as the input of LSTM [11].ATAE-LSTM: ATAE-LSTM appends the aspect embedding into each word vector to strengthen the importance of the target [11].IAN: two LSTM networks are used to model the review and target phrase, respectively. It uses the hidden states of the review to generate an attention vector for the aspect, and vice versa. Based on these two attention vectors, it outputs a review representation and an aspect representation for classification [12].

The experimental results are shown in Table 2.

First, the worst method is Majority, demonstrating that for aspect-level sentiment classification, a powerful feature representation is important. Then, among all the other methods based on LSTM, the basic LSTM approach has the worst performance because it just models the whole review and ignores the target information. TD-LSTM has an improvement of 1% on the restaurant dataset and 2% on the laptop dataset over LSTM when target information is taken into consideration. Because the attention mechanism is introduced, AE-LSTM and ATAE-LSTM perform better than TD-LSTM. IAN obtains better results on restaurant and laptop datasets than LSTM-based methods because IAN explores separate representations of targets and interactive learning between the context and target. Our LT-T-TR model significantly surpasses the performance of IAN and all other baseline approaches. This reinforces our hypothesis that a model capable of capturing target-context dependencies interactively indeed performs better. We will conduct a more detailed analysis in the following sections.

3.3. Model Analysis: The Effect of Different Pooling Functions

In this section, we analyze the contribution of various pooling functions (see equations (3)–(5)) by using the LT-T-TR model. The results are shown in Table 3. It can be seen that the accuracy (77.5%) is the lowest when using min pooling alone to extract hidden features. By using max and avg pooling, the model has a significantly improved accuracy (79.3% and 79.6%, respectively). Finally, we obtain the best accuracy (80.6%) by combining max and avg pooling.

3.4. Model Analysis: The Effect of Different Sequence Models

We analyze the effect of different sequence models, recurrent neural networks (RNN), LSTM, and gated recurrent unit (GRU), to verify the effectiveness of our model. The results of experimental comparison results are shown in Table 4. We can see that LSTM performs better than RNN, and this is because LSTM has more complicated hidden units and offer better computation capability than RNN. Simultaneously, GRU has fewer parameters to train compared to LSTM, so that GRU has better accuracy than LSTM. Bi-LSTM has slightly better performance compared to GRU and LSTM because Bi-LSTM can capture more context semantic information than LSTM and GRU.

3.5. Model Analysis

To validate the effectiveness of the LT-T-TR model, we design several models in this section. We first input the review as a whole (rather than as three segments) into Bi-LSTM for modeling, and then use the attention mechanism to calculate the importance of each word toward sentiment categories. We refer to this model as No-Separation. Second, we simplify the LT-T-TR model by using the average of initial target vectors to represent the target phrase, we refer to this model as No-Target-Learned.

Furthermore, we compare the effect of interactive attention modeling between the target and left/right context. First, we build a model (named No-Interaction) without interactive information by removing the attention interaction operation between the left/right context and target phrase and just learn the attention weight representation by their own Bi-LSTM hidden states. Then, we build the Target-to-Context model by removing context-to-target attention, which is based on Target-to-Context [12]. Finally, we create an L-T-R model through dividing a review into the preceding context (without target), the target, and the following context (without target) and then model these three parts in the same way as in the LT-T-TR model.

Table 5 shows the experimental results. It can see that the No-Separation model achieves the worst performance among all approaches, and the No-Target-Learned model performs worse than No-Interaction and Target-to-Context model. This verifies that the target representation is important to judge the final sentiment categories, and the target should be modeled separately.

And L-T-R and LT-T-TR perform better than the No-Interaction model and the Target-to-Context model, which shows that the interaction between the target phrase and left/right context is important to final sentiment classification. Moreover, L-T-R has slightly worse results than the LT-T-TR model because the target phrase is not contained in the left/right context.

3.6. Qualitative Analysis

In this section, we selected three examples from the restaurants dataset to analyze which words contribute the most to the final classification. We get the attention weights and then visualize them by using a visualization tool Heml [11]. The results are shown in Figure 2, in which the color depth represents the importance of a word: the darker, the more important.

The review in Figures 2(a) and 2(b) is “The people with carts of food don’t understand you because they don’t speak English, their job is to give you the delicious food you point at.” The corresponding targets are “food” and “people with carts of food,” respectively. It can be seen that when a review contains two targets, the correct sentiment categories for each target can be calculated automatically through our model, that is, the attention mechanism can dynamically obtain the important words from the whole review. In Figure 2(b), we can see that “people” is the most important word in the target phrase “people with carts of food.” In Figure 2(c), the target is a multiword phrase “fried mini buns with the condensed milk and the assorted fruits on beancurd” and “buns” and “fruits” are more important than other words, so our model pays more attention to “buns” and “fruits.” This also proves that just averaging the vectors of words the target phrase contains to represent the target does not help much. Therefore, modeling the target phrase and context interactively is important for aspect-level sentiment classification.

3.7. Error Analysis

We made an error analysis of the experimental results. The first type of error is caused by noncompositional sentiment expression [20]. For instance, in this review “not only was the look of the food fabulous, but also the taste was to die for,” “taste” is a target and “to die for” is the relevant sentiment expression, whose meaning should not be understood literally. The second kind of error comes from complex sentimental relation expressions such as double negatives, assumptions, and comparisons, like “even though the price of this camera is unacceptable, I love its lens.” Our model fails to deal with the complex sentiment expression in this case. Furthermore, in the review “the movie was really on point—I was surprised,” “movie” is the target word and the idiom “on point” is the relevant sentiment expression, which is difficult to be identified by our model.

4.1. Aspect-Level Sentiment Classification

Sentiment analysis, also known as opinion mining [1, 21], has brought the widespread attention from both industry and academic communities. As a fine-grained task in the field of sentiment analysis [1], aspect-level sentiment classification has drawn a lot of attention, which is also considered as a kind of text classification problem. Traditional text classification methods depend greatly on the effectiveness of the feature engineering [22], which lacks generalization and is difficult for us to discover the potential explanatory or discriminative factors of data. In recent years, distributed word representation and neural network methods have been proposed and shown promising performance on this task [7, 8]. Dong et al. [7] used an adaptive recursive neural network to evaluate the sentiments of specific targets in context words. Vo and Zhang [8] separated the whole review into three sections (the target, its left contexts, and its right contexts) and used neural pooling functions and sentiment lexicon to extract the feature vector for a given target.

4.2. Neural Network for Aspect-Level Sentiment Classification

Today, neural network approaches are extremely fashionable for many natural language processing tasks and obviously, the field of sentiment classification is no exception. Many sentence/document-level sentiment classification tasks are dominated by neural network architectures [2325]. To further incorporate context information with target information, several models have been proposed, such as the target-dependent LSTM [9], which models each sentence toward the aspect. ATAE-LSTM and AT-LSTM [11] are attentional models inspired by [26]. AT-LSTM can be considered as a modification of the neural attention proposed in [26] for entailment detection, swapping the premise’s last hidden state for the aspect embedding. Han et al. [27] proposed a novel neural network based on LSTM and the attention mechanism for word context extraction and document representation. Chen et al. [28] combined regional long short-term memory and convolutional neural network for target-based sentiment classification. Zhang et al. [29] introduced dynamic memory networks based on multiple attention mechanism and LSTM, which showed a significant performance in aspect-level sentiment classification. Yang et al. [30] designed a coattention-LSTM network based on coattention mechanism for aspect-based sentiment analysis, combining the target and context attention vectors of sentences. The work most relevant to ours is IAN [12], which models the sentence and aspect term using two LSTM networks, respectively. It uses the hidden states from the sentence to generate an attention vector for the aspect, and vice versa. Based on these two attention vectors, it outputs a sentence representation and an aspect representation for classification.

Despite these aforementioned methods are effective, discriminating different sentiment polarities for different targets is still a challenging issue. Therefore, it is necessary to design a powerful neural network for aspect-level sentiment classification.

5. Conclusions

In this study, we have proposed an interactive neural network for aspect-level sentiment classification. The approach uses Bi-LSTM and an attention mechanism to interactively learn the important words in the target and context and generates the review representation for the final sentiment classification. Experimental results on the SemEval 2014 dataset show that our method achieves significant improvements. Our model analysis also shows that different sequence models can discriminatively learn the important words in the context and in the target. Furthermore, our model cannot handle several error cases effectively.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the National Social Science Foundation of China under grant no. 17BXW071, the National Natural Science Foundation of China under grant no. 61562057, and the Technology Program of Gansu Science and Technology Department under grant no. 18JR3RA104.