Abstract

Some texts that are challenging to recognize on their own may become more understandable in a neighborhood of related texts with similar contexts. Motivated by this intuition, a novel deep text sentiment classification (DTSC) model is proposed to improve the model’s performance by incorporating the neighborhood of related texts. Our framework uses a nonparametric approach to construct neighborhoods of related texts based on Jaccard similarities. Then, a new deep recurrent neural network architecture is proposed, comprising two distinct modules: bidirectional long short-term memory (Bi-LSTM) and gated recurrent unit (GRU). The proposed model aims to effectively capture informative features from the input text and its neighbors. The result of each module is processed through the maximum operation, which selects the most pertinent data. Finally, the extracted features are concatenated and subjected to classification to achieve accurate sentiment prediction. Previous studies have commonly employed a parametric approach to represent textual metadata. However, our approach utilizes a nonparametric approach, enabling our model to perform strongly even when the text vocabulary varies between training and testing. The proposed DTSC model has been evaluated on five real-world sentiment datasets, achieving 99.60% accuracy on the Binary_Getty (BG) dataset, 98.32% accuracy on the Binary_iStock (BIS) dataset, 96.13% accuracy on Twitter, 82.19% accuracy on the multi-view sentiment analysis (MVSA) dataset, and 87.60% accuracy on the IMDB dataset. These findings demonstrate that the proposed model outperforms established baseline techniques in terms of model evaluation criteria for text sentiment classification.

1. Introduction

The present generation of widely available and affordable web technologies generates significant social big data with perspectives that aid decision-making. Sentiment analysis (SA) is a computational study that analyses individuals’ perspectives, opinions, and emotions toward a particular entity that may track individuals’ moods and viewpoints by analyzing unstructured, multimodal, informal, noisy, and high-dimensional social data [1]. SA is a subset of natural language processing (NLP) that can be used in various real-world applications, including financial and stock price forecasting [2], politics [3], and medicine [4, 5]. Many researchers have dedicated significant efforts to investigating textual SA [610] through various methodologies, resulting in notable advancements on social media platforms. One significant constraint of current sentiment classification systems is their predominant dependence on a post or tweet’s textual content.

However, on social platforms such as Twitter, Flickr, and Instagram, there’s a wealth of metadata available alongside the text in posts. People attach metadata to their shared content, including user-generated labels, as a means of engaging with others. This metadata, which includes information about the attributes of social media posts and their authors, serves as a valuable source of context for dissecting the expressed sentiment within the text. Moreover, these metadata elements can function as an additional asset alongside the text-based features when it comes to successfully executing sentiment classification. For instance, the sentence “Nowruz Mubarak!” displayed in the green bounding box of Figure 1 carries an ambiguous meaning, and it is challenging to discern the true sentiment behind this sentence. Now, we consider the sentences in the red bounding box: “Happy Iranian New Year (Nowruz), Happy Nowruz, …” These sentences can help determine the true sentiment expressed by the input sentence, which can be positive. As a result, SA can greatly benefit from the metadata or neighbors of input sentences. Based on this intuition, we enhance SA by augmenting each input text with the neighborhood of related texts.

Previous studies have utilized metadata extracted from tweets, including metrics such as URL counts, hashtags, and mentions, as features for sentiment analysis. This approach assumes that texts sharing similar contents or features can convey similar sentiments [11, 12]. However, these studies relied on parametric models for text analysis, assuming consistent vocabularies between training and testing datasets. In the real world, metadata vocabularies can change as new tags emerge. Therefore, it is crucial to investigate the incorporation of metadata, or neighboring data, to adapt to these evolving vocabularies and enhance sentiment classification. In conclusion, a compelling rationale exists for utilizing both tags and metadata, facilitating the seamless integration of additional data sources to enhance the representation of the original input data.

The primary technical contribution of our study involves the nonparametric approach for generating text neighborhoods, as depicted in Figure 1. Subsequently, we employ a novel parametric model to learn the degree of informative representation derived from an initial input and its corresponding neighbors. This strategy enables our model to perform complex or impractical tasks using existing techniques and demonstrates a cutting-edge level of performance in text classification and SA. We specifically demonstrate the following capabilities of our model.

1.1. Adapt to Changing Vocabularies

Our model can handle various vocabularies during training and testing thanks to our nonparametric approach to finding near neighbors. Even when the training and testing vocabulary is entirely unrelated, our model still performs well. In other words, during training, the model learns the specific words in the training data and their relationships and similarities. However, it can still effectively classify sentiments in text even when it encounters entirely different words during the testing phase. This adaptability is attributed to the model’s nonparametric approach, allowing it to generalize to new categories of text metadata and adjust to changes in that metadata over time. So, despite variations in the text vocabulary, the model’s performance remains robust due to its unique design and training process.

1.2. Handle Different Text Types with and without Metadata

The proposed model can calculate neighbors from either the input data or the associated metadata. This feature ensures that our approach does not constrain the exclusive use of metadata or input data, showcasing the model’s remarkable adaptability across various text data types. It is essential to highlight the significance of both the presence and integration of metadata, as these aspects can significantly influence the model’s performance in sentiment analysis tasks. This reinforces the pivotal role of metadata, underscoring its potential to enhance the model’s accuracy, and stresses the need to consider it a crucial factor in data analysis.

1.3. Efficient Deep Model

An efficient model is designed to jointly learn representations from samples and their neighbors to generate intraclass-oriented meaningful representation.

The main objective of this study is to identify substantial correlations between the input text and the neighborhood of related texts within various sentiment datasets. It seeks to explore whether this metadata or neighboring texts can indicate potential biases in sentiment labeling. Lastly, we demonstrate how leveraging this metadata as features in a classifier might aid in enhancing sentiment classifiers and sentiment quantification.

The paper’s remaining sections are structured as follows: Section 2 presents the literature review. Section 3 describes the proposed model in depth. Section 4 illustrates the findings of the experiments. Section 5 concludes the study.

2. Literature Review

The significance of sentiment analysis (SA) increases in the context of natural language processing (NLP) when dealing with a substantial volume of user-generated textual content. Many supervised traditional machine learning (ML) classifiers are used to classify sentiment based on various features [13]. Ahuja et al. [14] examined the effects of TF-IDF word level and N-gram on the SS-Tweet sentiment analysis dataset. The TF-IDF word level approach in sentiment analysis using six machine learning classification algorithms outperforms N-gram features by 3-4%. Mee et al. [15] examined the relationship between textual qualities and Twitter user characteristics using regression and sentiment analysis, specifically the TF-IDF approach. Kunal et al. [16] recommended combining Tweepy and TextBlob, a Python framework, to analyze and classify tweets using the Naive Bayes (NB) classifier. Htet and Myint [17] developed a system for analyzing social media data from Twitter to assess individuals’ health, education, and business status. This system employs the maximum entropy (ME) classifier to identify specific requirements. Obiedat et al. [7] presented a hybrid approach that combines support vector machine (SVM), particle swarm optimization, and various oversampling techniques to address the issue of imbalanced data. The SVM has enhanced sentiment prediction using the restaurant reviews’ dataset.

Recently, text analysis has experienced advantageous outcomes by utilizing diverse deep learning (DL) models [18, 19], which have been extensively implemented in numerous research investigations. The categorization of short text was performed by Wang et al. [20] through the use of a convolutional neural network (CNN). Basiri et al. [21] proposed a CNN-recurrent neural network (RNN) model that utilizes attention mechanisms to capture past and future contexts in text sentiment analysis. The incorporation of bidirectional temporal information flow enhances the accuracy of classification in this approach. The present trend in the field of sentiment analysis involves the development of innovative text classification methods utilizing deep learning techniques such as CNN [22, 23] and long short-term memory (LSTM) [24]. While CNNs can collect and analyze local data, they may be less effective in capturing long-range dependency. The limitation of sequential modeling of texts across sentences can be overcome by utilizing the LSTM technique. In any case, its performance in collecting local information is suboptimal. The integration of CNN and LSTM becomes essential for enhancing the efficacy of text sentiment classification [2527]. Li et al. [28] presented a new padding methodology that enhanced consistency in the dimensions of input data instances, thereby augmenting the amount of sentiment-oriented information incorporated in every review. Integrating a sentiment analysis model denoted as “lexicon,” which utilizes deep learning techniques, involved incorporating two-channel CNN-LSTM/Bi-LSTM family models through parallelization techniques. Abid et al. [29] presented a unified procedure for SA on the Twitter platform. This approach incorporates an RNN architecture to capture long-term dependencies efficiently and utilizes a CNN and Global Vectors (GloVe) for Word Representation as a word embedding technique. The experimental outcomes performed superior to the baseline model upon evaluating the Twitter corpora. Dang et al. [30] proposed the integration of LSTM networks, CNNs, and SVM in hybrid deep SA learning models. The proposed model was evaluated utilizing eight textual datasets comprising tweets and reviews from various domains. The findings indicate that the hybrid models performed superior to the single models in sentiment analysis across all datasets.

Salur and Aydin [31] presented a new hybrid deep learning model that combines different types of word embeddings, namely, Word2Vec, FastText, and character-level embeddings, in combination with various deep learning models such as LSTM, GRU, Bi-LSTM, and CNN. The model under consideration amalgamates features from various deep learning word embedding methods and classifies textual information based on its emotional content. Zulqarnain et al. [32] proposed a novel methodology for SA by utilizing an encoder approach with a two-state GRU named E-TGRU. This framework was designed to enhance the effectiveness of SA. The study’s results indicate that, with adequate training data, the GRU model can proficiently acquire the vocabulary utilized in user opinions. The findings indicate that E-TGRU exhibited superior performance compared to GRU, LSTM, and Bi-LSTM. Li et al. [33] proposed a sentiment classification model for analyzing online restaurant reviews, integrating Word2Vec, bidirectional GRU, and the attention technique. The results show that the model’s performance surpassed established sentiment analysis models. Kamyab et al. [34] presented a novel approach to sentiment analysis utilizing attention mechanisms in conjunction with CNNs and two distinct bidirectional recurrent neural networks. The proposed method aims to enhance the understanding of sentiment in textual data. Initially, a preprocessor was utilized to improve the quality of the data. Then, max-pooling was used in conjunction with a CNN layer to reduce the dimensionality of features and retrieve contextual information. In addition, the study employed two autonomous bidirectional recurrent neural networks, namely, LSTM and GRU, to effectively capture long-term dependencies. Ultimately, the attention mechanism was implemented to highlight the degree of attention attributed to each word.

Mishra et al. [11] developed a sentiment analysis tool that is both cost-free and open-source, featuring a graphical user interface. This tool enables users to perform two key functions: firstly, to retrain the weights of a given model by relabeling predictions and/or adding labeled instances and secondly, to tailor lexical resources to address errors in sentiment lexicons, such as false positives and false negatives. The proposed approach has the potential to offer advantages in iteratively improving or augmenting models in a readily available manner while disregarding the expenses associated with training a new model from the beginning and reducing predictive precision over time. Mishra and Diesner [12] analyzed the metadata characteristics at the user and tweet levels, identifying associations and relationships between these characteristics and the log odds for sentiment categories. The reliability of this analysis is strengthened by replicating the experiments on current tweets obtained from the user population present in our datasets. The results suggest that most patterns identified in this analysis exhibit high consistency. The metadata characteristics that have been identified are ultimately employed as features for a sentiment classification algorithm, resulting in an improved outcome for sentiment classification.

Deep learning-based sentiment analysis and the BERT technique have recently piqued the interest of researchers. Chiorrini et al. [35] suggested two BERT-based text classification approaches: BERT-base and cased BERT-base. Their investigation used two independent datasets for sentiment analysis and emotion recognition. They noted that BERT offers positive text classification results. Huang et al. [36] proposed an innovative DCNN-Bi-GRU (deep convolutional neural network bidirection gated recurrent) text categorization model. The word semantic representation language model is trained using BERT. The DCNN-Bi-GRU hybrid model receives the dynamically generated semantic vector from the word context. This model is validated by the CCERT Chinese e-mail sample set and movie comment data set experiments. Bello et al. [37] proposed utilizing bidirectional encoder representations from transformers (BERT) for text classification in NLP and various variants. The experimental results indicate that the integration of BERT with CNN, BERT with RNN, and BERT with Bi-LSTM yields favorable outcomes in terms of accuracy rate, precision rate, recall rate, and F1-score when compared to the outcomes achieved by employing these deep learning models with Word2Vec or without any variation.

The earlier research has exhibited satisfactory results using different ML and DL models. However, achieving high accuracy in sentiment categorization remains a formidable challenge, particularly when dealing with data from social media platforms. One of the limitations of current sentiment classification systems is their heavy reliance on traditional techniques, such as bag-of-words (BOW) and N-gram approaches, which use term frequency as a feature. While these methods are straightforward and effective, they generate feature vectors that are often sparse and high-dimensional. This can lead to scalability issues and potential overfitting, even with regularization techniques. Furthermore, these models operate under the implicit assumption that sentiment expression in a post is solely conveyed through the text data, without considering contextual factors. However, social media platforms like Twitter offer an opportunity to access rich metadata alongside the textual content of posts. This metadata includes information about the characteristics of social network posts and their authors, which can provide valuable contextual cues for analyzing sentiment in a tweet. Additionally, leveraging this metadata can complement the textual features in the sentiment classification task.

Although few studies have relied on tweet metadata assuming consistent language patterns in training and testing data, the reality is that metadata vocabularies can change as new categories and trends emerge. Therefore, finding methods to incorporate this changing metadata or neighboring data is crucial to improve sentiment analysis accuracy. Motivated by these observations, we introduce an innovative deep text sentiment classification (DTSC) model. Our model features a nonparametric approach to generating text neighborhoods, making it adaptable to a wide range of signals and capable of generalizing to new categories of text vocabularies. Then, the input texts and their neighbors are converted into embedding vectors of lower dimensions, allowing neural networks to capture semantic word relationships. Importantly, this dense vector representation maintains a fixed size (embedding dimension), reducing model parameters and improving computational efficiency.

Additionally, we employ a novel parametric model to gauge the level of informative representation obtained from the input text and its associated neighbors. This unique approach equips our model with the ability to tackle complex tasks that may have been challenging with conventional techniques. As a result, our model demonstrates state-of-the-art performance in text sentiment classification.

3. The Proposed Model

A novel deep text sentiment classification (DTSC) model is proposed, as shown in Figure 2. The model considers the neighborhoods in which input text features are embedded. Our model uses a nonparametric approach to construct neighborhoods of related texts based on Jaccard similarities to develop a perfect system that can handle a wide range of signals, generalize to new categories of text metadata, and adjust to changes in that metadata over time. The input texts and their neighbors are transformed into embedding vectors to efficiently capture the semantic relationships between words and texts, making them more appropriate for in-depth analysis. Then, a new deep recurrent neural network architecture is proposed. Specifically, two distinct modules, Bi-LSTM and GRU, extract valuable representations from an input text and its neighbors. The outputs of each module (extracted features of a text and its neighbors) are fed through the maximum operation, which selects the most pertinent data. Finally, the extracted features are concatenated and deeply fused using multiple fully connected layers with a classifier layer to perform sentiment classification. The weights and biases are shared among the input text and its neighbors. In other words, the input text and its neighbors are passed through a common architecture.

3.1. Candidate Neighborhoods

In the nonparametric approach, we assume that integrating neighboring data alongside the primary input during network training will yield extracted features that demonstrate similarity among samples belonging to the same class. In other words, it is assumed that if two texts are related or share similar content, their feature representations should also be similar after training. The key challenge lies in the selection of the nearest neighbors. Our approach leverages the Jaccard measure between words to calculate text similarity, allowing for the nonparametric creation of candidate neighborhoods. In particular, Jaccard similarity is employed to assess how similar or dissimilar individual words are between different texts. This similarity measurement involves the following steps.

3.1.1. Jaccard Similarity Calculation

In this context, it measures how similar the words in one text are to those in another. The Jaccard similarity is a nonparametric measure that quantifies the similarity between two sets by comparing the intersection (common words) of the word sets in both texts to their union (all unique words in both texts), and it ranges between 0 (no similarity) and 1 (perfect similarity). It does not assume any specific probability distribution for the data but calculates the proportion of common elements. Given two sets A and B, the Jaccard similarity () is defined aswhere(1)A∣ is the cardinality (number of elements) of set A.(2)B∣ is the cardinality (number of elements) of set B.(3)AB∣ is the number of elements in the intersection of sets A and B (i.e., the number of elements common to both sets).(4)AB∣ is the number of elements in the union of sets A and B (i.e., the total number of distinct elements in both sets).

Concretely, for , we computewhere and represent the set of words for -th sample and its nearest -th neighbor. We set for all , to prevent a text from appearing in its neighborhoods. If the Jaccard similarity is high, the texts share many common words and are more similar. Conversely, if the Jaccard similarity is low, it suggests that the texts have fewer words in common and are more dissimilar.

3.1.2. Creating Candidate Neighborhoods

In each batch of data, candidate neighbors for each sample are computed using the Jaccard similarity. The text’s calculated similarities are then grouped into “neighborhoods” or clusters. Texts with a high Jaccard similarity between their words are placed in the same neighborhood because they are considered more similar. These neighborhoods are groups of related texts that share common features or themes. The only limitation that can be considered is that the number of neighbors should be smaller than the batch size. This is because the neighbors are selected from among the closest samples within a batch.

In summary, this method employs Jaccard similarity calculations to evaluate word overlap between texts, facilitating their organization into potential neighborhoods or clusters. However, a critical transformation is applied to enhance the suitability of these textual representations for further analysis. The input texts and their neighbors (with the number of neighbors defined by the user) undergo a crucial conversion process, and they are transformed into embedding vectors.

This conversion is pivotal because it translates textual information into numerical vectors, enabling the model to effectively understand and work with the text. These embedding vectors capture the semantic relationships between words and texts, making them more appropriate for in-depth analysis. The input vectors and their corresponding candidate neighbors are introduced into a network featuring two distinct modules: Bi-LSTM and GRU. These modules leverage the embedded representations to extract valuable patterns and insights between the input samples and their neighbors, enabling efficient comprehension and classification of the sentiments expressed in the texts.

The rationale for implementing a nonparametric approach over parametric models lies in several key advantages:(1)Adaptability to changing vocabularies: nonparametric approaches can adapt seamlessly to varying vocabularies. This adaptability is crucial when working with text data from different sources or domains, where the vocabulary can be entirely unrelated between training and testing datasets. In contrast, parametric models often assume a fixed vocabulary, limiting their ability to handle such dynamic language usage.(2)Complex and diverse data handling: nonparametric approaches excel at managing complex and diverse data, which is common in real-world text applications. They can accommodate different word choices, expressions, and language styles, making them suitable for texts from various sources. Parametric models may struggle when confronted with data heterogeneity and nonlinear relationships.(3)Enhanced performance in text classification and sentiment analysis (SA): the combination of nonparametric neighborhood generation and a novel parametric model brings the best of both worlds. The nonparametric approach captures contextual information from neighbors, while the parametric model effectively learns informative representations. This synergy enables the model to perform complex tasks and achieve a cutting-edge level of performance in text classification and SA, surpassing the capabilities of traditional parametric models.(4)Robustness in exploring correlations and bias detection: beyond classification, the model’s nonparametric foundation allows it to investigate correlations between the input text and related neighborhood texts. Furthermore, it explores the potential for metadata to indicate biases in sentiment labeling. This capability is essential for understanding and addressing bias in sentiment analysis applications, ensuring more accurate and fair results.

In summary, the decision to employ a nonparametric model is justified by its adaptability to varying vocabularies, ability to handle complex and diverse data, improved performance, and robustness in exploring correlations and detecting biases. This hybrid approach, combining nonparametric and parametric techniques, is well suited for addressing the challenges posed by dynamic and diverse text data in the context of sentiment analysis and classification.

3.2. Bi-LSTM

The RNN [38] model has gained significant attention in NLP due to its complex architecture that facilitates effective feature extraction. The model demonstrates proficiency in processing short data sequences due to its singular memory, which renders it incapable of processing long-term dependency issues. As a result, the LSTM architecture is used as an extension of the RNN model to address the problem of long-term dependency in text SA. The LSTM model leverages the present word embedding and the preceding hidden state within the context of text sentiment data for every component or term to anticipate the upcoming hidden state. The hidden state ( is feature vector dimension) at time t is updated as follows:where is the element-wise product symbol, is the sigmoid activation function, represents the lower layer input at time step t, and tanh is the tangent activation function. , , , and are input, forget, output, and memory gates, respectively. The parameters of the LSTM are and .

In sequence modeling tasks, it is beneficial to understand past and future contexts. Adding a second hidden layer in the unidirectional LSTM model expands the architecture, giving rise to the Bi-LSTM [39], which incorporates hidden connections that propagate in the reverse temporal sequence. The Bi-LSTM model consists of two sequences, which are as follows:

The output is shown in equation (4):where is the element-wise sum operation.

3.3. GRU

GRU is a distinctive variant within the family of RNNs [40]. The internal unit of the GRU is analogous to the LSTM internal unit [36], except that the GRU combines the forgetting and incoming ports into a single update port. Although it draws inspiration from the LSTM unit, this model maintains the LSTM’s capacity to overcome the vanishing gradient issue. The simplified internal architecture of GRUs facilitates their training process by reducing the computational complexity involved in enhancing the internal states. The hidden state ( is feature vector dimension) at time t is updated as follows:where is the sigmoid activation function and is the hyperbolic tangent function. The operation denotes the element-wise vector product, denotes an input vector, indicates the output vector, denotes the candidate activation vector, is the update gate vector, is the reset gate vector, and the learned parameters are and .

In the proposed architecture, the -th input text () and its -th neighbor () go through RNN-based layers and new representations for the input sample () and its -th neighbor () are generated (equations (7) and (8)).

For Bi-LSTM-based representation,(1) represents the new representation of the input text , generated by applying the Bi-LSTM model to . This captures both the backward and forward dependencies of a word in the -th input text ().(2)Similarly, represents the new representation of the neighbor text , generated using Bi-LSTM to capture the backward and forward dependencies of a word in the -th neighbor ().For GRU-based representation,(4) represents the new representation of the input text , generated by applying the GRU model to . GRU also extracts contextual and high-level textual information with long-term dependencies of the -th input text ().(5)Similarly, represents the new representation of the neighbor text , generated using GRU to extract contextual and high-level textual information with long-term dependencies of the -th neighbor ().

The representations that were generated from the sample (, ) and its neighbors (, ) are merged after applying the maximum operation that selects the most pertinent data, as shown in the following equation:where “+” represents the concatenation operation and is the output feature generated by combining the features from the text and its neighbors.

Finally, the output feature is deeply fused using multiple fully connected layers, as shown in equation (10), with a classifier layer for sentiment classification. The weights and biases are shared among the input text and its neighbors. In other words, the input text and its neighbors are passed through a common architecture.where is the label predictions of the -th input text, is the activation function, and represent the biases and weights, and is the output feature of the RNN’s blocks.

The proposed method uses the cross-entropy objective function to train the network. The objective function calculates the loss, which is estimated between predicted labels (equation (10)) and the true label (equation (11)).where is the number of classes and indicates the accurate label of -th text.

4. Experiment

The effectiveness of the proposed methodology is assessed and compared with several established baseline techniques, including Bi-LSTM, bidirectional-GRU (Bi-GRU), LSTM, GRU, CNN, and LSTM-CNN. The parameters and their respective values for the proposed method and other baseline approaches are displayed in Table 1.

The architectures of Bi-LSTM, LSTM, Bi-GRU, and GRU employ three layers of 128, 64, and 64 units. Furthermore, the classifier employs three fully connected layers with dimensions of 128, 64, 32, and 1 (or more based on class count). Additionally, a 128-dimensional word embedding is utilized.

In the 1D CNN model, three layers are employed, each consisting of 32 filters and a kernel size 5. A 1D max pooling layer is employed with a pooling size of three. A 1D global max pooling layer is employed in the convolutional-based model’s last layer. The classifier employs two layers, with the first consisting of 16 units and the second consisting of 1 unit (or more based on class count). In addition, a 128-dimensional word embedding is employed.

The LSTM-CNN model incorporates two layers with 32 filters and a kernel size 5. A 1D max pooling layer is employed with a pool size of 3. In the final layer of the LSTM-CNN model, two LSTM layers with 32 units are employed. The classifier employed two layers, with 16 and 1 unit (or more based on class count). In addition, a 128-dimensional word embedding is employed.

4.1. Evaluation Criteria

To assess the effectiveness of the proposed model and conduct a comparative analysis with prior research, we employ the following evaluation metrics: precision, recall, F1-score, and accuracy. These metrics are defined in equations (12)–(15). In these equations, TP stands for true positive, FP represents false positive, TN stands for true negative, and FN represents false negative [41, 42].

4.2. Dataset

Five social media datasets are used to evaluate the efficacy of the proposed DTSC model. The datasets are collected from various social media platforms with two (positive, negative) and three (positive, negative, and neutral) classes. These datasets have been partitioned into training and testing sets with an 80 : 20 ratio. Table 2 presents the complete statistical information for each dataset.

4.2.1. Getty Images

Getty images (https://www.gettyimages.com/(accessed Mar. 16, 2023)) offer a vast collection of creative resources, including photographs with detailed textual descriptions, videos, and audio content, catering to businesses and consumers, with over 477 million resources in its collection. The main advantage of Getty images is its user-friendly, effective query-based search engine with formal yet informative image descriptions. In particular, 3244 adjective-noun pairs (ANPs) from the visual sentiment ontology [43] are used as keywords to collect 20,127 sentiment sentences with two classes (positive, negative), named the “Binary_Getty” (BG) dataset, which includes textual explanations and labels.

The initial labeling is accomplished using the sentiment scores associated with ANP keywords. We further employ the Valence-Aware Dictionary and sentiment Reasoner (VADER) [44], a lexicon and rule-based SA tool (https://github.com/chute/vaderSentiment (accessed Mar. 16, 2023)), to label the preprocessed textual description. Then, we select only the text samples for which ANP and VADER sentiment scores are the same. Finally, three volunteers were chosen to assess the quality of our datasets. Each sample is graded 1 (suitable) or 0 (unsuitable). The results show that 95% of the samples are suitable and 5% are unsuitable; we only consider the samples with grade 1 (suitable) and ignore the others.

4.2.2. iStock Images

iStock images (https://www.istockphoto.com/(accessed Mar. 16, 2023)) is an online platform that offers a wide range of international royalty-free microstock photos, including images and their accompanying textual descriptions, graphics, clipart, videos, and audio tracks. The same procedure from Getty images is implemented; 3244 ANPs are used as keywords to retrieve 19,279 sentiment sentences with two classes (positive, negative), named the “Binary_iStock” (BIS) dataset. The dataset includes textual explanations and labels. The same labeling procedure demonstrated for the BG dataset is used to establish the final labeling of the BIS dataset.

4.2.3. Twitter Dataset

Additionally, we gathered a new dataset from Twitter. English tweets are specifically gathered using the Twitter streaming application programming interface (API) (https://developer.twitter.com/en (accessed Mar. 16, 2023)), with user-generated hashtags as keywords. We carefully filtered out all texts that were too short (less than five words) or too long (more than 100 words). We use VADER, a lexicon and rule-based SA tool, to speed up the labeling process to predict text sentiment polarity. Based on the projected sentiment polarity, the tweets are manually categorized into neutral, negative, and positive sentiment polarities. Finally, 17,073 high-quality tweets were obtained.

4.2.4. Multiview Sentiment Analysis (MVSA) Dataset

The MVSA-Single dataset [45] comprises 5129 image-text pairs extracted from Twitter. After displaying each pair to a single annotator, the annotator assigned one of three polarities (neutral, negative, or positive) to the image-text pair. Like [46], we first delete tweets with contradicting textual and visual labels. In cases where one modality is labeled as neutral while the other is labeled as positive or (negative), the ultimate polarity assigned to multimodal data is positive or (negative). Thus, we obtain a new MVSA-Single dataset with 4511 text-image pairs. Here, we used only the textual data from this dataset and considered a benchmark dataset collected mainly from the Twitter website, to demonstrate the outstanding performance of our proposed model using different social datasets.

4.2.5. IMDB Dataset

The IMDB (https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews) is an online database of information related to films, television series, podcasts, home videos, video games, and critical reviews. It includes 50,000 IMDB movie reviews and the binary sentiment of each movie review: positive or negative.

To prepare text data for sentiment analysis (SA), it undergoes the following preprocessing steps: (1) lowercase, which involves changing all text to lowercase. (2) Remove irrelevant information, including punctuation, special characters, hashtags, multiple spaces, URL references, stop words, and numbers. (3) Emoticon translation involves translating all emoticons into their respective terms.

4.3. Experiment 1: The Effect of the Neighborhood Technique

The proposed method’s performance is compared with and without the neighborhood technique in this experiment to ascertain the impact of the neighboring technique on our approach. In this experiment, the first part includes only the original input in the network training phase without the neighboring technique and neighbors are ignored. In the second part, the number of neighbors is considered to be 2. The outcomes are shown in Table 3.

The proposed method with the neighboring technique outperforms the method without the neighboring mode. The use of the data neighbors enables the learning of better representations. The accuracy, precision, recall, and F1-score of the proposed model with using neighbors improve by an average of 1.20%, 1.52%, 1.42%, and 1.40% compared to without using neighbor modes. For the MVSA-Single dataset, using a neighbor has a more positive effect than other datasets. So, for the MVSA-Single dataset, the accuracy, precision, recall, and F1-Score of the proposed method improved by an average of 1.72%, 2.73%, 1.70%, and 2.63%, respectively. Using neighbors dramatically improves performance.

4.4. Experiment 2: The Effect of Neighborhood Size

In this experiment, to determine the effect of the neighborhood size in our approach, the performance of the proposed method is evaluated with different neighborhood sizes and batches. Our method is executed ten times for each dataset, and the average result is reported. The results are demonstrated in Table 4.

As demonstrated in Table 4, the efficacy of the proposed method increases with the increase in the neighborhood size. So, for sizes 2, 4, 8, 16, and 32, the average of all criteria has increased by 0.72%, 1.37%, 1.48%, 1.53%, and 1.54%, respectively, compared to size 0. Interestingly, all the criteria improve with the increase in neighbors. However, as the size of the neighborhood increases, the improvement percentage decreases. For example, there is only a 0.03% difference between sizes 16 and 32.

Meanwhile, there is a 0.61% difference between sizes 2 and 4. The reason is quite apparent; with the increase in the size of the neighborhood, distant neighbors may also be selected. Although these distant neighbors are in the same class as the input sample, the text of the two samples may differ.

4.5. Experiment 3: The Proposed Method vs. Baseline Models

In this section, an evaluation of the proposed method and other established baseline methods is conducted across all datasets. The outcomes of the proposed method and other approaches are presented in Table 5.

Table 5 demonstrated that the LSTM model outperformed CNN and GRU, exhibiting comparatively inferior performance. The hybrid LSTM-CNN model demonstrated superior performance to the LSTM-only model, followed by bidirectional-based methods with relatively good performance, such as Bi-LSTM and Bi-GRU. The proposed method’s average accuracy, precision, recall, and F1-score metrics are 92.77%, 93.73%, 91.82%, and 92.73%, respectively. On the second-best method (Bi-GRU), the average values for accuracy, precision, recall, and F1-score metrics are 92.37%, 93.32%, 91.42%, and 92.33%, respectively. According to CNN 1D’s average recall of 90.54%, more samples from the specific class are typically misclassified. According to the suggested method’s average recall of 91.82%, fewer samples are regularly misclassified compared to CNN 1D. As can be seen, the proposed model performs better than the current leading approach in all evaluation metrics. Specifically, it achieves the highest accuracy of 99.60% when evaluated on the BG dataset. This result provides strong evidence for the effectiveness of the DTSC model in enhancing the classifier’s performance.

4.6. Experiment 4: Generalization

Our model has the benefit of handling scenarios where various types of metadata are accessible during training and testing with ease. Additionally, our model handles circumstances in which the words used over time may change. In other words, there may be some differences between the words in the training and test sets.

In the real world, the vocabulary or tags may change as new words become popular and older words fall out of favor. Any method that relies on user metadata should be able to handle these conditions. Ideally, to test our model’s resilience to changes in user words over time, we should train it with texts from one time and test it with texts from another.

Instead of randomly dividing a dataset into training and test sets in this experiment, we use the BG dataset to train the model, generate neighborhoods in the training phase, and use the BIS dataset to test and generate neighborhoods in the testing phase. The results are reported in Table 6.

Table 6 presents a comparative analysis between the proposed method and baseline algorithms, namely, LSTM and GRU. The evaluation uses BG for training and BIS for testing in two distinct scenarios. The findings indicate that the proposed methodology has yielded enhancements in performance, with average accuracy improvements of 1.14% and 1.09% observed for group one and respective improvements of 1.89% and 2.23% achieved for group two.

This suggests that leveraging additional metadata, specifically employing different vocabularies during training and testing, generates enhanced representations. Utilizing neighbors plays a crucial role in achieving these improvements. During the network training phase, this approach generates superior features characterized by a high level of generalizability. Even during the testing phase, when the input sample contains words not present in the training dictionary, the model produces superior representations with the assistance of its neighbors.

The results obtained from Tables 3 to 6 demonstrate the great benefits of the proposed model, which can be summarized as follows:(1)Improved accuracy: the model is designed to enhance accuracy in sentiment categorization, a task known to be challenging, especially in the context of social media data. By leveraging both textual content and extensive metadata, it aims to provide more accurate sentiment analysis.(2)Contextual understanding: unlike models solely relying on text data, this model considers the contextual aspects of social media posts. This contextual understanding is crucial for interpreting sentiment accurately, given the nuances and informal language often used in social media.(3)Utilizing metadata: the model harnesses the wealth of metadata, or neighboring data, available on platforms like Twitter, which includes information about posts and their authors. This additional context can significantly improve the accuracy of sentiment analysis.(4)Flexibility: the model’s nonparametric generation of text neighborhoods and subsequent use of a novel parametric model make it adaptable to various tasks. It can handle complex sentiment analysis tasks that may be impractical for other techniques.(5)Cutting-edge performance: the model demonstrates a state-of-the-art level of performance in text sentiment classification. It outperforms traditional methods by effectively combining contextual information, thereby offering a valuable advancement in sentiment analysis technology.

In summary, the model’s benefits include improved accuracy, enhanced contextual understanding, effective utilization of metadata, flexibility in handling diverse tasks, and cutting-edge performance, all of which contribute to more precise sentiment categorization, especially in the challenging domain of social media analysis.

5. Conclusion

Some texts that are difficult to recognize on their own may become more understandable in a neighborhood of related texts with similar contexts. Motivated by this theory, a novel deep text sentiment classification (DTSC) model was proposed to improve the classifier’s performance by integrating the neighborhood of related texts. Our model uses the nonparametric approach to construct neighborhoods of related texts based on Jaccard similarities.

Moreover, two distinct deep learning-based recurrent neural networks (Bi-LSTM and GRU) were integrated to extract sophisticated features, capture temporal relationships, and generate SA insights. The result of each module was further processed through the maximum operation, which selects the most pertinent data. Finally, the extracted features were concatenated and subjected to classification to achieve accurate sentiment prediction. In contrast to the previous studies, our approach utilizes a nonparametric approach, enabling it to perform strongly even when the text vocabulary varies between training and testing. The effectiveness of the proposed model was evaluated on five real-world sentiment datasets of short English text along with a dataset of lengthy movie reviews. The DTSC model performs more accurately and efficiently in identifying and understanding the semantics of both short and lengthy texts when compared to baseline approaches. The proposed model demonstrated a high level of accuracy across the datasets. Specifically, it achieved a 99.60% accuracy on the Binary_Getty (BG) dataset, a 98.32% on the Binary_iStock (BIS) dataset, a 96.13% accuracy on Twitter, an 82.19% accuracy on the multiview sentiment analysis (MVSA) dataset, and an 87.60% accuracy on the IMDB dataset. These findings indicate that the proposed model performs better than the existing state-of-the-art techniques regarding model evaluation criteria for text sentiment classification. Future works primarily comprise (1) broadening the model’s scope to encompass additional languages, such as Persian, and (2) leveraging transformer-based language models to produce more resilient embedding representations.

Data Availability

The following information was supplied regarding data availability: (1) the computer codes and datasets are available at GitHub: https://github.com/israakhalafsalman/LNN_SANNECDM and (2) IMDB dataset of 50 K movie reviews: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

Conflicts of Interest

The authors declare that they have no conflicts of interest.