Learning Users’ Intention of Legal Consultation through Pattern-Oriented Tensor Decomposition with Bi-LSTM

Guo, Xiaoding; Zhang, Hongli; Ye, Lin; Li, Shang

doi:https://doi.org/10.1155/2019/2589784

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Related Work Preliminaries Conclusion Appendix Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Deep Learning Driven Wireless Communications and Mobile Computing

View this Special Issue

Research Article | Open Access

Volume 2019 | Article ID 2589784 | https://doi.org/10.1155/2019/2589784

Learning Users’ Intention of Legal Consultation through Pattern-Oriented Tensor Decomposition with Bi-LSTM

Xiaoding Guo,¹Hongli Zhang,¹Lin Ye,¹and Shang Li¹

Guest Editor: Haneul Ko

Received12 Nov 2018

Revised14 Jan 2019

Accepted17 Feb 2019

Published07 Mar 2019

Abstract

Online legal consultation plays an increasingly important role in the modern rule-of-law society. This study aims to understand the intention of legal consultation of users with different language expressions and legal knowledge background. A critical issue is a method through which users’ legal consultation data are classified and the feature of each category is extracted. Traditional classification methods rely considerably on lexical and syntactic features and frequently require strict sentence formatting, which eliminates substantial energy and may not be universally applicable. We aim to extract the patterns of users’ consultation on different categories, which minimally depend on lexical, syntax, and sentence formatting. However, research in this area has rarely been conducted in previous legal advisory service studies. In this study, a classification approach for multiclass users’ intention based on pattern-oriented tensor decomposition and Bi-LSTM is proposed, and each user’s legal consulting statement is expressed as a tensor. Moreover, we propose a pattern-oriented tensor decomposition method that can obtain a core tensor that approximates the patterns of users’ consultation. These patterns can improve the accuracy of classifying users’ intention of legal consultation. We use Bi-LSTM to automatically learn and optimize these patterns. Evidently, Bi-LSTM with a pattern-oriented tensor decomposition layer performs better than a recurrent neural network only. Results show that our method is more accurate than the previous work, and the factor matrix and core tensor calculated by the pattern-oriented tensor decomposition are interpretable.

1. Introduction

With the increase in demand for online legal consultation [1], understanding the intention of different users for legal consultation is a problem that must be solved [2]. Different users have various language expressions and levels of legal knowledge [3]; for example, User A inquired as follows: “He sneaked into my house and stole three thousand dollars, how to judge?”, and User B asked as follows: “Burglary $2500, how many sentences and fines should be sentenced?” Users A and B described burglary cases amounting to $3000. These users expressed the same intention of legal consultation and could be provided with the same category. Thus, a crucial step to understand the users’ legal counseling intentions is classifying users’ legal consulting statements. Traditional intention classification methods extract sentence features, which rely heavily on lexical and syntactic characteristics, and generally require sentences to have a strict format. However, users’ legal advice data, such as colloquial, disordered, and unprofessional data, are typically disorganized, thereby resulting in numerous difficulties for traditional methods of users’ intent classification.

In previous works, understanding users’ intent of legal consultation has been rarely accomplished, especially classifying users’ intent upon the colloquial, unprofessional, and disordered irregular legal consultation dataset [4]. Figure 1 illustrates the framework of the intent classification model of users’ legal consulting statements. Evidently, the problems to be solved mainly include modeling and classifying user legal consulting statements. Traditional intention classification methods are dedicated to feature extraction at the lexical and syntactic layers, and regular datasets with professional knowledge background are typically required to achieve high classification accuracy [5]. Obtaining these datasets requires expert knowledge and consumes substantial human engineering.

Definition 1 (the intention of the users’ legal consulting statement). The intention of the users’ legal consulting statement is category of consultation involved in it, such as process, assistance, crimes, and judgments on legal cases.

Problem 2. We represent users’ legal consulting statement as a tensor; the category of this statement is expressed in the scalar . Given the dataset of users’ legal consulting statements , , where represents the th statement in and indicates corresponding categories of . Train a model , which is used to classify and predict the category of a new legal consulting statement .

We define the intention of users’ legal consulting statements as Definition 1. This article formalizes the problem of understanding users’ legal intention as Problem 2. This study proposes a new method for understanding users’ intention of legal consultation. In terms of modeling methods for user legal consulting statements, we propose a pattern-oriented tensor decomposition method. We focus on extracting the patterns of user legal consultation, rather than features in the lexical and syntactical levels for different categories. These patterns can be regarded as a kind of data structure and are less dependent on vocabulary, grammar, and sentence formatting than traditional intention classification methods. The pattern-oriented tensor decomposition method is used to extract structured information in a users’ legal consulting statement, and the structured information approximates the user legal consulting patterns. For example, we denote the user legal consulting statement as the tensor and the user consultation pattern derived by Bi-LSTM as . Then, we use and as inputs of the pattern-oriented tensor decomposition method and obtain a core tensor , which is construed as the structured information of tensor and approximated to pattern . carries not only the vocabulary and syntax data but also the structured information of .

In terms of classification model optimization, this study proposes a user legal consulting intent classification method on the basis of Bi-LSTM and pattern-oriented tensor decomposition. We use Bi-LSTM to automatically learn and optimize users’ legal consultation patterns and obtain patterns that are highly favorable for classifying users’ legal consultation intention. Moreover, Bi-LSTM, which passes the pattern-oriented tensor decomposition layer, is more accurate in classifying users’ consultation intent and more relaxed on the datasets than directly using the users’ legal consulting statement tensor as the input to Bi-LSTM. Furthermore, the core tensor obtained by the pattern-oriented tensor decomposition method contains structured information that approximates the legal consultation patterns of different categories. Simultaneously, the core tensor dimension is considerably lower than the original one. The core tensor can be regarded as the main structured information of the original tensor for user intention classification.

The main contributions of this study are summarized as follows:(i)This study proposes a new feature extraction model for users’ legal consulting data. In contrast to the traditional feature extraction method based on the vocabulary and syntax, this study proposes a new pattern-oriented tensor decomposition method, which extracts the core tensor that represents the main structured information of users’ legal consultation data from the original tensor. The core tensor is approximate to users’ consultation patterns of different categories. The consultation patterns are beneficial to classifying users’ legal consultation intentions. In particular, in comparison with the original tensor, the core tensor obtained by the pattern-oriented tensor decomposition method can improve the accuracy of the Bi-LSTM classification algorithm.(ii)This study proposes a new intent classification approach that combines Bi-LSTM with pattern-oriented tensor decomposition method. In this study, we use the pattern-oriented tensor decomposition layer to extract core tensors from the original ones in accordance with the user consultation patterns for each category. We use Bi-LSTM to autonomously learn and optimize these patterns. Then, we obtain legal consultation patterns and core tensors, which are beneficial to classifying users’ legal consultation intention. Our proposed method depends less on lexical, syntactic, and sentence formatting of datasets and is more universal than traditional intent classification schemes that rely heavily on datasets.(iii)This study proposes a new optimization method for pattern-oriented tensors on the basis of Bi-LSTM. We derive the partial derivative of the error function for pattern tensors in Bi-LSTM in accordance with pattern-oriented tensor decomposition method and propagation equations of Bi-LSTM. Furthermore, we use the hyperparameter optimization strategy in Bi-LSTM to continuously update the value of pattern tensors. This approach is conducted to guide various training sample core tensors in improving the classification accuracy of the model.

This paper is organized as follows: Section 2 mainly describes the research procedure on classifying texts in the legal field in recent years and related works on intent recognition. Section 3 mainly introduces related background knowledge, such as several relevant methods, definitions, and notations. Section 4 details the method proposed in this study for user legal advice intent, that is, Bi-LSTM with pattern-oriented tensor decomposition method. Section 5 presents the relevant comparison experiments and result analysis.

In recent years, research on the understanding of user intent based on deep neural networks and tensor layers in the field of legal services has been rarely conducted. In the past ten years, researchers have concentrated on investigating the classification of legal related documents in the field of law and computer intersection [6]. Text classification in the legal field includes classification and understanding of legal cases, judgment documents, entities involved in legal cases, and laws and regulations.

In [7], Sulea, Zampieri, and Malmasi studied the application of text classification in the legal field for professionals. The authors proposed a method for predicting the judgment of the Supreme Court in France on the basis of machine learning algorithms and statistics and suggested an accurate case-like case retrieval technique and weight fluctuation algorithm for case influence over time. The SVM algorithm was mainly used to complete the classification of relevant documents in legal cases, and the judgment for the legal cases was realized. In [8], Sarwar, Karim, and Naeem studied the software copyright dispute between a user and a software program owner through a semisupervised machine learning algorithm; in addition, the authors predicted and judged the software copyright dispute that may be violated by the software after the user obtained the software license. Copyright disputes in using software licenses are common problems. After users obtain software licenses, they can use the software for a period in accordance with software usage rules.

In [9], Galgani and Hoffmann proposed a method for classifying legal references through incremental knowledge acquisition. This method can be automated to extract the main objectives from the legal text summary. These authors created considerable training and test corpora for legal citation classification in the legal field of Australian court judgment report, which is considered of high quality under Australian law. A specialized legal knowledge base in the field, which uses machine learning algorithms, is utilized to classify legal references. In [10], Xiong studied the automatic classification system in the field of Chinese legal texts. For Chinese legal documents, traditional Chinese character documents cannot be used to model Chinese legal documents. Otherwise, dimensional explosion and computational complexity will heighten. Xiong proposed a legal document clustering method on the basis of latent semantic analysis to diminish the dimension of legal text features. In addition, Xiong established a Chinese taxonomic automatic classification system in accordance with the second dimensionality reduction method based on the foundation of latent semantic analysis.

In [11], Maat, Krabben, and Winkels used machine learning algorithms to classify sentences in the Dutch legal library and compared the results of the classification with the legal sentence classification outcomes on the basis of traditional pattern classifiers. The legal sentence classifier based on machine learning algorithm has higher accuracy than the pattern-based classifier given the accurate modeling of legal sentences and feature extraction. In [12], Bartolini proposed a management labeling system for Italian law. The method aims to cluster the full text by representing redundant long documents in the vector form and achieve document classification. It uses the treaty and article as clustering units and presents clustering experiment results in a tree diagram form.

In the general text classification field, researchers have conducted substantial research [13]. Traditional machine learning algorithms and deep neural networks are used in text classification. From the perspective of machine learning, Nigam, McCallum, and Thrun improved the accuracy of learning text classifiers by using considerable unlabeled documents to augment few marked documents [14]. This method is necessary because obtaining text labels for text classification is costly in practice. However, considerable unlabeled documents are particularly easy to obtain. Their article uses an EM-based approach to learn and mark unlabeled documents. The algorithm first uses a Bayesian classifier to probabilistically mark unlabeled documents, followed by a small amount. Subsequently, the system counts the expected values of the tagged document, creates a tag classifier on the basis of all documents, and iterates until it converges. From the perspective of deep neural networks, Kim proposed TextCNN that is based on convolutional neural networks for text classification and prediction [15]. Donahue proposed a structure based on recurrent neural networks for text classification, that is, TextRNN [16]. Dzmitry proposed an attention structure for deep neural networks. Attention layer discovers the association between input and output by adding weight parameters [17].

3. Preliminaries

In this section, we introduce several related methods, definitions, and notations. Section 3.1 presents the basic definitions and notations involved in this study. Section 3.2 provides a detailed explanation of the tensor decomposition operation.

3.1. Definitions and Notations

In this study, we present user legal consulting statements in tensors. The pattern-oriented tensor decomposition method is used to decompose these tensors, and the obtained core tensors are used in the subsequent deep neural network classification model. A tensor is a data structure similar to a vector or matrix [18, 19]. Tensor decomposition is a dimensionality reduction operation on the tensor [20, 21]. Similar to principal component analysis and singular value decomposition methods, tensor decomposition methods are devoted to extracting the main structure and compositional information in the original tensor [22, 23].

The tensor is actually a multidimensional array [24], and we use the Euler script letters () to represent the tensor. We refer to the tensor dimension and number of tensor dimension as modes and order, respectively [22]. Scalar, vector, and matrix are denoted in lowercase (), bold lowercase (), and uppercase letters (), correspondingly; the transposition of matrix in ; and unit matrices in . A square matrix with elements of is represented by .

Definition 3 (outer product). The outer product of two vectors and is expressed as , , that is,

Definition 4 (Kronecker product). The Kronecker product of two matrices and is denoted as , which is an matrix.

Definition 5 (Khatri-Rao product). The Khatri-Rao product of and is denoted as , which is an matrix, that is,

Definition 6 (-mode product). Given -mode tensor and matrix , the -mode product is denoted as , , that is,

Definition 7 (-mode matricization). Given an -mode tensor , the -mode matricization of is denoted as . The calculation method aims to fix the th mode and form the elements of other modes into a long matrix.

Definition 8 (Frobenius norm of a tensor). Given an -mode tensor , the Frobenius norm of is denoted as , that is,

3.2. Tensor Decomposition

Tensor decomposition is a process of approximating a tensor into a core tensor and several factor matrices [25]. In Figure 2, given an -mode tensor , the formal expression of tensor decomposition on iswhere is a set of factor matrices, . The factor matrices are all column orthogonal ones [19]. Furthermore, is the core tensor, . The tensor decomposition methods minimize the objective function [26], where

4. Our Approach

This paper proposes Bi-LSTM with pattern-oriented tensor decomposition method for intention classification of users’ legal consulting statements. In Section 4.1, the pattern-oriented tensor decomposition method extracts the core tensor from the original tensor under the guidance of the pattern tensor , in order to make approximate . In Section 4.2, Bi-LSTM continually optimizes pattern tensor so that carries a specific tensor structured and elemental information in . This information is most conducive to improving the accuracy of the intent classification model of users’ legal consulting statements.

As shown in Figure 3, Bi-LSTM controls the process of pattern-oriented tensor decomposition by optimizing pattern tensor . Bi-LSTM continually optimizes pattern tensor , while core tensor continues to approach through the pattern-oriented tensor decomposition method. Finally, becomes the pattern tensor that can make the Bi-LSTM model reach high accuracy, and is the core tensor that is beneficial for improving the accuracy of the subsequent classification model under the guidance of tensor pattern .

4.1. Pattern-Oriented Tensor Decomposition Method

The pattern-oriented tensor decomposition method decomposes tensor into core tensor and factor matrices and , thus making the core tensor approach the users’ legal consultation pattern , that is, the core tensor and the users’ legal consultation pattern demonstrate a similar tensor structure. The subsequent Bi-LSTM classification model controls pattern-oriented tensor decomposition by continuously optimizing the pattern tensor . This situation implies that the core tensor is more advantageous than users’ legal consultation data tensor in terms of enhancing the accuracy in classifying users’ legal consultation intention. Simultaneously, achieves the dimension reduction effect, which considerably reduces the calculation time and space.

The framework of the pattern-oriented tensor decomposition method is depicted in Figure 4. In this study, the problems to be solved by the pattern-oriented tensor decomposition method are defined as Problems 9 and 10.

Problem 9. Given a tensor and a pattern tensor , find two factor matrix sets and , , , where and satisfy Conditions 1 and 2 simultaneously.

Condition 1. The factor matrix sets and minimize the following target function.

Condition 2. Factor matrices in and are orthogonal matrices, that is,

Problem 10. Given tensor and two factor matrix sets and , , , where and satisfy Conditions 1 and 2 simultaneously, find a core tensor , that minimizes the target function , where

In Problem 10, we can calculate the value of by setting the partial differential of function with respect to to 0. The specific conclusion is presented in Theorem 12. In Appendix A, Proof A.0.1 provides the proof of Lemma 11, and Proof A.0.2 provides the solution of (11) in Theorem 12.

Lemma 11. Given that the Frobenius function , where , , and satisfies (9), the partial differential of function to is , where is a constant.

Theorem 12. Given and in Problem 9, we can obtain the optimal solution of that minimizes the target function in Problem 10.

The following part is the process of calculating the sets of factor matrices and in Problem 9. Under the constraint of Conditions 1 and 2, we can calculate the optimal solution of function by using alternating least squares (ALS), Lemma 13, Proof B.0.3, and Proof B.0.4. In Appendix B, Proof B.0.3 elaborates the proof process of Lemma 13, and Proof B.0.4 gives the solution of Problem 9.

Lemma 13. Given the function , where , , and satisfies (9), the partial differential of function to , where , is , where is a constant.

The ALS algorithm aims to use the partial derivative of the remaining variable while fixing other variables and find the value of the variable when the partial derivative is zero. Then, the value is substituted for the original objective function. Similarly, the values of other variables are calculated using the same process. The ALS continuously iterates until the calculation error is tolerable. The process of calculating the optimal solution and that minimize target function (8) under constraint Conditions 1 and 2 is demonstrated in Proof B.0.4.

Algorithm 1 demonstrates the process of the pattern-oriented tensor decomposition method. The present study uses the ALS algorithm to optimize the parameters involved in Algorithm 1. The input of Algorithm 1 is the tensor that represents users’ legal consulting statement and the user legal consultation pattern , which is beneficial to classifying users’ legal consultation intention. The outputs of Algorithm 1 are the core tensor and the corresponding factor matrices and . In addition, can be interpreted as a feature map of the original tensor in the space determined by core tensor . That is, the original users’ legal consulting statement is mapped to the feature space that is beneficial for classifying users’ legal consultation intention. Then, we can accurately understand users’ legal consultation intention.

Input: Tensor which represents users’ legal consulting statement, , and tensor which is users’ legal
consultation pattern, .
Output: The core tensor which is close to users’ legal consultation pattern in the layer of the tensor structure,
, and its corresponding factor matrices and , , .
1 Initialize factor matrices and ;
2 for to thresholdvalue do
3 for to do
4 ;
5 , , ;
6 ;
7 end
8 for to do
9 ;
10 , , ;
11 ;
12 end
13 end
14 ;
15 return , , ;

In Line 1 of Algorithm 1, we initialize the sets of factor matrices related to the core tensor . Then we use the ALS algorithm to calculate the optimal solution of and that minimize the value of (8) under Condition 2. Furthermore, in Line 2 represents the value of number of iterations we set for the ALS algorithm. Function in Line 4 embodies the calculation process of (B.3) in Proof B.0.4 of Appendix B. Line 5 completes the SVD decomposition of the transition variable , which corresponds to (B.4) in Proof B.0.4 of Appendix B. Moreover, Line 6 presents the method for calculating that minimizes (8) while fixing where and . Similarly, Line 8 to Line 11 demonstrate the process of calculating to minimize (8) while fixing where and . In Line 14, function represents the calculation of which is the core tensor of users’ legal consulting statement . is interpreted as a result of the tensor decomposition of directed to users’ legal consultation pattern .

4.2. Optimization Method of Users’ Legal Consulting Pattern

In this study, we use the Bi-LSTM [27] to optimize users’ legal consultation pattern and ensure that the final calculated is a favorable users’ legal consultation pattern for the classification model of users’ legal consulting intention. Notably, the optimization function of Bi-LSTM is Rmsprop. The following section presents the training process of Bi-LSTM using Rmsprop as its optimization function:(i)We use the initial user legal consultation pattern and the core tensor set which represents user’s legal consulting statements as the input of Bi-LSTM. Each core tensor in the core tensor set is the result of the pattern-oriented tensor decomposition method while using the corresponding original tensor and the user legal consultation pattern as the input. approaches the user legal consultation pattern on the layer of tensor structure.(ii)In this study, the output of Bi-LSTM is used as the input of the softmax layer to realize the mapping of output vectors to categories of users’ legal consulting statement , and is the number of hidden layers. Moreover, the cross entropy is used as the loss function for calculating the error.(iii)By propagating the forward and reverse between LSTM units, using the formulas of the error backpropagation in Bi-LSTM over time, and the error inverse propagation between the hidden layers of Bi-LSTM, we calculate the partial derivative of the loss function with respect to the weight matrix , the bias term , and the users’ legal consulting pattern , that is, , , and , correspondingly.(iv)The Rmsprop optimization function is used to continuously optimize and iterate the abovementioned parameters ,, and using the value of , , and , correspondingly. Finally we obtain the value of the weight matrix , bias term , and users’ legal consultation pattern . These parameters are favorable for users’ legal consultation intention classification model based on Bi-LSTM.

4.2.1. Method for Calculating the Partial Derivative of to

This study proposes a method for solving the partial derivative of the loss function with respect to users’ legal counseling pattern . Directly calculating the partial derivative of to is difficult. However, we can indirectly determine the partial derivative of the loss function on users’ legal consultation pattern by using the total and indirect derivative rules.

In this study, we use tensor which represents users’ legal consulting statement, the factor matrices and , and the core tensor which approaches users’ legal consultation pattern on the layer of tensor structure as transition variables. , , and are calculated through the pattern-oriented tensor decomposition method with and as its inputs. The transition variables , , , and transform the derivative problem into the Sylvester problem, we use the Hessenberg-Schur algorithm to solve the Sylvester matrix equation, and finally the partial derivative of loss function with respect to users’ legal consulting pattern is obtained.

Lemma 14. Given tensors which represent users’ legal consulting statements, users’ legal consultation pattern , and the corresponding factor matrices , and the core tensors which are obtained through the pattern-oriented tensor decomposition method with and as its inputs, the partial derivative of loss function with respect to the users’ legal consulting pattern is obtained usingwhere and . Functions and meet the following limitations.

Proof C.0.5 in Appendix C elucidates the proof of Lemma 14. Using (B.2), (B.3), and (B.4) in Proof B.0.4 of Appendix B, we can obtain and , is the identity tensor, . The next part is the method for solving . and are calculated similarly. We determine the partial derivative of on both sides of the abovementioned equation and obtain that , which is the classic Sylvester equation and can be calculated using the Hessenberg-Schur algorithm.

Algorithm 2 demonstrates the optimization method for users’ legal consulting pattern in this study. In Line 2, represents the training times of the Bi-LSTM used. Lines 4 to 12 are the training steps for the Bi-LSTM model. In Line 6, represents the number of samples per small batch training. Function in Line 7 denotes the pattern-oriented tensor decomposition method. Line 8 presents the forward propagation process in Bi-LSTM on forward and backward layers. Line 9 elucidates the backpropagation process of errors over time and neural network layers in Bi-LSTM. We use Rmsprop in Line 11 as an optimization function to optimize the parameters in Bi-LSTM.

Input: , where is the user’s legal consulting statement and represents category that corresponds to
, and the size of , , , , , , , and , where , ,
and
Output: The optimal users’ legal consultation pattern , parameters of Bi-LSTM , , , , , ,
and .
1 Initialize users’ legal consultation pattern , parameters of Bi-LSTM , , , , , , and ;
2 for to do
3 Set to zero;
4 while do
5 Set , , , , , , ,
, to zero;
6 for to do
7 ;
8 ;
9 , , , , , , , ,
;
10 end
11 , , , , , , , , ,,
, , , , , , , , ;
12 end
13 end
14 return , , , , , , , and ;

4.2.2. Loss Function and Softmax Layer

In Algorithm 2, we use function to calculate the probability that belongs to each category. For intent classification of users’ legal consulting statements, the function above is defined as follows.

Definition 15. Given a set of users’ legal consulting statement samples and their corresponding outputs of Bi-LSTM , the probability that belongs to each category is calculated usingwhere represents the th element of .

In this study, the cross entropy is used as the loss function to calculate the error of Bi-LSTM. We define as follows.

Definition 16. Given a set of users’ legal consulting statement samples and their corresponding categories . The estimated category of is , which is calculated using Bi-LSTM; thenwhere represents the number of samples of users’ legal consulting statements. denotes the dimension of and , that is, the number of categories.

5. Empirical Results

We provide the results of the deep learning model with pattern-oriented tensor decomposition proposed in this study on actual datasets. The experiment verifies that the Bi-LSTM model with pattern-oriented tensor decomposition can accurately classify and understand users’ legal consulting sentences and intentions comprehensively. Bi-LSTM with a pattern-oriented tensor decomposition layer is more efficient, interpretable, and instructive than traditional recurrent neural networks.

5.1. Data Description

The data used in this study are mainly online users’ legal consulting statements. Our main data sources include the China Legal Business Consulting website and various public legal consulting service platforms at the local level. In this study, approximately 150,000 legal consultations are obtained from all over China from 2008 to 2018. These data have been manually labeled under the professional legal background and divided into 28 categories, including various common legal disputes, such as divorce and contract disputes, property transfer, and loan compensation.

Moreover, this study conducts a rigorous statistical analysis of the collected datasets and discovers certain interesting data. From 2008 to 2018, the public online legal advice issues are mainly concentrated on labor and personnel, civil, contract, and property disputes; marriage relationship; and creditor’s rights debt.

Figure 5(a) displays the distribution of different categories of partial users’ legal consulting statements. Evidently, the legal disputes that people aim to solve through online legal consultation have evident tendencies, mainly in the civil aspects, such as marriage and loan disputes and property division. By contrast, major or extraordinarily serious criminal offenses are extremely rare.

(a) Number of users’ legal consulting statements on each category

(b) Length of users’ legal consulting statements

5.2. Baseline Approaches

In order to understand the intent of users’ legal consulting statements, we proposed Bi-LSTM with pattern-oriented tensor decomposition method. However, in previous studies, research on the understanding of users’ legal consulting intention using deep neural networks and tensors has been rarely mentioned. We establish the experimental comparison works of this study on the following basic points:(i)From the perspective of deep neural networks, we use the latest neural networks for comparison with Bi-LSTM, including TextCNN [15] and TextCNN attention [17] which are based on convolutional neural networks [28], TextRNN [16] and TextRNN attention [17] which are based on recurrent neural networks, and LSTM, GRU [29], Bi-GRU [30].(ii)From the perspective of the tensor decomposition layer, we use common tensor decomposition algorithms for comparison with pattern-oriented tensor decomposition method, including Tucker and CP tensor decomposition algorithms.

Through the first point above, we show the performance of Bi-LSTM relative to other deep neural networks on intention classification of users’ legal consulting statements. Through the second point above, we show the superiority of pattern-oriented tensor decomposition method compared to other unsupervised tensor decomposition algorithms.

5.3. Feature Extraction

In this study, numerous preprocessing operations are performed on the obtained users’ legal consulting statements. The preprocessing operation can be divided into two main steps:(i)Module definition of users’ legal consulting statement. In this study, each users’ legal consulting statement is represented as a three-dimensional tensor. We divide user’s legal consultation into five modules, namely, subject, object, motivation, behavior, and consequences of consultation. Each module contains multiple vocabularies and is represented by a matrix of predetermined dimensions. The vertical dimension of the matrix demonstrates vocabularies contained in the module, and the horizontal dimension demonstrates embedding vectors of these vocabularies.(ii)Quantitative representation of users’ legal consulting statement. This study first performs word segmentation on users’ legal consulting statements and remove Chinese punctuation marks, stop words, redundant vocabularies, and other basic operations because the data collected are all Chinese users’ legal consulting statements. Furthermore, this study divides each users’ consulting statement in the dataset into five modules in accordance with the previous step. This study represents the vocabularies in users’ legal consulting statements in embeddings, that is, word-embedding operation. On this basis, this study instantiates each module of users’ legal consulting statement, vectorizes each vocabulary, and represents each user’s legal consulting statement in tensor.

The users’ legal consulting statement is represented by a three-dimensional tensor. In Figure 6, the first dimension of the tensor represents modules in the statement. The second dimension represents meaningful vocabulary contained in each module. The vocabularies are derived from the original statement through removing redundant, meaningless, and repeated words. The third dimension represents the word embedding corresponding to each word.

We divide each user’s legal consulting statement into five modules, namely, subject, object, motivation, behavior, and consequences of consultation. Each module in the users’ consulting statement exhibits multiple entity objects. For example, in the users’ legal consulting statement: “Xiao Wang repeatedly threatened me with a knife and took me more than 30,000 yuan. What crime should he sue?”, the subjects of the consultation module are “me” and “Xiao Wang”. Then, the object of the consultation module is “30,000 yuan”. The motivation, behavior, and consequence of the counseling module correspond to “crime”, “threatened”, and “knife”.

The word vector generation model is trained under large Chinese corpus. The Chinese Wikipedia and news from multiple websites, such as Tencent, Sohu, and Sogou, are used as corpora for Chinese word vector training [31]. Then we use the word2vec tool proposed by Google to train Chinese word vectors [32]. Word2vec converts one-hot vectors in corpus into low-dimensional dense vectors. The word-embedding operations ensure that users’ legal consulting statements can be processed using the Bi-LSTM model with pattern-oriented tensor decomposition presented in this study [33]. We fix users’ legal consulting statements to the same length because tensors representing them must have the same dimensions.

The length of the users’ legal consulting statement in the database is illustrated in Figure 5(b). Evidently, the number of vocabularies for most users’ consulting statements is between 15 and 500, except for a particularly small number of users’ legal consulting statements in which the number of vocabularies is higher than 2000 or less than 10. In this study, we set a vocabulary baseline and run users’ legal consulting statements with more vocabularies than the baseline. Then, we fill in users’ legal consulting statements with fewer vocabularies than the baseline.

5.4. Parameter Adjustment and Experimental Settings

We have implemented a tensor representation of each user’s legal consulting statement in the database on the basis of the abovementioned operations. This study uses a TensorFlow development kit to complete the programming of the proposed method. Then, the parameters of the Bi-LSTM model with the pattern-oriented tensor decomposition method proposed in this study are set.

In contrast to the traditional deep neural network algorithms, the important parameters involved in this model include the size of batches while training the neural network and number of layers, neurons in each layer, and iterations of the overall neural network algorithm. Furthermore, these parameters contain users’ legal consultation pattern . The setting of users’ legal consultation pattern seriously affects the convergence speed and accuracy of the model. Our experiments show that the classification accuracy of users’ legal consulting statement is difficult to increase when the structure of users’ legal consultation pattern is single; that is, column vectors in exhibit a linear relationship.

For all neural networks involved in the experiments of this article, including TextCNN, TextCNN attention, TextRNN, TextRNN attention, LSTM, Bi-LSTM, GRU, and Bi-GRU mentioned in Section 5.2, we trained each of them for 10 epochs with a batch size of 60, a hidden layer size of 512, a hidden layer number of 3, and a learning rate of 0.001. We use the TensorFlow development kit to implement neural networks and use Graphics Processing Unit (GPU) to run programs for a fast computing speed.

5.5. Experimental Results and Analysis

We provide experiments on the baseline in Section 5.2 and our approach for intention classification of users’ legal consulting statements in this section. Simultaneously, we provide a detailed explanation of the superiority of Bi-LSTM and the necessity of pattern-oriented tensor decomposition.

Figure 7 indicates that neural networks with tensor decomposition layer converge faster and have higher accuracy than that without it. In fact, this phenomenon is determined by the characteristics of tensor decomposition methods. Tensor decomposition algorithms extract the main structure and element information from the original tensor, while removing redundant information which has strong logical correlation. That is, tensor decomposition weakens the influence of vocabularies with weak relevance to the intention of users’ legal consulting statements on the classification model and enhances the influence of strong related vocabularies on it. Moreover, the tensor decomposition layer reduces the dimension of original tensors. This greatly reduces the computational complexity of subsequent deep neural networks. Therefore, the tensor decomposition layer makes neural networks converge faster and achieve higher accuracy.

(a) Experimental results of algorithms based on LSTM

(b) Experimental results of algorithms based on Bi-LSTM

(c) Experimental results of algorithms based on TextCNN

(d) Experimental results of algorithms based on TextRNN

As can be seen from the pink and cyan curves in Figure 7, Tucker and CP tensor decomposition have basically the same optimization effect on neural networks. This is because CP decomposition is a special case of Tucker decomposition. Tucker tensor decomposition is actually a high-order singular value decomposition (SVD). Tucker decomposition uses the SVD algorithm to iteratively extract the main components of each mode in the original tensor and finally figures out a core tensor and its corresponding factor matrix set. When the core tensor is a diagonal tensor, Tucker decomposition evolves into CP decomposition. Core tensors obtained by Tucker and CP decomposition are weakly interpretable. These methods are all unsupervised tensor decomposition methods. For neural network algorithms, Tucker and CP decomposition are not steerable and autonomous learning.

According to the red curves in Figure 7, we can see that the pattern-oriented tensor decomposition layer optimizes neural networks much more than Tucker and CP decomposition. It allows neural network algorithms to converge faster while achieving higher accuracy. The pattern-oriented tensor decomposition algorithm controls the tensor decomposition process through pattern tensors. This algorithm makes the core tensor extracted from the original one approximate the pattern tensor on tensor structure and elements information. On this basis, neural network classification algorithms affect the process of tensor decomposition by continuously optimizing the pattern tensor. These operations ultimately make core tensors carry information that is most conducive to improving the accuracy of the classification model. Therefore, compared with Tucker and CP tensor decomposition, pattern-oriented tensor decomposition method is more instructive and autonomous learning. Moreover, resulting core tensors are more interpretable. In general, the pattern tensor is a bridge between tensor decomposition and neural networks.

Figure 7(c) demonstrates that TextCNN has lower accuracy than TextRNN, LSTM, and Bi-LSTM in classifying the intention of users’ legal consulting statements. This is because the convolution kernel is more concerned with the spatial relationship of input data. Convolutional neural networks only consider the current input while recurrent neural networks consider both the current input and previous inputs. Users’ legal consulting statements are sequence data. The meaning of one word is related to words both before and after it. Compared with convolutional neural networks, recurrent neural networks can capture the lexical sequence relationships such as transition, emphasis, and juxtaposition included in statements. Therefore, recurrent neural networks provide a more comprehensive analysis of users’ legal consulting statements and achieve higher accuracy.

As can be seen from Figures 7(a), 7(b), and 7(d), LSTM and Bi-LSTM can achieve higher accuracy than TextRNN. Recurrent neural networks are difficult to handle long-distance dependencies. When the input users’ legal consulting statement is long, recurrent neural networks may experience gradient disappearance or explosion. LSTM-based neural networks solve the above problem by adding new cell states and gating mechanisms. Bi-LSTM comprehensively considers outputs of the forward and backward LSTM units. Compared with unidirectional LSTM, Bi-LSTM can achieve higher accuracy.

Tables 1 and 2 provide the accuracy and Micro-F1 score of a variety of neural networks for intention classification of users’ legal consulting statements. TP stands for tensor decomposition. It can be seen that Bi-LSTM with pattern-oriented tensor decomposition layer has the highest accuracy compared to other algorithms. From the perspective of sequence coding, attention layer can break the limit of fixed-length inputs and calculate the relationship between input sequences and output sequences. Although attention layer adds a series of weight parameters and learns the weight of each element from inputs and outputs sequences, it does not change the structure inside original neural networks. For the problem of intention classification of users’ legal consulting statements, attention layer is difficult to compensate for the missing sequence information of TextCNN and the gradient disappearance or explosion problems of TextRNN.

Tables 1 and 2 demonstrate that the pattern-oriented tensor decomposition layer has a greater optimization effect on LSTM and Bi-LSTM than GRU and Bi-GRU. GRU is a simplification of LSTM. LSTM controls outputs of neural units through the output gate, while GRU passes outputs directly to next neural units. Therefore, GRU converges faster than LSTM. For the optimization of pattern tensors, LSTM is better than GRU. The main reason is that GRU has higher integration and fewer adjustable parameters than LSTM. That is to say, GRU has a relatively limited optimization of pattern tensors.

6. Conclusion

In this study, we propose a new method (i.e., Bi-LSTM with pattern-oriented tensor decomposition) to solve the problem of users’ legal intention understanding in the field of legal services. Our method combines deep neural network with tensor decomposition method to complete the classification and deep understanding of users’ legal consulting statements. Our method is more instructive and interpretable than the traditional deep neural networks. We propose a new tensor decomposition method that is driven by users’ legal consultation patterns and continuously guide the training and update process of deep neural networks.

Experiments show that our technique demonstrates faster convergence and higher accuracy than the traditional deep learning methods by applying the tensor decomposition layer. We will extend our algorithm in future work and increase the dataset to enrich the model parameters and make these parameters universal.

Appendix

A.

We provide the proof of Lemma 11 in Proof A.0.1, and the proof of Theorem 12 in Proof A.0.2. Through these proofs, Theorem 12 and Lemma 11 proved to be able to solve Problem 10.

Proof A.0.1. In the target function , we use to represent . Moreover, is the identity tensor, . Then function can be described as . Here, let ; thus . We can obtain by using the derivation formula of Frobenius function. Therefore .

Proof A.0.2. Using (10), we can get function . We use to represent and use to represent . Then function can be written as . We declare a function , for any . For any matrix , . Then we can obtain . Since , , . Using Lemma 11, . When , . Finally, we can obtain (11) which is the optimal solution of (10) in Problem 10 while factor matrices and satisfy Conditions 1 and 2.

B.

Proof B.0.3 gives the proof of Lemma 13. Combining Lemma 13 and the ALS algorithm, Proof B.0.4 gives the solution of Problem 9.

Proof B.0.3. The target function is , where . By expanding the above formula, we can obtain that . We use to represent . Then function is described as . The abovementioned equation is abbreviated as . Therefore, . Let ; then we can obtain . By using the derivation formula of Frobenius function, we conclude that ; that is, . Finally .

Proof B.0.4. We declare a function using (8). If and can obtain the minimum of , then they can also generate the minimum of . We calculate the partial derivative of function with respect to while fixing where and . We use to represent and to represent . Function can be written as . We can abbreviate it as follows. ,Using Lemma 13, we can obtain . We set to zero. We can get that . Since meets (9), we declare a temporary variable . LetBy using the abovementioned formula, we obtain the transformation . We decompose matrix using SVDwhere and are the orthogonal matrices, and is diagonal matrix with nonnegative elements. And eventually we can determineand is calculated similarly to .

C.

Proof C.0.5 provides the proof of Lemma 14.

Proof C.0.5. On the basis of the total and indirect derivative rules, we can obtain that . Furthermore, we can obtain using (11). In accordance with (B.2), (B.3), and (B.4), we determine that and are the functions of . Then we can obtain (13) and (14) by calculating the differentials , , , and . Finally we derive (12) which shows the method for calculating .

Data Availability

The users’ legal consulting statement data after processing used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, 6/12 months] after publication of this article, will be considered by the corresponding author.

Conflicts of Interest

The researchers claim no conflicts of interests.

Acknowledgments

In the research and writing work of this article, we would like to thank teachers and students of our lab team (Network and Information Security Research Center, Harbin Institute of Technology) for their selfless help. This work was supported by the National Key Research and Development Program of China (2018YFC0830602, 2016QY03D0501).

References

K. Kirkpatrick, “Legal advice on the smartphone,” Communications of the ACM, vol. 59, no. 7, pp. 19–21, 2016.
View at: Publisher Site | Google Scholar
B. M. Gaff, S. G. Huggard, and G. W. Carey, “Do i need a lawyer? if you have to ask, you probably do,” The Computer Journal, vol. 45, no. 5, pp. 10–12, 2012.
View at: Publisher Site | Google Scholar
R. Yaroshinsky, R. El-Yaniv, and S. S. Seiden, “How to better use expert advice,” Machine Learning, vol. 55, no. 3, pp. 271–309, 2004.
View at: Publisher Site | Google Scholar
A. L. Tyree, G. Greenleaf, and A. Mobray, “Generating legal arguments,” Knowledge-Based Systems, vol. 2, no. 1, pp. 46–51, 1989.
View at: Publisher Site | Google Scholar
S. Gupta, S. Kamali, and A. Lpez-Ortiz, “On the advice complexity of the k -server problem under sparse metrics,” Theory of Computing Systems, vol. 8179, no. 3, pp. 1–24, 2013.
View at: Google Scholar
A. K. Uysal, “An improved global feature selection scheme for text classification,” Expert Systems with Applications, vol. 43, no. C, pp. 82–92, 2016.
View at: Publisher Site | Google Scholar
O.-M. Sulea, M. Zampieri, S. Malmasi, M. Vela, L. P. Dinu, and J. Van Genabith, “Exploring the use of text classification in the legal domain,” in Proceedings of the 2nd Workshop on Automated Semantic Analysis of Information in Legal Texts, ASAIL 2017, 2017.
View at: Google Scholar
I. S. Bajwa, F. Karim, M. A. Naeem, R. U. Amin et al., “A semi supervised approach for catchphrase classification in legal text documents,” Journal of Computers, vol. 12, no. 5, 2017, p. 451+. Academic OneFile, Accessed 6 Mar. 2019.
View at: Google Scholar
F. Galgani and A. Hoffmann, “Lexa: Towards automatic legal citation classification,” in Proceedings of the Ai 2010: Advances in Artificial Intelligence - Australasian Joint Conference, pp. 445–454, Adelaide, Australia, December 2010.
View at: Google Scholar
X. Xiong, “Application of quadratic dimension reduction method based on lsa in classification of the chinese legal text,” Electronic Measurement Technology, 2007.
View at: Google Scholar
E. De Maat, K. Krabben, and R. Winkels, “Machine learning versus knowledge based classification of legal texts,” Frontiers in Artificial Intelligence and Applications, vol. 223, pp. 87–96, 2010.
View at: Google Scholar
R. Bartolini, A. Lenci, S. Montemagni, V. Pirrelli, and C. Soria, “Automatic classification and analysis of provisions in Italian legal texts: a case study,” in On the Move to Meaningful Internet Systems 2004: OTM 2004 Workshops: OTM Confederated International Workshops and Posters, GADA, JTRES, MIOS, WORM, WOSE, PhDS, and INTEROP 2004, Agia Napa, Cyprus, vol. 3292 of Lecture Notes in Computer Science, pp. 593–604, Springer, 2004.
View at: Publisher Site | Google Scholar
P. Wang, B. Xu, J. Xu, G. Tian, C.-L. Liu, and H. Hao, “Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification,” Neurocomputing, vol. 174, pp. 806–814, 2016.
View at: Publisher Site | Google Scholar
K. Nigam, A. K. Mccallum, S. Thrun, and T. Mitchell, “Text classification from labeled and unlabeled documents using EM,” Machine Learning, vol. 39, no. 2, pp. 103–134, 2000.
View at: Publisher Site | Google Scholar
Y. Kim, “Convolutional neural networks for sentence classification,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751, Doha, Qatar, October 2014.
View at: Publisher Site | Google Scholar
J. Donahue, L. A. Hendricks, S. Guadarrama et al., “Long-term recurrent convolutional networks for visual recognition and description,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 2625–2634, USA, June 2015.
View at: Google Scholar
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” CoRR, vol. abs/1409.0473, 2014.
View at: Google Scholar
T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
M. Nakatsuji, H. Toda, H. Sawada, J. G. Zheng, and J. A. Hendler, “Semantic sensitive tensor factorization,” Artificial Intelligence, vol. 230, no. C, pp. 224–245, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
A. H. Phan and A. Cichocki, “Tensor decompositions for feature extraction and classification of high dimensional datasets,” Nonlinear Theory and Its Applications, IEICE, vol. 1, no. 1, pp. 37–68, 2010.
View at: Publisher Site | Google Scholar
A. Bernardi, J. Brachat, P. Comon, and B. Mourrain, “General tensor decomposition, moment matrices and applications,” Journal of Symbolic Computation, vol. 52, no. 9, pp. 51–71, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
Y. H. Taguchi, “Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing,” Plos One, vol. 12, no. 8, Article ID e0183933, 2017.
View at: Google Scholar
I. V. Oseledets, “Tensor-train decomposition,” SIAM Journal on Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
Y.-H. Taguchi, “Identification of candidate drugs using tensor-decomposition-based unsupervised feature extraction in integrated analysis of gene expression between diseases and drugmatrix datasets,” Scientific Reports, vol. 7, no. 1, 2017.
View at: Google Scholar
M. V. Rakhuba and I. V. Oseledets, “Grid-based electronic structure calculations: the tensor decomposition approach,” Journal of Computational Physics, vol. 312, pp. 19–30, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
Y. Wu, H. Tan, Y. Li, F. Li, and H. He, “Robust tensor decomposition based on Cauchy distribution and its applications,” Neurocomputing, vol. 223, pp. 107–117, 2017.
View at: Publisher Site | Google Scholar
T. Chen, R. Xu, Y. He, and X. Wang, “Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN,” Expert Systems with Applications, vol. 72, pp. 221–230, 2017.
View at: Publisher Site | Google Scholar
L. O. Chua and T. Roska, “CNN paradigm,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 40, no. 3, pp. 147–156, 1993.
View at: Publisher Site | Google Scholar
K. Cho, B. V. Merrienboer, C. Gulcehre et al., “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” Computer Science, 2014.
View at: Google Scholar
T. Bansal, D. Belanger, and A. McCallum, “Ask the gru: multi-task learning for deep text recommendations,” in Proceedings of the 10th ACM Conference on Recommender Systems, RecSys 2016, pp. 107–114, USA, September 2016.
View at: Google Scholar
S. B. Kakvaeva, “Substantive word combinations with dependent adjectives, pronouns, numerals, participles and adverbs in russian and lak languages,” Sociosfera, no. 4, 2012.
View at: Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” Computer Science, 2013.
View at: Google Scholar
Y. Goldberg and O. Levy, “Word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method,” https://arxiv.org/abs/1402.3722.
View at: Google Scholar

Copyright

Copyright © 2019 Xiaoding Guo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

904

Downloads

1012

Citations

Wireless Communications and Mobile Computing

Deep Learning Driven Wireless Communications and Mobile Computing

Learning Users’ Intention of Legal Consultation through Pattern-Oriented Tensor Decomposition with Bi-LSTM

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Definitions and Notations

3.2. Tensor Decomposition

4. Our Approach

4.1. Pattern-Oriented Tensor Decomposition Method

4.2. Optimization Method of Users’ Legal Consulting Pattern

4.2.1. Method for Calculating the Partial Derivative of to

4.2.2. Loss Function and Softmax Layer

5. Empirical Results

5.1. Data Description

5.2. Baseline Approaches

5.3. Feature Extraction

5.4. Parameter Adjustment and Experimental Settings

5.5. Experimental Results and Analysis

6. Conclusion

Appendix

A.

B.

C.

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright