Abstract

In seismic data processing, data recovery including reconstruction of the missing trace and removal of noise from the recorded data are the key steps in improving the signal-to-noise ratio (SNR). The reconstruction of seismic data and removal of noise becomes a sparse optimization problem that can be solved by using sparse regularization. Sparse regularization is a key tool in the solution of inverse problems. They are used to introduce prior knowledge and make the approximation of ill-posed inverses feasible. It deals with ill-posedness by replacing an ill-posed inverse problem with a well-posed problem that has a solution close to the true solution. In the last 2 decades, interest has shifted from linear toward nonlinear regularization methods even for linear inverse problems. In inverse problems, regularizations serve as stabilizing the solution of ill-posed inverse problems and give a solution that adequately fits measurements without producing unjustifiably complex artifacts. In this paper, we present a novel sparse regularization based on a tensor-based dictionary method for inverse problems (seismic data interpolation and denoising). This regularization avoids the vectorization step for sparse representation of seismic data during the reconstruction process. The key step in sparsifying signals is the choice of sparsity-promoting dictionary learning. The learning-based approach can adaptively sparsify datasets but has high computational complexity and involves no prior-constraint pattern information for the dataset. Many existing dictionary learning methods would be computationally infeasible for the high dimensional seismic data processing. These methods also destroy the essential information as well as it reduces the discriminability and expressibility of the signal, since they deal with vectorization. In this paper, the orthogonal tensor dictionary learning that learns a dictionary from the input data by employing orthogonality and separability is proposed as sparse regularization for the inverse problems. The performance of the proposed method was validated in seismic data interpolation and denoising individually as well as simultaneously. Numerical examples of synthetic and real seismic datasets demonstrate the validity of the proposed method. The SNR of the recovered data confirms that the proposed method is the most effective method than K-singular value decomposition and orthogonal dictionary learning methods.

1. Introduction

In exploration geophysics, seismic data processing is an essential task in processing various properties of the earth’s subsurface. Due to the physical and budget constraint, only a subset of seismic data is acquired for this process. The acquired seismic data may be subjected to various noise contamination, and some important significant traces may also distort. To increase the resolution of the seismic records, random noise attenuation and interpolation are the two critical steps in seismic data processing. Improving the quality of seismic data by removing random noises and reconstructing the missing irregular seismic traces play a vital role in providing high-resolution processing, oil, and gas exploration, multiple suppression, and wave-equation migration [1]. Seismic event detection, migration, and inversion demand a high quality of seismic data [2]. Different methods have been proposed for seismic noise attenuation and interpolation [35]. Most methods in seismic noise suppression are applied in the transform domain in which the signal and noise are distinguishable [6]. Even though reconstructing the missing seismic traces (interpolation) and attenuating seismic noise (denoising) methods are different in their output, they are nearly identical operations in the process of obtaining clear seismic data at the final [7].

Based on the prediction method in the time–spatial tx domain [8, 9], the frequency–spatial fx domain [10, 11] based on the predictability of linear events without the prior knowledge of lateral coherence of the events has been proposed. In the frequency–wavenumber, fx domain [7, 12] has been proposed by utilizing a convolutional prediction filter computed from the low-frequency parts to predict the high-frequency components. These methods are only applied to the seismic data with linear events and are applied for only regularly sampled seismic data.

Based on the rank-reduction Cadzow method such as truncated singular value decomposition (SVD)-based matrix rank reduction of constant frequency slices for trace interpolation [13], multichannel singular spectrum analysis [3], adaptive rank-reduction based on the energy entropy [14] in solving rank reduction were proposed. Low-rank matrix completion [15, 16], tensor higher order SVD [17], and nuclear-norm minimization-based matrix completion [18] have been proposed in seismic data restoration as an extension of Cadzow method.

Based on the wave-equation, seismic data restoration utilizes the inherent constraint of seismic data from wave equation to reconstruct seismic data [1]. Different methods have been proposed based on the wave-equation method [19]. This method attains good performance in the restoration of seismic data. However, it depends on only the known velocity model.

The fourth method is based on dictionary transform, promoting sparsity in seismic data processing. In seismic data processing, interpolation and denoising are the two inverse problems in which we can employ sparsity constraints [20]. The sparse dictionary learning (DL) algorithm has been also proposed to attenuate random noise in a transform-based denoising framework [21]. Sparse representation represents input data as the linear combination of basis elements such that most of the representation coefficients are zero [22]. For a given signal , the sparsest representation is to find a sparse vector such that , where is the dictionary and each of its columns is an atom. The idea of sparsity in seismic data processing has been used in two different ways: fixed (analytic) basis transform and learning (adaptive) dictionaries.

Fixed basis transforms have been applied in seismic data processing by using a known set of basis functions to estimate representation coefficients of the input data. Wavelet multiresolution analysis [23], physical wavelet transforms [24], Radon transforms [25, 26], Fourier transform [27, 28], curvelet transforms [29, 30], seislet transforms [31, 32], and shearlet transform [33] have been proposed. In the non-subsampled contourlet transform domain, threshold shrink method [34] for denoising seismic data has been also proposed. In analytic DL approaches, the analytic construction is employed, and the mathematical model of the signal is formulated to represent the model. These methods are simple and have fast computation in creating data features for different applications in seismic data processing.

Adaptive/learned DL uses the learned basis functions from the input data to estimate the missing data and restore the degraded data by a sparsity-promoting model. Based on adaptive/learned DL, different methods have been proposed in image processing. The adaptive DL algorithm has also been proposed to attenuate automatic coherent noise [35]. This approach can learn the features of both signals and coherent noise and leave obvious morphological differences in the dictionary atoms. Principal component analysis (PCA) [36], generalized PCA [37], method of optimal directions (MOD) [38, 39], and K-singular value decomposition (K-SVD) [40] are adaptive/learned dictionary proposed in image science. PCA is widely used in statistics for multivariate analysis. It reduces the dimensionality of the data by preserving the variability of data variables. K-SVD learns an overcomplete dictionary from the noisy image patches and uses the learned dictionary to sparsely represent model and image denoising [41]. K-SVD has been proposed for signal processing applications [42] and seismic data denoising [38]. However, the input data are considered as a vector in K-SVD, and then at each iteration step, SVD is used to transform the matrix to update each column of the dictionary [20]. It leads to the redundancy of dictionaries in which most of the atoms are similar or contribute little role in the representation of input data. K-SVD-based sparse representation has also been used in seismic data denoising and compression [43]. In this case, K-SVD is used to sparsely represent the 4D seismic dataset to find a representation with as few coefficients as possible while preserving the main characteristics of the data. In the learning dictionary for sparsity-based seismic data restoration, high redundancy dictionary leads to a poor sparse approximation of data. This approach is computationally infeasible when dealing with seismic data and requires high computational cost [44]. Data-driven tight frame (DDTF) [45], which learns dictionary with a prescribed block Toeplitz structure and satisfies the perfect reconstruction property for denoising, has been proposed to reduce the computational complexity of K-SVD. The high performance of DDTF in interpolation and denoising of high dimensional seismic data was analyzed [46, 47].

The hybrid method, which combines the advantage of fixed basis and learned-based DL, has been applied in seismic data processing. A double sparsity method which is based on seislets and DDTF [48] was proposed. However, most of these methods deal with vectors leading to loss of seismic data structure and poor sparse representation. To overcome this problem, tensor-based DL methods [49, 50] have been proposed for various sparsity-based restoration. In the tensor DL method, the input data are treated as tensors instead of vectors and the dictionary is learned by tensor decomposition [44]. The DDTF of Kronecker type (KronTF) [20], tensor decomposition [17], and a Kronecker-based dictionary in dynamic computed tomography [51] have been proposed to avoid the vectorization of input data and applying a learned dictionary based on tensor decomposition.

Many existing DL methods would be computationally infeasible when dealing with high-dimensional seismic data. Moreover, the vectorization method which likely destroys the input data’s important information and reduces the discriminability and expressibility of the obtained representation are applied to those methods. In this paper, we have proposed the orthogonal tensor DL that learns a dictionary from the input data by employing orthogonality and separability. We use this method for seismic data denoising and interpolation. The separability of the dictionary makes the proposed method highly scalable, and the orthogonality among dictionary atoms leads to a very efficient sparse coding computation. The scalability and computational efficiency make the proposed method suitable for processing seismic data in the tensor domain [44]. The rest part of this paper is organized as follows. In Section 2, we discussed the basic idea of orthogonal DL and its construction which greatly simplify the computation of both dictionary updating and sparse coding for the sparse representation of input data. In Section 3, orthogonal tensor-based DL and its property are verified. The orthogonal tensor DL is applied to seismic data denoising and interpolation problems. The numerical results are also presented in Section 4. Finally, the conclusion of this paper is addressed in Section 5.

2. Orthogonal Dictionary Learning

Sparse models in representing input data are an active research area in natural image processing. In a sparse model, the local input data patches are sparsely approximated by the linear combination of atoms in which the collection of these atoms forms a dictionary. The main challenge in sparse representation is the construction of a dictionary which is computationally feasible in seismic data representation. The orthogonal dictionary is the one which can greatly simplify the computation of both dictionary updating and sparse coding for the sparse representation of input data. In orthogonal DL, we aim at finding the orthogonal dictionary whose columns are dictionary atoms such that and to sparsely approximate the given data , where n and m are the size of orthogonal matrices. contains input orthogonal atoms and is the set of atoms to be learned from the input data. The corresponding minimization problem to learn orthogonal dictionary is formulated as follows [52]:such that and , where is the Frobenius norm which is defined as , where are the vectors or entries of the matrix and denotes the number of nonzero coefficients of . is the parameter that balances the trade-off between the approximation term and the sparsity term. The solution of Equation (1) is disposed to increase the elements of such that to become as small as possible. To solve Equation (1), we can take the alternating iterative scheme by decomposing the minimization problem into two steps. In this case, the K-SVD algorithm can solve Equation (1) by employing an iterative strategy that alternates between the two steps such that Equation (1) is reduced into only one unknown by fixing one as known.

Step 1 (sparse coding): For a given orthogonal dictionary , we need to find the sparse code by solving the following minimization problem:

To solve the minimization problem in step 1, we formulate the following remark which is equivalent to Equation (2).

Remark 1. Assume that , then has a unique solution which is given by

Based on Remark 1 and for , Equation (2) has a unique solution which is given by , where is the hard threshold which is defined as follows:

Step 2 (dictionary updating): For a given sparse code , we want to update the dictionary by solving the minimization problem:such that and . The minimization problem in Equation (4) can be solved by employing the following remark which is equivalent to step 2.

Remark 2. The optimization problem in step 2 is equivalent to the following equation:such that and . This minimization problem has a unique solution which is given by the following equation:where and are the orthogonal matrices defined by the following SVD:where is the singular value matrix of , and defined from to the space spanned by the columns of the dictionary is an orthogonal projection operator given by: , for all . Finally, we adopt the following orthogonal DL algorithm.

Input: Training dataset , Input orthogonal atoms .
  Output: Learned dictionary .
 1 Initialize the dictionary
 2 For
 3 Compute the sparse coding by using hard threshold
       ,
 4 Run the SVD for the following matrix
       
 5 Update the dictionary
       
 6 end for

3. Orthogonal Tensor Dictionary Learning

In this section, we extend the orthogonal DL that we have discussed in Section 2 by introducing tensor and learn a tensor dictionary with separability and orthogonality. In the conventional DL approach, the extracted image patches are first transferred into vectors to form training data and then the K-SVD method is used to learn a vector-based dictionary [48]. In this strategy, the inherent spatial constraints of the input data structures can be lost or distorted when the dimension of input data is very high. To overcome this problem, the tensor-based method has been investigated for DL to achieve better performance in signal/image processing [5355]. Tensor is a multidimensional array [56]. An order tensor is denoted as whose elements is , , . Let be the set of orthogonal square matrices. Then, we aim to learn the set of orthogonal dictionaries to be learned from the input data simultaneously such that , for i = 1,2 and in the tensor domain. In this paper, is considered as a tensor of two dimensions which can be written as follows:where and are orthogonal matrices from the set with the sparse coding tensor and is the product of sparse coding tensor with a matrix , are the tensor mode product operators. We formulate the following minimization problem to learn these dictionaries:where denotes the nonzero elements in the tensor domain, and is the Frobenius norm of the tensor which is defined as the square roots of the sum of the absolute squares of its elements. We decompose the problem in Equation (9) into two subproblems and solve them separately.

Step 1: For a given dictionaries and , we need to find the sparse tensor by solving the minimization problem:

To solve for the sparse coding , we consider the vectorized version of the tensor representation in Equation (8) in terms of Kronecker dictionaries which is formulated as follows:where represents the Kronecker product. As we discussed in Section 2, we can compute the corresponding sparse coding by applying a component-wise hard thresholding. Then, the solution for the sparse coding is given by the following equation:where is an operator that keeps the coefficients larger than and setting the other coefficients to be zero.

Step 2: For a given sparse coding , we need to update the dictionaries and . The tensor representation in Equation (8) can be unfolded in order to update the two dictionaries and as follows:

Then, and can be updated by solving the following minimization problem:

The dictionary (n = 1, 2) can be updated by using an alternating least squares method while fixing the sparse coding . To solve for the above two dictionaries and , we formulate the theorem which is related to the minimization problems in Equations (14) and ((15). The proof of this theorem is given in the appendix section.

3.1. Application to Seismic Data Interpolation and Denoising

To evaluate the effect of the proposed method on seismic data processing in terms of quality and computational efficiency, we applied for interpolation (reconstruction of missing traces) and denoising (attenuation of random noise). Let be the complete data for recovery, be the observed data, be the trace sampling matrix which contains only or , and denotes the amount of noise to be added on the input data. Let these data be related by the following model:

When , the model in Equation (16) is the interpolation problem that can reconstruct the missing traces and if is identity matrix and , it corresponds to the denoising problem. After the orthogonal tensor dictionary learned from the input seismic data simultaneously, we used it to propose sparse regularization which promotes sparsity in the seismic data restoration problem. Based on the learned orthogonal tensor dictionary model in Section 3, we formulate the seismic data restoration problem by employing a new sparse regularization. Let be an operator which can extract features of seismic data , then the following minimization problem is formulated to reconstruct the missing traces and denoising seismic data:with the parameters and need to be determined. The first term in Equation (17) is the regularization term that corresponds to tensor-based orthogonal DL and the second term is the fidelity data in the projection domain with the sampling matrix and the degraded data . Basically, we can write as . Now we can represent by and then Equation (17) becomes:

We employed the alternating direction method of multipliers (ADMM) to solve Equation (18) based on the augmented Lagrangian and the separable paraboloid surrogate method. ADMM method has been widely applied to solve different minimization problems in seismic data processing [57]. The surrogate method makes the cost function separable so that all traces of seismic data can be updated simultaneously. Let and be the two auxiliary variables used to transform the nonconvex optimization problem into a convex optimization problem, then we can rewrite Equation (18) as follows:

We split the minimization problem in Equation (19) into three subproblems, where , , and are the regularization parameters. Subproblem 1: updating DL and sparse coding:

Subproblem 2: c-subproblem. The optimization problem for the auxiliary variable c is given by the following equation:

Subproblem 3: -subproblem. The corresponding minimization problem for is as follows:

We have discussed the solution for the one that corresponds to updating DL and sparse coding for subproblem 1 in Section 2. To solve the second subproblem, we apply the optimality condition for Equation (22):

Then,

After the dictionary updated in the iteration process, the seismic data should be updated with and . To do so, the separable paraboloid surrogate method [58] is used to solve for as follows:

By using separable paraboloid surrogate method, is given by the following equation:where

Then,

The denominator of Equation (28) is the curvature of the paraboloid which can be obtained as follows:

Therefore,

Hence,

Finally, we formulate the following algorithm for the proposed method.

Input: Training data set , regularization parameters, patch size.
Output: Learning dictionaries
 1 Initialize , and .
 2 For
 3 Update the sparse coding using hard thresholding,
         
 4 For , Update the dictionaries by running the SVD in Equations (A.18) and (A.20).
 5 End for
 6 Update, using Equation (20)
 7 Update, using Equation (26)
 8 Solve for using Equation (33)
 9 end for

4. Numerical Experiments

The numerical examples and the graphs used in this paper were simulated by MATLAB and the machine we used to run the code is DESKTOP-HKHFG0P: Processor: Intel(R) Core(TM) i7-7500U CPU @ 2.70 GHz 2.90 GHz and RAM: 8.00 GB.

We test the validity of the proposed method to noise free and noisy synthetic and real seismic datasets. The proposed method tested on both synthetic and real datasets by the different percentages of missing traces based on Jittered undersampled strategy which can help to obtain random properties as well as control the maximum gap between adjacent traces to meet the requirements of the compressive sensing theory. Throughout the numerical experiments, the patch size was and the sparsity level was 4. The algorithm in Nazzal et al.’s [59] study is used to obtain the sparsity level. The regularization parameters are tuned based on the convergence condition. Accordingly, the choice of the parameters and is trivial. We empirically search for the optimal value of parameter from the smallest set and search for the parameter from the range of . To evaluate the quality of restored seismic data quantitatively, we introduced the signal-to-noise ratio (SNR) which is defined as follows:where denotes the clean seismic data and denotes the reconstructed seismic data. We simulate seismic data using synthetic datasets containing four layers with linear and curved events and synthetic datasets with six layers containing parabolic events. The 50% of missing traces is used based on the random sampling property as shown in Figures 1 and 2. We apply the proposed method to 2D synthetic data reconstruction with linear and curved events as shown in Figures 1 and 2, respectively. The result of the proposed method is compared with the results by K-SVD and orthogonal DL. In Figure 1, both K-SVD and orthogonal DL provide promising results in the reconstruction of missing traces when the seismic data are with the linear events. However, the reconstruction results by the K-SVD and orthogonal DL are not satisfactory when the seismic data are with the nonlinear events because of the elimination of some important features of the reconstructed data around the boundary, as shown in Figures 2(c) and 2(d). The effectiveness of the proposed method in the reconstruction of missing seismic traces is presented in Figure 1(e). Furthermore, the residual which is defined as the difference between the interpolated and original seismic data is presented in Figure 3 to show the performance of K-SVD, orthogonal DL, and proposed methods in the reconstruction of missing traces. For the results by K-SVD and orthogonal DL, there are some important features of seismic signals left in the residual sections, as observed in Figures 3(a) and 3(b), respectively. There is no important information about seismic signals left in the residual sections as indicated in Figure 3(c). The minor residual indicates that the important features of the seismic signal are well preserved, and better interpolated seismic data can also be achieved. For further comparison, the SNR of the proposed method, K-SVD, and orthogonal DL are given in Table 1. The proposed method clearly shows much better signal preservation and SNR enhancement while reconstructing the missing seismic traces. For additional comparison and to see the detail of the interpolated seismic data by the three methods, we plot a single seismic trace from the interpolated data in Figures 4 and 5.

In the next example, we focused on the real seismic data to verify the performance of the proposed method in simultaneous interpolation and denoising and compare it with the K-SVD and orthogonal DL methods. In this case, we randomly remove 50% seismic traces and apply the same method with synthetic data. The real data with 50% missing trace and covered by strong real noise in which detecting the useful seismic signals are hard, and hard to see some events of seismic data are presented in Figure 6(a). The interpolated and denoised results are shown in Figure 6(c)6(e). The proposed method shows promising performance in removing strong random noise and interpolating the missing seismic traces. Because of the special information contained in the tensor DL, the proposed method achieves better results in the interpolation and denoising of seismic data in terms of visual quality and preservation of seismic features. As shown in Figures 6(c) and 6(d), some important parts of the interpolated and denoised data are deteriorated. The noise sections from the original data (with strong noise) and the recovered data are presented in Figure 7. As presented in Figures 7(a) and 7(b), there are some useful seismic signals left in the noise sections. As we observe from Figure 7(c), there are no parts of seismic signals left in the noise sections.

The effectiveness of the proposed method is also verified by interpolating aliasing seismic data containing different features. The original seismic data, observed data with 50% missing traces, and the interpolated results are presented in Figure 8. The reconstructed results by K-SVD and orthogonal DL are unsatisfactory because of the insertion of artifacts on some parts of the interpolated results as shown in Figures 8(c) and 8(d), respectively. The performance of the proposed method in the reconstruction of missing seismic traces is presented in Figure 8(e). The single trace is extracted from the reconstructed data by K-SVD, orthogonal DL, and the proposed method as presented in Figure 9 to compare the details of the obtained results.

For the next example, we discussed the validity of the proposed method for removing random noise and compared it with K-SVD and orthogonal DL. Both synthetic and real seismic datasets are considered for random noise attenuation. Both synthetic and real seismic datasets are contaminated by band-limited Gaussian noise which is spatially uncorrelated random noise. The synthetic seismic data with random noise are presented in Figure 10. The denoised results by K-SVD and orthogonal DL are not satisfactory because of the elimination of some primary features of seismic data around the boundary and they introduce some artifacts as presented in Figures 10(c) and 10(d). Since K-SVD uses the redundancy of the seismic features over the datasets to attenuate random noise, it is computationally more expensive than orthogonal DL and proposed methods. Moreover, the denoising performance of K-SVD, orthogonal DL, and proposed methods are assessed by using the residuals obtained by subtraction of the denoised results from the noisy seismic data. In Figures 11(a) and 11(b), K-SVD and orthogonal DL residuals show some removed important seismic signals. In Figure 11(c), we observe that almost there were no important seismic signals left in the residual part. For further comparison, a single seismic trace extracted from the denoised results in Figure 10 is displayed in Figure 12, and the comparison of SNR is presented in Table 2 with the different noise levels.

Finally, we test the denoising of 2D real data with band-limited Gaussian noise. The denoised result by the proposed method is shown in Figure 13(e), and the result is clear and the primary features of the data are well preserved. The other two methods cause some damage to the useful seismic signals and some part of the seismic data left in the residual parts as shown in Figures 14(a) and 14(b). In Table 2, the SNR values of the denoised seismic data for different noise levels are presented. Furthermore, a single seismic trace comparison is displayed in Figure 15 to demonstrate the performance and validity of the proposed method.

5. Conclusion

In this paper, we exploited sparse regularization which promotes sparsity for both seismic data interpolation and denoising. We have proposed a novel method to interpolate and attenuate random noise of seismic data based on orthogonal tensor DL. Due to the separability, orthogonality, and imposed tensor decomposition, the proposed method is computationally efficient and fast for learning the dictionary. The effectiveness of the proposed method is compared with both K-SVD and orthogonal DL methods in denoising and interpolation of seismic data. The numerical experiments demonstrate that the proposed method shows promising results in the denoising and interpolation of seismic data. Our approach uses the dictionary which can extract the local features and adapt to seismic data, with no introduced artifacts on the interpolated and denoised results. Compared with K-SVD and orthogonal DL methods, the proposed method is computationally effective. The experimental results and the performance of the methods in seismic data interpolation and denoising show the effective performance of the proposed method on synthetic and real seismic datasets.

Appendix

Theorem A.1. Let be the set of orthogonal matrices and for the given , the minimization problem:has a unique solution which is given by the following equation:with orthogonal matrices and such thatwhere is the SVD of .

Proof. From Theorem 2.1 in Chrétien and Wei’s [60] study, every tensor can be written as follows:where each is an orthogonal matrix and is tensor of the same size with . Let be the standard Kronecker product for matrices, then:where is the mode matricization of . For convenience we can use , , and to represent , , and , respectively. Then, by using the mode unfolding and length preservation property of orthogonal transform, we can reformulate the minimization problem of this theorem as follows:such that . We can write the objective function in Equation (A.6) as follows:such that . Since the first two terms in Equation (A.12) are constant, the minimization problem in Equation (A.12) becomes:such that . By considering the SVD of , we can rewrite Equation (A.13) as follows:where .such that , since is orthogonal. Since is diagonal matrix, then the maximization problem in Equation (A.16) is achieved when the diagonal of is positive and maximum. By Cauchy–Schwartz inequality, this is true when such that the diagonal elements are all 1. Therefore, is the explicit solution for the minimization problem in Equation (A.6).

Based on this theorem, we can solve for the dictionaries and in the same manner. Accordingly, for the minimization problem in Equation (14), the dictionary can be solved by fixing the dictionary and which is given by the following equation:with the SVD

Similarly, for a fixed and is given by the following equation:with the SVD

Data Availability

Every data used for this paper is free and can be accessed upon the request through the corresponding author.

Conflicts of Interest

The author declares that there is no conflict of interests about the publication of this paper.

Acknowledgments

This research was funded by Mattu University Research (Community Service, Technology Transfer, and Collaboration Office).