Abstract

The emerging large-scale/massive multi-input multioutput (MIMO) system combined with orthogonal frequency division multiplexing (OFDM) is considered a key technology for its advantage of improving the spectral efficiency. In this paper, we introduce an iterative detection algorithm for uplink large-scale multiuser MIMO-OFDM communication systems. We design a Main-Branch structure iterative turbo detector using the Approximate Message Passing algorithm simplified by linear approximation (AMP-LA) and using the Mean Square Error (MSE) criterion to calculate the correlation coefficients between main detector and branch detector for the given iteration. The complexity of our method is compared with other detection algorithms. The simulation results show that our scheme can achieve better performance than the conventional detection methods and have the acceptable complexity.

1. Introduction

Multi-input multioutput systems with multiple antennas employed at both the transmitter and receiver got a lot of attention due to their multiplexing and diversity capabilities which can offer much higher data rates and enhance the system capacity [1]. OFDM has been shown to be an attractive scheme for mitigating the efforts of intersymbol interference (ISI) by broadband wireless channels which have long response memory [2]. In recent years, large-scale (or massive) multiuser MIMO systems have become a hot topic for the next generation 5G wireless communication, which equip tens to hundreds of antennas at the base station as shown in Figure 1. The large-scale multiuser MIMO technology combined with OFDM promises significant improvements in terms of spectral efficiency, link reliability, and coverage compared to conventional small-scale MIMO systems [3]. Unfortunately, the promised benefits of large-scale MIMO come at the cost of significantly increased computational complexity in the BS, especially for the antenna array design and signal processing. A major challenge in uplink communications of broadband multiuser MIMO systems is to create a receiver algorithm that can efficiently detect the multiple signals, transmitted from multiple antennas of different uplink users [4].

Turbo detection which performs detection/equalization and decoding in an iterative manner in coded data transmission over ISI channels have been widely studied [5]. The maximum a posteriori probability (MAP) can achieve optimal performance but the complexities are exponential in , where is the number of transmit antennas [6]. To reduce the complexity, the detector using soft interference cancellation (SIC) based on the Minimum Mean Square Error (MMSE) criterion for MIMO system was proposed in [7, 8]. MMSE-SIC has lower complexity than MAP algorithm but the accuracy is poor because of its structure without the feedback design. Moreover, the complexity of these suboptimal detectors remains prohibitive for the large-scale antenna array. Recently in ten years, message passing based on factor graphs models [9, 10] has been studied for detection/equalization on ISI channels. In [11], Kaynak et al. proposed a belief propagation (BP) algorithm, which has better performance than MMSE-SIC; however the computational complexity is still very large because of the marginalization operations over discrete symbols. In [12], a variant BP algorithm based on Gaussian tree approximation has been proposed recently, which approximates the dense factor graph of the MIMO system into a tree and passes exact messages over the resultant tree. More recently, Wu et al. proposed a relatively low complexity iterative detection algorithm for large-scale multiuser MIMO-OFDM systems using Approximate Message Passing in [13], but the performance is lower than conventional MMSE detection algorithm when the number of iterations is small. Besides those algorithms derived from factor graph, iterative BP algorithm based on Markov random field (MRF) was also investigated in [14].

In this paper, we propose a new iterative detection algorithm by using several signal processing methods for uplink multiuser MIMO systems. First, we design a Main-Branch structure for iterative signal detection. The soft information is parallel iterating between main detector and branch detector when the LLRs iterate between main detector and decoders. The Main-Branch structure detectors are allowed to exchange soft information in the absence of interleavers and promote the performance of suboptimal constituent detectors. Second, the principle of expectation propagation and linear approximation method are applied to obtain the symbol belief and approximate the Gaussian messages for reducing computing complexity. Finally, we employ AMP-LA algorithm as the constituent detector of the Main-Branch structure, and using the Mean Square Error (MSE) criterion to calculate the correlation coefficients between main detector and branch detector for the given iteration, we also give the LLRs computing between Main-Branch structure detector and decoders. The proposed scheme not only gains better performance by canceling the residual ISI and CCI but also simplifies the computational complexity of the detectors for avoiding the problem of matrix inversion on each frequency bin involved in the conventional MMSE detection algorithm. Simulation results show that the proposed algorithm can achieve better tradeoffs between performance and complexity than existing turbo detection algorithms and approaches the optimal performance with a small number of iterations.

The rest of the paper is organized as follows. The system model is described in Section 2, and the proposed Main-Branch iterative structure with AMP algorithm is discussed in Section 3. In Section 4, novel iterative detection based Main-Branch structure joint AMP-LA algorithm is proposed. We will show the simulation results along with a discussion in Section 5. And Section 6 concludes the paper.

Notation. The operators , , and denote transpose, complex conjugate, and complex conjugate transpose, respectively. represents the identity matrix of size , and denotes the natural logarithm; is the expectation operation with respect to the probability distribution . The operation denotes the relation of for some positive . denotes the symbols in with excluded; denotes the circular Gaussian pdf with mean and variance .

2. System Model and Iterative Receiver

We consider the uplink of a large-scale multiuser MIMO system with single antenna users, and the receiver is equipped with antennas at the base station (BS). At the transmitter, the information bits are first encoded into a code sequence and interleaved; each interleaved coded bit is mapped into one symbol. Let the transmitted frequency domain symbols vector by the th user be , where is the frequency domain symbol transmitted at the th subcarrier and is the number of the subcarriers in the OFDM system; here denotes the modulation constellation set. The large-scale multiuser MIMO system uses the OFDM modulation technique at the transmitters. Suppose the maximum taps number of MIMO intersymbol interference (ISI) channel is . We make point IFFT to symbol sequence , and the cyclic prefix (CP) with length is inserted at the beginning of the OFDM signals to remove the interblock interference and make the linear convolution to a circular convolution, as shown in Figure 2. Then the OFDM modulated signals are sent through the MIMO-ISI channel. After removing CP, the FFT technique is used to transform received signal from time domain to frequency domain at the receiver. Then the channel estimation algorithm in frequency domain provides the current channel state knowledge to the detector. As a core part of this large-scale multiuser MIMO system, the proposed iteration detector at the receiver can suppress cochannel interference (CCI).

Hence the received -dimensional base band signal vector at the th subcarrier in frequency domain can be written aswhere denotes all the transmitted frequency domain symbols associated with the th subcarrier and denotes a -dimensional stationary Additive White Gaussian Noise vector in frequency domain with zero-mean and covariance matrix . is a matrix representing the MIMO channel frequency response at the th subcarrier, which is given bywhere is a time-domain block circulant channel matrix with the entry being the th channel impulse response taps from the th user transmit antenna to the th receiving antenna.

The matrix form of can be denoted as

3. Iterative Detection Receiver and Approximate Message Passing Algorithm

3.1. Turbo Main-Branch Iterating Detection Structure

The iterative process by Main-Branch structure soft detector in the system is shown in Figure 2. Turbo MIMO detection exchanges extrinsic log likelihood ratios (LLRs) of the coded bits between a Main-Branch iterating structure part and a bank of channel decoders. The Main-Branch iterative part consists of one soft-input soft-output (SISO) main detector and one SISO branch detector; the soft information is parallel iterating between main detector and branch detector when the LLRs iterate between main detector and decoders. We will omit the subcarrier index for notational simplicity.

Unlike the extrinsic information exchanged between main detector and channel decoders, the extrinsic information between the main detector and the branch detector has significant correlation because no interleaving can be applied. On the other hand, detection algorithm can perform iteratively by Main-Branch structure soft detector without the decoder, called “self-iteration,” so it can be used in uncoded systems also. For the coded systems, the proposed turbo detection algorithm is established in Figure 2 to adapt common circumstances. At the th iteration, the received signal is detected by the main detector and its extrinsic LLR of each coded bit corresponding to the symbol is passed to the branch detector, to be used to generate its a priori information . At the same time the extrinsic information generated by the main detector is passed to channel decoders as their a priori information directly. The branch detector in turn generates its own extrinsic information with the given a priori information . For the significant correlation between and , they combine together to generates the a priori information . combined with the a priori information fed back from the decoders generate extrinsic LLRs of the main detector for the next detection process. The extrinsic LLRs of the main detector are given by where is the a posteriori LLR at the output of the main detector and represents the total a priori information at the input of the main detector, given aswhere represents the a priori LLRs passed down from the decoders, and represents the a priori LLRs passed from the branch detector.

The extrinsic LLRs of the branch detector are given by where is the a posteriori LLR at the output of the branch detector and represents the a priori information at the input of the branch detector.

The problem is how to adopt extrinsic information to generate and adopt to generate . We can regard the a priori LLRs and the extrinsic LLRs as the output of an equivalent AWGN channel by the method of [15]. The unbiased versions of these LLRs associate with the branch detector as the transmitted symbol corrupted by AWGN:where and are assumed to be zero-mean Gaussian random vector with zero-mean and covariance matrix and , which are independent of the transmitted data , but correlated with each other with correlation coefficient . The a priori information corresponding to the symbol passing to the main detector can be defined as

We assume that the variance , and then we can construct the a priori LLRs for the main equalizer based on two correlated LLRs and as

In order to accurately measure the correlation of the a priori LLRs and the extrinsic information , we force a linear relationship , where is a positive scaling factor, and another linear relationship is also forced to measure the correlation of the a priori LLRs and the extrinsic information .

The aim of coefficients computing is to minimize the error between and . So we can denote the Mean Square Error (MSE) function as

We also can get the MSE between and like (10), evaluating the derivative of Mean Square Error function with respect to the coefficients and setting it to zero; according to the theory in [16], we can get and , and two linear correlation evaluation equations are given as where is the noise correlation parameter between and and represents the other noise correlation parameter between and , so the noise correlation parameters and can be estimated as [17, 18]where ,  ,  ,  , and is the Sign function, which can be defined as

3.2. Factor Graph and Approximate Message Passing Algorithm

The detector generates the extrinsic LLRs of based on the received signal and the a priori LLRs can be written as

For the presentation of factor and message passing algorithm in [9, 10], we obtain the joint distribution probability aswhere denotes the channel transition function, which can be defined aswhere is the component of in the row and column.

Let us investigate the message passing algorithm on the factor graph model in Figure 3, where represents the mapping constraint function and is the Kronecker delta function. “=” denotes the cloning node of variable, and we must note that the code constraints cross all the subcarriers. We assume that the pass messages are from the top to the bottom of Figure 3 and back immediately, so we can avoid inner iteration. When extrinsic information is updated and passed downward, the new iteration will start.

Using the sum-product updating rule, for the th turbo iteration, the message passed from the channel transition node to the cloning node of is given by

In the opposite direction, the message passed from the cloning node to the channel transition node is given by where is the message passed from the mapping node to the cloning node, which can be defined aswhere is the a priori probability.

Let us assume that is continuous random variables, and approximating the message by the estimation of a complex Gaussian probability density function , the integration form of (19) is defined aswhere

And the parameters ,  are obtained by the criterion of minimum KL divergence in [19] as

In practice, we can use symbol belief by the principle of expectation propagation in [2022] at the th iteration to get the approximate message , and from the factor graph theory we also find that the symbol belief is the a posteriori probability of the symbol , so symbol belief can be approximately given by where is the a posteriori probability of the coded bit fed back from the decoders. can be regarded as the multiplication of all the incoming messages bywhere

By the expectation propagation (EP) principle and the canonical form of Gaussian PDF by [23], the approximate message can be given aswhere where denotes the mean and denotes the variance; they are defined as

In order to reduce the complexity, we use the linear approximation (LA) in [13] to simplify the parameters; if the number of users is large, we can get

Now, by using the first-order linear approximation and Taylor formula in (27), (33) and after some mathematical simplifications, we can get

Finally, the LLRs of coded bits of signal in frequency domain can be expressed in form aswhere

The detailed processing procedure of AMP algorithm simplified by linear approximation (AMP-LA) is illustrated in Algorithm 1.

(1)Initialization: Input , , .
(2)for    do computation at the cloning nodes.
(3)Calculate by (25).
(4)Calculate and by (30).
(5)end for
(6)for    do computation at the channel transition nodes.
(7)Calculate by (32).
(8)Calculate by (34).
(9)end for
(10)  for    do computation at the cloning nodes.
(11)  Calculate and by (35) and (36).
(12)  Calculate by (38).
(13)  for    do computation of LLRs
(14)    Calculate by (37).
(15)      end for
(16)  end for

4. Proposed Iterative Detection Algorithm and Complexity Analysis

4.1. Iterating Soft Turbo Detection Using Approximate Message Passing

We choose the main detector and the branch detector both to apply the AMP-LA algorithm, so the processing procedure will be simple. The iteration between main detector and branch detector can be called Main-Branch self-iteration, while the turbo iteration between main detector and decoders can be seen as outer iteration. Total iteration numbers of the proposed algorithm equal the numbers of outer iterations. The Main-Branch self-iteration and the outer iteration are performed in parallel. At the specified number of outer iterations i, combining with the a priori LLR from the decoders and the branch detector for the main detector, we have . When the main detector generates the extrinsic LLRs by Algorithm 1, for (16). We pass the extrinsic information and the correlation compensated extrinsic information of the main detector to the decoders and to the branch detector, respectively. Likewise, when the branch detector generates the extrinsic LLRs by Algorithm 1, and for (16). The main steps of proposed iterative detection structure joint Approximate Message Passing are described as in Algorithm 2.

(1)Initialization:
if  , input the initial a priori LLRs of the main detector and the branch detector, set , , .
(2)if  , combining with the a priori LLR from the decoders and the branch detector, let , do
computation extrinsic LLRs of the main detector by Algorithm 1.
(3)Pass to the decoders and to the branch detector in parallel, respectively.
(4)The decoders Calculate by decoding algorithm.
(5)Calculate between and by (14).
(6)Generate by (12).
(7)While given the a priori LLRs ,
   do computation extrinsic LLRs of the branch detector by Algorithm 1.
(8)Calculate between and by (13).
(9)Set , do computation the a priori LLRs by (11) for the next iteration.
4.2. Complexity Analysis

Actually, the computational complexity of different algorithms is mainly measured by the numbers of floating-point operations (FLOPs) used for multiplications. We can find that the point multiplications of a complex number and a real number need point FLOPs, and the point multiplications of two complex numbers require point FLOPs operations. In this paper, the complexity of channel estimation and decoder is not considered because the different detection schemes will estimate the channel and decode using the same amount of computations.

We compare the average complexity of the proposed detection algorithm and other algorithms in Table 1; is the Big- notation expressing the complexity of an algorithm as a function of a given input.

The traditional MMSE-SIC frequency domain algorithm in [7] has to estimate the complexity of the extrinsic mean and variances of . We let be the number of users, and is the numbers of receiver antennas. The MMSE-SIC method has the largest computational complexity compared to AMP using Gaussian approximation (AMP-G), AMP-EP, and AMP-LA but it got better performance improvement than them also. All the message passing algorithms require FLOPs to compute in the preprocessing stage, but the proposed algorithm only needs once preprocessing for main and branch iterating detector. At the observation nodes, AMP-G and AMP-EP both require FLOPs, the AMP-LA algorithm needs FLOPs, and the AMP-LA algorithm needs FLOPs. The computational complexity of extrinsic information LLRs is FLOPs. We give the normalized computational complexity versus number of antennas for different detection algorithms in the multiuser MIMO-OFDM systems with QPSK modulation as shown in Figure 4. For the antennas number , both BP algorithm based on Markov random field (MRF) in [14] and MMSE-SIC algorithm have the same order complexity , so we only consider the normalized computational complexity of MMSE-SIC algorithm. Figure 4 shows that the floating-point operations of proposed algorithm are fewer than other conventional MMSE-SIC algorithm and AMP-G algorithm, especially when the number of antennas is large. With the number of antennas increasing, the normalized computational complexity of AMP-EP is very close to the proposed detection method. The AMP-LA has the minimum complexity. The similar normalized complexity comparison can be got from other types of modulation.

5. Simulation Results and Performance Analysis

We use convolution-coded style to test the performance of different detection methods in MIMO system, respectively. The channels are 16-tap Rayleigh fading with equal tap exponential power delay profile model and 9-tap ITU-EVA model which has unequal taps power delay profile, so we can simulate frequency selective fading channel satisfactorily. When the number of taps is large and paths have equal energy, the MIMO-ISI channel has severely delay spread so we can simulate the scene just like [13, 14]. When the channel is ITU-EVA we can simulate the real application scenario in 5G.

The simulation parameters of the large-scale MIMO-OFDM system are given in Table 2. Suppose that the synchronization is ideal in the iteration process. The BER-/ performance of our algorithm and other conventional algorithms is simulated by above given channel model shown as Figures 59, respectively.

We first consider a RSC-coded QPSK and 16QAM modulation 64 × 64 MIMO system, and the channel is 16-tap Rayleigh fading with equal tap exponential power delay profile model. It is seen from Figure 5 that the proposed method has better BER performance than conventional detector using soft interference cancellation (SIC) based on the Minimum Mean Square Error (MMSE) criterion when the iterative numbers are in QPSK and in 16QAM. Our method also gets better performance than Approximate Message Passing detection by expectation propagation (AMP-EP) scheme and AMP with linear approximation (AMP-LA) scheme in [13]. For instance, at the BER 10−3, the proposed method gets about 0.3 dB gain compared with the MMSE-SIC when the iterative numbers are with QPSK. At the BER level of 10−3 and the number of iterations being 12, the proposed method gets about 1.1 dB gains compared with MMSE-SIC in 16QAM. For this BER level and 6 iterations, the proposed iterative equalization scheme also achieves better BER performance compared with the MMSE-SIC, while the AMP-EP performance is worse than MMSE-SIC. Note that, in the low /, the performance curve of proposed algorithm first reaches the matched filter bound (MFB). It should be stressed that the proposed method is very precise especially regarding the first iteration.

From Figures 6 and 7 we can observe that the proposed method also has better BER performance than AMP-EP algorithm with different number of iterations in multiuser 64 × 64 MIMO systems. When the number of iterations is increased, the performance improvement of our algorithm is obviously more remarkable. Figure 6 presents that 6 and 12 iterations are enough for the AMP-EP to approach the MFB of QPSK and 16QAM modulation at BER = 10−5, while the BER performance of the proposed algorithm can approach the MFB after just 5 and 10 iterations at BER = 10−4 as shown in Figure 7. The performance gap between the AMP-EP algorithm and the MFB is obviously larger than that between the proposed method and the MFB for the same BER level in the low / region. Because the AMP-LA is simplified by AMP-EP, the performance of AMP-LA is worse than AMP-EP. So we just consider comparison of the performance of AMP-EP.

It should be also noted that the performance of our scheme for large-scale MIMO systems can be seriously affected due to error propagation effects on the approximation made in the message updating by (34) and (35). If we want the performance curves of our scheme to approach the MFB, we need more number of iterations for the low-dimensional MIMO systems. As is shown in Figures 8 and 9, different methods of BER performance are present in 16 × 16 and 4 × 4 MIMO systems with QPSK and 16QAM modulation. Compared with the MFB, at BER = 10−2, the performance loss for the proposed algorithm is about 1.3 dB and 1.5 dB when the number of iterations is 6, respectively, in a 16 × 16 and 4 × 4 MIMO system with the 16QAM constellation, while the performance degraded for the proposed algorithm in a 64 × 64 MIMO system is about 0.7 dB by the same iterations compared with the MFB at the same BER level with 16QAM. This is very important for large-dimensional systems, because the error propagation effects will be lower. On the other hand, the performance gap between the proposed method and AMP-EP algorithm in the more antennas MIMO system is obviously larger than that in a small-scale 4 × 4 MIMO system for the same BER level. The proposed algorithm could also be applied in noncoded systems for the Main-Branch iterating soft detection structure not using interleaving. In fact, in noncoded systems our algorithm can be seen as complexity turbo detection implemented in the frequency domain which does not employ channel decoders in the feedback loop. So it has similar performance behavior with turbo detection just like the BER performance curves of our scheme approaching the MFB in larger / region. It is seen from Figure 9 that the performance of the proposed algorithm is even lower than MMSE-SIC when the number of iterations is for QPSK and for 16QAM, and we also find that the performance of the proposed algorithm is also near other algorithms when the number of iterations is small in Figure 8. So we can conclude that our proposed algorithm is best used in the large-scale MIMO systems such as .

In Figures 10 and 11, the performance of the proposed algorithm is compared with other different methods in ITU-EVA channel for QPSK and 16QAM modulation, respectively. We can see that the performance gap between the proposed method and other detection algorithms over ITU-EVA channel is larger than that over 16-tap Rayleigh fading with equal tap power delay profile model.

6. Conclusions

In this paper, we proposed an iterative detection algorithm in order to defy intersymbol interference and improve the spectral efficiency for uplink large-scale MIMO communication systems. The iterative detection algorithm can achieve near-optimal performance by iteratively exchanging probabilistic information about the coded bits between a soft-input soft-output (SISO) Main-Branch structure detector and a SISO channel decoder. Our method calculates the correlation coefficients between iterative main detector and branch detector by computing gradient to minimize the MSE. By applying AMP-LA, the precision of detection is improved while the complexity can be acceptable. The simulation results show that the performance of the proposed method is better than traditional MMSE-SIC detection and AMP type detection in small number of iterations. Moreover, the complexity of the proposed algorithm is lower than MMSE-SIC algorithm and near the AMP-EP when the number of system antennas is large.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This study is funded by the National Natural Science Foundation of China (Grant nos. 61571162 and 61101122) and Science and Technology Project of Ministry of Public Security Foundation (Grant nos. 2015GABJC38 and 2015GABJC37), Major National Science and Technology Project (Grant nos. 2014ZX03004003 and 2011ZX03004-004), Municipal Exceptional Academic Leaders Foundation (Grant no. 2014RFXXJ002), and China Postdoctoral Science Foundation (Grant no. 2014M561347).