Abstract

High dynamic range (HDR) imaging, aiming to increase the dynamic range of an image by merging multiexposure images, has attracted much attention. Ghosts are often observed in a resultant image, due to camera motion and object motion in the scene. Low-rank matrix completion (LRMC) provides an effective tool to remove ghosts. However, user specification of the included or excluded regions is required. In this paper, we propose a novel HDR imaging method based on bidirectional structural similarities and weighted low-rank matrix completion. In our method, we first propose the bidirectional structural similarities containing forward-projection structural similarity (FPSS) and backward-projection structural similarity (BPSS) to divide each image into four groups: motion region, saturated region in the source image, saturated region in the reference image, and static and unsaturated regions. Then, the weight maps and the motion maps constructed based on FPSS and BPSS are introduced in the weighted LRMC model to reconstruct the background irradiance maps. Experiments are conducted on several challenging image sets with complex scene, and the results show that the proposed method outperforms three current state-of-the-art methods and Photoshop cs6 and is robust to the reference image.

1. Introduction

The typical digital cameras capture images represented in 8-bit per pixel for each color channel, which is much lower than the dynamic range of the real-world scenes. Thus, details of the dark or bright parts in the scenes are missing in a single image. This problem can be addressed by merging images captured under the different exposure settings, because different regional information can be captured under specific exposure [1].

Some methods generate a high dynamic range (HDR) image as the weighted sum of the estimated irradiance images, after recovering the camera response function [2, 3], while others directly generate an HDR-like low dynamic range (LDR) image as the weighted sum of the input LDR images by appropriately adjusting weights [46]. These methods perform well if the scene is static. However, ghosts are often observed in a resultant image, because motions are hard to be avoided in the applications. Thus, ghost removal is essential in HDR imaging [79].

Recently, most studies focus on object motion correction in the scene, because camera motion can be avoided by fixing camera or applying global registration methods [1012]. Existing object motion removal methods mainly include two categories: selection-based methods and correction-based ones. Selection-based methods [1316] generate a resultant image as the weighted sum of all input images or based on the weighted sum of all gradient images, where 0 or small weight is assigned to motion pixel. These methods perform well in some cases; however, they usually rely heavily on the accurate detection of the motion pixel.

Correction-based methods reconstruct the motion regions and the saturated regions based on the correlation. For example, Sen et al. [17] and Hu et al. [18] exploited the patch matching method to search the closest patches for each pixel, which was used to correct the motion regions and the saturated regions. However, mismatch appears in the saturated regions and blurring exists in the fused image. Zimmer et al. [19] proposed using optical flow to find the dense correspondence, based on which an HDR image is reconstructed. However, the correspondence failed for large displacement.

Rank minimization provides an effective tool in image recovery [2022]. Based on the assumption that the intensity of image is linear to the irradiance of the scene, Oh et al. [23] first proposed introducing the rank minimization in HDR imaging to detect motion and using the estimated sparse error to determine the weight maps. Bhardwaj and Raman [24] modified the soft thresholding function in the original robust principal component analysis (RPCA) algorithm to recover the low-rank matrix, which was combined by applying the pyramid-based method [4] to obtain the resultant background irradiance map. Lee et al. [25] improved the model by introducing the low-rank matrix completion (LRMC). However, these methods also suffer from the problems of the selection-based methods. To handle this problem, Oh et al. [26] introduced the rank-1 constraint into the LRMC and replaced the partial sum of singular values to the nuclear norm. The estimated low-rank matrix is the background irradiance map. Lee and Lam [27] employed truncated nuclear norm minimization to accelerate the algorithm. However, their performance relies highly on the selection of the missing regions. In [26, 27], part of missing regions requires user specification. When the scene is complex, it is hard for the user to do so.

To address the limitations of the LRMC-based HDR imaging methods, we present a novel HDR imaging method based on the bidirectional structural similarities and the weighted LRMC model. First, we propose the bidirectional structural similarities to segment an image into four groups: motion regions, saturated regions in the source image, saturated regions in the reference image, static and unsaturated regions. Similarity measurements irrelevant to luminance variation, such as local entropy [28], zero-mean normalized cross-correlation [29, 30], interconsistency and intraconsistency [31], direction of the signal structure component [32], were employed to detect motion regions. These methods perform well in many cases but are prone to mistake the well-exposure regions of the source image that correspond to the saturated regions in the reference image as object motion. Considering that the images need to be transformed into the same luminance level prior to the similarity check, we observe that structural variation in the motion regions and in the saturated regions are bidirectional and unidirectional, respectively. To facilitate the discussion later, the projection from the reference image to the source image is termed as the forward-projection (FP) and the reverse projection is termed as the backward-projection (BP). The structure in a motion region changes in both FP and BP; the structure in a saturated region of the source image changes in BP and remains unchanged in FP; the structure in a saturated region in the reference image remains unchanged in BP and changes in FP. Therefore, we propose bidirectional structural similarities including FP structural similarity (FPSS) and BP structural similarity (BPSS) to more accurately detect the motion regions and the saturated regions. Then, we construct the motion maps and the weight maps based on FPSS and BPSS and introduce into the weighted LRMC-based method. The proposed method requires no user specification of the missing regions and is robust to the reference image.

The rest of the paper is organized as follows. The proposed method based on the bidirectional structural similarities and the weighted LRMC-based method is described in Section 2. Section 3 discusses the experiments and results, followed by conclusions in Section 4.

2. HDR Imaging Based on Bidirectional Structural Similarities and Weighted LRMC

In this section, a novel HDR imaging method based on bidirectional structural similarities and weighted LRMC is described. Figure 1 illustrates an overview of the proposed method. Given n different exposure LDR images {I1, I2, …, In}, where the pixel number of each image is m, one image Ir (1 ≤ r ≤ n) is chosen as the reference image and the others are the source images. We assume that n images are globally aligned by applying a global registration method [12]. First, for each source image, we measure the bidirectional structural similarities including FPSS and BPSS. Then, all pixels are classified into four groups: motion regions, saturated regions in the source image, saturated regions in the reference image, and static and unsaturated regions. Noise could lead to incorrect region detection. Thus, we introduce FPSS and BPSS into graph cuts to generate the final motion maps and the weight maps, which are integrated into the weighted LRMC model. The low-rank matrix of the weighted LRMC model corresponds to the background irradiance.

2.1. Bidirectional Structural Similarities

In the previous methods, FP is usually employed prior to measure the similarity between two different exposure images. A unsaturated region in the source image which corresponds to a saturated region in the reference image is mistaken as the motion region. The unsaturated region has richer details than saturated region. When unsaturated intensity is projected to saturated intensity, compression between intensity difference results in missing detail so that the projected region is like the saturated region. By contrast, when saturated intensity is projected to unsaturated intensity, compressed intensity difference cannot be recovered so that the structural similarity is low. Therefore, pixels in the saturated regions of the reference image have small FPSS and big BPSS, while pixels in the saturated regions of the source image have small BPSS and big FPSS. Pixels in the motion regions have both small FPSS and small BPSS.

Figure 2 illustrates the structural similarities between two images in FP and BP, where Figure 2(a) is the reference image and Figure 2(b) is the source image. Figure 2(c) is the backward-projected image of (b) and (d) is the forward-projected image of (a). Figures 2(e) and 2(f) are BPSS and FPSS between (a) and (b). Three regions marked with red box represent the saturated region in the source image, the saturated region in the reference image, and the motion region, respectively. From Figures 2(e) and 2(f), we can see that the region 1 has small BPSS and big FPSS, region 2 has big BPSS and small FPSS, and region 3 has both small BPSS and small FPSS. Based on the above phenomenon, each image can be segmented into four groups: motion regions, saturated regions in the source image, saturated regions in the reference image, and static and unsaturated regions.

As stated in paper [32], patch-based structural similarity is expected to best represent the structural similarity. Unlike the previous method, all color channels are considered jointly. We use the color channel that has the largest structural change to determine the structural similarity. Therefore, the desired structural similarity between two different exposure images Ii and Ij is determined by the smallest structural similarity of all color channels:where R, G, and B represent the red, green, and blue channels of a color image, respectively, vark,p(∙) is the variance in the window with size 9 × 9 around of channel k, covk,p(∙, ∙) is the covariance, and c1 is a small constant to avoid denominator and nominator to be zero and is set to 0.03.

For the smoothed region, intensity similarity of all color channels would respond to the structural similarity. A straightway of this relationship is employed as follows:where μk,p(∙) represents the mean value in the window around p of channel k and c2 is a small constant to avoid denominator and nominator to be zero and is set to 0.01.

For the saturated region, the intensity difference is compressed. When the saturated region and the unsaturated region have the same intensity difference, the similarity of the saturated region is less than that of the unsaturated region. Thus, we introduce the well-exposedness [4] which measures how far the intensity is from the saturated intensity to present similarity. We define the well-exposedness similarity as follows:where f(∙) is the well-exposedness measurement function and defined as

For each source image Ii (1 ≤ i ≤ n, i ≠ r), we define FPSS and BPSS as follows:where FP(∙) and BF(∙) represent the FP and the BP based on histogram projection algorithm, respectively.

2.2. Motion Map and Weight Map Construction

With FPSS and BPSS, the source image is divided into four groups. Let 0, 1, 2, and 3 denote the labels of motion regions, saturated regions in the source image, saturated regions in the reference image, static and unsaturated regions, respectively. For pixel , the probability belonging to l-th group is defined aswhere Cl is the center of the l-th group. As stated in Section 2.1, the centers of motion regions, saturated regions in the source image, saturated regions in the reference image, and static and unsaturated regions are [0,0], [1,0], [0,1] and [1,1], respectively. Then, the initial segmentation is determined by .

Because noises could make the segmentation unreliable, we employed graph cuts algorithm [33, 34], where the energy function is defined as

The first term is the data term and is represented by . The second term is smoothed term and is presented by , where σ is the variance of the whole image.

We use the segmentation results of equation (7) to define the motion maps and the weight maps. First, pixels labeled 0 in the source images are included in the motion regions. Saturated regions may be connected with motion regions; thus, we take the regions labeled 1 and 2 and connected with motion regions as motion regions. The remaining regions are regarded as static regions, where the weight maps are based on the FPSS and BPSS. For each source image, the weight value for each pixel in the static regions labeled 2 and 3 is set 1, while the weight value for each pixel in the static regions labeled 1 is proportional to BPSSp(Ir, BP(Ii)). For the reference image, the weight values for all pixels are set to 1 except that the weight values for the saturated region are proportional to the minima FPSSp(FP(Ir), Ii) for all source images. We define the weight map for each pixel as follows:

2.3. Weighted LRMC-Based HDR Imaging

Let I = [vec(I1), vec(I2), …, vec(In)] ∈ Rm×n be a matrix, where vec(∙) is transform function from matrix to vector. For each image, the corresponding irradiance image Di (i = 1, 2, …, n) is estimated by the camera response function. The irradiance matrix D is represented by the background matrix L with rank equal to 1 plus the sparse error matrix E. Then, the effective region is the static region constructed by the method discussed in Section 2.2. In the effective region, the information in the unsaturated regions is more reliable than that in the saturated regions. Therefore, we add the sparse error with small weights in the saturated regions and small weights in the unsaturated regions. We propose the weighted LRMC-based HDR imaging as follows:where is the nuclear norm, is the l0-norm of the matrix E, and means dot product and P(∙) is defined as follows:

Inspired by Oh et al. [26], the partial sum of the eigenvalues of matrix L is used to replace the nuclear norm and l0-norm is replaced by l1-norm. Then, equation (9) is rewritten as follows:

The optimization of equation (11) can be solved by augmented Lagrange multipliers and alternate direction method. Finally, the resulting HDR image is the average of the recovered low-rank matrix L.

3. Experimental Results

In this section, the performance of the proposed method is evaluated both subjectively and objectively by comparing with Oh et al. [26], Hu et al. [18], Ma et al. [32], and Photoshop cs6 on the challenging image sets with complex scene (downloaded from http://user.ceng.metu.edu.tr/∼akyuz/files/eg2015/), where each image set contains five LDR images and the third image is chosen as the reference image for all methods. Oh et al. [26] is the state-of-the-art LRMC-based HDR imaging method. Hu et al. [18] is the most competitive state-of-the-art correction-based HDR imaging method. Ma et al. [32] is the state-of-the-art exposure fusion method. Photoshop cs6 is commercial software.

To subjectively evaluate the experimental results, Reinhard’s tone-mapping method [35] is employed in displaying the HDR image generated by the proposed method and Oh et al. [26]. The results generated by Hu et al. [18] are a set of the latent images, which is merged by Mertens et al. [4]. Ma et al. [32] and Photoshop cs6 directly generate displayable image. All experiments are carried out in Matlab R2016b (64-bit) and Windows 7.

Figure 3 shows the results generated by the proposed method, Oh et al. [26], Hu et al. [18], Ma et al. [32], and Photoshop cs6. Hu et al. [18] and Ma et al. [32] perform well in ghost removal. However, the details of the man inside the cafe are missing, as shown in Figures 3(h) and 3(i). Oh et al. [26] and Photoshop cs6 preserve the details in the dark and bright regions but have problems in removing ghost, as shown in Figures 3(g) and 3(j). This is because Oh et al. [26] cannot handle the large overlapped region of the man and the sitting women. Yet, the proposed method successfully removes the ghost and preserved the details in both dark and bright regions.

Figure 4 gives another comparison result generated by the proposed method, Oh et al. [26], Hu et al. [18], Ma et al. [32], and Photoshop cs6. This scene is very complex, and there are large and irregular overlapped regions across images. Thus, the region is difficult for the user to specify. Ghosts are very obvious in the results of Oh et al. [26] and Photoshop cs6. Hu et al. [18] and Ma et al. [32] generate the pleasant results completely without ghosts, but the result is unnatural. For example, Figure 4(c) shows that color around the light is distorted, and Figure 4(d) shows halos appear around the edges. The proposed method provides the best performance in ghost removal and detail preserving, and there are no additional artifacts. Similar results can also be seen in Figure 5.

Owing to the lack of the reference image, we applied the blind image quality assessment index, HDR image gradient-based evaluator (HIGRADE) [36], to objectively evaluate the performance. For HIGRADE index, a higher value represents a higher visual quality. Table 1 shows the HIGRADE scores of the proposed method and three state-of-the-art methods. For most cases, the proposed method achieves the highest HIGRADE score, which indicates the proposed method can achieve natural appearances and preserve rich details.

Figure 6 shows the results of the proposed method based on the different reference images. The performance of the proposed method relies on the motion map and the weight map. As shown in Figures 6(f)6(i), ghosts are removed successfully and details of dark and bright regions are preserved. Detail loss on the road appears in Figure 6(j), because there are too large saturated regions in the reference image (Figure 6(e)).

4. Conclusions and Discussion

In this paper, a novel HDR imaging method based on the bidirectional structural similarities and the weighted LRMC is proposed. We observe that structural variation in the motion regions and in the saturated regions are bidirectional and unidirectional, respectively. Therefore, we propose bidirectional structural similarities including FPSS and BPSS to segment an image into four groups: motion regions, saturated regions in the source image, saturated regions in the reference image, and static and unsaturated regions. Then, graph cuts algorithm is employed to eliminate noise. Finally, the motion maps and the weight maps based on FPSS and BPSS are introduced in the weighted LRMC-based method. Unlike the previous LRMC-based methods, the proposed method requires no user specification of the missing regions.

Experiments on several challenging image sets with complex scene are conducted. And, the proposed method is compared with three current state-of-the-art algorithms and Photoshop cs6. The results show that the proposed method can preserve more details in the dark and bright regions and simultaneously remove ghosts. In particular, the proposed method is robust to the chosen reference image.

Data Availability

All data used to support the findings of our study are downloaded from http://user.ceng.metu.edu.tr/∼akyuz/files/eg2015/, which have been included within the article (on page 5).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work has been supported in part by the National Natural Science Foundation of China (Grant nos. 61562047, 61562048, 61462048, and 61862044), Science and Technology Project of the Education Department of Jiangxi Province (no. 151084), and Science Foundation of Jiujiang University (nos. 2014KJYB029 and 2015LGY831).