Abstract

Fast image search with efficient additive kernels and kernel locality-sensitive hashing has been proposed. As to hold the kernel functions, recent work has probed methods to create locality-sensitive hashing, which guarantee our approach’s linear time; however existing methods still do not solve the problem of locality-sensitive hashing (LSH) algorithm and indirectly sacrifice the loss in accuracy of search results in order to allow fast queries. To improve the search accuracy, we show how to apply explicit feature maps into the homogeneous kernels, which help in feature transformation and combine it with kernel locality-sensitive hashing. We prove our method on several large datasets and illustrate that it improves the accuracy relative to commonly used methods and make the task of object classification and, content-based retrieval more fast and accurate.

1. Introduction

In Web 2.0 applications era, we are experiencing the growth of information and confronted with the large amounts of user-based content from internet. As each one can publish and upload their information to the internet, it is urgent for us to handle the information brought by these people from internet. In order to organize and be close to these vision data from Internet, it has caused considerable concern of people. Therefore, the task of fast search and index for large video or image databases is very important and urgent for multimedia information retrieval such as vision search especially now the big data in some certain domains such as travel photo data from the website and social network image data or other image archives.

With the growth of vision data, we focus on two important aspects of problem including nearest neighbor search and similarity metric learning. For metric learning, many of the researchers have proposed some algorithms such as Information-Theoretic metric learning [1]. As for nearest neighbors search, the most common situation and task for us is to locate the most similar image from an image database. Among all the methods, given the similarity of example and query item, the most common method is to find all the vision data among the vision database and then sort them. However time complexity of this algorithm is too large and also impractical. When we handle image or video data, especially, this complexity will not be calculated, because it is very difficult for us to compute the distance of two items in higher dimensional space and also vision datum is sparse, so we cannot complete it by limited time.

Many researchers believe that linear scanning can solve this problem; although we believe it is a common approach and not suitable for computing in large-scale datasets, it promotes the development of ANN. LSH was used in ANN algorithms. To get fast query response for high-dimensional space input vectors [15], when using LSH, we will sacrifice the accuracy. To assure a high probability of collision for similar objects, randomized hash function must be computed; this is also referred to in many notable locality-sensitive hashing algorithms [6, 7].

Although, in object similarity search task, the LSH has played an important role, some other issues and problems have been neglected. In image retrieval, recognition, and search tasks, we find that they are very common:(A)in the sample feature space, traditionally LSH approaches can only let us get a relatively high collision probability for items nearby. As a lot of vision datasets contained much rich information, we can find that the category tags are attached to YouTube and Flickr data and the class labels are attached to Caltech-101 images. However the low-level and high-level of vision samples have great gap, which means that the gap low-level features and high-level semantic information exist. To solve this problem, we intend to utilize the side additional information for constructing hash table;(B)As to manipulate nonlinear data which is linear inseparable, we commonly use kernel method in vision task because of its popularity. For instance, in vision model, objects are often modeled as BOF and kernel trick is an important approach in classifying these data from low-dimension space to high-dimension space. However, how to create hash table in kernel spaces is a tough problem for us.

To verify our idea, we did several experiments in object search task. For example, we show our results on the Caltech-101 [8] dataset and demonstrate that our approach is superior to the existing hashing methods as our proposed algorithm.

In order to test our algorithm performance on dataset, we design some experiments on certain visual task such as Caltech-101 [8] and demonstrate that the performance of algorithm in our paper is beyond the traditional LSH approaches on the dataset, as hash functions can be calculated beyond many kernels. Arbitrary kernel in ANN is suitable in our scheme; actually we can find that a lot of similarity hashing functions can be accessed in the task of vision search tasks based on content retrieval.

2. Homogeneous Kernel

In our paper, we mainly focus on some similar kernels like intersection, Jensen-Shannon, Hellinger’s, and kernels. In the fields of machine learning and vision search, we often use these kernels as learning kernels. These kernels have two common attributes: being homogeneous and additive. The idea of kernel signature has been smoothly connected to these kernels in this section. Meanwhile we can use pure functions to represent these kernels. Also these attributes will be applied in Section 3 to obtain kernel feature maps. Through the kernel feature map, we can get their approximate expression.

Homogeneous Kernels. A kernel is -homogenous ifWhen , we believe that is homogeneous. Let ; we can obtain a -homogeneous kernel and we can also write the formula asHere the pure functionis called the kernel signature.

Stationary Kernels. A kernel is called stationary kernels ifLet ; the is represented asHere we call formula (6) kernel feature.

In the field of machine learning or computer vision, most of the homogeneous kernels are composed of the Jensen-Shannon, intersection, , and Hellinger’s kernels. So we can also view them as additive kernels. In the next section, we will focus on these kernels and their kernel maps. Table 1 shows the details [9].

Kernel. We define as the kernel [10, 11]. Here the distance is then defined as .

Jensen-Shannon (JS) Kernel. We define as the JS kernel. Here the JS kernel distance can be obtained by , where we import the concept of Kullback-Leibler divergence computed by .

Intersection Kernel. We defined as the intersection kernel [12]. The distance metric is distance between variants and .

Hellinger’s Kernel. We defined as the Hellinger’s kernel and specified distance metric as Hellinger’s distance between variants and . The function expression is the signature of the kernel, which is constant.

Homogeneous Parameters. In previous research paper, we can see that the homogeneous kernels are used by parameters and . When , the kernel becomes . Now, in our paper, we can derive the homogeneous kernel by formula (2).

3. Homogeneous Kennel Map

When handling low-dimensional data which is inseparable, we should create kernel feature map for the kernel so that we can map our input data information in low-dimensional space to relatively high-dimensional (Hilbert) information space with :

In order to compute the feature maps and get approximate kernel feature maps expression for the homogeneous kernels, we should use Bochner’s theorem by expanding the configuration of -homogeneous expression. Here we notice that if a homogeneous kernel is Positive Definite [13], its signature will also be Positive Definite expression. The assumption condition is suitable for a stationary kernel. So, depending on formulae (2) and Bochner’s theorem (9), we can derive the and closed feature map.

We can compute the kernel density and feature map closed form [9] for most machine learning kernels. Table 1 illustrates the results. Consider

4. Kernelized Locality-Sensitive Hashing

To create and conduct the data association, we take the approach of Kernelized LSH [14] which is also a hash table-based algorithm. KLSH is proposed based on LSH algorithm, which is more efficient and accurate for query search and matching. When searching the input query, the KLSH approach can quickly locate the possible similar and nearest neighbor items in the hash table and match it. In addition, KLSH has another characteristic: traditional LSH methods can only find a part of hashes in the kernel space, while KLSH can locate all the possible hash tables in kernel space. Moreover KLSH has been applied in the vision search tasks by large scale datasets such as Tiny Image and other datasets [14].

Similar to LSH, constructing the hash functions for KLSH has been the key problem for us. That means if we intend to compute the collision probabilities of input query and the database points, we should compute the extent of similarity between them in the database as proposed by [15].

KLSH Principle. Any locality-sensitive hashing algorithm is based on the probability of distribution of hash function clusters. So we should compute the collision probability of a bundle of points, for example, and :

We can also view the problem as the issue of computing the similarity of objects between and . Here in the algorithm is the measure function of calculating the similarity, while and are randomly selected from the hash function cluster . The instinct beyond this is that we find the fact that and will collide in the same hash bucket. So those objects which are significantly similar will be more possible to be memorized in the hash table and this eventually results in confliction [1].

We can derive the similarity function expression according to the vector inner product: In [15, 16], the definition of LSH function has been extended from formula (10) asHere we create a random hyper plane vector . The distribution of fit has a zero-mean multi-Gaussian distribution. The dimensionality of is the same with the input vector . This demonstrates that the statistical characteristic of input vector is uniquely matched with each hash function. Meanwhile this verification has been detailedly reported in the LSH attribute in [17]. When we project on a point , actually the sigh function we obtain in this process is a hash function and then we repeat it times; a couple of hashes can be created. We can also call this couple of hashes hash bucket. The hash bucket can be formed as From (13), we can see that, after repeating times, we can get one column of hash bucket (14); then repeating times, we can finally obtain the hash bucket :When given the value of , we can get all the the hash functions located in the bucket; we can see the following:Due to the fact that we compute the similarity measure function in high-dimensional kernel space, the similarity function can also be extended and written asIn formula (16), we use kernel function to construct to complete the kernel mapping for the points of and . And is a product of projection on hash function from the space. The problem is that nothing is known about the data while in kernel space to generate from . Therefore, in order to construct the hash function, needs to be created so that we can quickly compute the function based on the kernel. Similar to normal , we could use only the kernel of to approximately compute the function of . We should select a subset of database to construct . By the large number of central limit theory, if we intend to choose parts of database items from the whole database to form the dataset , the sample of kernel data must be satisfied by the distribution with mean and variance . The variable can be written as

With the growth of variable , the theory tells us that the vector has also been satisfied by the distribution of normal Gaussian.

We used the whitening transform to obtain :The LSH function has been yielded:As analyzed above, we use kernel function to represent the database data; then the statistical data like variance and mean are uncertain. If we intend to estimate and calculate and , we could sample the data from the database by KPCA and eigen decomposition in [18] and we let   and ; therefore we can obtain the hash function :From the above, we can see how to construct the hash function for the kernel matrix input vectors. In this case, we let the kernel matrix input be   by decomposing the matrix. Here and have the same nonzero eigenvalue; it is also viewed as another form of kernel matrix input. From [18], we compute the projectionHere and are, respectively, the th eigenvector of the kernel matrix and its covariance matrix.

As mentioned before, we choose data points from the database to form ; traversing all the eigenvectors and conducting the computation yieldsSubstituting (21) into (22) yieldsSimplifying (23) yieldsSince , we further simplify the (24) yieldswhere .

Through the above derived formula we can obtain which obeys random Gaussian distribution, then we can substitute (17) into :We neglect the term of , and finally the simplified yields (27). represents the unit vector for .

And therefore hash function for kernel input will finally be is the kernel mapping matrix for points and in space. After several iterations, the hash function will form a hash bucket.

In order to get the suitable parameters in this process, we implement the query matching for several iterations. The detailed algorithm is illustrated finally in Algorithm 1. Consider

(1) Process Data
(a) Obtain matrix from database throughout the points.
(b) Obtain by randomly sampling a subset from the
(c) Project on th subset to obtain .
(d) Obtain
(e) Project onto the points in kernel space
(f) Obtain hash bucket
(2) Query Processing
(a) Obtain the same hash bucket in (29) from the database
(b) Use Ann search for query matching.

5. Experimental Result

In the experiment, we proposed the homogenous kernel-hashing algorithm and verified the high efficiency on the dataset. In our scheme, homogenous kernel-KLSH method makes it possible to get the unknown feature embeddings. We use these features to conduct vision search task to locate the most similar items in the database, and the neighbors we find in the task will give their scores on the tags. The method proved to be more effective and accurate than the linear scan search.

In this part, we design and verify our algorithm on the Caltech-101 dataset in Figure 1. Caltech-101 dataset is a benchmark on image recognition and classification, which has 101 categories objects and each category has about 100 images, so 10000 images totally. In recent years, many researchers have done useful research on this dataset such us proposing some important and useful image represent kernels [19]. Also there are many published papers that focused on this dataset, some of which are very valuable and significantly historic. For example, papers [2022], respectively, state their contribution to the dataset. The author of [21] proposed the matching method for pyramid kernel of images histograms, while Berg [20] proposed and created the CORR kernel of image local feature using geometric blur for matching local image similarity.

In our paper, we apply our algorithm to complete the vision classification and similar search task. The platform of our benchmark is based on Intel 4 core 3.6 GHZ CPU and 16 GB of memory and 2 TB hard disk.

We used kernel for -homogeneous kernel maps () and applied the nonlinear RBF kernel designed in [19, 23] to the SIFT-based local feature. Meanwhile we applied and learnt the homogenous kernel map beyond it. Compared with the nonlearnt kernel, our learnt kernel has been more accurate. And we use KNN classifier, respectively, for KLSH and linear scan to compute the accuracy of classification. We also compare it with CORR [24] and the result proves to be better than them, here we use 15 images per class for training task.

From Figure 2 we can see that the growth of parameters is closely related with accuracy. As is seen, the accuracy increased with the increase of , while it has little relationship with the number of and . The value of () is chosen as , , as the best parameters through a series of experiments.

We find that the combination of these parameters can result in better performance than the large-scale dataset. Meanwhile it can be seen that our approach with homogenous kernel map has higher accuracy than CORR-KLSH with metric learning [25].

Figure 3 illustrates that our method is superior to other existing approaches [2528] tested on this dataset. Comparing with other kernel classifiers, our classifier with RBF- kernel for local features performs better. In Table 2 we can see that the result of ours has higher accuracy with and than other papers’ results including better than [24] which obtains the result by 61% for and 69.6% for . More clearly, it has improved the result by 16% several years ago.

In order to find the best parameters in our experiment for NN search for our scheme, we should take into account the balance between performance and CPU time. Therefore here we conducted to analyze the performance and CPU time of different of for NN search. Figure 4 illustrates the accuracy and CPU time by each in our dataset.

The author of [26] proposed the method by combining KPCA and normal LSH. That means computing hashing beyond the KPCA. However this method has apparent disadvantage because KPCA will bring on the loss of input information although it can reduce the dimensionality in the processing, while KLSH can solve this problem to assure the integrity of input information to compute the LSH. Therefore we found that our method has high accuracy and better performance than the algorithm in [26].

6. Conclusions

In our paper, we properly use the concept of homogeneous kernel maps to help us to solve the problem of approximation of those kernels, including those commonly used in machine learning such as , JS, Hellinger’s, and intersection kernels. Combining with the KLSH scheme, it enables us to have access to any kernel function for hashing functions. Although our approach is inferior to linear scan search in time but it can guarantee that the search accuracy will not be affected. Moreover we do not need to consider the distribution of input data; to some extent, it can be applicable for many other databases as Flicker and Tiny Image. Experimental results demonstrate that it is superior to standard KLSH algorithm.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank Brian Kulis from UC Berkeley University for helpful comments on experiments and also they should thank Professor Yan Shuicheng for advice on their paper. The work was supported in part by the National Program on Key Basic Research Project (973 Program) 2010CB731403/2010CB731406.