Abstract

Object tracking is an important procedure in the computer vision field as it estimates the position, size, and state of an object along the video’s timeline. Although many algorithms were proposed with high accuracy, object tracking in diverse contexts is still a challenging problem. The paper presents some methods to track the movement of two types of objects: arbitrary objects and humans. Both problems estimate the state density function of an object using particle filters. For the videos of a static or relatively static camera, we adjusted the state transition model by integrating the movement direction of the object. Also, we propose that partitioning the object needs tracking. To track the human, we partitioned the human into N parts and, then, tracked each part. During tracking, if a part deviated from the object, it was corrected by centering rotation, and the part was, then, combined with other parts.

1. Introduction

The object tracking in videos is a technique that has many applications in many fields. For example, in the biomedical field [1, 2], the object-tracking technique is applied to automatically track cells while they are born, duplicated, or as they move and die. Another example is the application of the technique in autopilot systems, where it is used to observe and track the vehicles around the driving car [3, 4] or footballer tracking [57]. A highly accurate vehicle-tracking program is an indispensable necessity for safety. Moreover, the tracking technology is usually combined with the identification and recognition systems to create a complete tactic for real-life applications.

Tracking objects in video is difficult due to many challenges that all are needed to be considered and solved. The first challenge is that we do not know, in advance, the object that we need to track. There may be no information about that object. In the absence of information, the object description for the program script must be highly general. Another challenge is that the tracked object has heterogeneity in colors, which vary by each part of the object. For example, to track human movement, the head is characterized by the hair color (black and yellow), while the body and legs are described by the color of the shirt and pants that the person is wearing. Because of the challenges and difficulties mentioned above, no comprehensive tracking algorithm can be adopted for all problems.

In this paper, we present the method to modify the state model according to the direction of predicting that an object appears in the same direction of motion with a higher probability. In addition, we explore the effectivity to track partially obscured objects by tracking its visible sections. To do this, we divided the object into multiple sections and tracked these sections independently. When some parts obscure the object, our approach should still successfully track the object movement. We also present the experimental particle filter model and present a suggestion for integrating information on the direction of the object’s movement, the N-particle filter model, to track each part then combines.

The rest of the paper is organized as follows: The most relevant work that motivated this paper is reviewed in Section 2. Section 3 describes, in detail, our method multiple particle filters for multipart combined moving directions information. Section 4 summarizes the results from our method. Section 5 is the discussion of our paper.

The correlation filters approach is a powerful tool in digital signal processing [8, 9]. This algorithm class utilizes the properties of Fourier transform of turning convolution in the spatial domain into function multiplication in the Fourier domain [1013]. The original idea of the correlation filter was to solve the problem of locating an object in an image. In other words, if the object of interest appears in the image, its position including the axis coordinates is determined. The tool to solve this problem is the average synthetic exact filter [10]. The next, correlation filter, is the Total Minimum Output Error, studied by Bolme et al. [13]. This tracking method is very powerful and can cope with situations such as changing light and changing the size and shape of objects.

Aviden et al. [14] at Misubishi Electric Research Labs considered object tracking as the binary classification problem to distinguish background pixels and tracking-object pixels using the AdaBoost technique. The method’s idea was training weak classification functions to classify the background and object and, then, combine them to form a strong classification based on the Adaboost mechanism. But, the author realized that if an object is not in the rectangle form, pixels in the containing object rectangle but not in the object will be labeled as belonging to the object. These pixels are considered alien elements, while AdaBoost is sensitive to alien elements [15]. In addition, some other limitations of the approach are as follows: it has not solved the completely obstructed object’s situation in a long time and the featured space used in the algorithm does not yet utilize the spatial information of the image.

The approach based on random process filtering has been studied for a long time in the field of mathematical statistics, and there have been many discovered impressive results [1618]. Most of the algorithms based on this approach are based on the Bayes optimal solution for the hidden Markov filtering problem [1921]. That means building a hidden Markov model plays a key role, and the model is more accurate; in fact, more Bayesian solutions accurately estimate the state of the object. The work in [20] uses the featured color histogram to construct a particle filter to track objects. The work in [22] uses gentle AdaBoost to construct an updated observation model over time.

Recently, the Siamese network-based trackers have received significant attention for their well-balanced tracking accuracy and efficiency. These trackers formulate visual tracking as a cross-correlation problem and are expected to better leverage the merits of deep networks from end-to-end learning [2330]. Bhat et al. [31] proposed a gradient-guided method to update the template. Li et al. [32] developed a discriminative model-prediction architecture for tracking.

3. Methodology

Problem 1. Highlight that first frame coordinates are given and we need to infer object coordinates in the subsequent frames.
Filtering the state of the object in the next frames, we rely on the hidden Markov model theory with the construction of two models: state transitions and observations. The state transitions model in the studies is quite similar and are all Gaussian motion, and the main difference in the algorithms depends on the observed model. Using particle filters allows us to better handle color clutter in the background, as well as tracking completely obstructed objects.
The principle of the particle filter according to the Figure 1, including 3 steps:Measurement: calculating the samples’ weights based on the observation at time nResampling: resampling or fine tuning (based on the threshold of the return weights) the samples to remove or adjust the samples in which the object positions are overmismatched at the current timePrediction: predicting the object status at time n + 1 based on the likelihoods from time n to the previous statesBased on the particle filter operating mechanism in Figure 1, we present the approximate results of posterior density function at time k.
From the posterior distribution at time k – 1, , we calculate the prior distribution for the time k (without observing ) by using Chapman–Kolmogorov equality [16], , where already exists in the state transition model and is the posterior root of step k − 1.
After observing , we update the prior density function at the predicted step at the level k: , where already exists in the observation model and is an a priori at the time k calculated in the previous step.
As a result, we obtain a weighted pattern representing the posterior density function at the time k: .
When an object is in motion, it usually moves in a specific trajectory. Therefore, to predict the object’s location, we propose integrating the direction of motion, which will be discussed in detail in Section 3.1. In addition, different parts of an object carry their distinctive characteristics of shape, color, light absorption, and reflection capacity. If we use a particle filter, it can yield false tracking results. Besides, if a part of the object has the same color and brightness level as any other object in the frame, the tracking may be distorted. To fix this problem, we propose to divide the object into many parts, each of which will have the same properties. We, then, track the movement of each part with the constraints that these parts move in the same direction and maintain similar area and shape.

3.1. Moving Direction Information

With the videos filtered out from the dataset in which the camera was relatively stable, we modified the state transition model in the hidden Markov model by integrating the direction of the object movement. This means that instead of using the Gaussian motion state model, we projected these Gauss functions into several different directions with different ratios before we made a new pattern. Because each object moves in a specific trajectory, the direction of the object's motion will remain constant for a certain period of time. Specifically, we consider the direction of motion as a separate component. At each assessment, we will update the direction of motion. We use this direction of motion to impact the particle filter at the prediction step, with the purpose that the particle filter will predict the object appearing in the same direction with higher probability.

is rotated an angle θ by multiplying the rotation matrix by as follows:where .

Figure 2 depicts the probability distribution in the direction of the object's appearance, in which the red direction (in the direction of ) is the predicted movement direction of the object, are real numbers in [0, 1], and . In the prediction step, we translate particle by , particle by , particle by , particle by , particle by , particle by , particle by , and particle by .

We propose the particle filter algorithm to integrate the direction of motion in Algorithm 1.

Input: Particles pf, Observation image, motion direction
Output: New particles represent , estimating the state of the object in the observed image.
Step 1:
 Translate particles in pf by
 Translate particles in pf by
 Translate particles in pf by
 Translate particles in pf by
 Translate particles in pf by
 Translate particles in pf by
 Translate particles in pf by
 Translate particles in pf by
Step 2:
 for i = 1 to do
 Beginfor
  /∗ (with is i th particle) ∗/
  Get
  Get
  Get
  Calculate
  Calculate by Algorithm 8
  Update weight of i th particle:
 Endfor
Step 3:
Calculate the total
 for i = 1 to do
 Beginfor
  Standardize weight
 Endfor
Step 4: Calculate
Step 5:
 if then
  
Endif
Step 6: Estimate the state of the object in the kth image by calculating the average of the new set of particles
3.2. Multiple Particle Filters Model
3.2.1. Multiparts of an Object

While considering the problem where the object shapes are less changing, if the object includes many parts with dissimilarity about the colors and contrast, using 1 particle filter for tracking will lead to incorrect tracking. Therefore, an object needs to separate each area with similar color, grayscale, and contrast into n parts, each part being tracked separately. In this way, the parts which affected by the environment and other object artifacts will cause incorrect position identification which will need to be adjusted. For example, a human object can be normally represented by a 3-partition structure as illustrated in Figure 3. This structure divides the human object based on the gray color changes among the black head, the white shirt body, and the black pants legs. The resulting human object will be divided into 3 parts with a border represented by a different gray level, each part using a particle filter to track and combine based on the best feature matching part.

3.2.2. Build Model

The adjustment of deviated areas should take two steps:(i)The center of the similar areas changed, which allowed the incorrect position of the similar areas to be adjusted accordingly to the correct position of the object(ii)Size ratio of similar area allowed similar areas to scale the height to the height of the original object

Thus, when one part of the object is obscured or similar to another, we can restore and track enough.

3.2.3. Fine Tuning Parts

Once the anterior root of the object is defined, the object is defined into n parts in a structure H, as shown in Figure 3. We, then, used one particle filter to track each part . At each time data point, each particle filter in the tracking area can diverge from each other. Therefore, a modification model is needed to correct this issue. We used the collected assessment data of each part to evaluate which tracked part behaves the best. We kept this best-tracked part fixed and applied the rotation algorithm to n − 1 other parts using the fixed part as the origin. The adjustment of N-particle filters when tracking an object in the frame time k is described below.Step 1: we calculate the rotation from the center of each section with the remaining n − 1 based on frame 0.For example, the angle of rotation from the center of compared to the rest of is the angle according to Figure 4(a). Step 2: we suppose that are estimates and are parts of the object in the original image. The distance is calculated as .However, because the division of parts may not be equal, to determine which is best, we multiply the coefficient by each distance and, then, compare . The smallest value is considered the best estimate. The best estimate is placed at the kth frame as . Step 3: when the best part was selected, we performed the center rotation of the remaining n − 1 relative to the best part with the rotation angle defined. The result is the new center coordinates of the n − 1 part.find the new center coordinates of the part with by calculating the center rotation of the section compared to the center of the part with the angle of rotation as follows:where is the center coordinate of the part and is the new center coordinate of the part after performing the rotation.For example, according to Figure 4(b), the new center coordinates of parts are found compared to centers with the rotation angle and as Step 4: translating the particles progress of part with which differs from the best part to the position of n-1 new part is created out through Step 3.Step 5: the sizes of parts are scaled according to the ratio of . That is, we will calculate and with of the sections .First, we find the horizontal dimensions of each part as follows: with .Next, we find the height dimensions of each part as follows: . Step 6: translating the particles progress of part with compared to the best part with distance .

For example, as shown in Figure 4(b), the particles of are translated towards the best part with about .

We propose the multiparticle filter algorithm in Algorithms 2 and 3.

Input: pf particles sample set (based on Algorithm 4), k th image (where k starts from the second image)
Output: The new set of particles represents , estimate the state of the object in the k th image.
Step 1:
 for i = 1 to do
 Beginfor
  /∗(with is the i th particle)∗/
  Get
  Get
  Get
  Calculate
  Calculate by Algorithm 8
  Update weight for i th particle:
 Endfor
Step 2:
 Calculate sum
 for i = 1 to do
 Beginfor
  Standardize weight:
 Endfor
Step 3: Calculate
Step 4:
 if then
   use boostrap
 Endif
Step 5: Estimate the state of the object in the k th image by calculating the average of the new set of particles
Step 1: Initialize N-particles set
Step 2:
 for i = 1 to n do
 Beginfor
  Take patern for part i according to Algorithm 4.
  Train strong classification according to Algorithm 5.
 Endfor
Step 3:
 while The video is not over do
 Beginwhile
  Get observation photos obs
  for i = 1 to n do
  Beginfor
   Use particle filter to estimate the ith state according to Algorithm 1 or Algorithm 2
    
   Measure the distance:
  Endfor
  Choose the best part
  Using center part and structure H perform rotation of the remaining n − 1 centers.
  Tranlate n − 1 other particle set to n − 1 has just been centered on .
  Scale the particle sets according to the structure ratio H.
  Translating n − 1 set of other particles toward the part .
  for i=1 to n do
  Beginfor
    Take a new sample based on the new section rotated from .
    Update strong classification according to Algorithm 6
  Endfor
 Endwhile

4. Experiment

4.1. Environment

Installation environment: we experiment on computers using the Windows 10 Pro 64 bit, RAM 8 GB, Chip Intel Core (TM) 5i-3210M CPU @ 2.5GHz; Matlab programming language version R2016a.

4.2. Data Set

In 2013, Wu et al. [33] gathered many video sources related to the track and proceeded to create ground truth for these videos to form the TB-100 dataset. Because the TB-100 is a compilation of data from many sources, the context of the videos is also very different and diverse in attributes such as the type of objects to track, color or black-and-white videos, and still or dynamic cameras. The video datasets used to support the findings of this study have been deposited in http://www.visual-tracking.net.

Challenges in the dataset include the following:IV- illumination variation: the brightness of the subject varies significantlySV- scale variation: the ratio of the rectangle containing the first image object to the current image is out of range OCC- occlusion: the object is partially or completely obscuredDEF- deformation: nonsolid objects that change shapeMB- motion blur: the subject is blurry due to camera movementFM- fast motion: groundtruth motion is greater than tm pixels (tm = 20)IPR- in-plane rotation: objects rotate in the image domainOPR- out-of-plane rotation: object out of the image domainOV- out of view: part of the object out of the image domainBC- background clutters: the background near the object has the same color or line as the objectLR- low resolution: the number of pixels in the rectangle that contains the object (considering ground truth) is less than tr (tr = 400)

The abovementioned challenges are distributed in the data set, which is shown in Figure 5.

4.3. Evaluating

We use the evaluation criteria presented at the site [33] to evaluate the tracking algorithm.Method 1 (R1): evaluation based on the Euclid distance (precision plot): we measure the distance Euclid d from the estimated center of the algorithm to the actual center of the object (ground truth), if d is less than or equal to a threshold t0. The view is successful according to Figure 6(a).Method 2 (R2): evaluation based on levels of overlap (success plot): the number of overlapping points is defined as , in which is the bounding rectangle determined by the algorithm and ra is the ground truth rectangle according to Figure 6(b).

We calculate the ratio of by (the number of successful images/the total number of images of images).

4.4. Result

The results of tracking people with the camera do not fluctuate much, the rotation angle is conserved, and the proportions on the body of people and people are not too small.

We named Program 1 as MultiPart3 using the MultiPart algorithm by dividing the object into 3 parts in a ratio of 1 : 5 : 3; Program 2 is MultiPart3_direction using the MultiPart algorithm to calculate the direction of moving objects by dividing the object into 3 parts in a ratio of 1 : 5 : 3; Program 3 is MultiPart2 using the MultiPart algorithm by dividing the object into 2 parts in a ratio of 5 : 3; Program 4 is MultiPart2_direction using the MultiPart algorithm to calculate the direction of object movement by dividing the object into 2 parts in a ratio of 5 : 3. The abovementioned five programs compared with DiMP algorithms [31] and GradNet [32] are shown in Table 1 and Figure 7.

The MultiPart2 algorithm uses 2 particle filters in a ratio of 5 : 3 to track, allowing a large portion of the head (head and body) to be more informative, less changing over time, and “denser” than the leg. The average accuracy result (R1 = 92.2%, R2 = 87.9%) is slightly larger than that of the GradNet algorithm (R1 = 85.9%, R2 = 86.3%). With the abovementioned results, we can see that the tracking part has much information for good average results compared with the object tracking. However, for data (Dancer và Dancer2) that have a tracking object who wears a skirt or long skirt covering feet, tracking using 2 particle filters at a ratio of 5 : 3 gives a low result compared to an object trace. By tracking, the object intact in this case is best.

For the videos mentioned above, the MultiPart3 algorithm divides 3 parts in a 1 : 5 : 3 ratio, after each image has adjustments of the parts according to the rotation technique based on the structure H to bring the wrong parts to the right area of ground truth. The results of the MultiPart3 algorithm (R1 = 94.6% and R2 = 92.2%) are better than those of DiMP (R1 = 89.1%, R2 = 89.9%) and GradNet (R1 = 85.9%, R2 = 86.3%). In addition, the algorithm MultiPart3_direction is the combination of the object movement direction for average accuracy results (R1 = 93.3%, R2 = 93.2%), and this result is better than DiMP and GradNet algorithm’s result. The MultiPart3_direction algorithm approximates the average accuracy result with the MultiPart3 algorithm because the dancer data have the human object that jumps up and down suddenly and the refined data take a number of frames so that the determination of the motion direction is wrong.

5. Conclusions

This paper presented several methods for object tracking in the videos mainly related to particle filters. To solve the general problem, we built a hidden Markov model and applied particle filters. For tracking human videos in normal condition where the human scale is preserved, we used 3 particle filters to track each part of the body or track the part of the body containing the most information, which will, then, infer to the whole body. Experimental results show dividing the object into (n + m) parts, even when n parts of objects are partially obscured and the remaining m parts are tracked normally and do not affect the tracking of the subject in the video.

The development direction of this paper is to change the observation model. We found that the gentle Adaboost training process is time consuming. However, the algorithms using correlation filters have the advantage of being fast and highly accurate. For future studies, we suggest integrating the correlation filters into the observation model to shorten the execution time. In addition, we are planning to study parts of the traced objects in parallel to shorten the execution time.

Appendix

Table 2 describes the notations.

A. Sample to Train Gentle Adaboost

Figure 8 describes the random true-positive sampling with the bordering error at ±5 pixels and false-positive sampling which does not include any particular object.

Input: Image at initial time, position and object size
Output: Data set D size , including positive patern, negative patern.
Step 1: Initialize a feature stack to hold feature vectors.
Step 2: Get the Haar-like characteristics [34] of the area ,
   Put on stack features, features.push ()
Step 3:
 for i = 1 to N+ −1 do
 Beginfor
  Randomly draw rectangular area S from the image at a position of ±5 pixels from the object
  Get the Haarlike feature vector over the region S, v=getHaarlike(S)
  Put vector in the stack features, features.push(v)
 Endfor
Step 4:
 for i = 1 to N1 do
 Beginfor
  Randomly take the area of S rectangle in the green part of Figure 8
  Get the Haarlike feature vector over the region S, =getHaarlike(S).
  Put vector in the stack features, features.push ()
 endfor

B. Create Gentle Adaboost

Input: Training set with is the label of
Output: Strong classification function F
Step 1: Initialize coefficients for training set
Step 2:
 for t = 1 to s do
 Beginfor
  for j = 1 to m do
  Beginfor
   Training weak classification , mean that calculate according to the formula
  Endfor
  Select the weak classification with the lowest error, set to
  Update strong classification
  Update weight
 Endfor

C. Update Gentle Adaboost

Input: Old strong classification F is a matrix, new D data set
Output: A new strong classification
Step 1:
 Initialize the weight set consisting of N equal numbers, that equal 1/N, where N is the number of elements in the data set.
 Initialize matrix size to contain weak classifications
 Initializes the chosen stack to save the position of the weak classification that have been selected
Step 2:
 for i=1 to s-T do
 Beginfor
  Initialize the loss stack to store error values
  for j=1 to s do
  Beginfor
   if − then
    Calculate the error of classification j according to the following formula:
     
    Put into loss, loss.push()
   Endif
  Endfor
    Select the classification with the smallest error value,
  Put into loss, loss.push ()
  for j = 1 to s do
  Beginfor
   Update weight
  Endfor
 Endfor
Step 3: freshly train weak classification T on data set by Algorithm 5
Step 4: combine s-T weak classification in Step 2 and T weak classification in Step 3 to create a new strong classification .

D. Calculate the Classification Point

Input: Strong classification F is a matrix (find strong classification based on Algorithm 5), Haar-like x feature vector of area to be calculated
Output: Classification point conf
Step 1: Assign point
Step 2:
 for i = 1 to s do
 Beginfor
  Get are i th classification
  Update
 Endfor

E. Calculate the Likelihood of the Image on the Hypothesis of the State of the Object

Input: Observation image, hypothesis, vector of object in the first image, strong classification F.
Output: Likelihood
Step 1: Extract image area in rectangle ,
Step 2: Resize image h for the size of the object in the first image
Step 3:
 Extract feature HOG on h,
 Extract feature Haar-like on h,
Step 4: Calculate classification point by Algorithm 7
Step 5: Calculate

Data Availability

Public data were used to research. The [TB-100] data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.