Keywords
|
Object Tracking, Mean Shift Tracking, Scale and Orientation, MMST |
INTRODUCTION
|
The objective of object tracking is to associate target objects in consecutive video frames. The association can be especially difficult when the objects are moving fast relative to the frame rate. Another situation that increases the complexity of the problem is when the tracked object changes scale and orientation over time. For these situations object tracking systems usually employ a motion model which describes how the image of the target might change for different possible motions of the object. |
In the classical mean shift tracking algorithm [9], the estimation of scale and orientation changes of the target is not solved. Although it is not robust, the CAMSHIFT algorithm [6], as the earliest mean shift based tracking scheme, could actually deal with various types of movements of the object. In CAMSHIFT, the moment of the weight image determined by the target model was used to estimate the scale (also called area) and orientation of the object being tracked. Based on Comaniciu et al’s work in [9], many tracking schemes [10, 11, 17, 18, and 23] were proposed to solve the problem of target scale and/or orientation estimation. Collins [10] adopted Lindeberg et al’s scale space theory [19, 20] for kernel scale selection in mean-shift based blob tracking. However, it cannot handle the rotation changes of the target. An EM-shift algorithm was proposed by Zivkovic and Krose in [11], which simultaneously estimates the position of the local mode and the covariance matrix that can approximately describe the shape of the local mode. In [23], a distance transform based asymmetric kernel is used to fit the object shape through a scale adaptation followed by a segmentation process. Hu et al [17] developed a scheme to estimate the scale and orientation changes of the object by using spatial-color features and a novel similarity measure function [12, 16]. In this paper, a Modified mean shift tracking (MMST) algorithm is proposed under the mean shift framework. |
MEAN SHIFT TRACKING ALGORITHM
|
A. Target Representation |
In object tracking, a target is usually defined as a rectangle or an ellipsoidal region in the image. Currently, a widely used target representation is the color histogram because of its independence of scaling and rotation and its robustness to partial occlusions [9, 21]. Denote by {Xi*=} i=1...n the normalized pixels in the target region, which is supposed to be centered at the origin point and have n pixels. The probability of the feature u (u=1, 2… m) in the target model is computed as [9]. |
|
|
|
Similarly, the probability of the feature u in the target candidate model from the candidate region centered at position y is given by |
|
|
|
Since the first term in Eq. (2.7) is independent of y, to minimize the distance in Eq. (2.6) is to maximize the second term in Eq. (2.7). In the mean shift iteration, the estimated target moves from y to a new position y1, which is defined as |
|
When we choose the kernel k(x) with the Epanechnikov profile, there is g(x) =-k(x) =1, and Eq. (2.9) can be reduced to [9]. |
|
By using Eq. (2.10), the mean shift tracking algorithm finds in the new frame the most similar region to the object. From Eq. (2.10) it can be observed that the key parameters in the mean shift tracking algorithm are the weights wi. In this project we will focus on the analysis of wi with which the scale and orientation of the tracked target can be well estimated, and then a scale and orientation adaptive mean shift tracking algorithm can be developed. |
B. Modified Mean Shift Tracking For Scale And Orientation Of Target. |
In this section, we first analyse how to calculate adaptively the scale and orientation of the target in sub-sections II.II.I ~ II.II.V, then in sub-section II.II.VI, a modified mean shift tracking (MMST) algorithm for scale and orientation of target is presented. |
The enlarging or shrinking of the target is usually a gradual process in consecutive frames. Thus we can assume that the scale change of the target is smooth and this assumption holds reasonably well in most video sequences. If the scale of the target changes abruptly in adjacent frames, no general tracking algorithm can track it effectively. With this assumption, we can make a small modification of the original mean shift tracking algorithm. Suppose that we have estimated the area of the target (the area estimation will be discussed in sub-section II.II.II) in the previous frame, in the current frame we let the window size or the area of the target candidate region be a little bigger than the estimated area of the target. Therefore, no matter how the scale and orientation of the target change, it should be still in this bigger target candidate region in the current frame. Now the problem turns to how to estimate the real area and orientation from the target candidate region. |
1. The Weight Images For Target Scale Changing |
In the CAMSHIFT and the mean shift tracking algorithms, the estimation of the target location is actually obtained by using a weight image [10, 24]. In CAMSHIFT, the weight image is determined using a hue-based object histogram where the weight of a pixel is the probability of its hue in the object model. While in the mean shift tracking algorithm, the weight image is defined by Eq. (2.8) where the weight of a pixel is the square root of the ratio of its color probability in the target model to its color probability in the target candidate model. Moreover, it is not accurate to use the weight image by CAMSHIFT to estimate the location of the target, and the mean shift tracking algorithm can have better estimation results. That is to say, the weight image in the mean shift tracking algorithm is more reliable than that in the CAMSHIFT algorithm. |
2. Estimating The Target Area |
Since the weight value of a pixel in the target candidate region represents the probability that it belongs to the target, the sum of the weights of all pixels, i.e., the zeroth order moment, can be considered as the weighted area of the target in the target candidate region: |
|
In mean shift tracking, the target is usually in the big target candidate region. Due to the existence of the background features in the target candidate region, the probability of the target features is less than that in the target model. So Eq. (2.8) will enlarge the weights of target pixels and suppress the weight of background pixels. Thus, the pixels from the target will contribute more to target area estimation, while the pixels from the background will contribute less. On the other hand, the Bhattacharyya coefficient (referring to Eq. (2.5)) is an indicator of the similarity between the target model q and the target candidate model p(y) . A smaller Bhattacharyya coefficient means that there are more features from the background and fewer features from the target in the target candidate region, vice versa. If we take Moo as the estimation of the target area, then according to Eq. (11), when the weights from the target become bigger, the estimation error by taking M00 as the area of the target will be bigger, vice versa. Therefore, the Bhattacharyya coefficient is a good indicator of how reliable it is by taking M00 as the target area. We propose the following equation to estimate it: |
|
Where c(ρ) is a monotonically increasing function with respect to the Bhattacharyya coefficient ρ(0 ≤ ρ ≤ 1).Here we choose the exponential function as c(ρ) based on our experimental experience: |
|
From Eqs. (2.12) and (2.13) we can see that when ρ approaches to the upper bound 1, i.e., when the target candidate model approaches to the target model, c(ρ) approaches to 1 and in this case it is more reliable to use M00 as the estimation of target area. When ρ decreases, i.e. the candidate model is not identical to the target model, M00 will be much bigger than the target area but c(ρ) is less than 1 so that A can avoid being biased too much from the real target area. When ρ approaches to 0, i.e., the tracked target gets lost, c(ρ) will be very small so that A is close to zero. |
3. The Moment Features In Mean Shift Tracking |
In this sub-section, we analyze the moment features in mean shift tracking and then combine them with the estimated target area to further estimate the width, height and orientation of the target in the next sub-section. Like in CAMSHIFT, we can easily calculate the moments of the weight image as follows: |
|
|
Where pair (xi,1, xi,2) is the coordinate of pixel i in the candidate region. Comparing Eq. (2.10) with Eqs. (2.11) and (2.14), we can find that y1 is actually the ratio of the first order moment to the zeroth order moment: |
|
Where (x 1, x 2) represents the centroid of the target candidate region. The second order center moment could describe the shape and orientation of an object. By using Eqs. (10), (11), (15) and (16), we can convert Eq. (9) to the second order center moment as follows |
|
Eq. (2.17) can be rewritten as the following covariance matrix in order to estimate the width, height and orientation of the target: |
|
4. Estimating The Width, Height And Orientation Of The Target |
By using the estimated area (sub-section 2.2.2) and the moment features (sub-section 2.2.3), the width, height and orientation of the target can be well estimated. The covariance matrix in Eq.(2.18) can be decomposed by using the singular value decomposition (SVD) [22] as follows |
|
|
Because the weight image is a reliable density distribution function, the orientation estimation of the target provided by matrix U is more reliable than that by CAMSHIFT. Moreover, in the CAMSHIFT algorithm, λ1 and λ2 height of the target, which is actually improper? Next, we present a new scheme to more accurately estimate the width and height of the target. |
Suppose that the target is represented by an ellipse, for which the lengths of the semi-major axis and semiminor axis are denoted by a and b, respectively. Instead of using and λ1 and λ2 directly as the width a and height b, it has been shown that the ratio of λ1 and λ2 can well approximate the ratio of a to b, i.e., λ1 λ2 ≈ a/b Thus we can set a = k λ1 and b= k λ2 , where k is a scale factor. Since we have estimated the target area A, there is πab = π (k λ1) (k λ2) = A. Then it can be easily derived that |
|
|
Now the covariance matrix becomes |
|
The adjustment of covariance matrix Cov in Eq. (22) is a key step of the proposed algorithm. It should be noted that the EM-like algorithm by Zivkovic and Krose [11] estimates iteratively the covariance matrix for each frame based on the mean shift tracking algorithm. Unlike the EM-like algorithm, our algorithm combines the area of target, i.e., A, with the covariance matrix to estimate the width, height and orientation of the target. |
5. Determining The Candidate Region In Next Frame |
Once the location, scale and orientation of the target are estimated in the current frame, we need to determine the location of the target candidate region in the next frame. With Eq. (2.22), we define the following covariance matrix to represent the size of the target candidate region in the next frame |
|
where Δd is the increment of the target candidate region in the next frame. The position of the initial target candidate region is defined by the following ellipse region |
|
6. Implementation Of The Mmst Algorithm |
Based on the above analyses in sub-sections 2.2.1 ~ 2.2.5, the scale and orientation of the target can be estimated and then a scale and orientation adaptive mean shift tracking algorithm, i.e. the MMST algorithm, can be developed. The implementation of the whole algorithm is summarized as follows. |
Algorithm of Modified Mean Shift Tracking (MMST)
|
1) Initialization: calculate the target model q and initialize the position y0 of the target candidate model in the previous frame. |
2) Initialize the iteration number k ←0. |
3) Calculate the target candidate model p(y0) in the current frame. |
|
EXPERIMENTAL RESULTS AND DISCUSSIONS
|
This section evaluates the Developed MMST algorithm in comparison with the original mean shift algorithm, i.e., mean shift tracking with a fixed scale, the adaptive scale algorithm [9] and the EM-shift algorithm [11, 25]. The adaptive scale algorithm and the EM-shift algorithm are two representative schemes to address the scale and orientation changes of the targets under the mean shift framework. Because the weight image estimated by CAMSHIFT is not reliable, it is prone to errors in estimating the scale and orientation of the object. So CAMSHIFT is not used in the experiments. |
We selected RGB color space as the feature space and it was quantized into 16×16×16 bins for a fair comparison between different algorithms. It should be noted that other color space such as the HSV color space can also be used in MMST. One synthetic video sequence and three real video sequences are used in the experiments. |
A. Experiments On Real Video Sequences |
The developed MMST algorithm is tested by using four real video sequences. The first video is a Torch sequence recorded in home (Figure 3.1) where the object has clearly scale and orientation changes. To show the efficiency of developed MMST algorithm figure (3.1) consists subsequent frames 20, 40, 80. The second video is a palm sequence (Figure 3.2) of 26 frames, where the object has clearly scale and orientation changes, the estimated target scale and orientation by is accurate by the MMST algorithm. |
|
|
|
Fig. 3.3: Tracking results of the car sequence by different tracking algorithms. The frames 15, 40, 60 and 75 are displayed. |
The last experiment is on a Card reader sequence which is complex because the object is small and having scale and orientation change. The object exhibits large scale changes with partial occlusion. The MMST scheme works much better in estimating the scale and orientation of the target. |
|
|
Table 1 lists the average numbers of iterations by different schemes on the four video sequences. The average number of iterations of the developed MMST is approximately equal to that of the original mean shift algorithm with fixed scale. The iteration number of the modified scale algorithm is the highest because it runs mean shift algorithm three times. The main factors which affect the convergence speed of the EM-shift and the MMST algorithms are the computation of the covariance matrix. EM-shift estimates it in each iteration while MMST only estimates it once for each frame. So MMST is faster than EM-shift. In general, the developed MMST algorithm, which is motivated by the CAMSHIFT algorithm [6], extends the mean shift algorithm when the target has large scale and orientation variations. It inherits the simplicity and effectiveness of the original mean shift algorithm while being adaptive to the scale and orientation changes of the target. |
CONCLUSIONS
|
By analyzing the moment features of the weight image of the target candidate region and the Bhattacharyya coefficients, we developed a scale and orientation adaptive mean shift tracking (MMST) algorithm. It can well solve the problem of how to estimate robustly the scale and orientation changes of the target under the mean shift tracking framework. |
The weight of a pixel in the candidate region represents its probability of belonging to the target, while the zeroth order moment of the weights image can represent the weighted area of the candidate region. By using the zeroth order moment and the Bhattacharyya coefficient between the target model and the candidate model, a simple and effective method to estimate the target area was proposed. Then a new approach, which is based on the area of the target and the corrected second order center moments, was proposed to adaptively estimate the width, height and orientation changes of the target. |
The developed MMST method inherits the merits of mean shift tracking, such as simplicity, efficiency and robustness. Extensive experiments were performed and the results showed that MMST can reliably track the objects with scale and orientation changes, which is difficult to achieve by other state-of-the-art schemes. In the future research, we will focus on how to detect and use the true shape of the target, instead of an ellipse or a rectangle model, for a more robust tracking. |
References
|
- Kailath T.: ‘The Divergence and Bhattacharyya Distance Measures in Signal Selection’, IEEE Trans. Communication Technology, 1967, 15, (1), pp. 52-60.
- Fukunaga F., Hostetler L. D.: ‘The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition’, IEEE Trans. on Information Theory, 1975, 21, (1), pp. 32-40.
- Cheng Y.: ‘Mean Shift, Mode Seeking, and Clustering’, IEEE Trans on Pattern Anal. Machine Intell., 1995, 17, (8), pp. 790-799.
- Mukundan R., Ramakrishnan K. R.: ‘Moment Functions in Image Analysis: Theory and Applications’, World Scientific, Singapore, 1996.
- Wren C., Azarbayejani A., Darrell T., Pentland A.: ‘Pfinder: Real-Time Tracking of the Human Body’, IEEE Trans. Pattern Anal. Machine Intell, 1997, 19, (7), pp. 780-785.
- Bradski G.: ‘Computer Vision Face Tracking for Use in a Perceptual User Interface’, Intel Technology Journal, 1998, 2(Q2), pp. 1-15.
- Comaniciu D., Ramesh V., Meer P.: ‘Real-Time Tracking of Non-Rigid Objects Using Mean Shift’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head, SC, June, 2000, vol. 2, pp. 142-149.
- Comaniciu D., Meer P.: ‘Mean Shift: a Robust Approach toward Feature Space Analysis’, IEEE Trans Pattern Anal. Machine Intell., 2002, 24, (5), pp. 603-619. [
- Comaniciu D., Ramesh V., Meer P.: ‘Kernel-Based Object Tracking’, IEEE Trans. Pattern Anal. Machine Intell.2003, 25, (2), pp. 564- 577.
- Collins R.: ‘Mean-Shift Blob Tracking through Scale Space’, Proc. IEEE Conf. Computer Vision and Pattern Recognition, Wisconsin, USA, 2003, pp. 234-240.
- Zivkovic Z., Krose B.: ‘An EM-like Algorithm for Color-Histogram-Based Object Tracking’, Proc. IEEE Conf. Computer Vision and Pattern Recognition, Washington, DC, USA, 2004, vol.1, pp. 798-803.
- Yang C., Ramani D., Davis L.: ‘Efficient Mean-Shift Tracking via a New Similarity Measure’, Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Diego, CA, 2005, vol. 1, pp.176-183.
- Fashing M., Tomasi C.: ‘Mean Shift is a Bound Optimization’, IEEE Trans. Pattern Anal. Machine Intell., 2005, 27, (3), pp. 471-474.
- Yilmaz A., Javed O., Shah M.: ‘Object Tracking: a Survey’, ACM Computing Surveys, 2006, 38, (4), Article 13.
- Carreira-Perpinan M. A. ‘Gaussian Mean-Shift is an EM Algorithm’, IEEE Trans. Pattern Anal. Machine Intell., 2007, 29, (5), pp. 767- 776.
- Birchfield S., Rangarajan S.: ‘Spatiograms versus histograms for region-based tracking’, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2005, vol. 2, pp. 1158–1163, 2005.
- Hu J., Juan C., Wang J.: ‘A spatial-color mean-shift object tracking algorithm with scale and orientation estimation’, Pattern Recognition Letters, 2008, 29, (16), pp. 2165-2173.
- Srikrishnan V., Nagaraj T., Chaudhuri S.: ‘Fragment Based Tracking for Scale and Orientation Adaption’, Proc. Indian Conf. on In Computer Vision, Graphics & Image Processing, 2008, pp. 328-335.
- Linderberg T.: ‘Feature Detection with Automatic Scale Selection’, International Journal of Computer Vision. 1998, 30, (2), pp. 79-116.
- Bretzner L., Lindeberg T.: ‘Qualitative Multi-Scale Feature Hierarchies for Object Tracking’, Journal of Visual Communication and Image Representation, 2000, 11, (2), pp.115-129.
- Nummiaro K., Koller-Meier E., Gool L. V.: ‘An Adaptive Color-Based Particle Filter’, Image and Vision Computing, 2003, 21, (1), pp. 99-110.
- Horn R. A., Johnson C. R., Topics in Matrix Analysis, Cambridge University Press, U.K., 1991.
- Quast K., Kaup A.: ‘Scale and Shape adaptive Mean Shift Object Tracking in Video Sequences’, Proc. European Signal Processing Conference, Glasgow, Scotland, 2009, pp. 1513-1517.
|