Video Analysis using Absorbing Markov Chain
- 학위수여년월2017. 8
- 학과 및 전공일반대학원 컴퓨터공학과
- 저작권포항공과대학교 논문은 저작권에 의해 보호받습니다.
- 초록 moremore
Nowadays, a tremendous number of videos are captured, consequently, requirement on automatic video analysis increases. Video analysis refers to computer vision techniques which detect and recognize interesting activities or objects from videos. In this thesis, we focus on three challenging problems ...
- Nowadays, a tremendous number of videos are captured, consequently, requirement on automatic video analysis increases. Video analysis refers to computer vision techniques which detect and recognize interesting activities or objects from videos. In this thesis, we focus on three challenging problems of video analysis: 1) co-activity detection, 2) salient object detection and 3) visual object tracking and segmentation. Co-activity detection is the task extracting one or more streaks of frames containing a common activity from each video out of multiple ones without separate training procedure. Salient object detection refers to the task of identifying regions which stand out from their neighborhood and draw attention of human visual system. Object tracking and segmentation aims to segment a target object, which is defined in the first frame, for succeeding frames sequentially.
Absorbing Markov Chain (AMC) has been studied in several computer vision problems successfully, which include image matching, image segmentation and image saliency detection. However, it is not straightforward to extend the idea to video analysis since the image-level models typically have trouble in handling videos involving various spatio-temporal challenges and there are several critical design issues related to AMC. In this thesis, we propose simple but effective algorithms using AMC to tackle video analysis problems.
Firstly, we propose an unsupervised learning algorithm to detect a common activity (coactivity) from a set of videos, which is formulated using AMC in a principled way. In our algorithm, a complete multipartite graph is first constructed, where vertices represent subsequences extracted from videos using a temporal sliding window and edges connect the pairs of vertices originated from different videos; the weight of an edge is proportional to the similarity between the features of two end vertices. Then, we extend the graph structure by adding edges between temporally overlapped subsequences in a video to handle variablelength co-activities using temporal locality, and create an absorbing vertex connected from all other nodes. The proposed algorithm identifies a subset of subsequences as co-activity by estimating absorption time in the constructed graph efficiently. The great advantage of our algorithm lies in the properties that it can handle more than two videos naturally and identify multiple instances of a co-activity with variable lengths in a video. Our algorithm is evaluated intensively in challenging datasets and shows outstanding performance quantitatively and qualitatively.
Secondly, we propose an algorithm to detect salient object in videos which is a natural extension of the existing techniques in image domain, by propagating spatio-temporal visual saliency using AMC. The proposed algorithm is composed of two steps— 1) motion and 2) appearance saliency estimation, where the absorbing vertices in the Markov chain at the previous frame are first determined based on motion information, and the salient superpixels are finally identified using an AMC defined in a spatio-temporal domain. Our algorithm is evaluated on four independent datasets and achieves outstanding performance compared to the state-of-the-art techniques with less computational cost.
Lastly, we propose a simple but effective superpixel-based tracking-by-segmentation algorithm using AMC, where target state is estimated by a combination of bottom-up and top-down approaches, and target segmentation is propagated to subsequent frames in a recursive manner. In this framework, we employ AMC in supervised manner, whereas we proposed algorithms in unsupervised manners for the previous two video analysis tasks. Our algorithm constructs a graph for AMC using the superpixels extracted in two consecutive frames, where background superpixels in the previous frame correspond to absorbing vertices while all other superpixels create transient ones. The weight of each edge depends on the similarity of scores in the two end superpixels, which are given by support vector regression. Once graph construction is completed, target segmentation is estimated using the absorption time of each superpixel. The proposed tracking algorithm achieves substantially improved performance compared to the state-of-the-art segmentation-based tracking techniques in multiple challenging datasets.