- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
基于机器视觉的基准标记跟踪
展开查看详情
1 .Fiducial Marker Tracking Using Machine Vision Saurabh Ghanekar, Kavi Global Kazutaka Takahashi, University of Chicago #AISAIS14
2 .Outline • Motivation & Goals • Approach • Results • Next Steps #AISAIS14 2
3 .Motivation • Feeding is a highly complex, life-sustaining behavior, essential for survival in all species • Certain neurological conditions such as Parkinson’s disease, ALS, stroke can cause difficulty in chewing and swallowing, known as dysphagia • Affects quality of life • Dysphagia can lead to malnutrition, dehydration, and aspiration #AISAIS14 3
4 .End-Goal To characterize feeding dynamics and gain insights into feeding behavior changes caused by certain neurological conditions and changes in oral environment. #AISAIS14 4
5 .Current State • Study focused on rodents • X-ROMM videos of rodents feeding on kibble • Videos recorded from 2 camera angles simultaneously • Radio-opaque markers implanted in skull, mandible, tongue • Movement of markers needs to be tracked and quantified • Marker tracking process is extremely tedious as it is done using manual, frame-by-frame methods [1,2] • Consumes valuable time, thus delaying further research #AISAIS14 5
6 .Immediate Goal A near-automated, deep learning-based solution for detecting and tracking markers, resulting in a more efficient and robust process (c) Bunyak et al, 2017 #AISAIS14 6
7 . Approach: Key Steps Data In: Head and Marker Detection: Marker Tracking: Read in videos frame by Utilize neural network to identify Employ Kalman filters along with frame for left and right bounding box of head and also Hungarian algorithm to keep cameras in 2D (x,y) pinpoint unlabeled markers inside track of markers from frame to the bounding box. frame Sequence Matching: 2D to 3D conversion: Match sequence tracks from Feed 2D left right coordinates left and right cameras along with rotational matrices and translation vector to get final 3D coordinates (x,y,z) #AISAIS14 7
8 .Data Description • 13 pairs of videos (left & right camera) available for training • 720px by 1260px videos, recorded at 250 fps, ~10 seconds each • Head and marker coordinates per frame used for model training & evaluation • 18-20 markers to be tracked in each video Camera 1 Camera 2 #AISAIS14 8
9 .Head and Marker Detection • TensorFlow’s Object Detector API • Single Shot Multibox Detector (SSD) with MobileNet using transfer learning from the MS COCO dataset • Key model parameters: – Initial Learning Rate: 0.0004 – Feature Extractor Type: ssd_mobilenet_v1 – Minimum Depth: 16 – Depth Multiplier: 1.0 – conv_hyperparams: activation: RELU_6; regularizer: l2_regularizer; weight: 0.00004 #AISAIS14 9
10 .Head and Marker Detection #AISAIS14 10
11 .Marker Tracking Multi-object tracking involves three key components: • Predicting the object location in the next frame • Associating predictions with existing objects • Track Management (c) Howe, Holcombe, 2012 #AISAIS14 11
12 .Prediction • Kalman Filter is used to predict marker location in the next frame • Estimate position recursively in each frame, based on previous frames • Uses Bayesian learning and estimates a joint probability distribution • Start with initial velocity estimate & covariance matrix #AISAIS14 12
13 .Association • After prediction, an assignment cost-matrix is computed from the bounding-box intersection-over-union (IoU) • Hungarian Algorithm is used to optimally associate markers True Marker Positions 0 1 2 ... 0 518 101 312 Predicted 1 24 963 225 Marker Positions 2 872 20 220 ... #AISAIS14 13
14 .Track Management • If IoU is below a set threshold, there is no assignment • Also, not all potential tracks become actual tracks • As a result, tracks may die and new ones are born • The output of Kalman filter and Hungarian algorithm can result in a large number of discontinuous tracks • These are “stitched” together by looking forward and backward a number of frames to find the best match based on closest Euclidean distance • At the end, we get one track per marker #AISAIS14 14
15 . Predicted (x,y) Predicted (x,y) For left camera For right camera Sequence Matching • After generating marker tracks separately for each camera, corresponding tracks from each camera must be matched • Tried different distinct methodologies such as Time Series Clustering and different correlation measures. • Spearman correlation on frame-to-frame changes in Y-coordinate values gave the best results (100% accuracy on manually tracked data) #AISAIS14 15
16 .2D to 3D Conversion • P = K * (R | T) - Camera Projection Matrix (3x4) for each camera – K = Camera Matrix (3x3) – R = Rotation Matrix (3x3) – T = Translation Vector (3x1) • Results are a good match with actual 3D coordinates #AISAIS14 16
17 . Evaluation 1: % IoU Difference Step 1: Calculate a perfect overlap. Step 2: Calculate the percent difference Sum the area of the boxes over each between the perfect area and actual IoU frame for each true stream Predicted Streams True Streams True Streams Area Stream: (box area)*(number of frames) % IoU Difference = #AISAIS14 17
18 .Evaluation 2: % Correctly Labeled Step 1: Determine the best matching Step 2: Calculate the percent of frames marker label for each frame in the labeled with the same label as the overall predicted streams using the maximum IOU label given in the tracking phase in each frame 2 2 2 1 2 2 2 2 2 1 2 Predicted 0 2 Predicted Stream: 2 Streams 2 0 Marker 2 0 0 True Streams #AISAIS14 18
19 .Results #AISAIS14 19
20 .Challenges Ideal scenario Off-screen Markers Occluded Markers Even if marker detection and tracking models perform well, the above problems may negatively impact results since at some point a marker may be assigned to the incorrect track #AISAIS14 20
21 .Next Steps • Detection – Tune marker detection thresholds, and marker assignment thresholds • Kalman Filter – Tune initialization velocities, and acceleration and covariance matrices – Better initialization is known to produce better predictions – Non-linear methods (Extended Kalman Filters, particle filters) • Marker Detection Assignment to Kalman Tracks – Currently using Hungarian Assignment. Other options include Probabilistic Assignment, Markov Chain Monte Carlo methods • Stitching – Tune parameters and algorithm to better match together disparate tracks #AISAIS14 21
22 .References [1] Bunyak F, Shiraishi N, Palaniappan K, Lever TE, Avivi-Arber L, Takahashi K. Development of semi-automatic procedure for detection and tracking of fiducial markers for orofacial kinematics during natural feeding. Conference proceedings : Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE Engineering in Medicine and Biology Society Annual Conference. 2017;2017:580-583. doi:10.1109/EMBC.2017.8036891. [2] Best MD, Nakamura Y, Kijak NA, et al. Semiautomatic marker tracking of tongue positions captured by videofluoroscopy during primate feeding. Conference proceedings: . Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE Engineering in Medicine and Biology Society Annual Conference. 2015;2015:5347-5350. doi:10.1109/EMBC.2015.7319599. [3] Howe PDL and Holcombe AO (2012) The effect of visual distinctiveness on multiple object tracking performance. Front. Psychology 3:307. doi: 10.3389/fpsyg.2012.00307 #AISAIS14 22
23 . Thank You! Saurabh Ghanekar Kazutaka Takahashi, Ph.D. Research Assistant Professor Principal Consultant Department of Organismal Biology and Anatomy Kavi Global University of Chicago saurabh@kaviglobal.com kazutaka@uchicago.edu Funding Information: • National Center for Advancing Translational Sciences of the National Institutes of Health (UL1 TR000430) • JSPS The Strategic Young Researcher Overseas Visits Program for Accelerating Brain Circulation (S2504) • JSPS KAKENHI (JP16K11589) Acknowledgements: • Dr. Naru Shiraishi, Niigata University, Japan (for experimental procedure development and data collection) • Animal Research Center (ARC) staff at the University of Chicago #AISAIS14 23