摘要
Background Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in the training procedure, such as data augmentation and learning objectives. Methods Herein, we categorize a collection of such tricks through the theoretical examination and empirical evaluation of their effects on the final accuracy of the visual odometry. Results/Conclusions By combining the aforementioned tricks, we were able to significantly improve a baseline model adapted from SfMLearner without additional inference costs. Furthermore, we analyzed the principles of these tricks and the reason for their success. Practical guidelines for future research are also presented.