摘要
深度估计是智能农机视觉系统实现三维场景重建和目标定位的关键。该文提出一种基于自监督学习的番茄植株图像深度估计网络模型,该模型直接应用双目图像作为输入来估计每个像素的深度。设计了3种面向通道分组卷积模块,并利用其构建卷积自编码器作为深度估计网络的主体结构。针对手工特征衡量2幅图像相似度不足的问题,引入卷积特征近似性损失作为损失函数的组成部分。结果表明:基于分组卷积模块的卷积自编码器能够有效提高深度估计网络的视差图精度;卷积特征近似性损失函数对提高番茄植株图像深度估计的精度具有显著作用,精度随着参与损失函数计算的卷积模块层数的增加而升高,但超过4层后,其对精度的进一步提升作用不再明显;当双目图像采样距离在9.0 m以内时,该文方法所估计的棋盘格角点距离均方根误差和平均绝对误差分别小于2.5和1.8 cm,在3.0 m以内时,则分别小于0.7和0.5 cm,模型计算速度为28.0帧/s,与已有研究相比,2种误差分别降低了33.1%和35.6%,计算速度提高了52.2%。该研究可为智能农机视觉系统设计提供参考。
Depth estimation is critical to 3D reconstruction and object location in intelligent agricultural machinery vision system, and a common method in it is stereo matching. Traditional stereo matching method used low-quality image extracted manually. Because the color and texture in the image of field plant is nonuniform, the artificial features in the image are poorly distinguishable and mismatching could occur as a result. This would compromise the accuracy of the depth of the map. While the supervised learning-based convolution neural network(CNN) is able to estimate the depth of each pixel in plant image directly, it is expensive to annotate the depth data. In this paper, we present a depth estimation model based on the self-supervised learning to phenotype tomato canopy. The tasks of the depth estimation method were to reconstruct the image. The dense disparity maps were estimated indirectly using the rectified stereo pair of images as the network input, from which a bilinear interpolation was used to sample the input images to reconstruct the warping images. We developed three channel wise group convolutional(CWGC) modules, including the dimension invariable convolution module, the down-sampling convolution module and the up-sampling convolution module, and used them to construct the convolutional auto-encoder-a key infrastructure in the depth estimation method. Considering the shortage of manual features for comparing image similarity, we used the loss in image convolutional feature similarity as one objective of the network training. A CWGC-based CNN classification network(CWGCNet) was developed to extract the low-level features automatically. In addition to the loss in image convolutional feature similarity, we also considered the whole training loss, which include the image appearance matching loss, disparity smoothness loss and left-right disparity consistency loss. A stereo pair of images of tomato was sampled using a binocular camera in a greenhouse. After epipolar rectification, the pair of images was constructed for training and testing of the depth estimation model. Using the Microsoft Cognitive Toolkit(CNTK), the CWGCNet and the depth estimation network of the tomato images were calculated using Python. Both training and testing experiments were conducted in a computer with a Tesla K40 c GPU(graphics processing unit). The results showed that the shallow convolutional layer of the CWGCNet successfully extracted the low-level multiformity image features to calculate the loss in image convolutional feature similarity. The convolutional auto-encoder developed in this paper was able to significantly improve the disparity map estimated by the depth estimation model. The loss function in image convolutional feature similarity had a remarkable effect on accuracy of the image depth. The accuracy of the disparity map estimated by the model increased with the number of convolution modules for calculating the loss in convolutional feature similarity. When sampled within 9.0 m, the root means square error(RMSE) and the mean absolute error(MAE) of the corner distance estimated by the model were less than 2.5 cm and 1.8 cm, respectively, while when sampled within 3.0 m, the associated errors were less than 0.7 cm and 0.5 cm, respectively. The coefficient of determination(R2) of the proposed model was 0.8081, and the test speed was 28 fps(frames per second). Compared with the existing models, the proposed model reduced the RMSE and MAE by 33.1% and 35.6% respectively, while increased calculation speed by 52.2%.
作者
周云成
许童羽
邓寒冰
苗腾
吴琼
Zhou Yuncheng;Xu Tongyu;Deng Hanbing;Miao Teng;Wu Qiong(College of Information and Electrical Engineering,Shenyang Agricultural University,Shenyang 110866,China)
出处
《农业工程学报》
EI
CAS
CSCD
北大核心
2019年第24期173-182,共10页
Transactions of the Chinese Society of Agricultural Engineering
基金
辽宁省自然科学基金(20180551102)
国家自然科学基金(31601218)
关键词
图像处理
卷积神经网络
算法
自监督学习
深度估计
视差
深度学习
番茄
image processing
convolution neural network
algorithms
self-supervised learning
depth estimation
disparity
deep learning
tomato