摘要
近年来基于时空兴趣点的视觉词袋(bag of video words,BOVW)模型被广泛用于行为识别算法研究;但是该模型忽略了每一种视觉单词的权重,另外没有考虑兴趣点时空分布信息,因而制约了其识别精度。提出了两种算法解决上述问题;其一,采用词频-逆向文件频率(term frequency-inverse document frequency,TF-IDF)方法对传统BOVW直方图进行优化处理,根据视觉单词在词袋与BOVW直方图的比例权衡其重要程度;其二,提出了基于三维共生矩阵的时空兴趣点互信息(spatialtemporal interest points mutual information,STIPsMI)算法,刻画不同视觉单词的时空兴趣点之间的时空关系。然后将STIPsMI描述符与优化后的BOVW直方图级联,作为视频序列最终的描述符。最后在两个主流的数据集KTH与UCF sports对该算法进行评估。实验结果表明,提出的时空特征描述符在行为识别准确率上优于BOVW模型与其他主流方法。
In recent years,the bag of visual words ( BOVW) model based on spatial-temporal interest points (STIPs) has been widely used in the research of behavior recognition. However,the model ignores the weight of each visual word,and secondly it does not consider the spatial and temporal distribution of STIPs,which defrades the recognition accuracy. Two new algoritlims were proposed to solve the above problems. Firstly,term frequency-inverse document frequency (TF-IDF) metiiod was used to optimize the traditional BOVW histogram,and the importance of visual word is weighed according to the its proportion in the words bag and the BOVW histogram .Secondly,the STPs mutual information( STPsIVI) algorithm based on three dimensional Co-occurrence matrix is proposed ;the new descriptor is proposed to describe the spatial-temporal relationship of interest points between different visual words. Then the STIPsMI descriptor is concatenated with the optimized BOVW histogram as the final descriptor of the video sequence. The proposed method is evaluated on two challenging human action datasets:the KTH dataset and the UCF sports dataset. Experiment results confirm the validity of our approach and better than BOVW model and other mainstream methods.
作者
刘云
杨建滨
王传旭
LIU Yun;YANG Jian-bin;WANG Chuan-xu(Institute of Informatics,Qingdao University of Science and Technology,Qingdao 266061,China)
出处
《科学技术与工程》
北大核心
2018年第21期69-75,共7页
Science Technology and Engineering
基金
国家自然科学基金(61472196
61672305)
山东省自然科学基金(ZR2015FM012)资助