摘要
The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images. The prevalent way is to provide a keyword interface for users to submit queries. However, the amount of images without any tags or annotations are beyond the reach of manual efforts. To overcome this, automatic image annotation techniques emerge, which are generally a process of selecting a suitable set of tags for a given image without user intervention. However, there are three main challenges with respect to Web-scale image annotation: scalability, noise- resistance and diversity. Scalability has a twofold meaning: first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster. Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags. Diversity represents that image content may include both scenes and objects, which are further described by multiple different image features constituting different facets in annotation. In this paper, we propose a unified framework to tackle the above three challenges for automatic Web image annotation. It mainly involves two components: tag candidate retrieval and multi-facet annotation. In the former content-based indexing and concept-based eodebook are leveraged to solve scalability and noise-resistance issues. In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets. Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map. Millions of images from Flickr are used in our evaluation. Experimental results show that we have achieved 33% performance improvements compared with those single facet approaches in terms of three metrics: precision, recall and F1 score.
The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images. The prevalent way is to provide a keyword interface for users to submit queries. However, the amount of images without any tags or annotations are beyond the reach of manual efforts. To overcome this, automatic image annotation techniques emerge, which are generally a process of selecting a suitable set of tags for a given image without user intervention. However, there are three main challenges with respect to Web-scale image annotation: scalability, noise- resistance and diversity. Scalability has a twofold meaning: first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster. Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags. Diversity represents that image content may include both scenes and objects, which are further described by multiple different image features constituting different facets in annotation. In this paper, we propose a unified framework to tackle the above three challenges for automatic Web image annotation. It mainly involves two components: tag candidate retrieval and multi-facet annotation. In the former content-based indexing and concept-based eodebook are leveraged to solve scalability and noise-resistance issues. In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets. Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map. Millions of images from Flickr are used in our evaluation. Experimental results show that we have achieved 33% performance improvements compared with those single facet approaches in terms of three metrics: precision, recall and F1 score.
基金
supported by the National Natural Science Foundation of China under Grant No. 60931160445