袁瑩
DOI:10.16644/j.cnki.cn33-1094/tp.2016.09.005
摘 要: 圖像中存在顏色、形狀和紋理等全局特征以及LBP和SIFT等局部特征,這些異構(gòu)特征之間存在明顯的結(jié)構(gòu)信息。不同視覺(jué)特征在表示特定高層語(yǔ)義時(shí)重要程度不同,因此,正確的特征選擇對(duì)于圖像標(biāo)注來(lái)說(shuō)具有十分重要的意義。為了充分利用異構(gòu)特征之間的結(jié)構(gòu)組效應(yīng),提出了一種基于組稀疏的高維特征選擇算法及其在圖像標(biāo)注中的應(yīng)用。通過(guò)與其他三種算法在圖像標(biāo)注上的性能對(duì)比,證明該算法能得到更優(yōu)的圖像標(biāo)注結(jié)果。
關(guān)鍵詞: 異構(gòu)特征; 組稀疏; 特征選擇; 圖像標(biāo)注
中圖分類號(hào):TP311 文獻(xiàn)標(biāo)志碼:A 文章編號(hào):1006-8228(2016)09-17-04
Image annotation based on structured grouping sparsity
Yuan Ying
(Department of Computer and Information Technology, ZheJiang Police College, Hangzhou, Zhejiang 310058, China)
Abstract: The heterogeneous features can describe various aspects of visual characteristics of images, such as global features (color, shape and texture) or local features (SIFT and LBP). Different heterogeneous features have different structural information. Different groups of heterogeneous features have different intrinsic discriminative power to characterize the semantics inside images. Therefore, to select the right features is of great significance for image annotation. In order to effectively utilize the structural grouping effect among heterogeneous visual features, a high-dimensional feature selection method based on structured grouping sparsity is proposed, and its application in image annotation is introduced. Comparing with the performance of other three algorithms in image annotation, it is proved that the proposed algorithm can get better image annotation results.
Key words: heterogeneous features; group sparsity; feature selection; image annotation
0 引言
隨著數(shù)字?jǐn)z影、網(wǎng)絡(luò)技術(shù)、存儲(chǔ)技術(shù)的迅速發(fā)展,互聯(lián)網(wǎng)中的圖像數(shù)據(jù)大量涌現(xiàn)。許多互聯(lián)網(wǎng)網(wǎng)站如Flickr以及Wikipidia,提供用戶免費(fèi)上傳、儲(chǔ)存、分享照片,同時(shí)將圖片標(biāo)上標(biāo)簽以供瀏覽、查詢。這些網(wǎng)站每天都在不斷產(chǎn)生和使用海量的圖像數(shù)據(jù)且伴隨有大量文本信息,例如標(biāo)注信息。然而這些標(biāo)注信息往往是混亂無(wú)序的,同時(shí)還存在不少錯(cuò)誤。這些圖像數(shù)據(jù)在給人們生活帶來(lái)各種便利的同時(shí),也使用戶如何能夠從圖像數(shù)據(jù)中快速準(zhǔn)確地找到所需要的信息成為了一個(gè)迫切需要解決的課題。因此如何正確標(biāo)注圖像具有十分重要的意義。
1 稀疏表達(dá)概述
近年來(lái),從統(tǒng)計(jì)信號(hào)處理中發(fā)展出的壓縮感知(Compressive Sensing,簡(jiǎn)稱CS)受到越來(lái)越多的關(guān)注。壓縮感知利用“數(shù)據(jù)是稀疏可壓縮”先驗(yàn)知識(shí)進(jìn)行信號(hào)重建。壓縮感知(Compressed sensing) 和特征選擇(Feature selection)理論與方法相結(jié)合,用來(lái)對(duì)圖像形成更加有效的“稀疏表達(dá)”(Sparse representation),成為計(jì)算機(jī)視覺(jué)和機(jī)器學(xué)習(xí)等領(lǐng)域的研究熱點(diǎn)問(wèn)題。美國(guó)斯坦福大學(xué)的Tibshirani 和加州大學(xué)伯克利分校的Breiman幾乎同時(shí)提出了對(duì)特征系數(shù)施以?1-范數(shù)約束的lasso(least absolute shrinkage and selection operator)思想[1-2],促使被選擇出來(lái)的特征盡可能稀疏,以保證結(jié)果穩(wěn)定性和提高數(shù)據(jù)處理過(guò)程的可解釋性(interpretable)。但是,以lasso為基礎(chǔ)的特征選擇方法并沒(méi)有考慮到特征之間存在的組效應(yīng)(grouping effect)特性(即某一(類)特征與其他(類)特征之間存在很強(qiáng)相關(guān)性)。
為了克服這一不足,本文利用異構(gòu)特征所存在的組稀疏(grouping sparsity)特點(diǎn)去選擇某一語(yǔ)義所對(duì)應(yīng)的重要特征,提出了一種基于結(jié)構(gòu)化組稀疏的高維特征選擇算法(high-dimensional feature selection methods based on Structured Grouping Sparsity,簡(jiǎn)稱SGS)。
4 結(jié)束語(yǔ)
本文將同一種類視覺(jué)特征歸屬為一組(如SIFT 特征歸屬為一組,而顏色直方圖歸屬為另一組),使得圖像異構(gòu)特征在表達(dá)時(shí)能充分利用這種結(jié)構(gòu)性組效應(yīng)。同時(shí),為了克服數(shù)據(jù)高維異構(gòu)特征帶來(lái)的線性不可分問(wèn)題,本章提出了一個(gè)基于結(jié)構(gòu)化組稀疏的高維特征選擇圖像標(biāo)注算法(high-dimensional feature selection methods based on Structured Grouping Sparsity,簡(jiǎn)稱SGS)。本文通過(guò)與其他三種算法在圖像標(biāo)注上的性能對(duì)比,證明了所提出算法SGS 能得到更優(yōu)的圖像標(biāo)注結(jié)果。
但是在高維特征上的基于核學(xué)習(xí)的算法需要將數(shù)據(jù)通過(guò)核函數(shù)映射到新的特征空間,映射后的核矩陣維數(shù)只跟樣本數(shù)量有關(guān),因此對(duì)于大規(guī)模圖像數(shù)據(jù),核學(xué)習(xí)算法運(yùn)行較慢,且無(wú)法滿足隨時(shí)增長(zhǎng)的圖像標(biāo)注問(wèn)題。如何建立大規(guī)模圖像數(shù)據(jù)的學(xué)習(xí)模型以及如何處理實(shí)時(shí)增長(zhǎng)的圖片數(shù)據(jù),是圖像標(biāo)注領(lǐng)域值得研究的重要問(wèn)題。
參考文獻(xiàn)(References):
[1] Robert Tibshirani. Regression shrinkage and selection via
the lasso. Journal of the Royal Statistical Society. Series B (Methodological),1996:267-288
[2] Leo Breiman. Heuristics of instability and stabilization in
model selection. The annals of statistics,1996.24(6):2350-2383
[3] F. Wu, Y. Yuan, Y. Rui, S. Yan, Y. Zhuang. Annotating
web images using nova: Non-convex group sparsity. In Proceedings of the 20th ACM international conference on Multimedia,2012:509-518
[4] Alexander Loui, Jiebo Luo, Shih-Fu Chang, Dan Ellis, Wei
Jiang, Lyndon Kennedy, Keansub Lee, Akira Yanagawa. Kodak's consumer video benchmark data set: concept definition and annotation. In Proceedings of the international workshop on Workshop on multimedia information retrieval,2007:245-254
[5] Hao Li, Meng Wang, Xian-Sheng Hua. Msra-mm 2.0: A
large-scale web multimedia dataset. In Data Mining Workshops, 2009. ICDMW'09. IEEE International Conference on,2009:164-169
[6] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li,
Zhiping Luo, Yantao Zheng. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval,2009:48
[7] M. Yuan and Y. Lin. Model selection and estimation in
regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology),2006.68(1):49-67
[8] Ying Yuan, Jian Shao, Fei Wu, Yue-Ting Zhuang. Image
annotation by the multiple kernel learning with group sparsity effect. Ruanjian Xuebao/Journal of Software,2012.23(9):2500-2509