«Previous Article|Table of Contents|Next Article»

[1] Zhang Hongbin, Ji Donghong, Yin Lan, et al. Product image sentence annotationbased on kernel descriptors and tag-rank [J]. Journal of Southeast University (English Edition), 2016, 32 (2): 170-176. [doi:10.3969/j.issn.1003-7985.2016.02.007]
Copy

Product image sentence annotationbased on kernel descriptors and tag-rank()

基于核特征和tag-rank的商品图像句子标注

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 32
Issue:: 2016 2

Page:: 170-176

Research Field:: Computer Science and Engineering

Publishing date:: 2016-06-20

Info

Title:: Product image sentence annotationbased on kernel descriptors and tag-rank

: 基于核特征和tag-rank的商品图像句子标注

Author(s):: Zhang Hongbin¹; 2; Ji Donghong¹; Yin Lan¹; Ren Yafeng¹; Yin Yi²; ¹Computer School, Wuhan University, Wuhan 430072, China
²School of Software, East China Jiaotong University, Nanchang 330013, China

: 张红斌¹; 2; 姬东鸿¹; 尹兰¹; 任亚峰¹; 殷依²; ¹武汉大学计算机学院, 武汉 430072; ²华东交通大学软件学院, 南昌 330013

Keywords:: product image; sentence annotation; kernel descriptors; tag-rank; word sequence blocks building(WSBB); N-gram word sequences

: 商品图像; 句子标注; 核特征; tag-rank; 词序列拼积木; N元词序列

PACS:: TP391

DOI:: 10.3969/j.issn.1003-7985.2016.02.007

Abstract:: Dealing with issues such as too simple image features and word noise inference in product image sentence amnotation, a product image sentence annotation model focusing on image feature learning and key words summarization is described. Three kernel descriptors such as gradient, shape, and color are extracted, respectively. Feature late-fusion is executed in turn by the multiple kernel learning model to obtain more discriminant image features. Absolute rank and relative rank of the tag-rank model are used to boost the key words’ weights. A new word integration algorithm named word sequence blocks building(WSBB)is designed to create N-gram word sequences. Sentences are generated according to the N-gram word sequences and predefined templates. Experimental results show that both the BLEU-1 scores and BLEU-2 scores of the sentences are superior to those of the state-of-art baselines.

: 针对商品图像句子标注中图像特征单一、关键词受噪声干扰等问题, 提出一种聚焦图像特征学习和关键词摘取的商品图像句子标注模型.从梯度、形状和颜色3个角度抽取图像核特征, 并在多核学习模型内进行后融合.利用tag-rank模型中的绝对排序和相对排序特征提升关键词权重, 设计词序列拼积木算法把关键词拼装成N元词序列.基于N元词序列和模板生成句子.实验表明:句子的BLEU-1和BLEU-2评分优于对比模型.

References:

[1] Farhadi A, Hejrati M, Sadeghi M A, et al. Every picture tells a story: Generating sentences from images[C]//European Conference on Computer Vision. Berlin: Springer-Verlag, 2010: 15-29.
[2] Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task: Data, models and evaluation metrics [J]. Journal of Artificial Intelligence Resource, 2013, 47(1): 853-899.
[3] Yang Y, Teo C L, Daume H, et al. Corpus-guided sentence generation of natural images[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, Scotland, UK, 2011:444-454.
[4] Kulkarni G, Premraj V, Dhar S, et al. Baby talk: Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2891-2903. DOI:10.1109/TPAMI.2012.162.
[5] Ushiku Y, Harada T, Kuniyoshi Y. Automatic sentence generation from images[C]//Proceedings of the 19th ACM International Conference on Multimedia. New York: ACM, 2011:1533-1536.
[6] Feng F, Lapata M. Automatic caption generation for news images [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4):797-812. DOI:10.1109/TPAMI.2012.118.
[7] Gupta A, Verma Y, Jawahar C V, et al. Choosing linguistics over vision to describe images[C]//American Association for Artificial Intelligence. Palo Alto, CA, USA: Association for the Advancement of Artificial Intelligence, 2012:606-611.
[8] Berg T L, Berg A C, Shih J. Automatic attribute discovery and characterization from noisy web data[C]//European Conference on Computer Vision. Berlin: Springer, 2010: 663-676.
[9] Kiapour H, Yamaguchi K, Berg A C, et al. Hipster Wars: Discovering elements of fashion styles[C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 472-488.
[10] Mason R. Domain-independent captioning of domain-specific images[C]//North American Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics Publication, 2013:69-76.
[11] Kiros R, Salakhutdinov R, Zemel R. Multimodal neural language models[C]//International Conference on Machine Learning. Beijing, China, 2014: 595-603.
[12] Bo L, Ren X, Fox D. Kernel descriptors for visual recognition[C]//Advances in Neural Information Processing Systems. Vancouver, Canada, 2010:1734-1742.
[13] Hwang S, Grauman K. Learning the relative importance of objects from tagged images for retrieval and cross-modal search [J]. International Journal of Computer Vision, 2012, 100(2): 134-153. DOI:10.1007/s11263-011-0494-3.
[14] Su Y, Jurie F. Visual word disambiguation by semantic contexts[C]//IEEE International Conference on Computer Vision. Barcelona, Spain, 2011: 311-318.

Memo

Memo:: Biography: Zhang Hongbin(1979—), male, doctor, associate professor, zhanghongbin@whu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.61133012), the Humanity and Social Science Foundation of the Ministry of Education(No.12YJCZH274), the Humanity and Social Science Foundation of Jiangxi Province(No.XW1502, TQ1503), the Science and Technology Project of Jiangxi Science and Technology Department(No.20121BBG70050, 20142BBG70011).
Citation: Zhang Hongbin, Ji Donghong, Yin Lan, et al. Product image sentence annotation based on kernel descriptors and tag-rank[J].Journal of Southeast University(English Edition), 2016, 32(2):170-176.doi:10.3969/j.issn.1003-7985.2016.02.007.

Last Update: 2016-06-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics