|Table of Contents|

[1] Zhang Hongbin, Ji Donghong, Yin Lan, et al. Product image sentence annotationbased on kernel descriptors and tag-rank [J]. Journal of Southeast University (English Edition), 2016, 32 (2): 170-176. [doi:10.3969/j.issn.1003-7985.2016.02.007]

Product image sentence annotationbased on kernel descriptors and tag-rank()

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

2016 2
Research Field:
Computer Science and Engineering
Publishing date:


Product image sentence annotationbased on kernel descriptors and tag-rank
Zhang Hongbin1 2 Ji Donghong1 Yin Lan1 Ren Yafeng1 Yin Yi2
1Computer School, Wuhan University, Wuhan 430072, China
2School of Software, East China Jiaotong University, Nanchang 330013, China
product image sentence annotation kernel descriptors tag-rank word sequence blocks building(WSBB) N-gram word sequences
Dealing with issues such as too simple image features and word noise inference in product image sentence amnotation, a product image sentence annotation model focusing on image feature learning and key words summarization is described. Three kernel descriptors such as gradient, shape, and color are extracted, respectively. Feature late-fusion is executed in turn by the multiple kernel learning model to obtain more discriminant image features. Absolute rank and relative rank of the tag-rank model are used to boost the key words’ weights. A new word integration algorithm named word sequence blocks building(WSBB)is designed to create N-gram word sequences. Sentences are generated according to the N-gram word sequences and predefined templates. Experimental results show that both the BLEU-1 scores and BLEU-2 scores of the sentences are superior to those of the state-of-art baselines.


[1] Farhadi A, Hejrati M, Sadeghi M A, et al. Every picture tells a story: Generating sentences from images[C]//European Conference on Computer Vision. Berlin: Springer-Verlag, 2010: 15-29.
[2] Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task: Data, models and evaluation metrics [J]. Journal of Artificial Intelligence Resource, 2013, 47(1): 853-899.
[3] Yang Y, Teo C L, Daume H, et al. Corpus-guided sentence generation of natural images[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, Scotland, UK, 2011:444-454.
[4] Kulkarni G, Premraj V, Dhar S, et al. Baby talk: Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2891-2903. DOI:10.1109/TPAMI.2012.162.
[5] Ushiku Y, Harada T, Kuniyoshi Y. Automatic sentence generation from images[C]//Proceedings of the 19th ACM International Conference on Multimedia. New York: ACM, 2011:1533-1536.
[6] Feng F, Lapata M. Automatic caption generation for news images [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4):797-812. DOI:10.1109/TPAMI.2012.118.
[7] Gupta A, Verma Y, Jawahar C V, et al. Choosing linguistics over vision to describe images[C]//American Association for Artificial Intelligence. Palo Alto, CA, USA: Association for the Advancement of Artificial Intelligence, 2012:606-611.
[8] Berg T L, Berg A C, Shih J. Automatic attribute discovery and characterization from noisy web data[C]//European Conference on Computer Vision. Berlin: Springer, 2010: 663-676.
[9] Kiapour H, Yamaguchi K, Berg A C, et al. Hipster Wars: Discovering elements of fashion styles[C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 472-488.
[10] Mason R. Domain-independent captioning of domain-specific images[C]//North American Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics Publication, 2013:69-76.
[11] Kiros R, Salakhutdinov R, Zemel R. Multimodal neural language models[C]//International Conference on Machine Learning. Beijing, China, 2014: 595-603.
[12] Bo L, Ren X, Fox D. Kernel descriptors for visual recognition[C]//Advances in Neural Information Processing Systems. Vancouver, Canada, 2010:1734-1742.
[13] Hwang S, Grauman K. Learning the relative importance of objects from tagged images for retrieval and cross-modal search [J]. International Journal of Computer Vision, 2012, 100(2): 134-153. DOI:10.1007/s11263-011-0494-3.
[14] Su Y, Jurie F. Visual word disambiguation by semantic contexts[C]//IEEE International Conference on Computer Vision. Barcelona, Spain, 2011: 311-318.


Biography: Zhang Hongbin(1979—), male, doctor, associate professor, zhanghongbin@whu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.61133012), the Humanity and Social Science Foundation of the Ministry of Education(No.12YJCZH274), the Humanity and Social Science Foundation of Jiangxi Province(No.XW1502, TQ1503), the Science and Technology Project of Jiangxi Science and Technology Department(No.20121BBG70050, 20142BBG70011).
Citation: Zhang Hongbin, Ji Donghong, Yin Lan, et al. Product image sentence annotation based on kernel descriptors and tag-rank[J].Journal of Southeast University(English Edition), 2016, 32(2):170-176.doi:10.3969/j.issn.1003-7985.2016.02.007.
Last Update: 2016-06-20