Lei Wang
Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval.
Wang, Lei; Song, Dawei; Elyan, Eyad
Abstract
Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.
Citation
WANG, L., SONG, D. and ELYAN, E. 2012. Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval. In Proceedings of the 21st Association for Computing Machinery (ACM) international conference on information and knowledge management (CIKM'12), 29 October - 2 November 2012, Maui, USA. New York: ACM [online], pages 1303-1312. Available from: https://doi.org/10.1145/2396761.2398433
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 21st Association for Computing Machinery (ACM) international conference on information and knowledge management (CIKM'12) |
Start Date | Oct 29, 2012 |
End Date | Nov 2, 2012 |
Acceptance Date | Oct 31, 2012 |
Online Publication Date | Oct 31, 2012 |
Publication Date | Dec 31, 2012 |
Deposit Date | Jan 21, 2015 |
Publicly Available Date | Jan 21, 2015 |
Publisher | Association for Computing Machinery (ACM) |
Peer Reviewed | Peer Reviewed |
Pages | 1303-1312 |
DOI | https://doi.org/10.1145/2396761.2398433 |
Keywords | Spatial; Temporal; Correlation; Discriminative visual word; Content based ; Video ; Retrieval; Query by Example ; Bag of visual; Word |
Public URL | http://hdl.handle.net/10059/1129 |
Contract Date | Jan 21, 2015 |
Files
WANG 2012 Improving bag-of-visual-words
(2.1 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
A multimodel-based screening framework for C-19 using deep learning-inspired data fusion.
(2024)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search