Leszek Kaliciak
Hybrid models for combination of visual and textual features in context-based image retrieval.
Kaliciak, Leszek
Authors
Contributors
Dawei Song
Supervisor
Professor Nirmalie Wiratunga n.wiratunga@rgu.ac.uk
Supervisor
Jeff Pan
Supervisor
Abstract
Visual Information Retrieval poses a challenge to intelligent information search systems. This is due to the semantic gap, the difference between human perception (information needs) and the machine representation of multimedia objects. Most existing image retrieval systems are monomodal, as they utilize only visual or only textual information about images. The semantic gap can be reduced by improving existing visual representations, making them suitable for a large-scale generic image retrieval. The best up-to-date candidates for a large-scale Content-based Image Retrieval are models based on the Bag of Visual Words framework. Existing approaches, however, produce high dimensional and thus expensive representations for data storage and computation. Because the standard Bag of Visual Words framework disregards the relationships between the histogram bins, the model can be further enhanced by exploiting the correlations between the visual words. Even the improved visual features will find it hard to capture an abstract semantic meaning of some queries, e.g. straight road in the USA. Textual features, on the other hand, would struggle with such queries as church with more than two towers as in many cases the information about the number of towers would be missing. Thus, both visual and textual features represent complementary yet correlated aspects of the same information object, an image. Existing hybrid approaches for the combination of visual and textual features do not take these inherent relationships into account and thus the combinations performance improvement is limited. Visual and textual features can be also combined in the context of relevance feedback. The relevance feedback can help us narrow down and correct the search. The feedback mechanism would produce subsets of visual query and feedback representations as well as subsets of textual query and textual feedback representations. A meaningful feature combination in the context of relevance feedback should take the inherent inter (visual-textual) and intra (visual-visual, textualtextual) relationships into account. In this work, we propose a principled framework for the semantic gap reduction in large scale generic image retrieval. The proposed framework comprises development and enhancement of novel visual features, a hybrid model for the visual and textual features combination, and a hybrid model for the combination of features in the context of relevance feedback, with both fixed and adaptive weighting schemes (importance of a query and its context). Apart from the experimental evaluation of our models, theoretical validations of some interesting discoveries on feature fusion strategies were also performed. The proposed models were incorporated into our prototype system with an interactive user interface.
Citation
KALICIAK, L. 2013. Hybrid models for combination of visual and textual features in context-based image retrieval. Robert Gordon University, PhD thesis.
Thesis Type | Thesis |
---|---|
Deposit Date | Jan 16, 2014 |
Publicly Available Date | Jan 16, 2014 |
Public URL | http://hdl.handle.net/10059/924 |
Contract Date | Jan 16, 2014 |
Award Date | Jul 31, 2013 |
Files
KALICIAK 2013 Hybrid models for combination
(2.3 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/
Copyright Statement
© The Author.
You might also like
FedSim: similarity guided model aggregation for federated learning.
(2021)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search