MVVA-net: a video aesthetic quality assessment network with cognitive fusion of multi-type feature–based strong generalization.

Li, Min; Wang, Zheng; Ren, Jinchang; Sun, Meijun

doi:10.1007/s12559-021-09947-1

MVVA-net: a video aesthetic quality assessment network with cognitive fusion of multi-type feature–based strong generalization.

Li, Min; Wang, Zheng; Ren, Jinchang; Sun, Meijun

Authors

Min Li

Zheng Wang

Professor Jinchang Ren j.ren@rgu.ac.uk
Professor of Computing Science

Meijun Sun

Abstract

With the increasing popularity of short videos on various social media platforms, there is a great challenge for evaluating the aesthetic quality of these videos. In this paper, we first construct a large-scale and properly annotated short video aesthetics (SVA) dataset. We further propose a cognitive multi-type feature fusion network (MVVA-Net) for video aesthetic quality assessment. MVVA-Net consists of two branches: intra-frame aesthetics branch and inter-frame aesthetics branch. These two branches take different types of video frames as input. The inter-frame aesthetic branch extracts the inter-frame aesthetic features based on the sequential frames extracted at fixed intervals, and the intra-frame aesthetic branch extracts the intra-frame aesthetic features based on the key frames extracted by the inter-frame difference method. Through the adaptive fusion of inter-frame aesthetic features and intra-frame aesthetic features, the video aesthetic quality can be effectively evaluated. At the same time, MVVA-Net has no fixed number of input frames, which greatly enhances the generalization ability of the model. We performed quantitative comparison and ablation studies. The experimental results show that the two branches of MVVA-Net can effectively extract the intra-frame aesthetic features and inter-frame aesthetic features of different videos. Through the adaptive fusion of intra-frame aesthetic features and inter-frame aesthetic features for video aesthetic quality assessment, MVVA-Net achieves better classification performance and stronger generalization ability than other methods. In this paper, we construct a dataset of 6900 video shots and propose a video aesthetic quality assessment method based on non-fixed model input strategy and multi-type features. Experimental results show that the model has a strong generalization ability and achieved a good performance on different datasets.

Citation

LI, M., WANG, Z., REN, J. and SUN, M. 2022. MVVA-net: a video aesthetic quality assessment network with cognitive fusion of multi‑type feature–based strong generalization. Cognitive computation [online], 14(4), pages 1435-1445. Available from: https://doi.org/10.1007/s12559-021-09947-1

Journal Article Type	Article
Acceptance Date	Sep 29, 2021
Online Publication Date	Mar 12, 2022
Publication Date	Jul 31, 2022
Deposit Date	Jun 30, 2022
Publicly Available Date	Mar 13, 2023
Journal	Cognitive Computation
Print ISSN	1866-9956
Electronic ISSN	1866-9964
Publisher	Springer
Peer Reviewed	Peer Reviewed
Volume	14
Issue	4
Pages	1435-1445
DOI	https://doi.org/10.1007/s12559-021-09947-1
Keywords	Videos; Social media platforms; Aesthetic quality; Short video aesthetics (SVA); Multi-type feature fusion network (MVVA-Net)
Public URL	https://rgu-repository.worktribe.com/output/1628644
Related Public URLs	https://rgu-repository.worktribe.com/output/1628665

Files

LI 2022 MVVA-net (AAM) (859 Kb)
PDF

Copyright Statement
This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s12559-021-09947-1