Skip to main content

Research Repository

Advanced Search

MVVA-net: a video aesthetic quality assessment network with cognitive fusion of multi-type feature–based strong generalization.

Li, Min; Wang, Zheng; Ren, Jinchang; Sun, Meijun

Authors

Min Li

Zheng Wang

Jinchang Ren

Meijun Sun



Abstract

With the increasing popularity of short videos on various social media platforms, there is a great challenge for evaluating the aesthetic quality of these videos. In this paper, we first construct a large-scale and properly annotated short video aesthetics (SVA) dataset. We further propose a cognitive multi-type feature fusion network (MVVA-Net) for video aesthetic quality assessment. MVVA-Net consists of two branches: intra-frame aesthetics branch and inter-frame aesthetics branch. These two branches take different types of video frames as input. The inter-frame aesthetic branch extracts the inter-frame aesthetic features based on the sequential frames extracted at fixed intervals, and the intra-frame aesthetic branch extracts the intra-frame aesthetic features based on the key frames extracted by the inter-frame difference method. Through the adaptive fusion of inter-frame aesthetic features and intra-frame aesthetic features, the video aesthetic quality can be effectively evaluated. At the same time, MVVA-Net has no fixed number of input frames, which greatly enhances the generalization ability of the model. We performed quantitative comparison and ablation studies. The experimental results show that the two branches of MVVA-Net can effectively extract the intra-frame aesthetic features and inter-frame aesthetic features of different videos. Through the adaptive fusion of intra-frame aesthetic features and inter-frame aesthetic features for video aesthetic quality assessment, MVVA-Net achieves better classification performance and stronger generalization ability than other methods. In this paper, we construct a dataset of 6900 video shots and propose a video aesthetic quality assessment method based on non-fixed model input strategy and multi-type features. Experimental results show that the model has a strong generalization ability and achieved a good performance on different datasets.

Citation

LI, M., WANG, Z., REN, J. and SUN, M. 2022. MVVA-net: a video aesthetic quality assessment network with cognitive fusion of multi‑type feature–based strong generalization. Cognitive computation [online], 14(4), pages 1435-1445. Available from: https://doi.org/10.1007/s12559-021-09947-1

Journal Article Type Article
Acceptance Date Sep 29, 2021
Online Publication Date Mar 12, 2022
Publication Date Jul 31, 2022
Deposit Date Jun 30, 2022
Publicly Available Date Mar 13, 2023
Journal Cognitive Computation
Print ISSN 1866-9956
Electronic ISSN 1866-9964
Publisher Springer
Peer Reviewed Peer Reviewed
Volume 14
Issue 4
Pages 1435-1445
DOI https://doi.org/10.1007/s12559-021-09947-1
Keywords Videos; Social media platforms; Aesthetic quality; Short video aesthetics (SVA); Multi-type feature fusion network (MVVA-Net)
Public URL https://rgu-repository.worktribe.com/output/1628644
Related Public URLs https://rgu-repository.worktribe.com/output/1628665