Min Li
Data Collector
MVVA-net: a video aesthetic quality assessment network with cognitive fusion of multi-type feature–based strong generalization. [Dataset]
Contributors
Zheng Wang
Data Collector
Professor Jinchang Ren j.ren@rgu.ac.uk
Data Collector
Meijun Sun
Data Collector
Abstract
Most of the existing video aesthetic quality assessment datasets (as seen in Table 1) are not public, some are not large enough, which makes the trained depth model perform poorly and some are based on the professionalism of video shooting or the ratings of video website users as the evaluation criteria for the evaluation of video aesthetic quality. To solve these problems, the authors built a large-scale short video aesthetics (SVA) dataset with scientific annotation methods. SVA includes 6900 edited videos from YouTube and AVAQ6000, each lasting 10 to 30 s. The labeling process involves 15 viewers of different genders and different ages. Before labeling, each viewer will watch some indicative videos with high and low aesthetic quality in advance. When labeling, the viewer watches the video and assigns an aesthetic quality score of 1 to 10 points to the watched video, of which 1 to 5 points are assigned to videos with low aesthetic quality, and 6 to 10 points are assigned to high aesthetics quality. After labeling, the final decimal aesthetic score of each video is the average score after removing the highest and lowest scores. If the decimal aesthetic score of a video is greater than σ, the video is considered to be of high aesthetic quality; otherwise, the video is considered to be of low aesthetic quality. In this paper, we set σ to 5. In SVA, 3735 videos are labeled as high aesthetic quality and 3165 videos are labeled as low aesthetic quality.
Citation
LI, M., WANG, Z., REN, J. and SUN, M. 2022. MVVA-net: a video aesthetic quality assessment network with cognitive fusion of multi-type feature–based strong generalization. [Dataset]. Hosted on GitHub [online]. Available from: https://github.com/Lm0324/MVVA-Net
Acceptance Date | Sep 29, 2021 |
---|---|
Online Publication Date | Mar 12, 2022 |
Publication Date | Jul 31, 2022 |
Deposit Date | Jun 16, 2022 |
Publicly Available Date | Jul 1, 2022 |
Keywords | Videos; Social media platforms; Aesthetic quality; Short video aesthetics (SVA); Multi-type feature fusion network (MVVA-Net) |
Public URL | https://rgu-repository.worktribe.com/output/1628665 |
Publisher URL | https://github.com/Lm0324/MVVA-Net |
Related Public URLs | https://rgu-repository.worktribe.com/output/1628644 |
Type of Data | SVA files |
Collection Date | Mar 12, 2022 |
Collection Method | The intra-frame aesthetic branch takes key frames as input to extract intra-frame aesthetic features; the inter-frame aesthetic branch takes sequential frames as input to extract inter-frame aesthetic features. The authors adaptively fuse the multi-type features extracted from the two branches to evaluate the aesthetic quality of the video. At the same time, both the intra-frame aesthetic branch and the inter-frame aesthetic branch support videos of different durations with different frame numbers as input. The method designs two branches to extract intra-frame aesthetic features and inter-frame aesthetic features of the video, respectively. These two branches take different types of video frames, namely key frames and sequence frames, as input. The sequential frame is a frame extracted from video based on a fixed interval, which contains the changing relationship between frames in a video, so it is used as the input of inter-frame aesthetic branch to extract inter-frame aesthetic features; the key frame is obtained by frame difference method, which can represent different pictures in the video, so it is used as the input of intra-frame aesthetic branch to extract intra-frame aesthetic features. In this study a dataset of 6900 video shorts was constructed. In order to comprehensively consider intra-frame aesthetics and inter-frame aesthetics, and improve the generalization ability of the model, the authors propose a method of fusing multi-type features for video aesthetic quality assessment based on the strategy of not fixed model input. The experimental results show that our model has shown good performance on different datasets and demonstrated strong generalization ability. |
Files
LIM 2022 MVVA-Net (LINK ONLY)
(2 Kb)
Other
You might also like
Two-click based fast small object annotation in remote sensing images.
(2024)
Journal Article
Prompting-to-distill semantic knowledge for few-shot learning.
(2024)
Journal Article
Detection-driven exposure-correction network for nighttime drone-view object detection.
(2024)
Journal Article
Feature aggregation and region-aware learning for detection of splicing forgery.
(2024)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search