Semantic-Conditional Network for Micro-Video Summarization

Xiaowei Gu

doi:10.52710/cfs.471

PDF

Published: Mar 13, 2025

DOI: https://doi.org/10.52710/cfs.471

Keywords:

User query; Semantic web; Video analysis; Micro-video; Conditional video summarization

Xiaowei Gu

Abstract

The goal of video summarization is to extract key information from a raw video so that long videos can be interpreted in a short time without losing much semantic information. Previous methods primarily consider the diversity and representation of the obtained summary without paying sufficient attention to the semantic information of the resulting frame set, especially when generating summaries motivated by user queries. In this paper, we break the conventions in conditional video summarization and propose a new model to accept user queries semantically, namely Semantic-Conditional Network (SC-Net). Technically, for each video, we first search the semantically relevant video frames via a cross-modal retrieval model to convey the comprehensive semantic information in the user query. The rich semantics are further regarded as semantic prior to trigger the optimization of the summarization network, which produces summaries in a diverse and representative way. Furthermore, a novel one-stage training strategy optimizes the time complexity from polynomial to linear. Extensive experiments on publicly available datasets demonstrate promising results compared with state-of-the-art methods.

Issue

Volume 2025, Issue 2

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details