High-fidelity video frame interpolation through context-aware temporal aggregation and recurrent propagation

Dimensions

P., Mohana Priya and K, Ulagapriya (2026) High-fidelity video frame interpolation through context-aware temporal aggregation and recurrent propagation. Systems and Soft Computing, 8. p. 200428. ISSN 27729419

Text
systems and soft.pdf - Published Version
Download (23MB)

Official URL: https://doi.org/10.1016/j.sasc.2025.200428

Abstract

Accurate inpainting of missing middle frames in video sequences is vital for multiple applications like video
restoration, enhancement and compression. This study introduces a sophisticated deep learning-based framework
designed to address this challenge by utilizing adjoining sequences of preceding and following frames. Our
approach integrates temporal aggregation and recurrent propagation to effectively perform frame inpainting.
Temporal aggregation leverages visible content from adjacent frames to recreate missing frames, ensuring high
spatial fidelity and feature conservation. Optical flow estimation, utilizing methods such as Farneback Optical
Flow, estimates displacement between frames and provides motion vectors that guide the interpolation process,
enabling accurate alignment and blending of frames. Recurrent propagation is accomplished through Long Short-
Term Memory (LSTM) networks that maintains temporal coherence by embedding and propagating information
from preceding frames, thus ensuring smooth transitions and consistency across the video sequence. To further
enhance performance, our model includes a context-aware feature extraction mechanism that adapts to various
motion patterns and occlusions, optimizing the reconstruction quality. Framework has been evaluated on MSU
Video Frame Interpolation (VFI) Benchmark Dataset, which provides diverse and challenging scenarios for
interpolation, as well as the YouTube-8 M dataset, which contains a wide range of real-world video content. The
experimental results demonstrate the robustness of the proposed model: a PSNR of 32.00 and an SSIM score of
0.905 indicate its superior reconstruction quality and structural similarity compared to baseline models. These
results underscore the framework’s effectiveness in handling complex motion dynamics and occlusions, making
it well suited for advanced video restoration, enhancement and compression tasks

Item Type:	Article
Subjects:	Computer Science Engineering > Computer Vision
Domains:	Computer Science Engineering
Depositing User:	user 12 12
Date Deposited:	09 Mar 2026 10:01
Last Modified:	09 Mar 2026 10:01
URI:	https://ir.vistas.ac.in/id/eprint/13097

Actions (login required)

: View Item