High-fidelity video frame interpolation through context-aware temporal aggregation and recurrent propagation

Mohana Priya, P. and Ulagapriya, K. High-fidelity video frame interpolation through context-aware temporal aggregation and recurrent propagation. Systems and Soft Computing, 8: 200428.

[thumbnail of 35.High-fidelity video frame interpolation through context-aware temporal.pdf] Text
35.High-fidelity video frame interpolation through context-aware temporal.pdf
Restricted to Registered users only until 10 February 2027.

Download (23MB) | Request a copy

Abstract

Accurate inpainting of missing middle frames in video sequences is vital for multiple applications like video
restoration, enhancement and compression. This study introduces a sophisticated deep learning-based frame-
work designed to address this challenge by utilizing adjoining sequences of preceding and following frames. Our
approach integrates temporal aggregation and recurrent propagation to effectively perform frame inpainting.
Temporal aggregation leverages visible content from adjacent frames to recreate missing frames, ensuring high
spatial fidelity and feature conservation. Optical flow estimation, utilizing methods such as Farneback Optical
Flow, estimates displacement between frames and provides motion vectors that guide the interpolation process,
enabling accurate alignment and blending of frames. Recurrent propagation is accomplished through Long Short-
Term Memory (LSTM) networks that maintains temporal coherence by embedding and propagating information
from preceding frames, thus ensuring smooth transitions and consistency across the video sequence. To further
enhance performance, our model includes a context-aware feature extraction mechanism that adapts to various
motion patterns and occlusions, optimizing the reconstruction quality. Framework has been evaluated on MSU
Video Frame Interpolation (VFI) Benchmark Dataset, which provides diverse and challenging scenarios for
interpolation, as well as the YouTube-8 M dataset, which contains a wide range of real-world video content. The
experimental results demonstrate the robustness of the proposed model: a PSNR of 32.00 and an SSIM score of
0.905 indicate its superior reconstruction quality and structural similarity compared to baseline models. These
results underscore the framework’s effectiveness in handling complex motion dynamics and occlusions, making
it well suited for advanced video restoration, enhancement and compression tasks.

Item Type: Article
Subjects: Computer Science Engineering > Deep Learning
Domains: Computer Science Engineering
Depositing User: User 8 8
Date Deposited: 13 Mar 2026 06:09
Last Modified: 13 Mar 2026 06:09
URI: https://ir.vistas.ac.in/id/eprint/13130

Actions (login required)

View Item
View Item