Multi-Modal Deepfake Detection Using Cross-Frequency Patterns

Muthuselvan, C and Yamini, B. (2025) Multi-Modal Deepfake Detection Using Cross-Frequency Patterns. INTERNATIONAL JOURNAL OF RECENT TRENDS IN TECHNOLOGY AND ENGINEERING (IJRTTE), 04. ISSN 2832-4277

Text
Multi_Modal_Deepfake_Detection_Using_Cross_Frequency_Patterns_Yamini B.pdf - Published Version
Download (435kB)

Official URL: https://ijrtte.com/v4-i4.html

Abstract

With the swift development of deep generative models, it is now possible to
produce extremely realistic synthetic audio-visual content often referred to as deepfakes that
is posing significant risks to digital trust, security and media authenticity. Despite the
reported significant progress in recent deepfake detection algorithms, most of the existing
systems are densely based on the spatial or temporal characteristics and cannot to generalize
against state-of-the-art generative models, particularly diffusion-based methods. In addition,
existing multimodal models do not address much about frequency-domain inconsistency and
inter-modal spectral associations that occur during media manipulation. In order to curb such
constraints, the present paper suggests a new cross-frequency multi-modal deepfake detector
that jointly trains based on audio and visual cues in the frequency domain. The suggested
approach breaks down both modalities into multi-band spectral feature and trains the cross
frequency associations between the respective audio and visual elements. The framework
manages to capture minute manipulation artifacts that are normally invisible on the space
wise domain by modelling inter-modal spectral alignment with the aid of a cross-frequency
correlation and attention mechanism. The results of extensive experiments developed on
several benchmark datasets (FF++, Celeb-DF, DFDC, FakeAVCeleb, and WaveFake) show
that the presented approach is better than the existing unimodal and multimodal ones in terms
of accuracy, robustness, and generalization. The findings confirm that cross frequency
reasoning offers a robust and resilient cue when next generation deep fake detection is needed
especially when compression, noise, and invisible manipulating are involved.

Item Type:	Article
Subjects:	Computer Science Engineering > Deep Learning
Domains:	Computer Science Engineering
Depositing User:	Mr IR Admin
Last Modified:	11 May 2026 06:08
URI:	https://ir.vistas.ac.in/id/eprint/15974

Actions (login required)

: View Item

Altmetric

Citation