MULTIMODAL BIOMETRIC AUTHENTICATION USING FACE AND VOICE WITH DEEP TRANSFORMERS

Shanthi, P and Yamini, B. and Vidhya, K and Varadharajan, S and Soundarya Rajan, D S and UNSPECIFIED1 (2026) MULTIMODAL BIOMETRIC AUTHENTICATION USING FACE AND VOICE WITH DEEP TRANSFORMERS. In: INTERNATIONAL CONFERENCE ON CONTEMPORARY ENGINEERING AND TECHNOLOGY, 22nd- 23rd March 2026, PRINCE SHRI VENKATESHWARA PADMAVATHY ENGINEERING COLLEGE, Chennai.

[thumbnail of 1 TO 499 (1) (1).pdf] Text
1 TO 499 (1) (1).pdf - Published Version

Download (3MB)

Abstract

Unimodal biometric systems that only use face or voice recognition to establish identity have suffered in the last few years due to issues with spoofing, illumination changes, and background noise. To address these concerns, this paper introduces a multimodal biometric authentication framework that systematically utilizes facial and voice features using deep transformer technologies. In this study, face embedding extraction is performed using Vision Transformer (ViT), and voice feature representation is derived using Speech Transformer (AST—Audio Spectrogram Transformer). The feature fusion of both modalities with the cross-attention-based fusion network ultimately provides enhanced capabilities that promote a stronger and more discriminative biometric authentication process. Results evaluated using publicly available multimodal datasets demonstrated improved accuracy, robustness, and resistance to spoofing attacks compared to unimodal systems. This significantly advanced multimodal model reached 98.2% in recognition accuracy and 1.6% in Equal Error Rate (EER) compared to state-of-the-art CNN and LSTM-based methods.

Item Type: Conference or Workshop Item (Paper)
Subjects: Computer Science Engineering > Deep Learning
Domains: Computer Science Engineering
Depositing User: Mr IR Admin
Date Deposited: 19 May 2026 09:59
Last Modified: 19 May 2026 10:02
URI: https://ir.vistas.ac.in/id/eprint/16799

Actions (login required)

View Item
View Item