A REAL-TIME MULTIMODAL AI ASSISTANT USING HAND GESTURES AND SPEECH TOGETHER

Mohana Priya, P. and Meyyarivu, J and Pradeesh, S (2026) A REAL-TIME MULTIMODAL AI ASSISTANT USING HAND GESTURES AND SPEECH TOGETHER. INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SCIENCE, ENGINEERING AND MANAGEMENT,TAGORE ENGINEERING COLLEGE, CHENNAI. ISBN 978-81-69050-45-6

Text
Icrasem 18-4-26 fn.pdf - Published Version
Download (4MB)

Abstract

This project develops a high-performance multimodal Human-Computer Interaction (HCI) framework for low-latency, system-wide desktop control using Python, JavaScript/TypeScript, and C#. The system uses MediaPipe and OpenCV to detect 21-point hand landmarks in real time (60 FPS) and applies a weighted smoothing algorithm for stable cursor movement. Voice commands are processed using an offline Vosk-based natural language understanding engine that converts speech into system actions. A Node.js System Bridge enables fast bidirectional communication through WebSockets between the detection system and the operating system. To bypass browser security restrictions and allow global system control, a C# Interop and PowerShell backend interacts with user32.dll for native mouse and keyboard simulation. The interface is built with a Next.js/React Heads-Up Display (HUD) that provides real-time system feedback. By integrating these technologies, the framework enables responsive, touchless, and hardware-independent desktop interaction across applications.

Item Type:	Book
Additional Information:	INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SCIENCE, ENGINEERING AND MANAGEMENT
Subjects:	Computer Science Engineering > Computer Vision
Domains:	Computer Science Engineering
Depositing User:	Mr IR Admin
Date Deposited:	12 May 2026 04:55
Last Modified:	12 May 2026 04:55
URI:	https://ir.vistas.ac.in/id/eprint/18489

Actions (login required)

: View Item

Altmetric

Citation