A REAL-TIME MULTIMODAL AI ASSISTANT USING HAND GESTURES AND SPEECH TOGETHER

Mohana Priya, P. and Meyyarivu, J and Pradeesh, S (2026) A REAL-TIME MULTIMODAL AI ASSISTANT USING HAND GESTURES AND SPEECH TOGETHER. INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SCIENCE, ENGINEERING AND MANAGEMENT,TAGORE ENGINEERING COLLEGE, CHENNAI. ISBN 978-81-69050-45-6

[thumbnail of Icrasem 18-4-26 fn.pdf] Text
Icrasem 18-4-26 fn.pdf - Published Version

Download (4MB)

Abstract

This project develops a high-performance multimodal Human-Computer Interaction (HCI) framework for low-latency, system-wide desktop control using Python, JavaScript/TypeScript, and C#. The system uses MediaPipe and OpenCV to detect 21-point hand landmarks in real time (60 FPS) and applies a weighted smoothing algorithm for stable cursor movement. Voice commands are processed using an offline Vosk-based natural language understanding engine that converts speech into system actions. A Node.js System Bridge enables fast bidirectional communication through WebSockets between the detection system and the operating system. To bypass browser security restrictions and allow global system control, a C# Interop and PowerShell backend interacts with user32.dll for native mouse and keyboard simulation. The interface is built with a Next.js/React Heads-Up Display (HUD) that provides real-time system feedback. By integrating these technologies, the framework enables responsive, touchless, and hardware-independent desktop interaction across applications.

Item Type: Book
Additional Information: INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SCIENCE, ENGINEERING AND MANAGEMENT
Subjects: Computer Science Engineering > Computer Vision
Domains: Computer Science Engineering
Depositing User: Mr IR Admin
Date Deposited: 12 May 2026 04:55
Last Modified: 12 May 2026 04:55
URI: https://ir.vistas.ac.in/id/eprint/18489

Actions (login required)

View Item
View Item