A REAL-TIME MULTIMODAL AI ASSISTANT USING HAND GESTURES AND SPEECH TOGETHER
Mohana Priya, P. and Meyyarivu, J and Pradeesh, S (2026) A REAL-TIME MULTIMODAL AI ASSISTANT USING HAND GESTURES AND SPEECH TOGETHER. INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SCIENCE, ENGINEERING AND MANAGEMENT,TAGORE ENGINEERING COLLEGE, CHENNAI. ISBN 978-81-69050-45-6
Icrasem 18-4-26 fn.pdf - Published Version
Download (4MB)
Abstract
This project develops a high-performance multimodal Human-Computer Interaction (HCI) framework for low-latency, system-wide desktop control using Python, JavaScript/TypeScript, and C#. The system uses MediaPipe and OpenCV to detect 21-point hand landmarks in real time (60 FPS) and applies a weighted smoothing algorithm for stable cursor movement. Voice commands are processed using an offline Vosk-based natural language understanding engine that converts speech into system actions. A Node.js System Bridge enables fast bidirectional communication through WebSockets between the detection system and the operating system. To bypass browser security restrictions and allow global system control, a C# Interop and PowerShell backend interacts with user32.dll for native mouse and keyboard simulation. The interface is built with a Next.js/React Heads-Up Display (HUD) that provides real-time system feedback. By integrating these technologies, the framework enables responsive, touchless, and hardware-independent desktop interaction across applications.
| Item Type: | Book |
|---|---|
| Additional Information: | INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SCIENCE, ENGINEERING AND MANAGEMENT |
| Subjects: | Computer Science Engineering > Computer Vision |
| Domains: | Computer Science Engineering |
| Depositing User: | Mr IR Admin |
| Date Deposited: | 12 May 2026 04:55 |
| Last Modified: | 12 May 2026 04:55 |
| URI: | https://ir.vistas.ac.in/id/eprint/18489 |

Citation
Citation