SIGN2SPEAK: ASL TO VIDEO AND SPEECH GENERATOR USING DEEP LEARNING

Snow Varshini, V and Swetha, M and Benasir Begam, F (2026) SIGN2SPEAK: ASL TO VIDEO AND SPEECH GENERATOR USING DEEP LEARNING. In: 8ᵗʰ INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SCIENCE, ENGINEERING AND MANAGEMENT, TAGORE ENGINEERING COLLEGE.

Text
Icrasem 18-4-26 fn.pdf - Published Version
Restricted to Registered users only until 19 December 2027.
Download (4MB)

Abstract

Abstract Hearing impaired people find it difficult to communicate with the rest of the population because of less knowledge about sign language. It is this communication gap which can cause social isolation and impairment of access to important services. The proposed project is a vision based system that converts sign language gestures into text, speech and an animated avatar output in real time in order to provide a more natural and effective interaction. The system leverages a webcam to provide live video input and therefore, the solution does not demand any special hardware and thus, economical and easily implemented. MediaPipe is used in efficient hand detection and proper extraction of hand landmarks to give a structured representation of the hand gestures. The resultant features are then inputted into a Convolutional Neural Network (CNN) which is trained to classify and identify sign language gestures with a high level of accuracy. The use of deep learning techniques enhances the system‘s ability to handle variations in hand shapes, orientations, and environmental conditions. After recognizing the gestures, they are translated into textual results that are then converted to speech with the help of Text to Speech (TTS) technology. Besides the text and audio output, the system also produces a video output as an animated avatar which visually depicts the speech that has been recognized and makes the communication process more interactive and human like. The suggested system is real time and allows immediate feedback and facilitates the seamless communication process. It is a combination of ideas in computer vision, deep learning, and human computer interaction to provide an efficient and user friendly solution. As a whole, the given project will contribute to the creation of assistive technologies that facilitate inclusive communication and assist in filling the gap between the hearing impaired population and society.

Item Type:	Conference or Workshop Item (Paper)
Subjects:	Computer Science Engineering > Artificial Intelligence
Domains:	Computer Science Engineering Computer Science
Depositing User:	Mr IR Admin
Date Deposited:	19 May 2026 07:51
Last Modified:	19 May 2026 09:33
URI:	https://ir.vistas.ac.in/id/eprint/20262

Actions (login required)

: View Item