Smart Tamil: A Dialect-Aware Small Language Model for Tamil NLP

Gokul, K and Kishore Kumar, R and Tholkappiyan, R and Padma, R. Smart Tamil: A Dialect-Aware Small Language Model for Tamil NLP. International Journal of Science, Strategic Management and Technology, 2.

[thumbnail of Smart-Tamil-A-Dialect-Aware-Small-Language-Model-for-Tamil-NLP.pdf] Text
Smart-Tamil-A-Dialect-Aware-Small-Language-Model-for-Tamil-NLP.pdf
Restricted to Registered users only until 5 May 2027.

Download (720kB)

Abstract

Smart Tamil, designed to be a dialect-sensitive language system, aims to embrace the diversity and richness of Tamil
as spoken in the dialects of Tamil Nadu. Most language systems do not take dialect differences explicitly during largescale deployments and the output language is grammatically correct but evocative of a non- idiomatic language. To
solve this, the Small Language Model (SLM) of Smart Tamil will be trained on small- corpus spoken data,
heterogeneous written data, and video data to capture the language and dialect variations and spoken styles of the five
major dialect zones of Tamil Nadu including Kongu Tamil (Coimbatore/Erode), Nellai Tamil
(Tirunelveli/Thoothukudi), Kanyakumari Tamil, the Central Trichy/Thanjavur, and Urban Tamil of Chennai. The
Smart Tamil System has been built as a full stack React + Flask application with the inbuilt ability for speech
synthesis, and speech recognition through the Web Speech API.

Item Type: Article
Subjects: Computer Science Engineering > Natural Language Processing
Domains: Computer Science
Depositing User: Mr IR Admin
Date Deposited: 15 May 2026 12:52
Last Modified: 15 May 2026 12:52
URI: https://ir.vistas.ac.in/id/eprint/19731

Actions (login required)

View Item
View Item