Smart Tamil: A Dialect-Aware Small Language Model for Tamil NLP
Gokul, K and Kishore Kumar, R and Tholkappiyan, R and Padma, R. Smart Tamil: A Dialect-Aware Small Language Model for Tamil NLP. International Journal of Science, Strategic Management and Technology, 2.
Smart-Tamil-A-Dialect-Aware-Small-Language-Model-for-Tamil-NLP.pdf
Restricted to Registered users only until 5 May 2027.
Download (720kB)
Abstract
Smart Tamil, designed to be a dialect-sensitive language system, aims to embrace the diversity and richness of Tamil
as spoken in the dialects of Tamil Nadu. Most language systems do not take dialect differences explicitly during largescale deployments and the output language is grammatically correct but evocative of a non- idiomatic language. To
solve this, the Small Language Model (SLM) of Smart Tamil will be trained on small- corpus spoken data,
heterogeneous written data, and video data to capture the language and dialect variations and spoken styles of the five
major dialect zones of Tamil Nadu including Kongu Tamil (Coimbatore/Erode), Nellai Tamil
(Tirunelveli/Thoothukudi), Kanyakumari Tamil, the Central Trichy/Thanjavur, and Urban Tamil of Chennai. The
Smart Tamil System has been built as a full stack React + Flask application with the inbuilt ability for speech
synthesis, and speech recognition through the Web Speech API.
| Item Type: | Article |
|---|---|
| Subjects: | Computer Science Engineering > Natural Language Processing |
| Domains: | Computer Science |
| Depositing User: | Mr IR Admin |
| Date Deposited: | 15 May 2026 12:52 |
| Last Modified: | 15 May 2026 12:52 |
| URI: | https://ir.vistas.ac.in/id/eprint/19731 |
Dimensions
Dimensions