A Hybrid Deep Learning Framework Combining Swin Transformer and Mask R-CNN for Dismantled Weapon Part Detection

Shanthi, P and Yamini, B. and Vishnupriya, A and Hariprabha, S and Vidhya, K and Panneer Selvi, R (2026) A Hybrid Deep Learning Framework Combining Swin Transformer and Mask R-CNN for Dismantled Weapon Part Detection. In: International Conference on Data Science, Agents & Artificial Intelligence 2026, 26th march to 28th march 2026, Chennai Institute of Technology, Chennai.

[thumbnail of CIT CONFERENCE PROCEEDING (2).pdf] Text
CIT CONFERENCE PROCEEDING (2).pdf - Published Version

Download (1MB)

Abstract

The growing prevalence of dangerous weapons
being misused in public spaces presents a major danger to the
public. As a result, society needs intelligent surveillance
systems so that early identification of threats can occur. The
challenge with identifying dismantled weapon parts is based
upon their minimal size, ability to be hidden, and their likeness to
other non- threatening objects. The current methods of detecting
weapons are primarily based on only identifying assembled
weapons and are restricted to the limits of conventional CNN-
based architectures. These limitations of current solutions result
in no contextual understanding of the surrounding environment
and minimal effectiveness in identifying threats that are occluded
or surrounded by cluttered areas. This paper proposes a
framework for the identification of dismantled weapon parts
using a Mask R-CNN architecture, combined with a Swin
Transformer, as the backbone of the framework. The Swin
Transformer allows for the extraction of hierarchical multi-scale
features using shifted window self-attention. Through the Mask
R-CNN architecture, instance-level classification, bounding box
regression, and accurate segmentation of each weapon
component are possible. The results of the experiments
demonstrate that this system performs exceptionally well: it has
demonstrated a 96.8% accuracy, 95.9% precision, and 96.3%
recall. Moreover, this system supports proactive identification of
threats, allowing police to take more effective steps toward
keeping the public safe.

Item Type: Conference or Workshop Item (Paper)
Subjects: Computer Science Engineering > Deep Learning
Domains: Computer Science Engineering
Depositing User: Mr IR Admin
Last Modified: 11 May 2026 08:35
URI: https://ir.vistas.ac.in/id/eprint/16586

Actions (login required)

View Item
View Item