Md Tanjil Islam Aronno
Military Institute of Science and Technology
I am a Lecturer in Electrical, Electronic, and Communication Engineering at the Military Institute of Science and Technology (MIST), with research interests centered on machine learning (ML) and deep learning (DL) applications in the field of visual and multimodal speech recognition, clinical speech anomaly detection and smart grid physical domain attack identification.
My primary research contribution is the development of LipBengal, the first Bengali visual speech recognition (VSR) dataset designed to address the lack of resources for low-resource languages. This work involved large-scale data collection, manual annotation, and structured dataset design using Python, OpenCV, and MediaPipe. Building on this dataset, I developed a CNN–BiLSTM model with CTC loss for visual-only speech recognition, achieving an accuracy of 84.6%. This work has been published in a Q2 journal (Elsevier Data in Brief).
Beyond speech research, I have co-authored IEEE conference papers in smart grid electricity theft detection and grid stability analysis, where I applied RNN-based feature extraction and optimized ensemble machine learning models. These experiences reflect my broader interest in data-driven modeling of complex cyber-physical systems.
I am currently pursuing an M.Sc. in Electrical Engineering, where I have focused on digital speech processing, clinical speech anomalies, and multimodal learning. My recent work includes fine-tuning Whisper-based models for Bengali and extending LipBengal toward audio-visual speech recognition (AVSR).
My long-term goal is to pursue doctoral research and an academic career, contributing to the development of ML/DL frameworks to be applied in the domains of VSR, AVSR or smart grid theft and attack identification which may lead to addressing problems which have national and social impacts.