Degree
Bachelor of Science (Computer Science)
Department
Department of Computer Science
School
School of Mathematics and Computer Science (SMCS)
Advisor
Muhammad Zain Uddin, Lecturer, Computer Science-SMCS
Co-Advisor
Adil Saleem - PhD Scholar
Keywords
Urdu, Transcription, AI, Whisper, Flutter
Abstract
Conversational speech is what we call normal, everyday, spoken speech. To represent this audio data in a text form we require the methods of transcription and speaker diarization to navigate conversations and gain valuable insight from them. While products in English and other high-resource languages are abundant, Urdu users face a lack of integrated systems that perform these tasks. This gap limits accessibility and productivity for millions of speakers of the language, especially professionals, students, and the hearing-impaired. Using publicly available, fine-tuned models, we evaluate on public and locally sourced data sets to develop a complete product that is an accurate, intuitive and user-friendly package to Pakistan’s growing technology needs of Urdu transcription and speaker diarization.
Tools and Technologies Used
Python, FAST API, Whisper, Firebase, Flutter
Methodology
We use whisper for Transcription and PyAnnote for diarization. The audio is cleaned and then diarized, and then each segment is transcribed. The output is a JSON file which represent the speaker, and what they spoke. Firebase is used to maintain the database, and the flutter app accesses the model output via FASTAPI. The Model is running on collab, and is accessible via NGROK, the link for which is set up in firebase remote configuration and accessed by the flutter application.
Document Type
Restricted Access
Submission Type
BSCS Final Year Project
Recommended Citation
Ahmed, M., Lakhani, M., Khan, T., & Humayun, N. (2025). Harf Ba Harf - Urdu Transcription and Diarization. Retrieved from https://ir.iba.edu.pk/fyp-bscs/16
COinS