Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Muhammad Zain Uddin, Lecturer, Computer Science-SMCS

Co-Advisor

Adil Saleem - PhD Scholar

Keywords

Urdu, Transcription, AI, Whisper, Flutter

Abstract

Conversational speech is what we call normal, everyday, spoken speech. To represent this audio data in a text form we require the methods of transcription and speaker diarization to navigate conversations and gain valuable insight from them. While products in English and other high-resource languages are abundant, Urdu users face a lack of integrated systems that perform these tasks. This gap limits accessibility and productivity for millions of speakers of the language, especially professionals, students, and the hearing-impaired. Using publicly available, fine-tuned models, we evaluate on public and locally sourced data sets to develop a complete product that is an accurate, intuitive and user-friendly package to Pakistan’s growing technology needs of Urdu transcription and speaker diarization.

Tools and Technologies Used

Python, FAST API, Whisper, Firebase, Flutter

Methodology

We use whisper for Transcription and PyAnnote for diarization. The audio is cleaned and then diarized, and then each segment is transcribed. The output is a JSON file which represent the speaker, and what they spoke. Firebase is used to maintain the database, and the flutter app accesses the model output via FASTAPI. The Model is running on collab, and is accessible via NGROK, the link for which is set up in firebase remote configuration and accessed by the flutter application.

Document Type

Restricted Access

Submission Type

BSCS Final Year Project

Share

COinS