Modeling POS Tagging for the Urdu Language
Faculty / School
Faculty of Computer Sciences (FCS)
Department
Department of Computer Science
Was this content written or created while at IBA?
Yes
Document Type
Conference Paper
Publication Date
3-1-2020
Conference Name
2020 International Conference on Emerging Trends in Smart Technologies, ICETST 2020
Conference Location
Karachi, Pakistan
Conference Dates
26-27 March 2020
ISBN/ISSN
85084959816 (Scopus)
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Abstract / Description
This paper presents a Parts-of-Speech (POS) tagger for a low resourced 'Urdu' language. POS tagging is a primary preprocessing step in many natural language processing tasks such as sentiment classification, syntactic parsing and named-entity recognition. The proposed taggers make use of the two state-of-the-art models widely used for sequential tagging: Conditional Random Field (CRF) and the Bidirectional long short-term memory CRF (BiLSTM CRF). This work is the first instance of applying BiLSTM CRF model for POS tagging in the Urdu language. Both models achieved the F1 score of 96% on the test data, thus outperforming existing Urdu POS tagger with a significant margin.
DOI
https://doi.org/10.1109/ICETST49965.2020.9080721
Recommended Citation
Nasim, Z., Abidi, S., & Haider, S. (2020). Modeling POS Tagging for the Urdu Language. https://doi.org/10.1109/ICETST49965.2020.9080721
COinS