Modeling POS Tagging for the Urdu Language

Faculty / School

Faculty of Computer Sciences (FCS)

Department

Department of Computer Science

Was this content written or created while at IBA?

Yes

Document Type

Conference Paper

Publication Date

3-1-2020

Conference Name

2020 International Conference on Emerging Trends in Smart Technologies, ICETST 2020

Conference Location

Karachi, Pakistan

Conference Dates

26-27 March 2020

ISBN/ISSN

85084959816 (Scopus)

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Abstract / Description

This paper presents a Parts-of-Speech (POS) tagger for a low resourced 'Urdu' language. POS tagging is a primary preprocessing step in many natural language processing tasks such as sentiment classification, syntactic parsing and named-entity recognition. The proposed taggers make use of the two state-of-the-art models widely used for sequential tagging: Conditional Random Field (CRF) and the Bidirectional long short-term memory CRF (BiLSTM CRF). This work is the first instance of applying BiLSTM CRF model for POS tagging in the Urdu language. Both models achieved the F1 score of 96% on the test data, thus outperforming existing Urdu POS tagger with a significant margin.

Share

COinS