Faculty Research - Book Chapters and Conference Papers

On building an interpretable topic modeling approach for the Urdu language

Zarmeen Nasim, Institute of Business Administration Karachi

Faculty / School

Faculty of Computer Sciences (FCS)

Department

Department of Computer Science

Was this content written or created while at IBA?

Yes

Document Type

Conference Paper

Publication Date

1-1-2020

Conference Name

The 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence!IJCAI-PRICAI2020

Conference Location

Yokohama, Japan

Conference Dates

7-15 January 2021

ISBN/ISSN

85097355737 (Scopus)

First Page

5200

Last Page

5201

Publisher

IJCAI International Joint Conference on Artificial Intelligence

Keywords

Natural language processing, NLP applications and tools, Embeddings, Natural language summarization

Abstract / Description

This research is an endeavor to combine deep-learning-based language modeling with classical topic modeling techniques to produce interpretable topics for a given set of documents in Urdu, a low resource language. The existing topic modeling techniques produce a collection of words, often uninterpretable, as suggested topics without integrating them into a semantically correct phrase/sentence. The proposed approach would first build an accurate Part of Speech (POS) tagger for the Urdu Language using a publicly available corpus of many million sentences. Using semantically rich feature extraction approaches including Word2Vec and BERT, the proposed approach, in the next step, would experiment with different clustering and topic modeling techniques to produce a list of potential topics for a given set of documents. Finally, this list of topics would be sent to a labeler module to produce syntactically correct phrases that will represent interpretable topics.

DOI

https://doi.org/10.24963/ijcai.2020/740

Recommended Citation

Nasim, Z. (2020). On building an interpretable topic modeling approach for the Urdu language., 5200-5201. https://doi.org/10.24963/ijcai.2020/740

Link to Full Text

COinS

Faculty Research - Book Chapters and Conference Papers

On building an interpretable topic modeling approach for the Urdu language

Faculty / School

Department

Was this content written or created while at IBA?

Document Type

Publication Date

Conference Name

Conference Location

Conference Dates

ISBN/ISSN

First Page

Last Page

Publisher

Keywords

Abstract / Description

DOI

Recommended Citation

Browse

Search

Author Corner

LINKS

Faculty Research - Book Chapters and Conference Papers

On building an interpretable topic modeling approach for the Urdu language

Author

Faculty / School

Department

Was this content written or created while at IBA?

Document Type

Publication Date

Conference Name

Conference Location

Conference Dates

ISBN/ISSN

First Page

Last Page

Publisher

Keywords

Abstract / Description

DOI

Recommended Citation

Share

Browse

Search

Author Corner

LINKS