Faculty Research - Journal Articles

Cluster analysis of urdu tweets

Zarmeen Nasim, Institute of Business Administration, KarachiFollow
Sajjad Haider, Institute of Business Administration, KarachiFollow

Author Affiliation

Zarmeen Nasim is Lecturer at Institute of Business Administration (IBA), Karach

Sajjad Haider is Professor at Institute of Business Administration (IBA), Karach

Faculty / School

Faculty of Computer Sciences (FCS)

Department

Department of Computer Science

Was this content written or created while at IBA?

Yes

Document Type

Article

Source Publication

Journal of King Saud University - Computer and Information Sciences

ISSN

1319-1578

Keywords

Document clustering, Document embeddings, Feature extraction methods, Topic modelling, Unsupervised learning, Urdu language processing

Disciplines

Computer Sciences

Abstract

Document clustering allows a user to group semantically similar documents. It has been an interesting research area for the past many years and various methods and techniques have been developed. However, the research has primarily been limited to English and other high resource languages. For low-resource languages, such as Urdu, the area of document clustering is open to contributions. This work presents an experimental evaluation of clustering techniques on Urdu tweets. It is a challenging task to semantically cluster tweets due to their very short length. In this paper, various features, including sentence and phrase-level embeddings, TF-IDF features and document embeddings are extracted from tweets and clustering is performed using three different algorithms: K-Means, Bisecting K-Means, and Affinity Propagation algorithms. Furthermore, a comparison is performed with the traditional topic modeling approach. The results indicate that the TF-IDF features combined with the K-means clustering algorithm outperformed the adopted clustering techniques.

Indexing Information

HJRS - W Category, Scopus, Web of Science - Science Citation Index Expanded (SCI)

Journal Quality Ranking

Impact Factor: 13.473

Recommended Citation

Nasim, Z., & Haider, S. (2020). Cluster analysis of urdu tweets. Journal of King Saud University - Computer and Information Sciences Retrieved from https://ir.iba.edu.pk/faculty-research-articles/95

Publication Status

Published

Link to Full Text

COinS

Faculty Research - Journal Articles

Cluster analysis of urdu tweets

Author Affiliation

Faculty / School

Department

Was this content written or created while at IBA?

Document Type

Source Publication

ISSN

Keywords

Disciplines

Abstract

Indexing Information

Journal Quality Ranking

Recommended Citation

Publication Status

Browse

Search

Author Corner

LINKS

Faculty Research - Journal Articles

Cluster analysis of urdu tweets

Author(s)

Author Affiliation

Faculty / School

Department

Was this content written or created while at IBA?

Document Type

Source Publication

ISSN

Keywords

Disciplines

Abstract

Indexing Information

Journal Quality Ranking

Recommended Citation

Publication Status

Share

Browse

Search

Author Corner

LINKS