Type

Conference Proceedings

Authors

Gareth J.F. Jones
Debasis Ganguly
Piyush Arora

Subjects

Mathematics

Topics
question quality prediction text categorization document embedding question answering neighbourhood based transformation text classification naive bayes nearest neighbour

Nearest neighbour based transformation functions for text classification: a case study with StackOverflow (2016)

Abstract The significant growth in the number of questions in question answering forums has led to increasing interest in text categorization methods for classifying newly posted questions as good (suitable) or bad (otherwise) for the forum. Standard text categorization approaches, e.g. multinomial Naive Bayes, are likely to be unsuitable for this classification task because of: i) the lack of sufficient informative content in the questions due to their relatively short length; and ii) considerable vocabulary overlap between the classes. To increase the robustness of this classification task, we propose to use the neighbourhood of existing questions which are similar to the newly asked question. Instead of learning the classification boundary from the questions alone, we transform each question vector into a different one in the feature space. We explore two different neighbourhood functions using: the discrete term space, the continuous vector space of real numbers obtained from vector embeddings of documents. Experiments conducted on StackOverflow data show that our approach of using this neighborhood transformation can improve classification accuracy by up to about 8% as compared to using just unigram textual features.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: ADAPT
Ireland -> Dublin City University -> Status = Published

Full list of authors on original publication

Gareth J.F. Jones, Debasis Ganguly, Piyush Arora

Experts in our system

1
Gareth J. F. Jones
Dublin City University
Total Publications: 297
 
2
Debasis Ganguly
Dublin City University
Total Publications: 40
 
3
Piyush Arora
Dublin City University
Total Publications: 15