Skip to main content

Research Repository

See what's under the surface

Advanced Search

Representation and learning schemes for sentiment analysis.

Mukras, Rahman

Authors

Rahman Mukras



Contributors

Robert Lothian
Supervisor

Abstract

This thesis identifies four novel techniques of improving the performance of sentiment analysis of text systems. Thes include feature extraction and selection, enrichment of the document representation and exploitation of the ordinal structure of rating classes. The techniques were evaluated on four sentiment-rich corpora, using two well-known classifiers: Support Vector Machines and Na¨ıve Bayes. This thesis proposes the Part-of-Speech Pattern Selector (PPS), which is a novel technique for automatically selecting Part-of-Speech (PoS) patterns. The PPS selects its patterns from a background dataset by use of a number of measures including Document Frequency, Information Gain, and the Chi-Squared Score. Extensive empirical results show that these patterns perform just as well as the manually selected ones. This has important implications in terms of both the cost and the time spent in manual pattern construction. The position of a phrase within a document is shown to have an influence on its sentiment orientation, and that document classification performance can be improved by weighting phrases in this regard. It is, however, also shown to be necessary to sample the distribution of sentiment rich phrases within documents of a given domain prior to adopting a phrase weighting criteria. A key factor in choosing a classifier for an Ordinal Sentiment Classification (OSC) problem is its ability to address ordinal inter-class similarities. Two types of classifiers are investigated: Those that can inherently solve multi-class problems, and those that decompose a multi-class problem into a sequence of binary problems. Empirical results showed the former to be more effective with regard to both mean squared error and classification time performances. Important features in an OSC problem are shown to distribute themselves across similar classes. Most feature selection techniques are ignorant of inter-class similarities and hence easily overlook such features. The Ordinal Smoothing Procedure (OSP), which augments inter-class similarities into the feature selection process, is introduced in this thesis. Empirical results show the OSP to have a positive effect on mean squared error performance.

Thesis Type Thesis
Institution Citation MUKRAS, R. 2009. Representation and learning schemes for sentiment analysis. Robert Gordon University, PhD thesis.

Files

MUKRAS 2009 Representation and learning schemes (853 Kb)
PDF

Copyright Statement
Copyright: the author and Robert Gordon University





You might also like



Downloadable Citations

;