Rahman Mukras
Representation and learning schemes for sentiment analysis.
Mukras, Rahman
Authors
Contributors
Professor Nirmalie Wiratunga n.wiratunga@rgu.ac.uk
Supervisor
Dr Robert Lothian r.m.lothian@rgu.ac.uk
Supervisor
Abstract
This thesis identifies four novel techniques of improving the performance of sentiment analysis of text systems. Thes include feature extraction and selection, enrichment of the document representation and exploitation of the ordinal structure of rating classes. The techniques were evaluated on four sentiment-rich corpora, using two well-known classifiers: Support Vector Machines and Na¨ıve Bayes. This thesis proposes the Part-of-Speech Pattern Selector (PPS), which is a novel technique for automatically selecting Part-of-Speech (PoS) patterns. The PPS selects its patterns from a background dataset by use of a number of measures including Document Frequency, Information Gain, and the Chi-Squared Score. Extensive empirical results show that these patterns perform just as well as the manually selected ones. This has important implications in terms of both the cost and the time spent in manual pattern construction. The position of a phrase within a document is shown to have an influence on its sentiment orientation, and that document classification performance can be improved by weighting phrases in this regard. It is, however, also shown to be necessary to sample the distribution of sentiment rich phrases within documents of a given domain prior to adopting a phrase weighting criteria. A key factor in choosing a classifier for an Ordinal Sentiment Classification (OSC) problem is its ability to address ordinal inter-class similarities. Two types of classifiers are investigated: Those that can inherently solve multi-class problems, and those that decompose a multi-class problem into a sequence of binary problems. Empirical results showed the former to be more effective with regard to both mean squared error and classification time performances. Important features in an OSC problem are shown to distribute themselves across similar classes. Most feature selection techniques are ignorant of inter-class similarities and hence easily overlook such features. The Ordinal Smoothing Procedure (OSP), which augments inter-class similarities into the feature selection process, is introduced in this thesis. Empirical results show the OSP to have a positive effect on mean squared error performance.
Citation
MUKRAS, R. 2009. Representation and learning schemes for sentiment analysis. Robert Gordon University, PhD thesis.
Thesis Type | Thesis |
---|---|
Deposit Date | Jul 8, 2009 |
Publicly Available Date | Jul 8, 2009 |
Public URL | http://hdl.handle.net/10059/379 |
Contract Date | Jul 8, 2009 |
Award Date | Jan 31, 2009 |
Files
MUKRAS 2009 Representation and learning schemes
(853 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/
Copyright Statement
© The Author.
You might also like
FedSim: similarity guided model aggregation for federated learning.
(2021)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search