Resource and data intense methods for robust fine grained sentiment analysis

In this research project, we address shortcomings on expression-level sentiment analysis. This fine-grained level has not been examined much in previous work even though for practical applications, such as opinion question answering or summarization, it is essential.

For the predominant type of expressions in this task, i.e. polar expressions such as nice or terrible that convey positive or negative sentiment, we will focus on the problem of unknown words. We plan to investigate the use of morphological analysis for both decomposing and synthesizing words. Moreover, we will address the issue of polar intensity. We plan to systematically compare different automatic ordering methods among each other and also with human ratings.

We will also create lexicons that contain different types of valence shifters. Shifters are essential for contextual classification, as they modify or even fully switch the polarity conveyed by polar expressions. Since valence shifting, so far, has been mostly reduced to handling common negation, this task requires a more thorough investigation on the nature of shifting.

With regard to the entity extraction tasks in expression-level sentiment analysis, i.e. opinion holder and opinion target extraction, we aim to create novel lexicons that can serve as the back-bone of rule-based extraction systems. Such systems are usually fairly domain-independent and easy to create in the absence of labeled textual data.

In order to tackle the afore-mentioned tasks, we will employ both resource-intensive methods, i.e. rule-based methods that make use of very deep semantic representations, and data-intensive methods, i.e. corpus-based methods which may also employ standard NLP tools.

We will examine these tasks for two languages, English and German. Since the majority of previous research in natural language processing focussed on the former language, there are already sophisticated resources available which allow investigations of deep(er) linguistic methods. By contrast, for the latter these resources are not available. Accordingly, shallower methods, typically data-intensive ones, need to be applied. One additional contribution of this project is that, in particular for German, new resources, such as lexical resources and processing tools for sentiment analysis, will be created.

In connection with the comparison of resource-intensive and data-intensive methods, we also want to answer the question which type of representation is best suited for the different classification/extraction tasks in fine-grained sentiment analysis. In this context, we will also critically assess the suitability of traditional lemma-based representations and contrast them with other potential levels, such as the sense level.

Finally, we plan to review established evaluation methods and examine whether they make sufficiently transparent which kinds of phenomena an analysis system handles well and which it does not.

Marc Schulder
Marc Schulder
Research Associate in Computational Linguistics

My research interests include natural language processing, sign languages and sentiment analysis.