Centrum für Informations- und Sprachverarbeitung

Breadcrumb Navigation



Key aspects of today’s dominant clustering algorithms are heuristic and have poorly understood statistical properties. We address this problem by developing and analyzing clustering algorithms that choose their parameters, such as the number of clusters and other algorithm-specific hyper-parameters, in a fully automatic way such that, for large sample sizes, statistical guarantees in view of a mathematically precise and practically meaningful clustering goal can be given. The new clustering algorithms will be applied to sentiment analysis where determining the set of distinct opinions is essentially a clustering task.  The biggest challenge in this respect are polarity modifiers, e.g., words like "not" that reverse the polarity of a clause. We will develop methods for automatically learning and representing polarity modifiers in a way that allows us to define an accurate similarity measure for clauses that can then be used for effective clustering.