A key research question for this project is: “What features can we use to characterize two communicative acts – speech utterance, blog post, web page, or tweet — as expressions of the same idea?” In analyzing the great variety of web productions we find in our internet searches, we have been discovering that a variety of factors are relevant
to classifying two productions as similar in topic, sentiment, and orientation: group allegiance, author identity, social network role, and a host of linguistic features. Our early work has focused on identifying a set of linguistic features diagnostic of group identification, using the example of white militant groups.
We have also been interested in exploring finer grained models that characterize the usage profiles of individual words, following work on distributionally based representations of meaning (the word as vector paradigm). These have been used to construct semantic clusters diagnostic of particular ideological orientations, as well as to try to refine our model of linguistic markers of group membership.