The unifying theme of our lab is fine-grained modeling and empirical investigation of how people are able to acquire and use language. Here are a few topics/projects currently going on:

Probabilistic expectations in human sentence comprehension

Language comprehension is harder than it seems -- sentences are ambiguous, human memory is limited, and other aspects of our environment compete for our attention. Although misunderstandings are not infrequent, it's remarkable that most of the time, we understand the sentences we hear to mean approximately what our interlocuters intend them to mean. There's now a great deal of evidence that a major factor accounting for our success is the use of probabilistic knowledge to form expectations about what sentences mean and also about what are interlocuters are likely to say in the future. This helps us to respond to our input efficiently and to understand it accurately. A major project in our lab is a detailed explication of these ideas. One of the central concepts in this project is the idea of surprisal -- that our probabilistic expectations guide our allocation of resources in a way that optimally prepares us to deal with linguistic input. We study both the empirical coverage of the surprisal theory and its theoretical underpinnings in optimality-based analyses of linguistic cognition.

Optimization in human language production

One consequence of the expressive richness of natural languages is that usually more than one means exists of expressing the same (or approximately the same) message. As a result, speakers are often confronted with choices as to how to structure their intended message into an utterance. If language users are rational, they might choose to structure their utterances so as to optimize communicative properties. We have focused on two major types of optimization in natural language production: (1) information-theoretic and psycholinguistic considerations suggest that this may include maximizing the uniformity of information density in an utterance. We have shown evidence for this principle of Uniform Information Density in the relative clauses of spontaneous spoken American English, using a combination of computational models, corpus analysis, and behavioral experimentation. We are also investigating other types of grammatical choice phenomena, such as word order variation, for evidence of Uniform Information Density. (2) Psycholinguistic evidence suggests that memory constraints are an important bottleneck in language processing, and the minimization of word-word dependency distances may be an important factor in utterance optimality. We are engaged in work on efficiently computing minimal-distance linearizations of dependency trees subject to the universal structural constraints observed in natural language, and evaluating how closely observed structures in natural language corpora adere to these optimal linearizations.

Environmental and cognitive limitations in language comprehension

Although language comprehension may be abstractly best characterized as a case of pure computational-level probabilistic inference, there are a number of limitations on the types of computational inferences that should be taken into account. First, language comprehension, as with all other cases of the extraction of meaningful structure from perceptual input, takes places under noisy conditions. Second, incremental processing algorithms exist that allow exact probabilistic inference have superlinear run time in sentence length, and impose strict locality conditions on the probabilistic dependence between events at different levels of structure, whereas humans seem to be able to make use of arbitrary features of (extra-)linguistic context in forming incremental expectations. In recent work, we have explored the consequences of imposing limitations in comprehension models on (a) the veridicality of surface input, and (b) the amount of memory available to the comprehender. It turns out that introducing these limitations can actually lead to elegant solutions to several outstanding problems for models of rational language comprehension.

Phonological generalizations in phonotactic learning

Experimental research shows that speakers are capable not only of recognizing whether a nonce form is an acceptable word of the language, but also that speakers assign a gradient level of acceptability both to sound sequences that are attested in the language and those that are not. Modeling those judgments poses a significant challenge because speaker judgments of attested patterns are usually modeled as distributions over the segmental sequences represented in the lexicon whereas judgments of unattested sequences are more effectively modeled as distributions over lexically under-represented natural class sequences. One of the main challenges in understanding generalization in phonotactic learning is the large number of natural class representations that can be used to describe a given data set, leading to a huge space of possible hypotheses that the learner could entertain. Understanding how speakers learn phonological generalizations given this hypothesis space is an important step towards understanding psychological saliency in terms of phonological features and frequency that has broad implications for phonological learning and processing in general.