|







|
|
The unifying theme of our lab is fine-grained modeling and empirical investigation of how people are able to acquire and use language. Here are a few topics/projects currently going on:
Probabilistic expectations in human sentence comprehension
Language comprehension is harder than it seems --
sentences are ambiguous, human memory is limited,
and other aspects of our environment compete for
our attention. Although misunderstandings are not
infrequent, it's remarkable that most of the time,
we understand the sentences we hear to mean
approximately what our interlocuters intend them
to mean. There's now a great deal of evidence
that a major factor accounting for our success is
the use of probabilistic knowledge to form
expectations about what sentences mean and also
about what are interlocuters are likely to say in
the future. This helps us to respond to our input
efficiently and to understand it accurately. A
major project in our lab is a detailed explication
of these ideas. One of the central concepts in
this project is the idea of surprisal --
that our probabilistic expectations guide our
allocation of resources in a way that optimally
prepares us to deal with linguistic input. We
study both the empirical coverage of the surprisal
theory and its theoretical underpinnings in
optimality-based analyses of linguistic cognition.
Optimization in human language production
One consequence of the expressive richness of natural languages is
that usually more than one means exists of expressing the same (or
approximately the same) message. As a result, speakers are often
confronted with choices as to how to structure their intended message
into an utterance. If language users are rational, they might choose
to structure their utterances so as to optimize communicative
properties. We have focused on two major types of optimization in
natural language production: (1) information-theoretic and
psycholinguistic considerations suggest that this may include
maximizing the uniformity of information density in an utterance. We
have shown evidence for this principle of Uniform Information Density
in the relative clauses of spontaneous spoken American English, using
a combination of computational models, corpus analysis, and behavioral
experimentation. We are also investigating other types of grammatical
choice phenomena, such as word order variation, for evidence of
Uniform Information Density. (2) Psycholinguistic evidence suggests
that memory constraints are an important bottleneck in language
processing, and the minimization of word-word dependency distances may
be an important factor in utterance optimality. We are engaged in work on efficiently computing minimal-distance linearizations of dependency trees subject to the universal structural constraints observed in natural language, and evaluating how closely observed structures in natural language corpora adere to these optimal linearizations.
Environmental and cognitive limitations in language comprehension
Although language comprehension may be abstractly best characterized as a case of pure computational-level probabilistic inference, there are a number of limitations on the types of computational inferences that should be taken into account. First, language comprehension, as with all other
cases of the extraction of meaningful structure
from perceptual input, takes places under
noisy conditions. Second, incremental processing
algorithms exist that allow exact probabilistic inference have superlinear run time in sentence length, and impose strict locality conditions on the probabilistic dependence between events
at different levels of structure, whereas humans seem to be able to make use of arbitrary features of
(extra-)linguistic context in forming incremental expectations. In recent work, we have explored the consequences of imposing limitations in comprehension models on (a) the veridicality of surface input, and (b) the amount of memory available to the comprehender. It turns out that introducing these limitations can actually lead to elegant solutions to several outstanding problems for models of rational language comprehension.
Phonological generalizations in phonotactic learning
Experimental research shows that speakers are capable not only of recognizing whether a nonce form is an acceptable word of the language, but also that speakers assign a gradient level of acceptability both to sound sequences that are attested in the language and those that are not. Modeling those judgments poses a significant challenge because speaker judgments of attested patterns are usually modeled as distributions over the segmental sequences represented in the lexicon whereas judgments of unattested sequences are more effectively modeled as distributions over lexically under-represented natural class sequences. One of the main challenges in understanding generalization in phonotactic learning is the large number of natural class representations that can be used to describe a given data set, leading to a huge space of possible hypotheses that the learner could entertain. Understanding how speakers learn phonological generalizations given this hypothesis space is an important step towards understanding psychological saliency in terms of phonological features and frequency that has broad implications for phonological learning and processing in general.
| |