Testing gradient measures of relevance in discourse

Alex Warstadt & Omar Agha, NYU

Abstract

The influential Question Under Discussion theory (Roberts 1996/2012) provides a categorical definition of relevance relative to the QUD. Specifically, an assertion with content p is predicted to be relevant to a question Q just in case p is a partial answer to Q. In this talk, we explore to what extent the categorical theory of relevance is an idealization of an underlying gradient notion of relevance. On a gradient theory some propositions can be partly relevant without being partial answers to the QUD.

We evaluate two gradient measures of relevance from Bayesian theories of question utility (Nelson 2005). The first measure, entropy reduction, relates the relevance of a response to the degree to which it decreases the listener’s uncertainty about the possible answers to the QUD (van Rooij 2004; Rothe, Lake, and Gurkeckis, 2018). The second measure, KL divergence, relates the relevance of a response to the distance between the prior and posterior distributions over the possible answers, regardless of whether uncertainty increases or decreases (Hawkins et al., 2015).

We test these two measures of relevance experimentally and present preliminary results. In the task, we manipulate two variables (i) the prior probabilities of the possible answers and (ii) the degree to which the response shifts those probabilities. Both of these gradient measures capture a large amount of variance in helpfulness judgments that the categorical theory cannot. KL divergence provides a better fit than entropy reduction. However, both measures diverge from human judgments in systematic ways.