Towards Precision Characterization of Communication Disorders

ICASSP 2025

Nigel G. Ward, Andres Segura, Georgina Bugarini, Heike Lehnert-LeHouillier, Dancheng Liu, Jinjun Xiong, Olac Fuentes

Abstract: The diagnosis and treatment of individuals with communication disorders offers many opportunities for the application of speech technology, but research so far has not adequately considered: the diversity of conditions, the challenges of limited data, and the role of pragmatic deficits. This paper explores how a general-purpose model of perceived pragmatic similarity may overcome these limitations. It shows that a simple model can capture utterance aspects that are relevant to diagnosis of autism and of specific language impairment, outlines how it might support several use cases for clinicians and clients, and analyzes its performance and limitations.

Full Paper (erratum: Table 2 is transposed)

Overview Video

Audio Pairs Illustrating Weaknesses of our Best Model

Audio Pair 1

A common reason for differences in the human and system ratings of similarity was the presence of sounds rare in the training data. For example, this pair includes an ingressive exclamation.

Reference:

Reference and Clip #1:

Metric	Value
Average Judge Rating	1.85
System Cosine Similarity	0.96

Audio Pair 2

Another reason for the differences may have been that the model picked up on pragmatic details that the judges didn’t consider important. For instance, the model may have been over-sensitive to differences in how speakers intended to take turns (whether they were holding or yielding the conversation).

Reference:

Reference and Clip #1:

Metric	Value
Average Judge Rating	2.07
System Cosine Similarity	0.88

Audio Pair 3

As another example of cases where the model may have been over-sensitive to aspects of similarity that the judges missed or didn’t prioritize, here both clips are very similar in seeking confirmation of an amount or quantity, but were judged different by the model, possibly because one was a question and the other was a statement, a distinction which the human judges likely felt unimportant.

Reference:

Reference and Clip #2:

Metric	Value
Average Judge Rating	2.08
System Cosine Similarity	0.88