IEEE Workshop on Spoken Language Technology (SLT), 2018. pages 831-827.
Abstract:
Going beyond turn-taking models built to solve specific tasks, such as
predicting if a user will hold his/her turn after a pause, there is
growing interest in more general models for turn taking that subsume
many such tasks, and very good results have recently been obtained
(Skantze 2017). Here we present an improved recurrent network model
that outperforms previous work and does so without requiring
lexical annotation. Further, we show that this model can be trained
for different languages with no modifications, providing good results
in turn-taking prediction for English, Spanish, Japanese, Mandarin and
French. We also show that our model performs well across genres,
including task-oriented dialog and general conversation.