Dialogs Re-enacted Across Languages, Version 2

Technical Report UTEP-CS-23-27

Nigel G. Ward, Jonathan E. Avila, Emilia Rivas, Divette Marco

July 2023

Abstract   To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection, and some observations and musings. This report is intended for 1) people using this corpus, 2) people extending this corpus, and 3) people designing similar collections of bilingual dialog data.

Technical Report

Corpus Download, etc