MoTT: A Speech Dataset for Modular Composition of Turn-Taking Conversations

Contributo in Atti di convegno

Data di Pubblicazione:

2025

Citazione:

MoTT: A Speech Dataset for Modular Composition of Turn-Taking Conversations / G. Salada, D. Fantini, F. Avanzini, G. Presti - In: 2025 Immersive and 3D Audio: from Architecture to Automotive (I3DA)[s.l] : IEEE, 2025. - ISBN 979-8-3315-5828-4. - pp. 1-8 (( convegno International Conference on Immersive and 3D Audio tenutosi a Bologna nel 2025 [10.1109/i3da65421.2025.11202114].

Abstract:

Among the numerous speech datasets in the literature, only a minority concerns conversational data, and even fewer datasets isolate the elements occurring in turn-taking conversations. To address this gap, this paper presents MoTT, an English speech dataset composed of questions, answers, reciprocal questions, and backchannel responses recorded by eight participants. The questions and answers pertain to ten topics and were recorded in two takes. The voice directivity pattern was simultaneously captured at frontal and lateral positions by two microphones. The MoTT dataset was designed to provide interchangeable conversational elements and enable their modular composition to obtain fictional but plausible and convincing conversations. As a result, multiple virtual speakers engage in a turn-taking conversation that emulates real-world interactions, with spatial audio techniques employed to enhance realism by arranging the speakers in the auditory scene. This dataset offers a valuable resource for studies in immersive spatial audio, human-computer interaction, and auditory scene analysis. The dataset is therefore well-suited for experiments that necessitate the simulation of ecologically valid conversations, as the one described in the use case reported in this paper.

Tipologia IRIS:

03 - Contributo in volume

Keywords:

Dataset; speech; audio recording; turn-taking

Elenco autori: