Skip to Main Content (Press Enter)

Logo UNIMI
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione

Expertise & Skills
Logo UNIMI

|

Expertise & Skills

unimi.it
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione
  1. Pubblicazioni

MoTT: A Speech Dataset for Modular Composition of Turn-Taking Conversations

Contributo in Atti di convegno
Data di Pubblicazione:
2025
Citazione:
MoTT: A Speech Dataset for Modular Composition of Turn-Taking Conversations / G. Salada, D. Fantini, F. Avanzini, G. Presti - In: 2025 Immersive and 3D Audio: from Architecture to Automotive (I3DA)[s.l] : IEEE, 2025. - ISBN 979-8-3315-5828-4. - pp. 1-8 (( convegno International Conference on Immersive and 3D Audio tenutosi a Bologna nel 2025 [10.1109/i3da65421.2025.11202114].
Abstract:
Among the numerous speech datasets in the literature, only a minority concerns conversational data, and even fewer datasets isolate the elements occurring in turn-taking conversations. To address this gap, this paper presents MoTT, an English speech dataset composed of questions, answers, reciprocal questions, and backchannel responses recorded by eight participants. The questions and answers pertain to ten topics and were recorded in two takes. The voice directivity pattern was simultaneously captured at frontal and lateral positions by two microphones. The MoTT dataset was designed to provide interchangeable conversational elements and enable their modular composition to obtain fictional but plausible and convincing conversations. As a result, multiple virtual speakers engage in a turn-taking conversation that emulates real-world interactions, with spatial audio techniques employed to enhance realism by arranging the speakers in the auditory scene. This dataset offers a valuable resource for studies in immersive spatial audio, human-computer interaction, and auditory scene analysis. The dataset is therefore well-suited for experiments that necessitate the simulation of ecologically valid conversations, as the one described in the use case reported in this paper.
Tipologia IRIS:
03 - Contributo in volume
Keywords:
Dataset; speech; audio recording; turn-taking
Elenco autori:
G. Salada, D. Fantini, F. Avanzini, G. Presti
Autori di Ateneo:
AVANZINI FEDERICO ( autore )
FANTINI DAVIDE ( autore )
PRESTI GIORGIO ( autore )
Link alla scheda completa:
https://air.unimi.it/handle/2434/1190266
Titolo del libro:
2025 Immersive and 3D Audio: from Architecture to Automotive (I3DA)
Progetto:
Transforming auditory-based social interaction and communication in AR/VR (SONICOM)
  • Aree Di Ricerca

Aree Di Ricerca

Settori


Settore INFO-01/A - Informatica
  • Informazioni
  • Assistenza
  • Accessibilità
  • Privacy
  • Utilizzo dei cookie
  • Note legali

Realizzato con VIVO | Progettato da Cineca | 25.11.5.0