Speech-Based Depression Recognition in Hikikomori Patients Undergoing Cognitive Behavioral Therapy
Articolo
Data di Pubblicazione:
2025
Citazione:
Speech-Based Depression Recognition in Hikikomori Patients Undergoing Cognitive Behavioral Therapy / S.S. Leal, S. Ntalampiras, M.G. Rossetti, A. Trabacca, M. Bellani, R. Sassi. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 15:21(2025 Nov), pp. 11750.1-11750.18. [10.3390/app152111750]
Abstract:
Major depressive disorder (MDD) affects approximately 4.4% of the global population. Its prevalence is increasing among adolescents and has led to the psychosocial condition known as hikikomori. MDD is typically assessed by self-report questionnaires, which, although informative, are subject to evaluator bias and subjectivity. To address these limitations, recent studies have explored machine learning (ML) for automated MDD detection. Among the input data used, speech signals stand out due to their low cost and minimal intrusiveness. However, many speech-based approaches lack integration with cognitive behavioral therapy (CBT) and adherence to evidence-based, patient-centered care-often aiming to replace rather than support clinical monitoring. In this context, we propose ML models to assess MDD in hikikomori patients using speech data from a real-world clinical trial. The trial is conducted in Italy, supervised by physicians, and comprises an eight-session CBT plan that is clinical evidence-based and follows patient-centered practices. Patients' speech is recorded during therapy, and the Mel-Frequency Cepstral Coefficients (MFCCs) and wav2vec 2.0 embedding are extracted to train the models. The results show that the Multi-Layer Perceptron (MLP) predicted depression outcomes with a Root Mean Squared Error (RMSE) of 0.064 using only MFCCs from the first session, suggesting that early-session speech may be valuable for outcome prediction. When considering the entire CBT treatment (i.e., all sessions), the MLP achieved an RMSE of 0.063 using MFCCs and a lower RMSE of 0.057 with wav2vec 2.0, indicating approximately a 9.5% performance improvement. To aid the interpretability of the treatment outcomes, a binary task was conducted, where Logistic Regression (LR) achieved 70% recall in predicting depression improvement among young adults using wav2vec 2.0. These findings position speech as a valuable predictive tool in clinical informatics, potentially supporting clinicians in anticipating treatment response.
Tipologia IRIS:
01 - Articolo su periodico
Keywords:
machine learning; speech depression recognition; wav2vec2
Elenco autori:
S.S. Leal, S. Ntalampiras, M.G. Rossetti, A. Trabacca, M. Bellani, R. Sassi
Link alla scheda completa: