Skip to Main Content (Press Enter)

Logo UNIMI
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione

Expertise & Skills
Logo UNIMI

|

Expertise & Skills

unimi.it
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione
  1. Pubblicazioni

Synthetic Data for Identifying Inclusive Language (Case Study: Job Descriptions in Italian)

Contributo in Atti di convegno
Data di Pubblicazione:
2024
Citazione:
Synthetic Data for Identifying Inclusive Language (Case Study: Job Descriptions in Italian) / T. Romano, F. Mohammadi, P. Ceravolo (PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (CSR)). - In: Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR) / [a cura di] S. Shiaeles, N. Kolokotronis, E. Bellini. - [s.l] : IEEE, 2024 Sep. - ISBN 979-8-3503-7536-7. - pp. 737-742 (( convegno IEEE International Conference on Cyber Security and Resilience, CSR tenutosi a London nel 2024 [10.1109/csr61664.2024.10679398].
Abstract:
Using a comprehensive list of job titles, we propose a framework to automatically generate job descriptions in Italian. This synthetic data is then used in a Large Language Model to detect inclusive language in job postings. Finally, we compare the results of this synthetic dataset with real data. Our study demonstrates that the data format and prompting method signif-icantly impact performance. Additionally, we identify limitations and key considerations for unifying synthetic data with real data for fine-tuning purposes. We also propose improvements to the framework and provide guidelines for effectively integrating these two types of data. The novelty of our work is generating and integrating synthetic data due to the scarcity of annotated Italian job descriptions, thereby improving the training of Large Language Models (LLMs) tailored specifically for Italian.
Tipologia IRIS:
03 - Contributo in volume
Elenco autori:
T. Romano, F. Mohammadi, P. Ceravolo
Autori di Ateneo:
CERAVOLO PAOLO ( autore )
MOHAMMADI FATEMEH ( autore )
Link alla scheda completa:
https://air.unimi.it/handle/2434/1119051
Titolo del libro:
Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR)
Progetto:
MUSA - Multilayered Urban Sustainability Actiona
  • Aree Di Ricerca

Aree Di Ricerca

Settori


Settore INFO-01/A - Informatica
  • Informazioni
  • Assistenza
  • Accessibilità
  • Privacy
  • Utilizzo dei cookie
  • Note legali

Realizzato con VIVO | Progettato da Cineca | 25.11.5.0