Skip to Main Content (Press Enter)

Logo UNIMI
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione

Expertise & Skills
Logo UNIMI

|

Expertise & Skills

unimi.it
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione
  1. Pubblicazioni

Leveraging RAG for Privacy Violation Detection and Explainability

Contributo in Atti di convegno
Data di Pubblicazione:
2025
Citazione:
Leveraging RAG for Privacy Violation Detection and Explainability / S. Locci, D. Audrito, G. Livraga, M. Viviani, L. Di Caro - In: IJCNN2025[s.l] : Institute of Electrical and Electronics Engineers (IEEE), 2025 Nov. - ISBN 979-8-3315-1042-8. (( International Joint Conference on Neural Networks : June 30 - July 5 Roma 2025 [10.1109/IJCNN64981.2025.11228403].
Abstract:
In today’s digital landscape, users frequently share vast amounts of information, including confidential data, often without full awareness of the associated privacy risks. This scenario highlights the need for automated methods to identify sensitive information and alert users to such risks. Existing algorithmic solutions for detecting sensitive content typically require either human intervention (rule-based approaches) or labeled data (supervised learning), both of which can be costly and limiting. In this paper, we propose a framework based on Retrieval-Augmented Generation (RAG) to classify privacy-sensitive content while providing contextual explanations. We employed the state-of-the-art generative Large Language Model (LLM) GPT-4o, with Information Retrieval models BM25 and FAISS, enhancing both detection accuracy and explainability. Our method utilizes a curated Knowledge Base of scientific literature on privacy and confidentiality to retrieve contextually relevant information, which is then used to guide the classification process and generate explanations. Experimental evaluations on a real-world dataset (Enron Email Dataset) demonstrate that RAG-based approaches significantly outperform the zero-shot baseline, with BM25 showing the highest performance. This tool is designed to serve end-users, by mitigating risks before data sharing, by enabling proactive monitoring of privacy violations.
Tipologia IRIS:
03 - Contributo in volume
Keywords:
Privacy; Retrieval-Augmented Generation (RAG); Large Language Models (LLMs); Information Retrieval (IR); Knowledge Bases (KBs)
Elenco autori:
S. Locci, D. Audrito, G. Livraga, M. Viviani, L. Di Caro
Autori di Ateneo:
LIVRAGA GIOVANNI ( autore )
Link alla scheda completa:
https://air.unimi.it/handle/2434/1224157
Titolo del libro:
IJCNN2025
Progetto:
Green responsibLe privACy preservIng dAta operaTIONs
  • Aree Di Ricerca

Aree Di Ricerca

Settori


Settore INFO-01/A - Informatica
  • Informazioni
  • Assistenza
  • Accessibilità
  • Privacy
  • Utilizzo dei cookie
  • Note legali

Realizzato con VIVO | Progettato da Cineca | 26.6.0.0