Causal Mediation Analysis for Interpreting Large Language Models

Conference Paper

Publication Date:

2024

Citation:

Causal Mediation Analysis for Interpreting Large Language Models / E. Rocchetti, A. Ferrara (CEUR WORKSHOP PROCEEDINGS). - In: SEBD 2024 : Symposium on Advanced Database Systems 2024 / [a cura di] M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta. - [s.l] : CEUR-WS, 2024. - pp. 585-594 (( convegno SEBD 2024 Symposium on Advanced Database Systems 2024 tenutosi a Villasimius nel 2024.

abstract:

Being able to understand the inner workings of Large Language Models (LLMs) is crucial for ensuring
safer development practices and fostering trust in their predictions, particularly in sensitive applications.
Causal Mediation Analysis (CMA) is a causality framework which fits perfectly for this scenario, providing
a mechanistic interpretation of the behaviour of LLM components and assessing a specific type of
knowledge in the model (e.g. presence of gender bias). This study discusses the challenges and potential
pathways in applying CMA to open LLMs’ black boxes. Through three exemplary case studies from the
literature, we show the unique insights CMA can provide. We elaborate on the inherent challenges and
opportunities this approach presents. These challenges range from the influence of model architecture
on prompt viability to the complexities of ensuring metric comparability across studies. Conversely, the
opportunities lie in the dissection of LLMs’ knowledge through the extraction of the specific domains
of knowledge activated during processing. Our discussion aims to provide a comprehensive insight
into CMA, focusing on essential aspects to equip researchers with the knowledge necessary for crafting
effective CMA experiments tailored towards interpretability objectives.

IRIS type:

03 - Contributo in volume

Keywords:

LLM; interpretability; causality; causal mediation analysis

List of contributors: