Skip to Main Content (Press Enter)

Logo UNIMI
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione

Expertise & Skills
Logo UNIMI

|

Expertise & Skills

unimi.it
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione
  1. Attività

Progetto PSR (2025) Linea 8 - Sottomisura A - Dott. Matteo PAPINI - Reinforcement Learning in Large Action Spaces

Progetto
Reinforcement Learning (RL) is the branch of machine learning dealing with decision and control problems. It is a promising approach to some of the biggest challenges of artificial intelligence, such as agentic AI and robotics. A key feature of RL algorithms is efficient exploration: learning agents must continually experiment with diverse behaviors to quickly find the best strategy for the given task rather than settling for suboptimal solutions. However, most interesting applications are characterized by exceptionally large or continuous action spaces (e.g., thousands of tokens in large language models, continuous control variables in robotics). This abundance of available options makes efficient exploration particularly challenging. Existing theories and algorithmic solutions are mostly designed for small action spaces and are therefore ill-equipped for the challenge. The purpose of this project is to investigate the fundamental aspects of exploration in the large-action-space regime. The methodology will comprise algorithmic design, theoretical analysis, and preliminary medium-scale experiments intended to test the feasibility of the proposed solutions. These solutions should be general and application-agnostic, but tested on representative examples of decision and control problems with large action spaces (e.g. fine-tuning of moderately sized LLMs and training of simulated robots). The budget will be allocated on computational resources for the numerical experiments (including a graphic card for parallel processing) and to fund travel to top-tier machine learning conferences (e.g., ICML, NeurIPS) to present findings and engage with the research community. This is intended as a seed project to better understand the problem and explore possible solutions, hence a starting point of subsequent applications for larger competitive research grants.
  • Dati Generali
  • Aree Di Ricerca

Dati Generali

Partecipanti (2)

BRUSCHI DANILO MAURO   Responsabile scientifico  
PAPINI MATTEO   Responsabile scientifico  

Dipartimenti coinvolti

Dipartimento di Informatica Giovanni Degli Antoni   Principale  

Tipo

PSR_LINEA8A_/ Piano di sviluppo di ricerca - Early Career Development - Linea 8 - Sottomisura A - Dote RTT

Finanziatore

UNIVERSITA' DEGLI STUDI DI MILANO
Organizzazione Esterna Ente Finanziatore

Periodo di attività

Gennaio 26, 2026 - Gennaio 25, 2028

Durata progetto

24 mesi

Aree Di Ricerca

Settori


Settore INFO-01/A - Informatica
  • Informazioni
  • Assistenza
  • Accessibilità
  • Privacy
  • Utilizzo dei cookie
  • Note legali

Realizzato con VIVO | Progettato da Cineca | 26.2.4.0