Progetto PSR (2025) Linea 8 - Sottomisura A - Dott. Matteo PAPINI - Reinforcement Learning in Large Action Spaces
Progetto Reinforcement Learning (RL) is the branch of machine learning dealing with decision and control problems. It is a promising approach to some of the biggest challenges of artificial intelligence, such as agentic AI and robotics.
A key feature of RL algorithms is efficient exploration: learning agents must continually experiment with diverse behaviors to quickly find the best strategy for the given task rather than settling for suboptimal solutions. However, most interesting applications are characterized by exceptionally large or continuous action spaces (e.g., thousands of tokens in large language models, continuous control variables in robotics). This abundance of available options makes efficient exploration particularly challenging. Existing theories and algorithmic solutions are mostly designed for small action spaces and are therefore ill-equipped for the challenge.
The purpose of this project is to investigate the fundamental aspects of exploration in the large-action-space regime. The methodology will comprise algorithmic design, theoretical analysis, and preliminary medium-scale experiments intended to test the feasibility of the proposed solutions. These solutions should be general and application-agnostic, but tested on representative examples of decision and control problems with large action spaces (e.g. fine-tuning of moderately sized LLMs and training of simulated robots).
The budget will be allocated on computational resources for the numerical experiments (including a graphic card for parallel processing) and to fund travel to top-tier machine learning conferences (e.g., ICML, NeurIPS) to present findings and engage with the research community.
This is intended as a seed project to better understand the problem and explore possible solutions, hence a starting point of subsequent applications for larger competitive research grants.