Skip to Main Content (Press Enter)

Logo UNIMI
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione

Expertise & Skills
Logo UNIMI

|

Expertise & Skills

unimi.it
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione
  1. Pubblicazioni

Mathematical programming for simultaneous feature selection and outlier detection under l1 norm

Articolo
Data di Pubblicazione:
2024
Citazione:
Mathematical programming for simultaneous feature selection and outlier detection under l1 norm / M. Barbato, A. Ceselli. - In: EUROPEAN JOURNAL OF OPERATIONAL RESEARCH. - ISSN 0377-2217. - 316:3(2024 Aug 01), pp. 1070-1084. [10.1016/j.ejor.2024.03.035]
Abstract:
The goal of simultaneous feature selection and outlier detection is to determine a sparse linear regression vector by fitting a dataset possibly affected by the presence of outliers. The problem is well-known in the literature. In its basic version it covers a wide range of tasks in data analysis. Simultaneously performing feature selection and outlier detection strongly improves the application potential of regression models in more general settings, where data governance is a concern. To trigger this potential, flexible training models are needed, with more parameters under control of decision makers. The use of mathematical programming, although pertinent, is scarce in this context and mostly focusing on the least-squares setting. Instead we consider the least absolute deviation criterion, proposing two mixed-integer linear programs, one adapted from existing studies, and the other obtained from a disjunctive programming argument. We show theoretically and computationally that the disjunctive-based formulation is better in terms of both continuous relaxation quality and integer optimality convergence. We experimentally benchmark against existing methodologies from the literature. We identify the characteristics of contamination patterns, in which mathematical programming is better than state-of-the-art algorithms in combining prediction quality, sparsity and robustness against outliers. Additionally, the mathematical programming approaches allow the decision maker to directly control parameters like the number of features or outliers to tolerate, those based on least absolute deviations performing best. On real world datasets, where privacy is a concern, our approach compares well to state-of-the-art methods in terms of accuracy, being at the same time more flexible.
Tipologia IRIS:
01 - Articolo su periodico
Keywords:
Data science; Outlier detection; Feature selection; Least absolute deviation; Mathematical programming
Elenco autori:
M. Barbato, A. Ceselli
Autori di Ateneo:
BARBATO MICHELE ( autore )
CESELLI ALBERTO ( autore )
Link alla scheda completa:
https://air.unimi.it/handle/2434/1047989
Link al Full Text:
https://air.unimi.it/retrieve/handle/2434/1047989/2405675/16_articolo_math_prog_for_sfsod_EJOR_Michele_Barbato.pdf
Progetto:
SEcurity and RIghts in the CyberSpace (SERICS)
  • Aree Di Ricerca

Aree Di Ricerca

Settori (4)


Settore INF/01 - Informatica

Settore MAT/09 - Ricerca Operativa

Settore INFO-01/A - Informatica

Settore MATH-06/A - Ricerca operativa
  • Informazioni
  • Assistenza
  • Accessibilità
  • Privacy
  • Utilizzo dei cookie
  • Note legali

Realizzato con VIVO | Progettato da Cineca | 25.11.5.0