Data di Pubblicazione:
2023
Citazione:
Variational inference for semiparametric Bayesian novelty detection in large datasets / L. Benedetti, E. Boniardi, L. Chiani, J. Ghirri, M. Mastropietro, A. Cappozzo, F. Denti. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - (2023), pp. 1-23. [Epub ahead of print] [10.1007/s11634-023-00569-z]
Abstract:
After being trained on a fully-labeled training set, where the observations are grouped
into a certain number of known classes, novelty detection methods aim to classify the
instances of an unlabeled test set while allowing for the presence of previously unseen
classes. These models are valuable in many areas, ranging from social network and
food adulteration analyses to biology, where an evolving population may be present.
In this paper, we focus on a two-stage Bayesian semiparametric novelty detector, also
known as Brand, recently introduced in the literature. Leveraging on a model-based
mixture representation, Brand allows clustering the test observations into known train-
ing terms or a single novelty term. Furthermore, the novelty term is modeled with a
Dirichlet Process mixture model to flexibly capture any departure from the known pat-
terns. Brand was originally estimated using MCMC schemes, which are prohibitively
costly when applied to high-dimensional data. To scale up Brand applicability to large
datasets, we propose to resort to a variational Bayes approach, providing an efficient
algorithm for posterior approximation. We demonstrate a significant gain in efficiency
and excellent classification performance with thorough simulation studies. Finally, to
showcase its applicability, we perform a novelty detection analysis using the openly-
available Statlog dataset, a large collection of satellite imaging spectra, to search
for novel soil types.
Tipologia IRIS:
01 - Articolo su periodico
Keywords:
Novelty detection; Dirichlet process; Variational inference; Large datasets; Nested mixtures; Bayesian modeling
Elenco autori:
L. Benedetti, E. Boniardi, L. Chiani, J. Ghirri, M. Mastropietro, A. Cappozzo, F. Denti
Link alla scheda completa: