Data di Pubblicazione:
2025
Citazione:
Maximizing data quality while ensuring data protection in service-based data pipelines / A. Polimeno, C. Braghin, M. Anisetti, C.A. Ardagna. - In: JOURNAL OF BIG DATA. - ISSN 2196-1115. - 12:1(2025 Dec), pp. 62.1-62.34. [10.1186/s40537-025-01118-5]
Abstract:
The growing capacity to handle vast amounts of data, combined with a shift in ser-
vice delivery models, has improved scalability and efficiency in data analytics, par-
ticularly in multi-tenant environments. Data are treated as digital products and pro-
cessed through orchestrated service-based data pipelines. However, advancements
in data analytics do not find a counterpart in data governance techniques, leaving
a gap in the effective management of data throughout the pipeline lifecycle. This
gap highlights the need for innovative service-based data pipeline management
solutions that prioritize balancing data quality and data protection. The framework
proposed in this paper optimizes service selection and composition within service-
based data pipelines to maximize data quality while ensuring compliance with data
protection requirements, expressed as access control policies. Given the NP-
hard nature of the problem, a sliding-window heuristic is defined and evaluated
against the exhaustive approach and a baseline modeling the state of the art. Our
results demonstrate a significant reduction in computational overhead, while maintain-
ing high data quality.
Tipologia IRIS:
01 - Articolo su periodico
Keywords:
Access control; Big data; Data protection; Data quality; Privacy; Service-based data pipelines
Elenco autori:
A. Polimeno, C. Braghin, M. Anisetti, C.A. Ardagna
Link alla scheda completa: