Data di Pubblicazione:
2014
Citazione:
NOVEL TECHNIQUES FOR INTRINSIC DIMENSION ESTIMATION / C. Ceruti ; relatore: P. Campadelli ; correlatore: E. Casiraghi ; coordinatore: G. Naldi. DIPARTIMENTO DI INFORMATICA, 2014 Dec 16. 27. ciclo, Anno Accademico 2014. [10.13130/ceruti-claudio_phd2014-12-16].
Abstract:
Since the 1950s, the rapid pace of technological advances allows to measure
and record increasing amounts of data, motivating the urgent need to develop
dimensionality reduction systems to be applied on datasets comprising high-
dimensional points.
To this aim, a fundamental information is provided by the intrinsic di-
mension (id) defined by Bennet [1] as the minimum number of parameters
needed to generate a data description by maintaining the “intrinsic” structure
characterizing the dataset, so that the information loss is minimized.
More recently, a quite intuitive definition employed by several authors in the
past has been reported by Bishop in [2] where the author writes that “a set in
D dimensions is said to have an id equal to d if the data lies entirely within a
d-dimensional subspace of D ”.
Though more specific and different id definitions have been proposed in dif-
ferent research fieldsthroughout the pattern recognition literature the presently
prevailing id definition views a point set as a sample set uniformly drawn from
an unknown smooth (or locally smooth) manifold structure, eventually embed-
ded in an higher dimensional space through a non-linear smooth mapping; in
this case, the id to be estimated is the manifold’s topological dimension.
Due to the importance of id in several theoretical and practical application
fields, in the last two decades a great deal of research effort has been devoted
to the development of effective id estimators. Though several techniques have
been proposed in literature, the problem is still open for the following main
reasons.
1At first, it must be highlighted that though Lebesgue’s definition of topo-
logical dimension (reported by [5]) is quite clear, in practice its estimation is
difficult if only a finite set of points is available. Therefore, id estimation tech-
niques proposed in literature are either founded on different notions of dimen-
sion (e.g. fractal dimensions) approximating the topological one, or on various
techniques aimed at preserving the characteristics of data-neighborhood distri-
butions, which reflect the topology of the underlying manifold. Besides, the
estimated id value markedly changes as the scale used to analyze the input
dataset changes, and being the number of available points practically limited,
several methods underestimate id when its value is sufficiently high (namely id
10). Other serious problems arise when the dataset is embedded in higher
dimensional spaces through a non-linear map. Finally, the too high computa-
tional complexity of most estimators makes them unpractical when the need is
to process datasets comprising huge amounts of high-dimensional data.
The main subject of this thesis work is the development of efficient and ef-
fective id estimators. Precisely, two novel estimators, named MiND (Minimum
Neighbor Distance estimators of intrinsic dimension, [6]) and DANCo (Dimension-
ality from Angle and Norm Concentration, [4]) are described. The aforemen-
tioned techniques are based on the exploitation of statistics characterizing the
hidden structure of high dimensional spaces, such as the distribution of norms
and angles, which are informative of the id and could therefore be exploited for
its estimation. A simple practical example to show the informatory power of
these features, is the clustering system proposed in [3]; based on the assumption
that each class is represented by one manifold, the clustering procedure codes
the input data by means of local id estimates and features related to them. This
coding allows to obtain reliable results by applying classic and basic clustering
algorithms.
To evaluate the proposed estimators by objectively comparing them with
relevant stat
Tipologia IRIS:
Tesi di dottorato
Elenco autori:
C. Ceruti
Link alla scheda completa:
Link al Full Text: