TY - JOUR
T1 - Fuzzy c-means clustering for uncertain data using quadratic penalty-vector regularization
AU - Endo, Yasunori
AU - Hasegawa, Yasushi
AU - Hamasuna, Yukihiro
AU - Kanzawa, Yuchi
PY - 2011/1
Y1 - 2011/1
N2 - Clustering - defined as an unsupervised data-analysis classification transforming real-space information into data in pattern space and analyzing it - may require that data be represented by a set, rather than points, due to data uncertainty, e.g., measurement error margin, data regarded as one point, or missing values. These data uncertainties have been represented as interval ranges for which many clustering algorithms are constructed, but the lack of guidelines in selecting available distances in individual cases has made selection difficult and raised the need for ways to calculate dissimilarity between uncertain data without introducing a nearest-neighbor or other distance. The tolerance concept we propose represents uncertain data as a point with a tolerance vector, not as an interval, while this is convenient for handling uncertain data, tolerance-vector constraints make mathematical development difficult. We attempt to remove the tolerance-vector constraints using quadratic penaltyvector regularization similar to the tolerance vector. We also propose clustering algorithms for uncertain data considering optimization and obtaining an optimal solution to handle uncertainty appropriately.
AB - Clustering - defined as an unsupervised data-analysis classification transforming real-space information into data in pattern space and analyzing it - may require that data be represented by a set, rather than points, due to data uncertainty, e.g., measurement error margin, data regarded as one point, or missing values. These data uncertainties have been represented as interval ranges for which many clustering algorithms are constructed, but the lack of guidelines in selecting available distances in individual cases has made selection difficult and raised the need for ways to calculate dissimilarity between uncertain data without introducing a nearest-neighbor or other distance. The tolerance concept we propose represents uncertain data as a point with a tolerance vector, not as an interval, while this is convenient for handling uncertain data, tolerance-vector constraints make mathematical development difficult. We attempt to remove the tolerance-vector constraints using quadratic penaltyvector regularization similar to the tolerance vector. We also propose clustering algorithms for uncertain data considering optimization and obtaining an optimal solution to handle uncertainty appropriately.
KW - Clustering
KW - Fuzzy c-means
KW - Optimization
KW - Penalty vector
KW - Uncertain data
UR - http://www.scopus.com/inward/record.url?scp=78751630281&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78751630281&partnerID=8YFLogxK
U2 - 10.20965/jaciii.2011.p0076
DO - 10.20965/jaciii.2011.p0076
M3 - Article
AN - SCOPUS:78751630281
SN - 1343-0130
VL - 15
SP - 76
EP - 82
JO - Journal of Advanced Computational Intelligence and Intelligent Informatics
JF - Journal of Advanced Computational Intelligence and Intelligent Informatics
IS - 1
ER -