Abstract:
Clustering is the organization of patterns, which are usually represented as vec- tors in multidimensional spaces, into a given number of groups with similar character- istics. It can be formulated as a mathematical optimization model whose objective is to locate cluster representatives so that the sum of dissimilarities with the representa- tive and given data vectors is minimized. In this work, we first formulate clustering problem for the minimization of the total expected distances between cluster repre- sentatives and data vectors, assuming that dissimilarities are measured using a metric separable with respect to the coordinates. We then propose a Lagrangian relaxation scheme for the solution of the resulting mixed integer linear programming problem. Our second formulation is for any distance metric and is analogous to the formulation of the multi-facilityWeber problem in many dimensions. The resulting nonconvex opti- mization problem is solved using column generation and d.c. programming techniques. According to our computational experiments we say that although the first one has low accuracy, the second one overperforms k-means algorithm.