This code is an implementation of massively distributed clustering for multivariate and functional data (code available on github).
High Dimensional Data Clustering by means of Distributed Dirichlet Process Mixture Models : Khadidja Meguelati, Bénédicte Fontez, Nadine Hilgert, Florent Masseglia, IEEE International Conference on Big Data (IEEE BigData), Dec 2019, Los-Angeles, United States
Dirichlet Process Mixture Models made Scalable and Effective by means of Massive Distribution : Khadidja Meguelati, Bénédicte Fontez, Nadine Hilgert, Florent Masseglia, SAC 2019 – 34th Symposium On Applied Computing, Apr 2019, Limassol, Cyprus. pp.502-509,
DPM clustering is illustrated by the chinese restaurant process.
Terms correspondance in statistical language:a table = a clustera client = an observation linked to a cluster label a dish = parameters of a clustermenu = space of all possible clusters
DC-DPM | HD4C | |
---|---|---|
Dressed table | Likelihood | Likelihood GP |
New table | Predictive | TD approximation of the predictive |
The workflow of our DC-DPM approach consists in 4 steps: