Fully-Scalable MPC Algorithms for Clustering in High Dimension

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F24%3A10493446" target="_blank" >RIV/00216208:11320/24:10493446 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.4230/LIPIcs.ICALP.2024.50" target="_blank" >https://doi.org/10.4230/LIPIcs.ICALP.2024.50</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.4230/LIPIcs.ICALP.2024.50" target="_blank" >10.4230/LIPIcs.ICALP.2024.50</a>

Alternative languages

Result language
angličtina
Original language name
Fully-Scalable MPC Algorithms for Clustering in High Dimension
Original language description
We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be n^σ for arbitrarily small fixed σ > 0. Importantly, the local memory may be substantially smaller than the number of clusters k, yet all our algorithms are fast, i.e., run in O(1) rounds.We first devise a fast MPC algorithm for O(1)-approximation of uniform Facility Location. This is the first fully-scalable MPC algorithm that achieves O(1)-approximation for any clustering problem in general geometric setting; previous algorithms only provide poly(log n)-approximation or apply to restricted inputs, like low dimension or small number of clusters k; e.g. [Bhaskara and Wijewardena, ICML'18; Cohen-Addad et al., NeurIPS'21; Cohen-Addad et al., ICML'22]. We then build on this Facility Location result and devise a fast MPC algorithm that achieves O(1)-bicriteria approximation for k-Median and for k-Means, namely, it computes (1+ε)k clusters of cost within O(1/ε2)-factor of the optimum for k clusters.A primary technical tool that we introduce, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing for every data point a statistic of its approximate neighborhood, for statistics like range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22].
Czech name
—
Czech description
—

Classification

Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
<a href="/en/project/GA22-22997S" target="_blank" >GA22-22997S: Efficient and Realistic Models in Computational Social Choice</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Article name in the collection
Leibniz International Proceedings in Informatics, LIPIcs
ISBN
978-3-95977-322-5
ISSN
1868-8969
e-ISSN
—
Number of pages
20
Pages from-to
1-20
Publisher name
Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Place of publication
Dagstuhl, Germany
Event location
Tallin, Estonsko
Event date
Jul 8, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—

Similar results(10)

The Euclidean k-Supplier problem in R^2 2D Bitwise Memory Matrix: A Tool for Optimal Parallel Approximate Pattern Matching Polynomial time approximation schemes for clustering in low highway dimension graphs

What are you looking for?

Quick search

Smart search

Fully-Scalable MPC Algorithms for Clustering in High Dimension

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)