loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Xinyu Chen and Trilce Estrada

Affiliation: University of New Mexico, United States

Keyword(s): Scalable Clustering, Privacy Preserving, Big Data.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Big Data ; Business Analytics ; Data Analytics ; Data Engineering ; Data Management and Quality ; Data Management for Analytics ; Data Structures and Data Management Algorithms ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Symbolic Systems

Abstract: Clustering high-dimensional data is often a crucial step of many applications. However, the so called "Curse of dimensionality" is a challenge for most clustering algorithms. In such high-dimensional spaces, distances between points tend to be less meaningful and the spaces become sparse. Such sparsity needs more data points to characterize the similarities so more distance comparisons are computed. Many approaches have been proposed for reduction of dimensionality, such as sub-space clustering, random projection clustering, and feature selection technique. However, approaches like these become unfeasible in scenarios where data is geographically distributed or cannot be openly used across sites. To deal with the location and privacy issues as well as mitigate the expensive distance computation, we propose an index-based clustering algorithm that generates a spatial \emph{key} for each data point across all dimensions without needing an explicit knowledge of the other data points. Then it performs a conceptual Map-Reduce procedure in the index space to form a final clustering assignment. Our results show that this algorithm is linear and can be parallelized and executed independently across points and dimensions. We present a Numba implementation and preliminary study of this algorithm's capabilities and limitations. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 2a06:98c0:3600::103

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Chen, X. and Estrada, T. (2017). Index Clustering: A Map-reduce Clustering Approach using Numba. In Proceedings of the 6th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-255-4; ISSN 2184-285X, SciTePress, pages 233-240. DOI: 10.5220/0006437402330240

@conference{data17,
author={Xinyu Chen. and Trilce Estrada.},
title={Index Clustering: A Map-reduce Clustering Approach using Numba},
booktitle={Proceedings of the 6th International Conference on Data Science, Technology and Applications - DATA},
year={2017},
pages={233-240},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006437402330240},
isbn={978-989-758-255-4},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 6th International Conference on Data Science, Technology and Applications - DATA
TI - Index Clustering: A Map-reduce Clustering Approach using Numba
SN - 978-989-758-255-4
IS - 2184-285X
AU - Chen, X.
AU - Estrada, T.
PY - 2017
SP - 233
EP - 240
DO - 10.5220/0006437402330240
PB - SciTePress