Index Clustering: A Map-reduce Clustering Approach using Numba

Xinyu Chen; Trilce Estrada

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Index Clustering: A Map-reduce Clustering Approach using Numba

Topics: Big Data; Big Data Search and Mining; Data Analytics; Data Management for Analytics; Data Science; Data Structures and Data Management Algorithms

In Proceedings of the 6th International Conference on Data Science, Technology and Applications DATA - Volume 1, 233-240, 2017 , Madrid, Spain

Authors: Xinyu Chen and Trilce Estrada

Affiliation: University of New Mexico, United States

Keyword(s): Scalable Clustering, Privacy Preserving, Big Data.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Big Data ; Business Analytics ; Data Analytics ; Data Engineering ; Data Management and Quality ; Data Management for Analytics ; Data Structures and Data Management Algorithms ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Symbolic Systems

Abstract: Clustering high-dimensional data is often a crucial step of many applications. However, the so called "Curse of dimensionality" is a challenge for most clustering algorithms. In such high-dimensional spaces, distances between points tend to be less meaningful and the spaces become sparse. Such sparsity needs more data points to characterize the similarities so more distance comparisons are computed. Many approaches have been proposed for reduction of dimensionality, such as sub-space clustering, random projection clustering, and feature selection technique. However, approaches like these become unfeasible in scenarios where data is geographically distributed or cannot be openly used across sites. To deal with the location and privacy issues as well as mitigate the expensive distance computation, we propose an index-based clustering algorithm that generates a spatial \emph{key} for each data point across all dimensions without needing an explicit knowledge of the other data points. Then it performs a conceptual Map-Reduce procedure in the index space to form a final clustering assignment. Our results show that this algorithm is linear and can be parallelized and executed independently across points and dimensions. We present a Numba implementation and preliminary study of this algorithm's capabilities and limitations. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 2a06:98c0:3600::103

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Chen, X. and Estrada, T. (2017). Index Clustering: A Map-reduce Clustering Approach using Numba. In Proceedings of the 6th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-255-4; ISSN 2184-285X, SciTePress, pages 233-240. DOI: 10.5220/0006437402330240

@conference{data17,
author={Xinyu Chen. and Trilce Estrada.},
title={Index Clustering: A Map-reduce Clustering Approach using Numba},
booktitle={Proceedings of the 6th International Conference on Data Science, Technology and Applications - DATA},
year={2017},
pages={233-240},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006437402330240},
isbn={978-989-758-255-4},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 6th International Conference on Data Science, Technology and Applications - DATA
TI - Index Clustering: A Map-reduce Clustering Approach using Numba
SN - 978-989-758-255-4
IS - 2184-285X
AU - Chen, X.
AU - Estrada, T.
PY - 2017
SP - 233
EP - 240
DO - 10.5220/0006437402330240
PB - SciTePress