This is a tool for retrieving nearest neighbors and clustering of large categorical data sets repesented in transactional form.
The clustering is achieved via a locality-sensitive hashing of categorical datasets for speed and scalability.
The locality-sensitive hashing method implemented is described in the video lectures under www.mmds.org (Chapter 3).
Information needed for LSH, such as shingles/tokens, MinHash signatures, band hashes to buckets
are stored in several database tables.
Information needed for clustering purposes, such as the most significant pairwise object similarities and density-based similarities are also stored in tables.

An early version of the fast database-based retrieval of nearest neighbors and clustering in large categorical datasets was published in:
Bill Andreopoulos, Aijun An, Xiaogang Wang, Dirk Labudde. Efficient Layered Density-based Clustering of Categorical Data. Elsevier Journal of Biomedical Informatics, 2009.

Project Samples

Project Activity

See All Activity >

Follow HIERDENC

HIERDENC Web Site

You Might Also Like
Cycloid is an engineering platform that improves the developer and end-user experience Icon
Cycloid is an engineering platform that improves the developer and end-user experience

For Developers, DevOps, IT departments, MSPs, Platform Engineering teams

Empower end-users and improve operational efficiency with your own opinionated Engineering Platform.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of HIERDENC!

Additional Project Details

Registered

2016-11-16