AudioDB @ Acoustical Society of America
This talk describes new approximate nearest-neighbor methods employed in a scalable audio-feature database system called “AudioDB.” This open-source system is designed to scale to storing and searching hundreds of millions of feature vectors on standard UNIX workstation platforms. A radius-bounded nearest-neighbor vector-sequence search algorithm, based on locality sensitive hashing LSH , achieves sublinear retrieval times at this scale. The performance of the LSH-based algorithm depends critically on the choice of radius bound supplied—the wrong value impacts retrieval accuracy or retrieval time. An optimal radius estimator is derived by modeling the minimum value distribution of a random sample of a data set’s pairwise distance distribution




