Approximate Nearest Neighbors (ANN)
Approximate Nearest Neighbors (ANN) is a class of algorithms designed to efficiently retrieve vectors that are close to a query vector in high-dimensional space. It’s widely used in vector search when exact precision isn’t critical but speed and scalability are.
Why ANN?
- Exact Nearest Neighbor (NN) search is computationally expensive in high-dimensional spaces (curse of dimensionality).
- ANN trades perfect accuracy for massive speed improvements, enabling real-time applications like semantic search or recommendations.
Core Concepts
- Nearest Neighbor Search (NNS): Find the most similar vector to a given query vector.
- Approximate Nearest Neighbor (ANN): Return one of the closest neighbors, not necessarily the exact closest.
- Top-K Retrieval: Return the top k most similar vectors.
Popular ANN Algorithms
- HNSW (Hierarchical Navigable Small World): Graph-based; excellent speed/accuracy trade-off.
- IVF (Inverted File Index): Clusters data into centroids and narrows search to nearby clusters.
- PQ (Product Quantization): Compresses vectors for memory-efficient indexing.
- Annoy: Tree-based (used by Spotify); good for memory-constrained use cases.
- FAISS: Facebook’s ANN library; supports flat, IVF, PQ, and HNSW.
Use Cases
- Semantic search in vector databases
- Recommendation engines
- Image and video similarity
- RAG pipelines for LLMs
- Fraud/anomaly detection
Key Trade-offs
Factor | Exact NN | ANN |
---|---|---|
Accuracy | 100% | ~95–99% (configurable) |
Speed | Slow | Fast |
Memory Usage | Often high | Often optimized |
Scalability | Limited | Excellent |
Example
You have 10 million embeddings. You need to find the top 5 most similar vectors to a query within 50ms. ANN enables this with minimal loss in precision.
Related Notes
Links to this note
Top-K Retrieval
Top-K retrieval refers to the process of returning the K most relevant or closest results from a dataset in response to a query. In vector search
Vector Databases
Vector databases are specialized data stores designed to handle high-dimensional vector representations—typically generated from embeddings—and