1. Nearest Neighbor (Nearest Neighbor) Search Decryption (Nearest Neighbor) Search (Nearest Neighbor) search (Nearest Neighbor) search
📌 Features:
* Finds the data points closest to a query vector.
• It is very expensive, especially on large data sets, because it needs to calculate the distances between all points.Dec.
* Approximate Nearest Neighbor (ANN) techniques have been developed for scalability.
🔥 Subspecies and Their Differences:
✅ K-Nearest Neighbor (k-NN):
* As many as k finds the nearest point.
* Advantage: It gives a definite result.
* Disadvantage: It is slow on large data sets.
✅ Approximate Nearest Neighbor (ANN — Approximate Nearest Neighbor):
* Returns much faster results by sacrificing a little bit of accuracy.
* Example: FAISS, Annoy, HNSW libraries.
* Advantage: It is fast on large data sets.
* Disadvantage: It may not always find the exact right result.
🎯 Sample Usage:
* Netflix recommendation system: Finds the movies closest to the embedding vector of the movie you are watching.
* Spotify recommendations: Brings the most similar songs with k-NN.
2. Dense (Dense) and Sparse (Sparse) Vector Search
📌 Features:
* Dense Vectors (Dense Vectors) → are vectors that carry all the information generated by machine learning (ML) and deep learning (DL) models.
* Sparse Vectors (Sparse Vectors) → are vectors created by classical statistical methods (such as TF-IDF, BM25), most of which have zero values.
🔥 Subspecies and Their Differences:
✅ Dense Vector Search:
* Uses vectors produced with deep learning models (such as BERT, GPT).
* Advantage: He is smarter, understands the context.
* Disadvantage: The calculation cost is high because it works with large-sized vectors.
✅ Sparse Vector Search:
* It uses keyword-based approaches such as classic search engines (TF-IDF, BM25) Dec.
* Advantage: It is lighter and faster.
* Disadvantage: It matches the words, not the meaning.
🎯 Sample Usage:
* Google Search: Uses Sparse search (BM25) but supports Dec Dec with Dense Vector Search (BERT) for semantic searches.
* Chatbots: Convert the user’s message to embedding with BERT and find the most similar response.
⸻
3. Semantic Similarity Search
📌 Features:
* Comprehends the meaning, does not depend on the exact match of the words.
• It is usually combined with Dense Vector Search.
* Also known as Embedding Similarity Search.
Difference 🔥 :
* Understands the context instead of the keyword.
• For example, ”buying a car“ and ”buying a new vehicle” mean the same thing even though they contain different words.
🎯 Sample Usage:
* Artificial intelligence-supported searches: Google asks“ “Which car should I buy?“ he can turn a page that says ”The best 2024 model cars” to his question.
* Customer support chatbots: ”I want to return“ and “can I send it back?” he can understand that his sentences mean the same thing.
⸻
4. Multi-Modal (Multimodal) Decryption
📌 Features:
• It is not limited to a single data type (e.g. text + visual + audio).
* Vectorizes information in different modalities (data types) in the same space.
🔥 Subspecies and Their Differences:
✅ Text-Visual Vector Search Decryption:
* Systems such as OPENAI’s CLIP model place images and their descriptions in the same vector space.
• For example: You can upload an image and find similar images.
✅ Voice-Text Search:
* Systems such as OPENAI’s Whisper model match sounds with text.
• For example: “Finding the content of a podcast with a text search.”
🎯 Sample Usage:
* Google Lens: Scans an object and finds similar ones.
* Podcast platforms: When the word “artificial intelligence” is mentioned in a podcast, it finds it.
⸻
5. Hierarchical and Decisional Search
📌 Features:
* Speeds up the search using hierarchical or compressed (quantized) structures in large datasets.
🔥 Subspecies and Their Differences:
✅ Hierarchical Navigable Small World (HNSW):
* It performs a fast nearest neighbor search by converting vectors into a graph structure.
* Advantage: It is very fast in large-scale data.
✅ Product Quantization (PQ):
* Deconstructs vectors into small code fragments and compresses them to make searching faster.
* Advantage: It is memory friendly and can work on disk.
🎯 Sample Usage:
* Facebook recommendation system: finds similar users using HNSW.
⸻
6. Time Series and Streaming Data Search
📌 Features:
* Detects similar events or anomalies by monitoring real-time data streams.
🎯 Sample Usage:
* Stock market analysis: To find similar price movements.
* Cyber security: To detect suspicious network traffic.
⸻
7. Geolocation Vector Search Decryption
📌 Features:
* Finds similar places by converting the locations on the map into vectors.
🎯 Sample Usage:
* Google Maps“ “Show me the nearest coffee shops.”
⸻
Which Vector Searches Does MongoDB Support?
✅ Proximity Search (k-NN, ANN)
✅ Dense Vector Search (Dense Vectors)
✅ Semantic Similarity Search
✅ Quick Decryption with HNSW
❌ Sparse Vector Search (TF-IDF, BM25) → Does not Support
❌ Does not Support Multimodal Search (Text-Visual or Audio-Text) → Decryption
❌ Streaming Vector Search (Real-time streaming data) → Does not Support
❌ Geo-Spatial Vector Search → Does not Support (But there is location-based search with GeoJSON) Decryption Decryption