service
Vector Databases
Vector database: how it's built and applied in AI systems — embeddings, ANN indexes, the difference from a relational DB, role in RAG and recommendations, common mistakes.
A vector database stores embeddings — numerical representations of text, images or other objects — and answers the question “what is semantically similar”. It’s a component underneath knowledge retrieval, recommendations and parts of AI agent scenarios — but it is not yet a system. We design vector retrieval as a layer of an AI system, not as “a box you drop embeddings into”.
How a vector database works
A text or object is passed through an embedding model and becomes a fixed-dimension vector. Semantic similarity is vector proximity under a chosen metric (cosine, Euclidean, dot product). Brute-force search works on small collections and breaks on large ones; production uses approximate nearest-neighbor (ANN) indexes — HNSW, IVF, PQ. They give near-instant retrieval at a controlled accuracy cost.
A vector DB and a relational DB are different jobs
A relational DB answers precise questions and runs transactions; a vector DB answers “what is similar” and does not replace the first. In practice they live together: relational holds entities and links, vector holds representations for semantic search. Metadata (author, date, project, access control) often lives in both — and that is exactly what filters the vector results, otherwise “similar” easily lands in the wrong collection, the wrong freshness or the wrong user.
Where vector retrieval really works
- RAG — retrieving relevant fragments to feed into the model. The vector DB is one step here, not the whole RAG system.
- Recommendations and similar-item search — products, documents, requests, tickets.
- Deduplication and entity linking — finding the same thing written in different words.
- Clustering — grouping inbound, topic tagging, “what are people even asking about”.
- Anti-fraud and anomalies — what doesn’t look like anything normal stands out in vector space.
A working engineer’s view of popular systems
In production on the market: Qdrant (Rust, metadata filtering as a first-class citizen), Weaviate (modular embedders, GraphQL), Milvus (scale, sharding), Chroma (minimal dependencies, convenient for prototypes), pgvector (a PostgreSQL extension — vectors next to your existing data), Faiss (a Meta library, not a server). There is no universal “best”: the choice depends on your existing stack, collection size, filter types and deployment constraints. Often the right start is pgvector next to your PostgreSQL; moving to a specialized DB is the answer when you hit a performance or query-type wall.
Common mistakes
“Pick the DB before picking the embeddings” — the most expensive step backward. Embedding quality and metadata schema first, then the database. “Compare cosine on unnormalized vectors” — that yields random noise instead of similarity. “Ignore metadata filters” — the result list turns into mush and no re-ranker fixes it. “Mistake a vector DB for RAG” — the most common strategic error: it answers “what’s similar”, a RAG system answers “what to put into the model so the answer is grounded”.
How we do it
Vector retrieval is designed for the specific scenario: what is being searched, which fields filter the result, how relevance is measured, how the index is refreshed. The embedding model is chosen for the language and domain; the index — for volume and latency; quality — by regression sets, not by eye. It’s a step in a larger AI system, not a standalone product.