Loading notes...
Loading notes...
RAG • Chapter 5
RAG engineering module on Semantic Search & Distance Metrics.
Semantic search relies on mathematical distance metrics to determine how similar two vectors are. The retrieval engine compares the query vector against all document vectors in the database.
Advanced System Mechanics
The three most common metrics are Cosine Similarity (measures the angle between vectors, ignoring magnitude), Euclidean Distance (L2 norm, measures straight-line distance), and Dot Product (measures both angle and magnitude). In modern embedding models where vectors are normalized to a length of 1, Cosine Similarity and Dot Product yield effectively identical rankings.
Implementation Blueprint
import numpy as np
def calculate_metrics(v1, v2):
# Dot Product
dot = np.dot(v1, v2)
# Euclidean Distance
l2 = np.linalg.norm(v1 - v2)
# Cosine Similarity
cosine = dot / (np.linalg.norm(v1) * np.linalg.norm(v2))
return dot, l2, cosine
q = np.array([0.1, 0.8, 0.3])
d = np.array([0.2, 0.7, 0.4])
print("Metrics:", calculate_metrics(q, d))