Tim Kellogg’s Post

RAG Trick: Cosine similarity is the same as dot product Most embedding models normalize their output to 1.0, e.g. all models from OpenAI, Cohere, etc., check the documentation. In other words, the length of the vector is always 1. All the same: - Dot product - Cosine similarity - Euclidean distance (range 0-2 instead of -1-1) Just do dot product. There’s a lot fewer mathematical operations to get the same result. 💡 It’s a (hyper)sphere! Have a hard time wrapping your brain around distances in embedding space? Think about it as a sphere. All embedding points lie on the surface of that sphere. Not sure about you, but that simplifies a lot for me. All the same: - Distance from centroid (average of vectors) - Logistic regression (scikit-learn) The difference between these is the paradigm. With the centroid, you find the center by averaging. With a logistic regression you find the outside edge by training a logistic regression model to draw a hyperplane. Since all points are on a hypersphere, they get you the same result. Choose whichever makes more sense to you. IMO, - centroids are easier to update & delete from, but you have to come up with the distance - logistic regression is more obviously a classifier and so easier to wrap your head around and makes code clearer #RAG #LLMs #LLM #AI #embeddings #vectordb #vectordatabase

Chris A.

Software Engineer at PNNL (Center for AI, rapid prototyping)

5mo

I think you know this, but for other folks who read just the first line: the dot product is only the same as cosine similarity if they are unit vectors, so if the embedding model doesn't normalize its output, they won't be the same. Euclidean distance is not the same as the other two. In three dimensions, it's the length of a direct line drawn from one vector to another. And I'm not sure thinking of it as a sphere in "high" dimensions is right. The range (of distances between random unit vectors) seems to narrow as the dimensions increase. For 1000 random vectors:

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics