# Rank by Vector Similarity ### Description Ranks objects in `SOURCE [OBJ,STRING]` according to the similarity scores of the vector embeddings of each `STRING` with those of `QTERMS [STRING]`. The vector embeddings are created using the selected embedding model. The similarity score is computed using the Euclidean distance between vectors. ### Input - `SOURCE [OBJ,STRING]`: a 2-column input with an object-string pair. Typically obtained with the `Extract string` block - `QTERMS [STRING]`: a list of keywords to rank `SOURCE` objects against ### Output - `RETRIEVE [OBJ]`: a list of ranked objects ### Parameters - `Embedding model`: the embedding model used for creating the vector embeddings - [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - [snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) - `Pooling mode`: how the embedded tokens of each input string are combined into one vector - 'MEAN': the average value of each dimension across all tokens is taken; captures the overall meaning - `MAX`: the highest value for each dimension across all tokens is taken; highlights the most prominent features - `Chunk size`: maximum number of characters that will be embedded as one vector. Strings longer than the chunk size will be split into multiple chunks. - `Chunk overlap`: number of characters that chunks should overlap. This intends to prevent information from being siloed into separate chunks. - `Search type`: the method used for vector similarity search - `EXACT`: computes the exact distance between each source and query vector, only recommended for a small amount of source vectors (~100,000 or less) - `HNSW`: computes the approximate distance between each source and query vector, based on the [Hierarchical Navigable Small World](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) algorithm. - `K value`: the amount of objects to retrieve when using an approximate `Search type`, greatly affects search time. - `Index name`: name necessary for storing the graph-based indices used during approximate search, needs to be unique per source data Output scores can be [normalised](docs://score_normalisation). Note: When using `HNSW`, if the `SOURCE` vectors are changed/updated the index will not automatically update. Change `Index name` to create a new index and see the changes.