# Rank by Vector Similarity

### Description
Ranks objects in `SOURCE [OBJ,STRING]` according to the similarity scores of the vector embeddings of each `STRING` with those of `QTERMS [STRING]`.
The vector embeddings are created using the selected embedding model.
The similarity score is computed using the Euclidean distance between vectors.

### Input
- `SOURCE [OBJ,STRING]`: a 2-column input with an object-string pair. Typically obtained with the `Extract string` block
- `QTERMS [STRING]`: a list of keywords to rank `SOURCE` objects against

### Output
- `RETRIEVE [OBJ]`: a list of ranked objects

### Parameters
- `Embedding model`: the embedding model used for creating the vector embeddings
  - [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
  - [snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0)
- `Pooling mode`: how the embedded tokens of each input string are combined into one vector
  - 'MEAN': the average value of each dimension across all tokens is taken; captures the overall meaning
  - `MAX`: the highest value for each dimension across all tokens is taken; highlights the most prominent features
- `Chunk size`: maximum number of characters that will be embedded as one vector. Strings longer than the chunk size will be split into multiple chunks.
- `Chunk overlap`: number of characters that chunks should overlap. This intends to prevent information from being siloed into separate chunks.
  
- `Search type`: the method used for vector similarity search
  - `EXACT`: computes the exact distance between each source and query vector, only recommended for a small amount of source vectors (~100,000 or less)
  - `HNSW`: computes the approximate distance between each source and query vector, based on the [Hierarchical Navigable Small World](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) algorithm.
- `K value`: the amount of objects to retrieve when using an approximate `Search type`, greatly affects search time.
- `Index name`: name necessary for storing the graph-based indices used during approximate search, needs to be unique per source data

Output scores can be [normalised](docs://score_normalisation).

Note: When using `HNSW`, if the `SOURCE` vectors are changed/updated the index will not automatically update. Change `Index name` to create a new index and see the changes.