-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cosine similarity support for faiss engine #2376
Conversation
d6f16a1
to
6671e64
Compare
Adding additional unit and integration test for radial search. Will mark it as ready once i add those tests |
eee45d5
to
4658bee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VijayanB - completed a first pass review
src/main/java/org/opensearch/knn/index/engine/AbstractKNNMethod.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/engine/AbstractKNNMethod.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/engine/AbstractKNNMethod.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/engine/KNNLibraryIndexingContext.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/ModelFieldMapper.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/VectorTransformer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
8764557
to
7829347
Compare
ec16cf5
to
29be76c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think we should try making this whole transformation process simpler , by simplifying the class and interfaces.
@@ -106,6 +108,10 @@ protected PerDimensionProcessor doGetPerDimensionProcessor( | |||
return PerDimensionProcessor.NOOP_PROCESSOR; | |||
} | |||
|
|||
protected VectorTransformer getVectorTransformer(KNNMethodContext knnMethodContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to override this method in future? What is intension behind keeping this method in this class, When we have abstract class which gives some common functionality to child classes , I assume that common functionality is part of abstract class , which is not the case here. In this case we are calling another factory to get you transformer , which is already abstraction and can be directly called from child classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractKNNMethod builds KNNLibraryIndexingContext that is why the method is added here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple ,
If the getTransformer method is not relevant to all child classes, it breaks the principle of abstraction. Abstract classes should only include methods that are meaningful to all subclasses.
I don't see this case here , and if I have to give common functionality I would give like this and let child classes give meaningful definition
public VectorTransformer getTransformer(KNNMethodContext context) {
return VectorTransformer.NOOP_VECTOR_TRANSFORMER; // Default no-op
}
src/main/java/org/opensearch/knn/index/engine/KNNLibraryIndexingContext.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/FlatVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
* Implementations can modify vectors while preserving their dimensional properties | ||
* for specific use cases such as normalization, scaling, or other transformations. | ||
*/ | ||
public interface VectorTransformer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this interface generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more context on what are you thinking about generic mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public interface VectorTransformer<T> {
default void transform(final T vector) {
if (vector == null) {
throw new IllegalArgumentException("Input vector cannot be null");
}
}
}
/** | ||
* A no-operation transformer that returns vector values unchanged. | ||
*/ | ||
VectorTransformer NOOP_VECTOR_TRANSFORMER = new VectorTransformer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Geting Instance of VectorTransformer is not job of Contract basesd Interface, It's responsibility of Factory, Please have factory to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I updated it.
* This factory determines whether vectors need transformation based on the engine type and space type. | ||
*/ | ||
@NoArgsConstructor(access = AccessLevel.PRIVATE) | ||
public final class VectorTransformerFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have one method which takes , engine and space type, how to get engine and space type should not be responsibility of Factory. Example below factory only acts on params becuase that's the deciding factor , how to create final object. I can not have mutiple methods from which I need to get param to get final Object
/**
* Retrieves a quantizer instance based on the provided quantization parameters.
*
* @param params the quantization parameters used to determine the appropriate quantizer
* @param <P> the type of quantization parameters, extending {@link QuantizationParams}
* @param <T> the type of the input vector to be quantized
* @param <R> the type of the output after quantization
* @return an instance of {@link Quantizer} corresponding to the provided parameters
*/
public static <P extends QuantizationParams, T, R> Quantizer<T, R> getQuantizer(final P params) {
if (params == null) {
throw new IllegalArgumentException("Quantization parameters must not be null.");
}
// Lazy Registration instead of static block as class level;
ensureRegistered();
return (Quantizer<T, R>) QuantizerRegistry.getQuantizer(params);
}
Another example is KnnQueryFactory which have just one method to get Object from CreateRequestMethod.
164e02f
to
633e50d
Compare
FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion and perform search, normalize the input vector and add it to faiss index with type as inner product. Since we will be storing normalized vector in segments, to get actual vectors, source can be used. By saving as normalized vector, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). We also support radial search for cosine similarity. Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Balasubramanian <[email protected]>
Signed-off-by: Balasubramanian <[email protected]>
Signed-off-by: Balasubramanian <[email protected]>
633e50d
to
1dc2ec9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing comments. Looks good to me.
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2376-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f5cf255286239dbf48742951f38ac68d80e5f7cb
# Push it to GitHub
git push --set-upstream origin backport/backport-2376-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
* Add cosine similarity support for faiss engine FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion and perform search, normalize the input vector and add it to faiss index with type as inner product. Since we will be storing normalized vector in segments, to get actual vectors, source can be used. By saving as normalized vector, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). We also support radial search for cosine similarity. Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Add cosine similarity support for faiss engine FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion and perform search, normalize the input vector and add it to faiss index with type as inner product. Since we will be storing normalized vector in segments, to get actual vectors, source can be used. By saving as normalized vector, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). We also support radial search for cosine similarity. Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Add cosine similarity support for faiss engine FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion and perform search, normalize the input vector and add it to faiss index with type as inner product. Since we will be storing normalized vector in segments, to get actual vectors, source can be used. By saving as normalized vector, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). We also support radial search for cosine similarity. Signed-off-by: Vijayan Balasubramanian <[email protected]>
Description
FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion, normalize the input vector, and add it to faiss index with type as inner product, and, before search, normalize query vector if space type is cosine and engine is faiss.
Since we will be storing normalized vector in segments, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). To avoid this additional latency, customers can normalize their data set and create inner product.
This also adds support to radial search, for both max distance and min score.
Related Issues
#2242
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.