Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cosine similarity support for faiss engine #2376

Merged
merged 4 commits into from
Jan 21, 2025

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Jan 9, 2025

Description

FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion, normalize the input vector, and add it to faiss index with type as inner product, and, before search, normalize query vector if space type is cosine and engine is faiss.

Since we will be storing normalized vector in segments, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). To avoid this additional latency, customers can normalize their data set and create inner product.

This also adds support to radial search, for both max distance and min score.

Related Issues

#2242

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@VijayanB VijayanB force-pushed the faiss-cosine branch 3 times, most recently from d6f16a1 to 6671e64 Compare January 9, 2025 07:43
@VijayanB
Copy link
Member Author

VijayanB commented Jan 9, 2025

Adding additional unit and integration test for radial search. Will mark it as ready once i add those tests

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VijayanB - completed a first pass review

@VijayanB VijayanB force-pushed the faiss-cosine branch 7 times, most recently from 8764557 to 7829347 Compare January 14, 2025 22:40
jmazanec15
jmazanec15 previously approved these changes Jan 16, 2025
@VijayanB VijayanB force-pushed the faiss-cosine branch 2 times, most recently from ec16cf5 to 29be76c Compare January 16, 2025 17:40
jmazanec15
jmazanec15 previously approved these changes Jan 16, 2025
Copy link
Contributor

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think we should try making this whole transformation process simpler , by simplifying the class and interfaces.

@@ -106,6 +108,10 @@ protected PerDimensionProcessor doGetPerDimensionProcessor(
return PerDimensionProcessor.NOOP_PROCESSOR;
}

protected VectorTransformer getVectorTransformer(KNNMethodContext knnMethodContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to override this method in future? What is intension behind keeping this method in this class, When we have abstract class which gives some common functionality to child classes , I assume that common functionality is part of abstract class , which is not the case here. In this case we are calling another factory to get you transformer , which is already abstraction and can be directly called from child classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractKNNMethod builds KNNLibraryIndexingContext that is why the method is added here

Copy link
Contributor

@Vikasht34 Vikasht34 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple ,

If the getTransformer method is not relevant to all child classes, it breaks the principle of abstraction. Abstract classes should only include methods that are meaningful to all subclasses.

I don't see this case here , and if I have to give common functionality I would give like this and let child classes give meaningful definition

public VectorTransformer getTransformer(KNNMethodContext context) {
        return VectorTransformer.NOOP_VECTOR_TRANSFORMER; // Default no-op
    }

* Implementations can modify vectors while preserving their dimensional properties
* for specific use cases such as normalization, scaling, or other transformations.
*/
public interface VectorTransformer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this interface generic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more context on what are you thinking about generic mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public interface VectorTransformer<T> {
    default void transform(final T vector) {
        if (vector == null) {
            throw new IllegalArgumentException("Input vector cannot be null");
        }
    }
}

/**
* A no-operation transformer that returns vector values unchanged.
*/
VectorTransformer NOOP_VECTOR_TRANSFORMER = new VectorTransformer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geting Instance of VectorTransformer is not job of Contract basesd Interface, It's responsibility of Factory, Please have factory to handle this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I updated it.

* This factory determines whether vectors need transformation based on the engine type and space type.
*/
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public final class VectorTransformerFactory {
Copy link
Contributor

@Vikasht34 Vikasht34 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have one method which takes , engine and space type, how to get engine and space type should not be responsibility of Factory. Example below factory only acts on params becuase that's the deciding factor , how to create final object. I can not have mutiple methods from which I need to get param to get final Object

/**
     * Retrieves a quantizer instance based on the provided quantization parameters.
     *
     * @param params the quantization parameters used to determine the appropriate quantizer
     * @param <P>    the type of quantization parameters, extending {@link QuantizationParams}
     * @param <T>    the type of the input vector to be quantized
     * @param <R>    the type of the output after quantization
     * @return an instance of {@link Quantizer} corresponding to the provided parameters
     */
    public static <P extends QuantizationParams, T, R> Quantizer<T, R> getQuantizer(final P params) {
        if (params == null) {
            throw new IllegalArgumentException("Quantization parameters must not be null.");
        }
        // Lazy Registration instead of static block as class level;
        ensureRegistered();
        return (Quantizer<T, R>) QuantizerRegistry.getQuantizer(params);
    }

Another example is KnnQueryFactory which have just one method to get Object from CreateRequestMethod.

@VijayanB VijayanB force-pushed the faiss-cosine branch 6 times, most recently from 164e02f to 633e50d Compare January 17, 2025 21:46
FAISS engine doesn't support cosine similarity natively.
However we can use inner product to achieve the same, because,
when vectors are normalized then inner product will be same
as cosine similarity. Hence, before ingestion and perform search,
normalize the input vector and add it to faiss index with type
as inner product.

Since we will be storing normalized vector in segments, to get
actual vectors, source can be used. By saving as normalized vector,
we don't have to normalize whenever segments are merged. This will
keep force merge time and search at competitive, provided we will
face additional latency during indexing (one time where we normalize).

We also support radial search for cosine similarity.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Balasubramanian <[email protected]>
Copy link
Contributor

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments. Looks good to me.

@VijayanB VijayanB merged commit f5cf255 into opensearch-project:main Jan 21, 2025
31 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2376-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f5cf255286239dbf48742951f38ac68d80e5f7cb
# Push it to GitHub
git push --set-upstream origin backport/backport-2376-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2376-to-2.x.

VijayanB added a commit to VijayanB/k-NN-2 that referenced this pull request Jan 21, 2025
* Add cosine similarity support for faiss engine

FAISS engine doesn't support cosine similarity natively.
However we can use inner product to achieve the same, because,
when vectors are normalized then inner product will be same
as cosine similarity. Hence, before ingestion and perform search,
normalize the input vector and add it to faiss index with type
as inner product.

Since we will be storing normalized vector in segments, to get
actual vectors, source can be used. By saving as normalized vector,
we don't have to normalize whenever segments are merged. This will
keep force merge time and search at competitive, provided we will
face additional latency during indexing (one time where we normalize).

We also support radial search for cosine similarity.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB VijayanB mentioned this pull request Jan 21, 2025
VijayanB added a commit to VijayanB/k-NN-2 that referenced this pull request Jan 22, 2025
* Add cosine similarity support for faiss engine

FAISS engine doesn't support cosine similarity natively.
However we can use inner product to achieve the same, because,
when vectors are normalized then inner product will be same
as cosine similarity. Hence, before ingestion and perform search,
normalize the input vector and add it to faiss index with type
as inner product.

Since we will be storing normalized vector in segments, to get
actual vectors, source can be used. By saving as normalized vector,
we don't have to normalize whenever segments are merged. This will
keep force merge time and search at competitive, provided we will
face additional latency during indexing (one time where we normalize).

We also support radial search for cosine similarity.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
VijayanB added a commit that referenced this pull request Jan 22, 2025
* Add cosine similarity support for faiss engine

FAISS engine doesn't support cosine similarity natively.
However we can use inner product to achieve the same, because,
when vectors are normalized then inner product will be same
as cosine similarity. Hence, before ingestion and perform search,
normalize the input vector and add it to faiss index with type
as inner product.

Since we will be storing normalized vector in segments, to get
actual vectors, source can be used. By saving as normalized vector,
we don't have to normalize whenever segments are merged. This will
keep force merge time and search at competitive, provided we will
face additional latency during indexing (one time where we normalize).

We also support radial search for cosine similarity.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants