Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]Reduce Disk Usage By Reusing NativeEngine Files #2266

Open
luyuncheng opened this issue Nov 12, 2024 · 8 comments
Open

[RFC]Reduce Disk Usage By Reusing NativeEngine Files #2266

luyuncheng opened this issue Nov 12, 2024 · 8 comments
Assignees

Comments

@luyuncheng
Copy link
Collaborator

luyuncheng commented Nov 12, 2024

Description

before ISSUE #1572 AND MR #1571, we found that we reuse docValues field like KNNVectorFieldData and do synthetic logic in _source field. and it can save about 1/3 disk usage. Also @jmazanec15 mentioned a great method to implements storeFields as #1571 (comment) says.

this RFC, i proposal a new method to reduce disk usage, that we can read nativeEngines files to create DocValues. so we can save the disk for skip write flatFieldVectorsWriter or BinaryDocValues

i read the Faiss code: faiss/impl/index_write.cpp it show that faiss HNSW32,Flat file structure like followings:


|-typeIDMap   -|-id_header-|
|-typeHnsw    -|-hnsw_header-|-hnswGraph-|
|-typeStorage -|-storage_Header-|-storageVector-|
|-idmap_vector-|-FOOTER_MAGIC+CHECKSUM-|

i implements a FaissEngineFlatVectorValues which read _0_2011_target_field.faissc files directly and wrap a DocIdSetIterator instead of using FlatVectorsReader. at POC code , it shows that, we can cut almost 50% disk usage for skip write flatVectors also without write flatVectors, write performance do a little optimize

in the next:

@navneet1v
Copy link
Collaborator

at POC code , it shows that, we can cut almost 50% disk usage for skip write flatVectors also without write flatVectors, write performance do a little optimize

This is an interesting gain. I am wondering when you say 50% gain in disk space it will happen only in case when source is not enabled for vectors. Cutting down the flat vectors and just reading it via Faiss index has been discussed couple of times. My only concern with this is will reading flat vectors from Faiss file be as efficient as reading the flat vectors from .vec file?

Also, did we explore this option where we don't store/serialize flat vectors in Faiss and use the .vec file instead. It can also help this feature: #1693

@0ctopus13prime , @jmazanec15

@jmazanec15
Copy link
Member

This would be good savings! Like @navneet1v, Im wondering if itll be easier to leverage .vec in faiss as opposed to simulating .vec with faiss.

Also, for this plan, how will quantized vectors be handled, where we dont store the full precision vecs in faiss files?

@luyuncheng
Copy link
Collaborator Author

luyuncheng commented Nov 13, 2024

Also, did we explore this option where we don't store/serialize flat vectors in Faiss and use the .vec file instead. It can also help this feature: #1693

after #1693, we talked about that goal is want to merge vector into one storage, we took following 2 options into consideration @jmazanec15 @navneet1v talked.

  • option1 use lucene .vec in faiss
  • option2 use faiss .faiss in lucene

i think all these options, native engine do AnnSearch would be the same latency cause vectors all in memory, the only impacts for query latency are ExactSearch AND Merge.

and why i chose option2, because

  1. i think we would introduce new engines in the future, we can not hacker all native engines for their storage format and io chain.
  2. in option2, only have to know file format and read it directly, it decoupled from native engines code.

@luyuncheng
Copy link
Collaborator Author

luyuncheng commented Nov 13, 2024

Also, for this plan, how will quantized vectors be handled, where we dont store the full precision vecs in faiss files?

@jmazanec15 at 1st step, i skipped using faiss file as docvalues when it is quantized. because we can not get full precision vecs forsexact search. but i think we can use it for merge and save the faiss computation in sa_encode and sa_decode.

@luyuncheng
Copy link
Collaborator Author

I am wondering when you say 50% gain in disk space it will happen only in case when source is not enabled for vectors. Cutting down the flat vectors and just reading it via Faiss index has been discussed couple of times. My only concern with this is will reading flat vectors from Faiss file be as efficient as reading the flat vectors from .vec file?

@navneet1v good question, i will do some benchmark for different types

@luyuncheng
Copy link
Collaborator Author

luyuncheng commented Nov 21, 2024

I am wondering when you say 50% gain in disk space it will happen only in case when source is not enabled for vectors. Cutting down the flat vectors and just reading it via Faiss index has been discussed couple of times. My only concern with this is will reading flat vectors from Faiss file be as efficient as reading the flat vectors from .vec file?

@navneet1v good question, i will do some benchmark for different types

i did some mini benchmark for file size and iterator all docs tests as following show:

TestsCase Engine FileSize Latency of Iterator ALL percentage
100000Docs,128Dims Lucene(.vec+.vemf) 50000KB 53ms
Faiss(.faiss) 50000KB(exclude graph) 33ms -37%
100000Docs,768Dims Lucene(.vec+.vemf) 300000KB 165ms
Faiss(.faiss) 300000KB 82ms -50%

@navneet1v @jmazanec15 because in Lucene99HnswVectorsFormat in dense vector, would be flat vector. so the file size equals to faiss flat. any options: use lucene .vec in faiss OR use faiss .faiss in lucene we can save 50% disk usage at vector file.

i also did trace the iterator latency. i think faiss file is too simple so we can iterator faster than lucene, and also because i put the IDMap in the memory, so FaissEngineFlatKnnVectorsReader can read faster with sequential io read in iterator and with less iops)

@navneet1v
Copy link
Collaborator

i think faiss file is too simple so we can iterator faster than lucene, and also because i put the IDMap in the memory, so FaissEngineFlatKnnVectorsReader can read faster with sequential io read in iterator and with less iops)

Since the IDMap is always in memory the latency is expected to be better. Can you tell me how many iterations were performed for these iterators?

Also, one benefit of Lucene file was that we don't need to load everything in memory and we can work it out in a constraint memory environment, what happen in that case with Faiss?

@luyuncheng
Copy link
Collaborator Author

i think faiss file is too simple so we can iterator faster than lucene, and also because i put the IDMap in the memory, so FaissEngineFlatKnnVectorsReader can read faster with sequential io read in iterator and with less iops)

Since the IDMap is always in memory the latency is expected to be better. Can you tell me how many iterations were performed for these iterators?

@navneet1v in the following code:

public int advance(int target) throws IOException {
ord = Arrays.binarySearch(ids, ord + 1, ids.length, target);
if (ord < 0) {
ord = -(ord + 1);
}
assert ord <= ids.length;
if (ord == ids.length) {
docId = NO_MORE_DOCS;
} else {
docId = (int) ids[ord];
}
return docId;
}
, every time we do advance to find the doc, we do binary search, so the time is O(logN) , also i use block read vector, when there is a merge occured, we only do iops(N/BUCKET_VECTORS).

Also, one benefit of Lucene file was that we don't need to load everything in memory and we can work it out in a constraint memory environment, what happen in that case with Faiss?

in vector field, as code shows:

protected void readBucketVectors() throws IOException {
assert ord >= 0;
assert ord <= metaInfo.ntotal;
int bucketIndex = ord / BUCKET_VECTORS;
slice.seek(metaInfo.vectorSeek + SIZET_SIZE + bucketIndex * BUCKET_VECTORS * FLOAT_SIZE * metaInfo.d);
for (int i = 0, o = ord; i < BUCKET_VECTORS && o < metaInfo.ntotal; i++, o++) {
slice.readFloats(value, i * metaInfo.d, metaInfo.d);
}
}
i used bucket read, like code shows, we do not load all vector files into memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog (Hot)
Development

No branches or pull requests

3 participants