Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusing Hashes for faster lookups #60

Open
pubkey opened this issue Jun 14, 2023 · 4 comments
Open

Reusing Hashes for faster lookups #60

pubkey opened this issue Jun 14, 2023 · 4 comments

Comments

@pubkey
Copy link

pubkey commented Jun 14, 2023

Is your feature request related to a problem? Please describe.

My Problem: I replicate bloom filters from many clients (up to 1000) to my server.
The server then has to lookup a given key "foobar" and check which of the replicated filters contains the key.
My bloom filters are pretty big and need about 13 hashes.
To check which replicated filter has the key, the server would then have to do 13*1000 hash operations.

Describe the solution you'd like

A way to precalculate the hash once and then lookup the existence in multiple bloom filters without having to hash again.

import { hash } from 'bloom-filters';
const lookupHash = hash(13, 'foobar');

const foundInFilters = bloomFilters.filter(bf => bf.hasHash(lookupHash));

Acceptation criterias

For testing we could run a test that ensures that lookups for pre-calculated hashes are exactly equal to normal key lookups.

Additional context

Also a question: Is it possible to use a different hash function? My plan is to run the hashing inside of WebAssembly for better performance, so maybe even an async hash function would be useful.

@pubkey pubkey changed the title Reusing Hashes for lookups Reusing Hashes for faster lookups Jun 14, 2023
@folkvir
Copy link
Collaborator

folkvir commented Jun 19, 2023

Hello ✋ Thank for using our library! First, we won't add any new function not related to the associated papers of the filters.
But it does not mean you can't do it 👍
You can achieve what your looking for by following the rules for checking existence of a key in the filter, aka the indexes

  public filter_has_in(element: HashableInput, bfs: BloomFilter[]): BloomFilter[] {
    // compute once the indexes, be aware that all your filters must have same _size, _nbhashes and seed
    // You may have undesirable side-effects if it is not the case
    const indexes = bfs[0]._hashing.getIndexes(element, bfs[0]._size, bfs[0]._nbHashes, bfs[0].seed);
    const filtered = [];
    for (let i = 0; i < bfs.length; i++) {
      let ok = true;
      // similar to bfs[i].has() we need to know if all indexes are in the internal structure
      for (let j = 0; j < indexes.length; j++) {
        if (!bfs[i]._filter.has(indexes[j])) {
          ok = false;
          break; // no need to continue, one false index means it is definitely not there, save us many loops
        }
      }
      if (ok) {
        filtered.push(bfs[i]);
      }
    }
    // return the filters which may have the key.
    return filtered;
  }

I did not test it locally but it should work. Give it a try and let us know if it fits your needs. (CC @Callidon )

@pubkey
Copy link
Author

pubkey commented Jun 20, 2023

Hi @folkvir
Thank you for your detailed answer. Uppon further research I found out that bloom filters with known seeds are vulnerable towards pollution attacks link

So I will now switch to XOR-filters so that I can rebuild the filter with a different seed each time and the server only has to do 3 hashes per lookup.

Also I found out there is this new type of XOR-filter called "binary fuse filter" which has a smaller space size and is faster to be build. link. Is this something that could be added to this library?

@folkvir
Copy link
Collaborator

folkvir commented Jun 20, 2023

Hi! Yes the binary fuse filter is definitely something we can add in the library. (CC @Callidon )
For more info: https://github.com/hexops/fastfilter

Yes this is dangerous to keep the seed in your filters if those are public so you need to hide it someway.
The XOR filter is very useful if you don't need to add/remove items but if you live with a few items only (< 1m entries). Then yes you can rebuuld it completely from scratch everytime.

@pubkey
Copy link
Author

pubkey commented Jun 30, 2023

Does anyone know if XOR-Filters are also prune to polution attacks, like bloom filters are?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants