Bloom Filter

Wikipedia describes a Bloom filter as a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It uses multiple hash functions to determine membership in a set.

When an element is added to a filter, a subsequent membership test will definitely return True. However as more elements that are added to the set, the probability of false positives will increase. Elements are always added to the set, never removed. No Delete operation

Implementations

A nice interactive tutorial on the bloom filter Java implementation Another Java implementation but with less functionality Ruby implementation

Usage

Performance when accessing data. Good to check if data exists before accesing physical disk, so there is less disk I/O
Safe browsing chromium

How to implement one in Ruby

n = Number of items in the filter p = Probability of false positives, float between 0 and 1 or a number indicating 1-in-p m = Number of bits in the filter k = Number of hash functions

We need a hash function or multiple hash functions. We can use cryptographic hash md5 or sha1 from Ruby. I did not use Ruby's built-in hash because it already seeds a value and generates different hashes for each process, not that it would matter, since this experimental, but just in-case. We need a bitarray to set the values generated by the hash. The bit indices are calculated by generating the hash for ( string + i) count times. The hash is probably not a random distribution but still manages to set k bits.

The fp_rate is $$(1 - (1 - \frac{1}{f})^{n})^{k}$$. I simply calculate and don't use it much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

Bloom Filter

Implementations

Usage

How to implement one in Ruby

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

Bloom Filter

Implementations

Usage

How to implement one in Ruby