Skip to main content

Bloom Filters

A Bloom filter is a space-efficient, probabilistic data structure used to test whether an element is part of a set. Unlike traditional data structures, it can say with certainty if an element is not in the set, but it can only say with some probability if an element is in the set. This means Bloom filters can have false positives but never false negatives.

How Does a Bloom Filter Work?

A Bloom filter consists of a bit array and a set of hash functions.

  1. Initialization: Start with an empty bit array of m bits, all set to 0.
  2. Hash Functions: Choose k different hash functions.
  3. Adding an Element: To add an element, pass it through all k hash functions to get k array positions. Set the bits at these positions to 1.
  4. Checking Membership: To check if an element is in the set, pass it through all k hash functions to get k array positions. If all bits at these positions are 1, the element might be in the set. If any bit is 0, the element is definitely not in the set.

Implementation

class BloomFilter {
constructor(size, hashFunctions) {
this.size = size;
this.bitArray = new Array(size).fill(0);
this.hashFunctions = hashFunctions;
}

add(element) {
this.hashFunctions.forEach((hashFunc) => {
const index = hashFunc(element) % this.size;
this.bitArray[index] = 1;
});
}

contains(element) {
return this.hashFunctions.every((hashFunc) => {
const index = hashFunc(element) % this.size;
return this.bitArray[index] === 1;
});
}
}

// Example hash functions
const hash1 = (str) =>
[...str].reduce((acc, char) => acc + char.charCodeAt(0), 0);
const hash2 = (str) =>
[...str].reduce((acc, char) => acc * char.charCodeAt(0), 1);

// Create a Bloom filter with a bit array of size 20 and two hash functions
const bloomFilter = new BloomFilter(20, [hash1, hash2]);

// Add elements to the filter
bloomFilter.add("apple");
bloomFilter.add("banana");

// Check if elements are in the filter
console.log(bloomFilter.contains("apple")); // true
console.log(bloomFilter.contains("banana")); // true
console.log(bloomFilter.contains("cherry")); // false

What makes them special ?

1. Space Efficiency

Bloom filters are extremely space-efficient. They use a bit array and a few hash functions, making them much more compact, especially when dealing with large datasets. This is crucial when you need to store a large number of elements in memory-constrained environments.

2. Speed

Bloom filters are fast. Both insertion and membership tests are O(k) operations, where k is the number of hash functions.

3. Avoiding Expensive Operations

Bloom filters can quickly rule out elements that are definitely not in the set, allowing you to avoid expensive database queries or disk I/O operations. This makes them ideal for caching mechanisms or preliminary checks before performing more costly operations.

4. False Positives Tolerable

In many applications, false positives are acceptable while false negatives are not. For example, in a password breach detection system, it is better to mistakenly think a safe password is breached (false positive) than to miss identifying a breached password (false negative). Bloom filters guarantee no false negatives, which makes them useful in such scenarios.

Medium uses Bloom filters to recommend posts to users by filtering out posts that have been already seen by the user while Quora has implemented a shared Bloom filter in the feed backend to filter out stories that people have seen before.

Conclusion

Bloom filters are a powerful tool for efficiently checking membership in a set, trading off a small probability of false positives for significant space savings. With their practical applications in caching, database optimization, and security, Bloom filters are an essential concept in computer science and software development.