Bloom filter
Bloom filters are cool!
November 12,2022
Bloom filters have nothing to do with flowers (despite the blog emoji 🥸). It was conceived by Burton howard Bloom. This is a Space-efficient probablistic data structure. In Bloom filter false positives are very much possible, but false-negatives are not possible (i.e.) a query either returns, “possibly in set” or “definitely not”. So, when a bloom filter returns “no” you can believe it, if it returns “maybe” you cannot. And it is very very efficient.
For example in a DB, if we want to check whether a data is present, instead of making a full db check we could use bloom filter as a preliminary test to verify whether some data could be in the database
A simple example of a bloom filter would be:
package main
import (
"bytes"
"crypto/sha1"
"encoding/binary"
"fmt"
)
type Bloom struct {
bits [1000]bool
}
func (b *Bloom) Set(s string) (int, error) {
i, err := b.getLocation(s) // Get location in filter to be inserted
if err != nil {
return 0, err
}
b.bits[i] = true // set true to the location
return i, nil
}
func (b *Bloom) isPresent(s string) bool {
i, err := b.getLocation(s) // Get location in filter to be verified
if err != nil {
return false
}
return b.bits[i]
}
func (b *Bloom) getLocation(value string) (int, error) {
// hash the string and convert to integer (to be used in binary array)
h := sha1.New()
h.Write([]byte(value))
bits := h.Sum(nil)
buf := bytes.NewBuffer(bits)
result, err := binary.ReadUvarint(buf)
if err != nil {
return 0, err
}
// Ensure all numbers are > 0
if result < 0 {
result = -result
}
return int(result) % len(b.bits), nil //make sure int is within the array length by doing a modulo
}
func main() {
var b Bloom
s := "ThisIsString"
set, err := b.Set(s)
if err != nil {
return
}
fmt.Println(s, " set in location", set)
fmt.Println(s, "in filter : ", b.isPresent(s))
}
This is used in chromium to check for malicious sites.