🌺

Bloom filter

Bloom filters are cool!

Bloom filters have nothing to do with flowers (despite the blog emoji 🥸). It was conceived by Burton howard Bloom. This is a Space-efficient probablistic data structure. In Bloom filter false positives are very much possible, but false-negatives are not possible (i.e.) a query either returns, “possibly in set” or “definitely not”. So, when a bloom filter returns “no” you can believe it, if it returns “maybe” you cannot. And it is very very efficient.

For example in a DB, if we want to check whether a data is present, instead of making a full db check we could use bloom filter as a preliminary test to verify whether some data could be in the database

A simple example of a bloom filter would be:

package main

import (
	"bytes"
	"crypto/sha1"
	"encoding/binary"
	"fmt"
)

type Bloom struct {
	bits [1000]bool
}

func (b *Bloom) Set(s string) (int, error) {

	i, err := b.getLocation(s) // Get location in filter to be inserted
	if err != nil {
		return 0, err
	}

	b.bits[i] = true // set true to the location
	return i, nil
}

func (b *Bloom) isPresent(s string) bool {

	i, err := b.getLocation(s) // Get location in filter to be verified
	if err != nil {
		return false
	}

	return b.bits[i]
}

func (b *Bloom) getLocation(value string) (int, error) {

	// hash the string and convert to integer (to be used in binary array)
	h := sha1.New()
	h.Write([]byte(value))
	bits := h.Sum(nil)
	buf := bytes.NewBuffer(bits)
	result, err := binary.ReadUvarint(buf)
	if err != nil {
		return 0, err
	}

	// Ensure all numbers are > 0
	if result < 0 {
		result = -result
	}
	return int(result) % len(b.bits), nil //make sure int is within the array length by doing a modulo
}

func main() {

	var b Bloom
	s := "ThisIsString"
	set, err := b.Set(s)
	if err != nil {
		return
	}
	fmt.Println(s, " set in location", set)
	fmt.Println(s, "in filter : ", b.isPresent(s))

}

This is used in chromium to check for malicious sites.

Refs: