New Clojure Libraries: Bloom Filter and LFSR
I created two new clojure libraries as part of my continued study of all things computing related. I was introduced to the Bloom Filter through Hacker News and to the Linear Feedback Shift Register [LFSR] through Toby DiPasquale. It turns out that each of these will have practical application for me in the near future. They are the kind of thing I wish I had learned about sooner.
Bloom Filter
You can download clj-bloom from Gitub.com/kyleburton/clj-bloom The filter is “a probabilistic data structure for testing set membership”, they “sacrifice determinism in favor of significantly lower memory usage”. They make this trade off at the cost of false positives – but you can tune the filter’s false positive probability. They are useful in situations when you need to test for membership in a very large set of data and can’t (or don’t want to) hold the set members themselves in memory. A common use case is with a large corpus of documents – being able to quickly check if a new document is not in the set, storing it if it’s not, or pulling the existing copy and comparing it if you do. In this type of use, a false positive has little impact as you’d pull the document and compare it anyway. You can save the cost of making the initial query by using the bloom filter.
Linear Feedback Shift Register [LFSR]
LFSRs are related to Pseudo Random Number Generators [PRNGs]. There is a very interesting special class of LFSRs that have maximal period length – these LFSRs can be used as binary numbering systems since they iterate through all (2^n) – 1 possible bit combinations. Unlike counting up from 1 in binary (0, 1, 10, 11, 100, 101, 110, 111, 1000, …) they are much less deterministic looking in how they iterate through the values. I am currently looking at using them in ID generation, so that the IDs generated are not adjacent numeric values, eg: look pseudo-random, but are still guaranteed to be unique. LFSRs are also easily serializable, which makes it possible for you to use them as a kind of pseudo random unique sequence. You can download the code from Gitub.com/kyleburton/clj-lfsr
I found these two constructs fascinating, I hope you find the implementations useful.