Peter Vandenabeele
Karnaugh maps and database performance
Previous weekend, I was drawing some Karnaugh maps. This was the first time, 22 years after the course of Prof. Emeritus Hugo De Man (4th year civil engineering K.U.Leuven), that I actually used this technique, learned in the lessons of ASIC design. I still appreciate the abstract beauty of the concept and the "completeness theorem" that I have distilled from it (don't just look for the obvious or easy cases ... consider all possible cases).
So what is the relation between Karnaugh maps, databases and social networks? Well, we figured out we will need some "categories" and I personally detest those systems where you are limited to choosing just 0 or 1 option out of list. I prefer to design systems where any random combination of options (0, 1 or more) is possible. That choice can be nicely represented by a bitmap (if a small and fixed number of options must be handled). Now, if you go searching in a list of such objects, you can also choose 0, 1 or more options that need to match with the objects in the list. So, the question was, which combinations of "object options bitmap" and "search options bitmap" should yield a match? Because the answer was not immediately clear to me (in hint side, the answer is obvious of course), I drew up e.g. this 8*8 Karnaud map with in the left column the "object options" and on the top row, the "search options". The answer simply is: "if no choice is made (no bit is set) on the object side or the search side, all case match; else, at least one bit must match".
For fun, I added the simple electronics schema that could easily do this match at 1E9 compares per second. If I was still at Mind, it would even have been conceivable to design a dedicated processor that would do this at e.g. 4E8 matches per second (extending a "slow" 400 MHz processor with a few dedicated cells of an FPGA to do one match per clockcycle).
Now into databases ... I see a few ways to do this kind of matching for categories in a database query:
- using a tags table and an "connection" table between the objects and the tags table (the cleanly normalized way).
- adding boolean columns for all the separate options
- having a real bitmap field with boolean operators in the database :-)
Turns out mySQL actually has such a bitmap datatype: the SET datatype. I presume this will be much faster than the other options in a select operation on a large number of objects. I expect the first option to be slower, because expensive joins are required (from this presentation on the Flickr architecture: "JOINS's are slow: normalised data is for sissies"). I presume the second option to be slower because complex ... where (option1 = "t") or (option3 = "t") ... statements will be required. With the SET datatype, real logic AND (&) operations can be executed on the object versus search criteria bitmaps.
We are now running performance tests on mySQL to figure out what is the real behaviour. Of course we will use indexing on the relevant columns and joins and maybe this will make the database use bitmaps anyway internally, so turning this into a non-issue?
One disadvantage of the SET datatype is that it is database specific (mySQL only, as far as I found) and is probably not natively supported in the Rails ORM.
If a reader has any hints on optimizing performance for queries in large sets of data, with small sets of pre-defined tags attached to them, I am all ears :-)
| Attachment | Size |
|---|---|
| Karnaugh.png | 40.25 KB |
| electronics.png | 19.11 KB |