Random notes on database features, code, practice, optimization and techniques.
I've been working on databases for about last 8 years. The beginning was accidental. I was working with a team developing a "product" for big stock brokers in India. One of the major problems was the size and rate of the arriving data "packets" from stock exchange(s) . Every second upto 20 packets each containing upto 20 records of 20 tickers could arrive. This whole dataset was then supposed to go to database. Simple "insert"s into Oracle 7.3 database. In reality this was a big scalability issue - the product simply could not deal with that much volume. Even if it could insert efficiently, the reads will be slow and so on.
I 'designed' a shadow-table logic - something really simple that would restrict the maximum volume a table can have and thus ease the "reads". This worked. I will talk about the model in a later instalment.
There are two types of work - sometimes converging into one. One, that you do well, and thus the other people would love you to do. Two, one that you yourself love to do. I was lucky that what I did well in my first job was something I also loved doing. From then on, it's mostly been performance, database, and more of these two together!
It's been either the voodoo of putting an obscure hint like,
cardinality; changing the data model significantly; 'lighten' the question asked and thereby reduce work; rewrite the question (query!); adding faster processors; adding more memory (works much less often!); liposuction of really obese code; patching; using newer features of a particular product line; doing "simple" maintenance to reap benefits and what not! But the central issue was always performance. Except when the central issue was really
availability!
My forecast for next 5 years is - a performance analyst will occupy a very significant position in any software organization. Two reasons behind that - (a) absolute data volume is growing every day - every year you file just another round of taxes, right? and, (b) the number and complexity of the 'questions' we're asking have been growing enormously - "Are all modules SOX compliant?" was never asked 2 years back. These two factors, combined with almost a methodic practice of ignoring performance at design level will cause severe bottlenecks.
This whole series is mostly not about how to "solve" those. Of course, the examples will be from the solutions. But I rather want to focus on what we could have done to not have those problems at all! Hey, it's not always the database optimizer goofing up. Sometimes it's just because the next-guy 'forgot' to join one 10-million record table but put it in the WHERE clause nevertheless!