The creation of a database system comprising a million lines of code begins with a single line:

int main(int argc, char *argv[])

Yeah, real systems are written in C. What can I say? Sure, implementation in Java would go much faster, and coding/prototyping/testing/debugging would be so much more pleasant, but when it comes down to complete control of memory and code, for me, it's got to be C.

From there, we can go anywhere. Where do we start? How much design up front? How much design iteration and evolution? There are many opinions on these issues. The lessons learned from using classical engineering project management (i.e. "waterfall") for a software project tell us that "Big Design Up Front" (BDUF) is usually a huge mistake. Although the benefits of discovering design bugs before implementation - and especially before deployment - are obviously attractive, the truth is that the initial product requirements that drove the design will not match the final (version 1) requirements by the time of the initial release. Thus, much of the design work ends up being redone repeatedly as the requirements change. Similarly, the other end of the spectrum, where coding starts with minimal design and the system evolves organically (usually using agile programming methods), has a different set of pitfalls. Oftentimes the agile-driven and evolved systems end up missing critical sub-components because they were not designed in up front. Then the engineering team learns (and re-learns) the pain of adding integral components to a fully operational system.

Many of the NoSQL products - the newcomers in the database market - appear to have learned that painful lesson. I'm not naming products, but several NoSQL products have minimal or missing components that are normally associated with complete database systems; authorization, concurrency control, transaction management (with full recovery capability) and disaster recovery come to mind. These are things that are not easily added to an existing system, or even worse, to a released product. I've experienced the pain of adding full transaction management and recovery after the fact. I was lucky, as my system was still fairly young and not many ad hoc changes had been implemented; specifically I mean changes that would make the transaction and recovery implementation more difficult because of special handling of data types or unusual storage semantics.

So, what lessons have we learned? Neither extreme (waterfall or agile) in its pure form is the right approach for something as complex as a database system. There has to be a mix. There's a minimum number of components and capabilities that have to be included in the initial design, and then regular iteration of review and refactoring is needed as the system evolves.

My intention is to use this blog to track the evolution of the new Clean Slate Systems DB - a database system that doesn't fit any of the existing database categories.

Stay tuned.

tjl