Don Haderle, IBM fellow and self-proclaimed "Mother of DB2", expressed an excellent observation about the evolution of database systems. He said that database architecture begins with an index over a data file (which is fast and efficient), and then increasingly adds function and complexity to meet the growing customer demands. IBM's IMS and DB2 both followed that trajectory, as did most other relational database products. At the point that the (ever increasing) complexity slows down the simple transactions, such as single-record insert/update, a break-away fraction of the market declares a new approach (a clean slate, if you will) that is intended to revolutionize the simple transaction (CRUD: Create, Read, Update, Delete) operations. The new design, an index over a file, is reborn. Key-value database systems are a perfect example.
The NoSQL movement coupled the simplified architecture (index over file) with the power of parallelism -- horizontally distributed database nodes that could loosely support a single large database -- provided no insert/update/delete transaction affected more than a single record (and thus, did not trigger a distributed, multi-node transaction commit). However, just as the first database systems were hit with increasing customer demands for more function, the same story applies to the NoSQL products. Customers like the fast performance with the horizontal scaling capability, but then they want more function. "Where are the joins?", is a comment that rings in the NoSQL development hallways, much to the annoyance of the designers that thought they could ride the simplicity wave a lot longer.
Looking at the two systems -- the very large/complex and slow relational systems, and the simple and fast NoSQL systems, I started to wonder ... Why can't we have both? My own history is that I created an in-memory storage manager prototype in the IBM Research Starburst system -- the codebase that became the DB2 Version 2 Database Product. That storage manager was not included in the product (the product employed only the disk-based storage manager), but I did learn a lot about what goes into a storage manager that is used by a relational database system.
That really got me thinking. Why can't we have both?
Let me know if you have a comment or question.