Twelve years ago, when I wrote the first articles for “Cracking the Code: Breaking Down the Software Development Roles,” I made a conscious and perhaps controversial decision to not include the database administrator or a database architect as a part of the roles. The decision was made because there were few organizations who dealt with the scale of data that required this dedicated role in the software development process. The solution architect could take care of the organization’s need to design the data structure as a part of their overall role. However, the world of data has gotten bigger since then.
Today, we’re facing more volume, greater velocity, and dynamic variety of the data sources that we’re processing. We’re not talking about the typical relational databases that have been popular for decades. The expansion of data requires a set of techniques and skills that are unlike historical approaches to data that we have been using.
Multithreading our processing of data is an improvement of the single threading approaches to data processing that popularized data processing in the 1980s; however, even these approaches, which rely on a single computer with multiple threads of execution, break down when the amount of processing necessary to extract meaning exceeds the capacity of a single machine.