I began my career as a software developer. I wrote code. I was only responsible for the code that I wrote, not for the complete system and not for the work of others. As I advanced in my career, I became a team leader and then a software development manager. I became responsible for the work of others. That is when I first started to notice the difference in output from different developers. Some developers produced code 10 times or even 100 times as fast as others. I do not mean that they produced 10 times as much code in the same time period. I mean they produced 10 times as much functionality, sometimes with one-tenth as many lines of code. Some people’s code was consistently the root cause of a software crash and other people’s code rarely was. And strangely, the people who produced the most functionality the fastest also tended to be the ones seldom responsible for system crashes. And when they were the ones responsible, the underlying cause was often found and fixed quickly.
As I advanced further in my career, not only was I responsible for the output of a reasonably sized team of software developers, but I was also the individual representing that team in larger company meetings. That was when I noticed there was something similar operating at the team level to what I had noticed at the individual level. Some teams’ systems easily adapted to new business requirements and some did not, and this did not seem to be dependent on the type of requirement. It seemed to be related to the way the teams operated, and the architecture of the system(s).
Eventually, I moved out of software development into architecture. My responsibility shifted from delivering a single component of a system to making the entire system work better, faster, more efficiently, and more effectively. I evolved my thinking around why some systems just seemed of higher quality and were more adaptable to change than others. I also noticed for the first time how much duplicate work was being done. So much money was being wasted producing the same component multiple times and fixing the bugs multiple times. I was fairly educated about object-oriented methodologies and the benefits of reuse, so facilitating reuse across groups made a ton of sense. Why would a team want to build something new, costing valuable time and money, if another team had already solved that problem?
This is when I began thinking about organizational dynamics, human behavior, and the often irrational belief under which many people operated: as long as they had ownership and control, everything would be fine – even if another group was clearly more proficient than they were.
This was when I learned of something called the optimism bias, which can be summarized as follows.
Take any arbitrary group of people and ask them an arbitrary set of questions where each person must rate themselves: below-average, average, or above average. Invariably, no matter the group, and no matter the questions, most of the group will rate themselves above average, most of the time; which of course, this cannot be true for everyone. Everyone cannot be above average.
I began to ask myself: “How can we increase reuse if everybody thinks they are doing it better than the next person?” The conclusion I reached is that there needed to be some unbiased assessment to prove a reusable component was of high quality. For people to feel comfortable reusing the work of somebody else, it would help to have sort of a good programming “seal of approval.” There also had to be a mentality that gave groups incentives to help one another, as well as an organizational environment where each group would realize they could not succeed by building everything alone.
Around this same time, I first read about the concept of technical debt, as espoused by Ward Cunningham in 1992. My initial reaction was that it was absolutely brilliant. But the more I read on the subject, the more I realized that – as currently understood – technical debt was about poor coding practices or systems that were poorly maintained. It was not as much about bad architecture or design, nor was there a way to put a real dollar amount to the technical debt associated with any system in a larger platform architecture.
My experience with software development and systems architecture told me that the hardest problems to fix were often the initial architecture and design issues, not the outputs of a poor programmer. Sometimes these decisions were set in stone in the first week of a year-long project. There had to be some way of objectively guiding these early decisions. It was imperative to quantify bad architecture and design decisions with dollar amounts, since they require real investments of money to fix.
I realized that if people were clear on terminology, definitions, and problem statements, and followed a few good rules, many bad architecture and design decisions could be avoided.
But life isn’t perfect. The realities of software delivery (including the need to make short-term delivery dates, and a dependence on others) made some non-optimal decisions inevitable – even when everybody agreed in principle that with a bit more time and money, a different decision would have been made. Life is compromise, and so is architecture and software development.
We may not be able to avoid compromise, but at the very least, we can meticulously track our compromises. The concept of technical debt seemed like an efficient way to track not only what compromises had been made, but also how big each compromise was. The concept that debt incurs interest the longer it remains unpaid also perfectly fits this analogy. The more compromises made in the design and implementation of any software-based system, the more “brittle” (or inflexible) that system becomes, and the more expensive to maintain and operate. The additional maintenance and operational cost required for a brittle system was like paying interest on the original debt. The longer the principle remained unpaid (i.e. kept incurring technical debt), the more money was spent on interest over the lifetime of the system.
The first time I explained my methodology (of defining a set of agreed-upon rules and measuring architecture and technical debt) to a technology audience, it went extremely well. There were lots of questions about who got to make which decision and how to ensure the process didn’t “get in the way” of getting the job done. But, in general, people understood.
The first time I explained my methodology to an executive business audience, it didn’t go so well. They understood that some systems and groups could deliver new features quickly and others could not. They liked the idea of measuring technical debt in dollars, but the first question I was asked was, “Is that real money?” I assured them that it was, but I didn’t have a very compelling way of explaining why or how. Most of the audience members had had the experience of technology groups demanding more time or more money to “do it right,” only to deliver disappointing results.
For the tool to be useful, tech people had to use it to explain to their less tech-savvy counterparts (who often controlled the budget) why one technology decision might be better than another for a valid business reason. It had to have something to do with a measurable impact on either company reputation, revenue, cost, and/or the customer. Merely being a “great technology idea – trust me, I’m an expert” was not a winning argument.
I started to do some research. I sought real-world examples of businesses being significantly impacted by some technology problems that everybody was aware of, but nobody could explain well enough to prioritize and fix. At the time, I was working for an organization in the legal industry, so I used the term body of evidence to describe these evidentiary stories behind the “golden rules.”
The next time I explained the methodology to an executive business audience, it went much better. I was able to use one of the key systems that everybody knew was important as an example. I pointed out a set of high-priority corrective actions that were required, how much was required to fix them, and why delaying the fixes would potentially lead to either revenue loss, increased cost, or unhappy customers. People understood that.
The Principle Based Enterprise Architecture is the methodology for enterprise architecture and architecture governance that I developed over the course of my thirty-five-year career. I did not do this alone. I stood on the shoulders of others who came before me, and had the assistance of many smart, dedicated people with whom I have worked. I hope this book will provide you with some useful tools and guidelines in your effort to be part of a great technology organization.