On Architecture and Modules

Another long Quora Answer

Why is it important to agree on software architecture principles?

In a sense, some of this is an update on my thinking on “modularity” (eg. ThoughtStorms:DecompositionByLanguageIsProbablyAModularityMistake

Well, possibly it’s only important in an “organizational” sense.

In that people in your team or project need to be aligned in their conception of the architectural principles or they’ll be working against each other or fighting.

OTOH, are absolute / global architectural principles important?

I’m inclined to think that it’s the same as in the case of “real” architecture. Of buildings and cities.

In architecture I’m a big fan of Christopher Alexander (inventor of “Pattern Languages”) The point about Alexander’s “Timeless Way of Building” is that it IS universal, but part of the universalism is intense awareness of local conditions. It champions the traditional and “vernacular”. It’s small-c “conservative”, in the Burkean sense. Believing that “what people have been doing a lot of around here for last few hundred years is probably a good idea”

OTOH, taking a bunch of ideas that have become successful in France and dumbly re-applying them in Brazil in the name of some bogus “universality” can be, at best ridiculous, and at worst disastrous.

So there are many heuristics and patterns that are useful, but have to be seen in context. How much abstraction and indirection you might need can depend on the kind of project, the language you are using, the type of user, type of interaction, platform the software is hosted on etc.

I increasingly believe that writing good software is like being a good butcher.

Butchery is about knowing where the natural joints of the animal are and carving along them. This allows you, in the minimal number of cuts with minimal effort, to produce the maximum useful pieces of meat.

The same is true of software. It’s about getting a feel for exactly where the natural module boundaries of your application are. What things need to be decoupled and how much decoupling / indirection is needed at each boundary.

Getting that wrong is what leads to expensive problems. And being “wrong” can mean both too close coupling of things that should be more loosely coupled with extra layers of indirection. And ALSO too much unnecessary indirection / abstraction between things that naturally should be more cohesive.

We tend to teach the importance of putting in abstractions / layers of indirection, but ignore the cost of doing it when we don’t need it.

People sometimes cite Richard Gabriel’s Worse Is Better as a rather vague principle. But read it carefully and a LOT of it is about exactly this problem. How much a function should expose the caller to its own failures.

The intuition we are all taught to cultivate, “Do the Right Thing” is that the module should protect the caller as much as possible. “Worse is better” argues that in this case, the module incurs too high a cost (in complexity) from trying to protect the caller from this failure. And that it’s both “worse” but, in fact, “better” to let the caller deal with the failure.

The important point is here is that the general principle is not “never protect the caller from your failure”. Nor is it “always protect the caller from your failure”. The important lesson is that “better” is to recognise what is the right answer in this particular situation.

If “agreed architectural principles” is taken to mean dumb applications of heuristics : “always use an extra abstraction layer” then they are worse than useless. They’re positively dangerous.

If the “agreed architectural principle” is “look at what people have been doing here for a decade, understand why, and follow that” then we’re on to something.

There are patterns that are nearly essential in Java. But pointless in Python.

Java and C++ are extremely similar languages in many way … BUT if you write Java in C++ or C++ in Java you are doing things very wrong.

They are not substitutes.

Java IS suitable for writing huge systems in a way which C++ just isn’t. If you try to write the kind of mega-application that Java is used for in C++ it’s going to be horrible. Juggling that much memory allocation by hand is intractable.

Use C / C++ for small, independent low-level programs, and glue them together in something else (eg. write small independent tools orchestrated within the operating system, or as libraries called from a Python script).

OTOH, trying to write the small programs for which C / C++ are good in Java is overkill. You are going to put too many abstract boundaries and the cost of garbage collection into something that should be smaller, simpler and faster.

One thing which is particularly egregious about Java (and the C++ heritage it comes from) is that these languages have an impoverished vocabulary for talking about “boundaries” between modules. They see classes and objects as a universal solution for everything.

In fact almost all programming languages have fairly poor vocabularies.

When we think of software in terms of architecture, or in terms of the “natural joints” we start to see many kinds of joints / membranes between the parts of our systems. With many degrees of permeability. Many features of languages are about this : the inlineability of functions, hygiene in macros, scope rules, “referential transparency”, data-hiding, lazy vs. eager evaluation of function arguments, the access rules for classes, synchronous messages to objects vs. asynchronous messages to actors, go-routines, sockets, the Unix pipe, internet protocols, microservices, integration “at the glass”, integration in the database, centralization / decentralization of databases.

All these are issues to do with “what kind of boundary is there between THIS and THAT?” How permeable is the boundary? How much do dependencies leak through? What obligations does it incur? What timing commitments does it need? How is the data communicated represented? How are errors checked and controlled?

We recognise this huge variety. But our languages often don’t. Most languages try to hide the variety behind a single principle : everything is a function. Or everything is passing messages between objects defined with classes. Or everything is an actor with async. messages.

While there is something very attractive about this simplicity and uniformity. Sooner or later you find yourself somewhere where the kinds of boundaries you want between the parts DON’T correspond well to the kinds of boundaries that your simple principle defines. You think actors and immutability are way cooler than mutable objects. And then you try to write a photo-editing program that applies filters to huge bitmaps.

So one architectural issue is that our languages try to enforce a single type of boundary when we want many.

The other is arbitrary and unnecessary boundaries. The biggest culprit is the difference between what is inside the program and what is outside it. Inside the program we have function calls, messages, possibly go-routines and internal queues / async. channels. Outside we have async. pipes from the OS. And synchronous socket communication. And a separation of database engine and cache engine. And search engine. And front-end web-server. And client. And server. And microservices. And XML-RPC and SOAP. And Amazon Lambda and similar “function as a service” etc etc.

Our program is divided into files. And various languages and frameworks insist that they should be divided within the file system in a particular way.

Our systems are arbitrarily divided by the underlying architecture of our platforms. Regardless of the natural joints of our applications.

I believe that one task for the next generation of languages is to be able to describe a range of boundary / membrane types and their permeability, timing and protocol requirements, all within a coherent and simple vocabulary. And where questions like “is this communication synchronous or asynchronous”, “lazy or eager” etc. are explicit “parameters” or alternatives within our language.

The other things these languages must do is transcend the “inside the program” / “outside the program” distinction. Today we have applications which have maybe a couple of lines of code doing calculations and other “business” logic. And a huge external “extro-structure”, often dozens of config files and MVC separated directories, to represent routing, caching, permissioning etc. architecture.

We need programming languages where the large scale architecture is a first class citizen of the program. Not something which has to be laboriously built around it. And when the program compiles, it compiles to not only object code, but to automated architecture : Puppet or Ansible scripts, Continuous Integration, containerization, orchestration of Kubernetes pods. This is where the DevOps revolution needs to end up. Languages that are as fluent in talking about all the parts of our systems, and all the kinds of communication between them.

So why am I talking about this in a question about “principles”?

Because software as an art / discipline / profession / culture advances when we take informal ideas and turn them into formal code that can be executed. Heuristics and “good practice” become patterns, become libraries, and ultimately become language features.

Architectural principles will ultimately be recognised and “agreed upon” parts of our practice when they finally become part of the languages we use everyday.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.