Programming Language Features for Large Scale Software

My Quora Answer to the question : What characteristics of a programming language makes it capable of building very large-scale software?

The de facto thinking on this is that the language should make it easy to compartmentalize programming into well segregated components (modules / frameworks) and offers some kind of “contract” idea which can be checked at compile-time.

That’s the thinking behind, not only Java, but Modula 2, Ada, Eiffel etc.

Personally, I suspect that, in the long run, we may move away from this thinking. The largest-scale software almost certainly runs on multiple computers. Won’t be written in a single language, or written or compiled at one time. Won’t even be owned or executed by a single organization.

Instead, the largest software will be like, say, Facebook. Written, deployed on clouds and clusters, upgraded while running, with supplementary services being continually added.

The web is the largest software environment of all. And at the heart of the web is HTML. HTML is a great language for large-scale computing. It scales to billions of pages running in hundreds of millions of browsers. Its secret is NOT rigour. Or contracts. It’s fault-tolerance. You can write really bad HTML and browsers will still make a valiant effort to render it. Increasingly, web-pages collaborate (one page will embed services from multiple servers via AJAX etc.) And even these can fail without bringing down the page as a whole.

Much of the architecture of the modern web is built of queues and caches. Almost certainly we’ll see very high-level cloud-automation / configuration / scripting / data-flow languages to orchestrate these queues and caches. And HADOOP-like map-reduce. I believe we’ll see the same kind of fault-tolerance that we expect in HTML appearing in those languages.

Erlang is a language designed for orchestrating many independent processes in a critical environment. It has a standard pattern for handling many kinds of faults. The process that encounters a problem just kills itself. And sooner or later a supervisor process restarts it and it picks up from there. (Other processes start to pass messages to it.)

I’m pretty sure we’ll see more of this pattern. Nodes or entire virtual machines that are quick to kill themselves at the first sign of trouble, and supervisors that bring them back. Or dynamically re-orchestrate the dataflow around trouble-spots.

Many languages are experimenting with Functional Reactive Programming : a higher-level abstraction that makes it easy to set up implicit data-flows and event-driven processing. We’ll see more languages that approach complex processing by allowing the declaration of data-flow networks, and which simplify exception / error handling in those flows with things like Haskell’s “Maybe Monad”.

Update : Another thing I’m reminded of. Jaron Lanier used to have this idea of “Phenotropic Programming” (WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES) Which is a bit far out, but I think it’s plausible that fault-tolerant web APIs and the rest of the things I’m describing here, may move us closer.

Comments

Leave a Reply