Programming Language Features for Large Scale Software

My Quora Answer to the question : What characteristics of a programming language makes it capable of building very large-scale software?

The de facto thinking on this is that the language should make it easy to compartmentalize programming into well segregated components (modules / frameworks) and offers some kind of “contract” idea which can be checked at compile-time.

That’s the thinking behind, not only Java, but Modula 2, Ada, Eiffel etc.

Personally, I suspect that, in the long run, we may move away from this thinking. The largest-scale software almost certainly runs on multiple computers. Won’t be written in a single language, or written or compiled at one time. Won’t even be owned or executed by a single organization.

Instead, the largest software will be like, say, Facebook. Written, deployed on clouds and clusters, upgraded while running, with supplementary services being continually added.

The web is the largest software environment of all. And at the heart of the web is HTML. HTML is a great language for large-scale computing. It scales to billions of pages running in hundreds of millions of browsers. Its secret is NOT rigour. Or contracts. It’s fault-tolerance. You can write really bad HTML and browsers will still make a valiant effort to render it. Increasingly, web-pages collaborate (one page will embed services from multiple servers via AJAX etc.) And even these can fail without bringing down the page as a whole.

Much of the architecture of the modern web is built of queues and caches. Almost certainly we’ll see very high-level cloud-automation / configuration / scripting / data-flow languages to orchestrate these queues and caches. And HADOOP-like map-reduce. I believe we’ll see the same kind of fault-tolerance that we expect in HTML appearing in those languages.

Erlang is a language designed for orchestrating many independent processes in a critical environment. It has a standard pattern for handling many kinds of faults. The process that encounters a problem just kills itself. And sooner or later a supervisor process restarts it and it picks up from there. (Other processes start to pass messages to it.)

I’m pretty sure we’ll see more of this pattern. Nodes or entire virtual machines that are quick to kill themselves at the first sign of trouble, and supervisors that bring them back. Or dynamically re-orchestrate the dataflow around trouble-spots.

Many languages are experimenting with Functional Reactive Programming : a higher-level abstraction that makes it easy to set up implicit data-flows and event-driven processing. We’ll see more languages that approach complex processing by allowing the declaration of data-flow networks, and which simplify exception / error handling in those flows with things like Haskell’s “Maybe Monad”.

Update : Another thing I’m reminded of. Jaron Lanier used to have this idea of “Phenotropic Programming” (WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES) Which is a bit far out, but I think it’s plausible that fault-tolerant web APIs and the rest of the things I’m describing here, may move us closer.

Wow! Playing with Prolog at the moment. It’s awesome.

I seem to have finally achieved some kind of fluency instead of fumbling around without knowing how to drive the thing. It probably helps that Erlang has accustomed me to the syntax and other conventions.

Interesting. Reia’s Tony Arcieri debunks Erlang’s “single assignment” propaganda.

I guess someone could argue that once you have multiple assignment you’re going to be more tempted to write a longer chain of actions as a sequence of statements rather than composing it out of multiple functions … and this may be a bad thing.

But I’ve ranted often enough against languages which think its their job to constrain programmers that it better not be me who makes that argument.

Update : I’d like to see Frederik’s question about closures (in the comments of that blog-post) answered though.

Update 2: Ulf Wiger points out that it got answered by Robert Virding later in the comments.

Prediction I made :

I predict that in five years time, the hot language will either be a version of Python which has adopted ideas (Pattern matching arguments, Monads, immutability etc.) from Haskell; or a Haskell-like language which has borrowed some syntactic sugar from Python.

Don’t know if I entirely believe that, but it’s a good conversational gambit. And there’s a certain logic to it if you look at the languages visually.

I’m personally, more into investigating Erlang at the moment, mainly because it seems more practical. The fast, parallel virtual machines are here and ready to go. But I could see myself looking back at Haskell too if it seemed to be gaining momentum.

I guess I also like the Erlang story on concurrency through message passing rather than transaction memories. And perhaps the whole typing thing is overdone in Haskell, although I can certainly see that it buys you more in that context than in, say, an OO language.

But I’d really love to see someone come up with mashup of the best bits of Python, Erlang and Haskell. It would *look* more or less like Python, although would find some-way to do multi-line lambdas etc. Like Haskell and Erlang it would have pattern matching arguments, functions defined over several definition statements, and immutable state. Like Erlang there’d be processes and messages as standard. Like Haskell, monads. Classes might be more like Haskell.

How might a language like this look?

class Tree(val,left,right)
def size(None) : 0
def size(Tree(_,left,right)) :
1 + size(left) + size(right)

Note that what’s in brackets after the word Tree is neither a super-class (as in Python) nor a list of types as in Haskell nor a definitive list of field names. It’s something more like information that a Tree is a tuple of three elements. And that by default they’re called val, left and right.

These default names for the elements of the tuple can be over-ridden in functions which deconstruct a Tree using patterns. For example, the size function doesn’t care about the initial val element and so leaves it blank.

I just thought of trying to make size a method of the class Tree. The syntax might then look something like this :

class Tree(val,left,right) :
def size(Tree(_,left,right)) :
1 + left.size() + right.size()

Where, obviously, the object itself is still (as in Python) explicitly passed as the first argument to the method, but instead of being called “self” we pick it up with the pattern of another Tree constructor and so it gets broken into _,left and right.

But where do we put the None / empty tree clause? My first thought was something like this :

class Tree(val,left,right) :
def size(None) : 0
def size(Tree(val,left,right)) :
1 + left.size() + right.size()

But that’s not quite going to work. If this really were a more statically typed language like Haskell, and we knew that left and right were always Trees, this would be OK. But as we’re trying to go for Pythonic duck-typing we have a problem. We would still like left.size() to mean, “pass the method selector size() to the object left” and then let the class of left decide how to handle it. But where left is None we have no idea which class to use to call. We could have a multitude of classes which define size(None); which would we pick?

Note that this is not the pattern-matching’s fault. We’d have the same problem here :

class Tree(val,left,right) :
def size(self) :
if self is None : 0
else :
1 + self.left.size() + self.right.size()

We could get around the problem by disallowing methods which take a single None argument, effectively forcing the programmer to write something like this :

class Tree(val,left,right) :
def size(Tree(_,None,None)) : 0
def size(Tree(_,None,right)) : 1+right.size()
def size(Tree(_,left,None)) : 1+left.size()
def size(Tree(val,left,right)) :
1 + left.size() + right.size()

But this starts looking somewhat clunky.

We might make it a bit less awkward by providing conditional evaluation to the left and right subtrees. Based on Python’s current conditional expression syntax that would look something like this :

class Tree(val,left,right) :
def size(Tree(_,left,right)) :
1 + (left.size() if left else 0) +
(right.size() if right else 0)

Or maybe we have to accept a static type system closer to Haskell’s. Perhaps the types, in this case, contribute to Haskell’s terseness.