NoSQL and the Tar Pit

In a Quora answer I went back to a theme that I mentioned when discussing Bret Victor a few months ago.

Here’s something that struck me yesterday when reading Out of the Tar Pit which is a very good essay that seems to signal the direction that many smart people think software development should be evolving in : namely giving up on as much explicit state and control flow as possible and moving towards a declarative style or saying just what your program should produce without worrying about how it does it.
I can’t overemphasize how big this idea is. Most important and smart people thinking about software will sign up to the idea that we need to move towards more functional languages, more declarative style, abandon more state and explicit control flow. Perhaps even separate the essential logic of what you want done from the “accidental” hints that can enhance performance into separate languages / parts of the system.
And yet …
And yet, the most widely adopted, commonly used example of this separation of telling what the program should do in one language and performance hints in another, (acknowledged in the paper) was the good old fashioned relational database written using SQL; which did, indeed, allow programmers to declare what they wanted their queries to deliver without worrying about access paths, control flow or performance. And then database admins worked behind the scenes profiling, creating special indexes etc. to improve performance.
Now, since this important paper was written, there’s been an absolute revolution in database circles, called the NoSQL movement, a wholesale rejection of the relational database model and its replacement by systems that hark back to the hierarchical and network databases of the late 1960s, Although NoSQL was adopted by people working on enormous systems across hundreds of thousands of machines, its popularity is so great that a new generation of programmers reaches for NoSQL database solutions (and explicit modelling of data-structures and responsibility for traversing access-paths etc.) more or less by default, even for small prototypes.
So, I’d say that NoSQL is one of the most successful “contrarian” movement. It’s massively popular and “trendy” while going against everything that many smart programmers think and say they want, and what many people had forseen as the future of software development.
It signals either that the argument in Out of the Tar-Pit is wrong : namely because performance is so important that programmers never want to give up explicitly modelling state and defining control-flow, or that people’s intuitions are badly broken.

Programming Language Features for Large Scale Software

My Quora Answer to the question : What characteristics of a programming language makes it capable of building very large-scale software?

The de facto thinking on this is that the language should make it easy to compartmentalize programming into well segregated components (modules / frameworks) and offers some kind of “contract” idea which can be checked at compile-time.

That’s the thinking behind, not only Java, but Modula 2, Ada, Eiffel etc.

Personally, I suspect that, in the long run, we may move away from this thinking. The largest-scale software almost certainly runs on multiple computers. Won’t be written in a single language, or written or compiled at one time. Won’t even be owned or executed by a single organization.

Instead, the largest software will be like, say, Facebook. Written, deployed on clouds and clusters, upgraded while running, with supplementary services being continually added.

The web is the largest software environment of all. And at the heart of the web is HTML. HTML is a great language for large-scale computing. It scales to billions of pages running in hundreds of millions of browsers. Its secret is NOT rigour. Or contracts. It’s fault-tolerance. You can write really bad HTML and browsers will still make a valiant effort to render it. Increasingly, web-pages collaborate (one page will embed services from multiple servers via AJAX etc.) And even these can fail without bringing down the page as a whole.

Much of the architecture of the modern web is built of queues and caches. Almost certainly we’ll see very high-level cloud-automation / configuration / scripting / data-flow languages to orchestrate these queues and caches. And HADOOP-like map-reduce. I believe we’ll see the same kind of fault-tolerance that we expect in HTML appearing in those languages.

Erlang is a language designed for orchestrating many independent processes in a critical environment. It has a standard pattern for handling many kinds of faults. The process that encounters a problem just kills itself. And sooner or later a supervisor process restarts it and it picks up from there. (Other processes start to pass messages to it.)

I’m pretty sure we’ll see more of this pattern. Nodes or entire virtual machines that are quick to kill themselves at the first sign of trouble, and supervisors that bring them back. Or dynamically re-orchestrate the dataflow around trouble-spots.

Many languages are experimenting with Functional Reactive Programming : a higher-level abstraction that makes it easy to set up implicit data-flows and event-driven processing. We’ll see more languages that approach complex processing by allowing the declaration of data-flow networks, and which simplify exception / error handling in those flows with things like Haskell’s “Maybe Monad”.

Update : Another thing I’m reminded of. Jaron Lanier used to have this idea of “Phenotropic Programming” (WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES) Which is a bit far out, but I think it’s plausible that fault-tolerant web APIs and the rest of the things I’m describing here, may move us closer.

Elm Lang

I must confess, I’m very intrigued by Elm-Lang.

For me there are four virtues :

1) FRP. All the attempts I’ve seen to graft FRP onto existing languages have looked clunky to me – ahem … Trellis? – Requiring the explicit definition of special types of fields. This is the kind of thing that I think needs a new language feature, not a new library.

Elm-lang’s “lift” looks a much cleaner way of going about it.

2) It’s in the browser. That’s where code has to run.

3) I like the way that it reunifies the document / graphics structure back into the same file. The problem is not so much that style and content shouldn’t be separated. It’s that there are more serious divisions of modularity to respect and forcing HTML and JS into different trees of the filing system has typically pushed highly interdependent data-structure and logic too far apart. I like the ability to bring them back together for small programs.

4) Perhaps it’s a way to get familiar with and more into Haskell. Obviously it’s not full Haskell. But it seems like a way to get more into that mind-set while doing some practical work.

Of course, the proof of the pudding is in the eating. I’d better go and try something …  🙂