Monday, November 24, 2014

Programming Struggles

Software is hard. Lots of programmers out there are constantly searching for new solutions to old problems, and sometimes they find something that works well. It seems that many recent languages (e.g. Dart, Scala, Groovy, Vala, Hack, Swift) are simply syntax sugar for existing programming languages. Are we ever going to make any progress? I know this question has been rehashed a million times, but I'm going to ask it again anyway.

What makes a good programming language?

  1. Algebraic Data Types - includes sum types (open and closed), and product types (open and closed); too many languages miss these.
  2. Generics, Parameterized Types - includes arrays, vectors, lists, hash tables, trees, parsers, I/O streams, pointers
  3. Reflection, Metaprogramming - includes macros, annotations, doc parsers, syntax highlighting, coverage tools, static analysis tools, serialization, etc.
  4. Memory Pool Management - this is different from memory management, but it refers to the ability to manage the pools themselves, such as the ability to switch between Gc (garbage collected) and Rc (reference counted) memory pool models.

There are probably countless other conditions that I can't think of right now, but if you can think of them for me and leave your comments below, perhaps I will make this article longer.

Algebraic Data Types

What is an algebraic data type? Fundamentally it is a way of combining existing types. For two types, there are two fundamental options for a new type, the new type can store both types (ProductType), or either type (SumType). However, it can get more complicated than that. The way that algebraic data types are implemented in most languages differ by how extensible the resulting types are. I call them open if they can be extended after definition, and closed if they cannot. So to give some examples from existing languages:

  1. closed product types - usually called structures, classes, records, tuples, etc.
  2. closed sum types - usually called enumerations, or tagged unions. Most languages get these wrong because C got them wrong by making them equivalent to integers (which are an example of an open sum type).
  3. open product types - usually called arrays, vectors, or lists. These can always store more data by appending. Adding fields in a subclass can also be an example of this kind of construction.
  4. open sum types - usually called objects, dynamics, variants, etc. These can always be extended by inheritance or by subclassing this type.

There are usually too many ways of performing one of these type definitions. For example Haskell has a Dynamic type, which can be used as a variant type for every type in the language, but it also provides type classes, which also allow open sum type definition.

In Java, and most other OOP programming languages, classes perform most of the above kinds of definitions. Final class fields are an example of closed product type definition. Abstract classes can be used for open sum type definition. Class fields added to a subclass that inherits from a superclass is an example of open product type definition.


Common generic types include Optional (Maybe in Haskell), List, Array, Pointer, GcPointer (Gc in Rust, ^ in C++/CLI), RcPointer (Rc in Rust, shared_ptr in C++), Box (Box in Rust, unique_ptr in C++), HashTable, etc. Some language don't have generics, but end up expressing concepts with special syntax, for example, * for pointers (in C and Go), and [ ] for arrays (in C and Go).


One prerequisite for reflection is that the language must expose the parser used in the compiler (or interpreter, although I cannot think of any language that still uses an interpreter these days). If it does not expose the same parser the compiler uses, then the language must be easy to parse. Too many languages miss this. If people are going to write tools for your language (and they will), then they might not use a fancy parsing framework to help, they might only look for documentation strings or lines that start with an @ symbol. C/C++ is one of the most difficult languages to parse, and yet it makes up most desktop software in use today.

Memory Pool Management

Most languages pick one memory pool model and stick with it. C does not provide any, except for malloc/free, and so reference counting (Rc) emerged as a common solution to this problem. Many scripting languages have opted for garbage collection (Gc) for every single object, which makes it impossible to opt-out. For a general-purpose programming language, it is unacceptable to require that users are forced into one model or another.


Rust has all of these features.

Update: Since the time of this writing, #rust has since informed the author that Gc has been removed from the roadmap, and the standard library. Gc was also not implemented as a garbage-collected pointer, it was more of an auto-reference-counted pointer type.