35. OCaml's Compare

Programming languages often have features which are not necessary but which are a boon to working with them. For example, Haskell compilers can automagically derive pretty-printers for user-defined datatypes which is a great help when debugging. Unfortunately, things don’t always go the way the language authors intended, and such features end up being more confusing than helpful. Take, for instance, the following implementation of a doubly-linked list in OCaml.

(** A circular doubly-linked list in OCaml.  See:
module Doubly_linked_list = struct
  type 'a doubly_linked_node = {
    data         : 'a;
    mutable prev : 'a doubly_linked_node;
    mutable next : 'a doubly_linked_node;

  type 'a t = {
    mutable head : 'a doubly_linked_node option;

  (** [create ()] makes a new doubly-linked list. *)
  let create () =
    { head = None; }

  (** [cons t data] prepends [data] to the doubly-linked
      list [t]. *)
  let cons t data =
    let rec new_node =
      { data; prev = new_node; next = new_node; }
    match t.head with
    | None ->
      t.head <- Some new_node
    | Some node -> <- node;
      new_node.prev <- node.prev; <- new_node; <- new_node;
      t.head <- Some new_node

  (** [iter t ~f] executes [f] on each of the values
      stored in the doubly-linked list [t]. *)
  let iter t ~f =
    let rec go ~start node =
      if <> start then
        go ~start
    match t.head with
    | None      -> ()
    | Some node -> go ~start:node node

(* Create a doubly-linked list and print its contents. *)
let main () =
  let list = Doubly_linked_list.create () in
  Doubly_linked_list.cons list "3";
  Doubly_linked_list.cons list "2";
  Doubly_linked_list.cons list "1";

  print_endline "Count-up timer go!";
  Doubly_linked_list.iter list ~f:print_endline;
  print_endline "Done!";

let () = main ();;

Seems innocuous, doesn’t it? But, running it fails with Fatal error: exception Out_of_memory; we’re actually causing the runtime to error out here. And it could be worse: I’ve seen code like this put the runtime into an infinite loop, and if the code is any more complicated, it’s a pain to debug.

The problem is our choice of comparison operator in iter: we used (<>), but we meant to use (!=).

val (<>) : 'a -> 'a -> bool
val (!=) : 'a -> 'a -> bool

Looking at the types of these comparison functions, we see something strange: since they’re polymorphic, they can’t be written in pure OCaml as anything other than constant functions. In fact, they’re both written in C, and they both on rely on information not normally available to OCaml code: (!=) does a pointer comparison of the given OCaml values, and (<>) does a structural comparison by recursively walking the pointers of the given OCaml values and comparing any binary blobs encountered.

The reasoning behind this is that we almost always want comparison functions for new data types and writing them is usually automatic. The problem here is that that OCaml allows mutable data, so it’s possible to create a circular data-structure which causes structural comparison to recurse infinitely.

What I find particularly annoying about all this is that everything happens automatically and the programmer is at no point forced to pause and think about what they’re doing; had we written the above code in Haskell, we would’ve had to “opt-in” to the automatic comparison generation and we would’ve probably noticed that structural comparison doesn’t really make sense for doubly_linked_node values.

Luckily, we can now do the above in OCaml too. By opening something like Core’s no_polymorphic_compare.mli in each module, we guarantee that we’ll get a type-error whenever we try to use polymorphic comparison. Then, we can generate comparison functions on a type-by-type basis with a syntax extension like comparelib.