2007-10-15

## Generating the Partitions of a Set

Posted by scvalexWikipedia defines the **partition of a set** as:

In mathematics, a partition of a set X is a division of X into non-overlapping “parts” or “blocks” or “cells” that cover all of X. More formally, these “cells” are both collectively exhaustive and mutually exclusive with respect to the set being partitioned.

A more succinct definition is given by Mathworld:

A set partition of a set S is a collection of disjoint subsets of S whose union is S.

Simply put, the **partitions of a set S** are all the ways in which you can choose disjoint, non-empty subsets of S that unioned result in S.

From now on, when I say a set of *n* elements, I mean *{1, 2, …, n}*. So, what are the subsets of *{1, 2, 3}*?

*{1, 2, 3}
{2, 3} {1}
{1, 3} {2}
{3} {1, 2}
{3} {2} {1}*

It’s obvious that these verify the definition: *{1}*, *{2}*, *{3}*, *{1, 2}*, *{1, 3}*, *{2, 3}* and *{1, 2, 3}* are all subsets of *{1, 2, 3}*. They’re all non-empty and, in any partition, the same element never appears twice. Finally, in a partitioning, the union of the partitions is the original set.

In how many ways can you partition a set of *n* elements? There are many ways to calculate this, but as far as I can tell, the easiest is using Catalan numbers:

If you check the formula for *3* you’ll see that it does give the correct answer: *5*.

Ok. We know what a partitioning is, we know how many there are, but how do you generate them? This is the first algorithm I could think of. It may not be clear from the explanation why it works but try it on a piece of paper for *n=3* and it will become obvious. Here’s how I came up with it:

First of all, how do you represent a partitioning of a set of *n* elements? The straight-forward way would be using a vector of n integers, each integer representing the number of the subset in which the corresponding element is in. If the corresponding element of *3* is *2*, that means that *3* is in the *2 ^{nd}* subset. So, given the set {1, 2, 3}:

`Partitioning -> Encoding`

{1, 2, 3} -> (1, 1, 1)

{1} {2, 3} -> (2, 1, 1)

{2} {1, 3} -> (1, 2, 1)

{1, 2} {3} -> (2, 2, 1)

{1} {2} {3} -> (3, 2, 1)

Notice that the encodings, written backwards are: *111*, *112*, *121*, *122* and *123*. From this you can guess how the generator works: more or less, **generate all the numbers between 111 and 123 using only the digits 1, 2 and 3**:

111

112

113

121

122

123

That’s almost right. The encodings *(1, 1, 2)* and *(1, 1, 3)* translate into the same partitioning: *{1} {2, 3}*. If you do the same thing for a larger *n* you’ll notice this happening again and again. Fortunately, there’s an easy solution: **never use a digit that’s more than 1 larger than any other digit in the encoding**. i.e. You can’t use

*(1, 1, 3)*because

*3*is larger by

*2*than the other digits in the encoding (

*1*and

*1*).

To do this, I use another vector *m* with the following significance: *m[i]* is the largest of the first *i* elements in the encoding. This makes it very easy not to generate any duplicate partitionings.

Here’s the code in C (part.c):

#include <stdio.h> /* printp - print out the partitioning scheme s of n elements as: {1, 2, 4} {3} */ void printp(int *s, int n) { /* Get the total number of partitions. In the exemple above, 2.*/ int part_num = 1; int i; for (i = 0; i < n; ++i) if (s[i] > part_num) part_num = s[i]; /* Print the p partitions. */ int p; for (p = part_num; p >= 1; --p) { printf("{"); /* If s[i] == p, then i + 1 is part of the pth partition. */ for (i = 0; i < n; ++i) if (s[i] == p) printf("%d, ", i + 1); printf("\\b\\b} "); } printf("\\n"); } /* next - given the partitioning scheme represented by s and m, generate the next Returns: 1, if a valid partitioning was found 0, otherwise */ int next(int *s, int *m, int n) { /* Update s: 1 1 1 1 -> 2 1 1 1 -> 1 2 1 1 -> 2 2 1 1 -> 3 2 1 1 -> 1 1 2 1 ... */ /*int j; printf(" -> ("); for (j = 0; j < n; ++j) printf("%d, ", s[j]); printf("\\b\\b)\\n");*/ int i = 0; ++s[i]; while ((i < n - 1) && (s[i] > m[i] + 1)) { s[i] = 1; ++i; ++s[i]; } /* If i is has reached n-1 th element, then the last unique partitiong has been found*/ if (i == n - 1) return 0; /* Because all the first i elements are now 1, s[i] (i + 1 th element) is the largest. So we update max by copying it to all the first i positions in m.*/ int max = s[i]; for (i = i - 1; i >= 0; --i) m[i] = max; /* for (i = 0; i < n; ++i) printf("%d ", m[i]); getchar();*/ return 1; } int main(int argc, char *argv[]) { int s[16]; /* s[i] is the number of the set in which the ith element should go */ int m[16]; /* m[i] is the largest of the first i elements in s*/ int n = 3; int i; /* The first way to partition a set is to put all the elements in the same subset. */ for (i = 0; i < n; ++i) { s[i] = 1; m[i] = 1; } /* Print the first partitioning. */ printp(s, n); /* Print the other partitioning schemes. */ while (next(s, m, n)) printp(s, n); return 0; }

The code is heavily commented, but I’ll happily respond to any questions. This is also what I used to generate all the above listings. Try decommenting some of the code to see how the programme works. Good luck!

P.S. Every encoding after *(3, 2, 1)* yields a duplicate partitioning. For fun, try proving this mathematically.