2007-10-15
Generating the Partitions of a Set
Posted by scvalexWikipedia defines the partition of a set as:
In mathematics, a partition of a set X is a division of X into non-overlapping “parts” or “blocks” or “cells” that cover all of X. More formally, these “cells” are both collectively exhaustive and mutually exclusive with respect to the set being partitioned.
A more succinct definition is given by Mathworld:
A set partition of a set S is a collection of disjoint subsets of S whose union is S.
Simply put, the partitions of a set S are all the ways in which you can choose disjoint, non-empty subsets of S that unioned result in S.
From now on, when I say a set of n elements, I mean {1, 2, …, n}. So, what are the subsets of {1, 2, 3}?
{1, 2, 3}
{2, 3} {1}
{1, 3} {2}
{3} {1, 2}
{3} {2} {1}
It’s obvious that these verify the definition: {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3} and {1, 2, 3} are all subsets of {1, 2, 3}. They’re all non-empty and, in any partition, the same element never appears twice. Finally, in a partitioning, the union of the partitions is the original set.
In how many ways can you partition a set of n elements? There are many ways to calculate this, but as far as I can tell, the easiest is using Catalan numbers:
If you check the formula for 3 you’ll see that it does give the correct answer: 5.
Ok. We know what a partitioning is, we know how many there are, but how do you generate them? This is the first algorithm I could think of. It may not be clear from the explanation why it works but try it on a piece of paper for n=3 and it will become obvious. Here’s how I came up with it:
First of all, how do you represent a partitioning of a set of n elements? The straight-forward way would be using a vector of n integers, each integer representing the number of the subset in which the corresponding element is in. If the corresponding element of 3 is 2, that means that 3 is in the 2nd subset. So, given the set {1, 2, 3}:
Partitioning -> Encoding
{1, 2, 3} -> (1, 1, 1)
{1} {2, 3} -> (2, 1, 1)
{2} {1, 3} -> (1, 2, 1)
{1, 2} {3} -> (2, 2, 1)
{1} {2} {3} -> (3, 2, 1)
Notice that the encodings, written backwards are: 111, 112, 121, 122 and 123. From this you can guess how the generator works: more or less, generate all the numbers between 111 and 123 using only the digits 1, 2 and 3:
111
112
113
121
122
123
That’s almost right. The encodings (1, 1, 2) and (1, 1, 3) translate into the same partitioning: {1} {2, 3}. If you do the same thing for a larger n you’ll notice this happening again and again. Fortunately, there’s an easy solution: never use a digit that’s more than 1 larger than any other digit in the encoding. i.e. You can’t use (1, 1, 3) because 3 is larger by 2 than the other digits in the encoding (1 and 1).
To do this, I use another vector m with the following significance: m[i] is the largest of the first i elements in the encoding. This makes it very easy not to generate any duplicate partitionings.
Here’s the code in C (part.c):
#include <stdio.h> /* printp - print out the partitioning scheme s of n elements as: {1, 2, 4} {3} */ void printp(int *s, int n) { /* Get the total number of partitions. In the exemple above, 2.*/ int part_num = 1; int i; for (i = 0; i < n; ++i) if (s[i] > part_num) part_num = s[i]; /* Print the p partitions. */ int p; for (p = part_num; p >= 1; --p) { printf("{"); /* If s[i] == p, then i + 1 is part of the pth partition. */ for (i = 0; i < n; ++i) if (s[i] == p) printf("%d, ", i + 1); printf("\\b\\b} "); } printf("\\n"); } /* next - given the partitioning scheme represented by s and m, generate the next Returns: 1, if a valid partitioning was found 0, otherwise */ int next(int *s, int *m, int n) { /* Update s: 1 1 1 1 -> 2 1 1 1 -> 1 2 1 1 -> 2 2 1 1 -> 3 2 1 1 -> 1 1 2 1 ... */ /*int j; printf(" -> ("); for (j = 0; j < n; ++j) printf("%d, ", s[j]); printf("\\b\\b)\\n");*/ int i = 0; ++s[i]; while ((i < n - 1) && (s[i] > m[i] + 1)) { s[i] = 1; ++i; ++s[i]; } /* If i is has reached n-1 th element, then the last unique partitiong has been found*/ if (i == n - 1) return 0; /* Because all the first i elements are now 1, s[i] (i + 1 th element) is the largest. So we update max by copying it to all the first i positions in m.*/ int max = s[i]; for (i = i - 1; i >= 0; --i) m[i] = max; /* for (i = 0; i < n; ++i) printf("%d ", m[i]); getchar();*/ return 1; } int main(int argc, char *argv[]) { int s[16]; /* s[i] is the number of the set in which the ith element should go */ int m[16]; /* m[i] is the largest of the first i elements in s*/ int n = 3; int i; /* The first way to partition a set is to put all the elements in the same subset. */ for (i = 0; i < n; ++i) { s[i] = 1; m[i] = 1; } /* Print the first partitioning. */ printp(s, n); /* Print the other partitioning schemes. */ while (next(s, m, n)) printp(s, n); return 0; }
The code is heavily commented, but I’ll happily respond to any questions. This is also what I used to generate all the above listings. Try decommenting some of the code to see how the programme works. Good luck!
P.S. Every encoding after (3, 2, 1) yields a duplicate partitioning. For fun, try proving this mathematically.