2007-10-10
Generating Subsets
Posted by scvalexThere quite a few definitions of what a set is, but it all boils down to this:
A set defined as a collection of distinct elements, in which order is not important.
So {1, 2, 3}, {3, 4}, {} and {5, 99, -1} are all sets. Because the order of the elements is ignored, {1, 2, 3} and {3, 2, 1} is the same set. In case you’re wandering, there are exactly n! diffrent ways to write a set of n elements.
For the rest of the discussion, I’ll use sets of the form {1, 2, …, n}, so when I say a set of 3 elements, I mean {1, 2, 3}. Just remember that is not a property of sets. They can contain anything as elements, not necessarily consecutive numbers.
The set S1 is said to be the subset of the set S2, if all the elements of S1 also belong to S2.
Knowing this, it’s easy to figure out the subsets of {1, 2, 3}:
{ }
{ 1 }
{ 2 }
{ 1, 2 }
{ 3 }
{ 1, 3 }
{ 2, 3 }
{ 1, 2, 3 }
How many subsets are there? For a set of one element, there are 2 subsets: {} and {1}. For a set of 2 elements, there are 4 subsets: {}, {1}, {2}, {1, 2}. For a set of 3 elements, there are 8 subsets. Notice the pattern?
n = 1: 21
n = 2: 22
n = 3: 23
For a set of n there are 2n subsets. This is easily proved: Any subset of the set can either contain or not contain an element; so, for a subset, there are 2 states for the first element, 2 for the second element, …, 2 for the nth element; so, there are 2 states for the first element, 2 * 2 = 22 states for the first two, 2 * 2 * 2= 23 states for the first three, …, 2 * 2 * 2 * … * 2 = 2n states for all the n elements.
The problem here is how to generate all the subsets of a given set. There are a few algorithms for doing this, but in the end, only two are worth considering.
The first is this: given all the subsets of S and the element y, you can generate all the subsets of S U {y} by taking each subset of S, once adding to it y and once leaving it as it is. i.e. Knowing that {1, 3} is a subset of S, you obtain the following two subsets of S U {y}: {1, 3, y} and {1, 3}.
This does what it’s supposed to - it generates all the subsets of S, and it wastes no time. It can also be used as another way to prove that there are 2n subsets for any set of n elements. The only problem is that you need the subsets from the previous step to generate those of this step. This means that just before the end, you must have 2n - 1 subsets in memory. Considering how much memory computers have this days, it’s not particularly wasteful, but still, there’s a better way.
The better way involves using a mask. If you have the a set of n elements, a valid mask would be an array of n boolean (true/false; 1/0) elements. When you apply a mask to a set, you check each element (e) in the set and the corresponding one in the mask (m): if m is true(1), you add e to the result, otherwise, you ignore it. After applying the mask (0, 1, 0, 0, 1) to {1, 2, 3, 4, 5}, you get {2, 5}.
So, to generate all the subsets of a set of n elements, you first have to generate all the possible 2n masks of the set and then apply them.
Generating the masks is a simple problem. Basically, you just have to implement a binary counter, i.e. something that generates:
000
001
010
011
100
101
110
111
Here’s the code in C (sub.c):
#include <stdio.h> /* Applies the mask to a set like {1, 2, ..., n} and prints it */ void printv(int mask[], int n) { int i; printf(”{ “); for (i = 0; i < n; ++i) if (mask[i]) printf(”%d “, i + 1); /*i+1 is part of the subset*/ printf(”\\b }\\n”); } /* Generates the next mask*/ int next(int mask[], int n) { int i; for (i = 0; (i < n) && mask[i]; ++i) mask[i] = 0; if (i < n) { mask[i] = 1; return 1; } return 0; } int main(int argc, char *argv[]) { int n = 3; int mask[16]; /* Guess what this is */ int i; for (i = 0; i < n; ++i) mask[i] = 0; /* Print the first set */ printv(mask, n); /* Print all the others */ while (next(mask, n)) printv(mask, n); return 0; }
Note: The next()
function generates the bits in reverse order.
Always open to comments.