2007-10-10

## Generating Subsets

Posted by scvalexThere quite a few definitions of what a **set** is, but it all boils down to this:

A **set** defined as a collection of *distinct* elements, in which order is not important.

So *{1, 2, 3}*, *{3, 4}*, *{}* and *{5, 99, -1}* are all sets. Because the order of the elements is ignored, *{1, 2, 3}* and *{3, 2, 1}* is the same set. In case you’re wandering, there are exactly *n!* diffrent ways to write a set of *n* elements.

For the rest of the discussion, I’ll use sets of the form *{1, 2, …, n}*, so when I say a set of *3* elements, I mean *{1, 2, 3}*. Just remember that is **not** a property of sets. They can contain anything as elements, not necessarily consecutive numbers.

The set *S _{1}* is said to be the

**subset**of the set

*S*, if all the elements of

_{2}*S*also belong to

_{1}*S*.

_{2}Knowing this, it’s easy to figure out the subsets of *{1, 2, 3}*:

*{ }
{ 1 }
{ 2 }
{ 1, 2 }
{ 3 }
{ 1, 3 }
{ 2, 3 }
{ 1, 2, 3 }*

How many subsets are there? For a set of *one* element, there are *2* subsets: *{}* and *{1}*. For a set of *2* elements, there are *4* subsets: *{}*, *{1}*, *{2}*, *{1, 2}*. For a set of *3* elements, there are *8* subsets. Notice the pattern?

`n = 1: 2`

^{1}

n = 2: 2^{2}

n = 3: 2^{3}

For a set of *n* there are *2 ^{n}* subsets. This is easily proved: Any subset of the set can either contain or not contain an element; so, for a subset, there are

*2*states for the first element,

*2*for the second element, …,

*2*for the

*n*element; so, there are

^{th}*2*states for the first element,

*2 * 2 = 2*states for the first two,

^{2}*2 * 2 * 2= 2*states for the first three, …,

^{3}*2 * 2 * 2 * … * 2 = 2*states for all the

^{n}*n*elements.

The problem here is how to generate all the subsets of a given set. There are a few algorithms for doing this, but in the end, only two are worth considering.

The first is this: given all the subsets of *S* and the element *y*, you can generate all the subsets of *S U {y}* by taking each subset of *S*, once adding to it *y* and once leaving it as it is. **i.e.** Knowing that *{1, 3}* is a subset of *S*, you obtain the following two subsets of *S U {y}*: *{1, 3, y}* and *{1, 3}*.

This does what it’s supposed to - it generates all the subsets of *S*, and it wastes no time. It can also be used as another way to prove that there are *2 ^{n}* subsets for any set of

*n*elements. The only problem is that you need the subsets from the previous step to generate those of this step. This means that just before the end, you must have

*2*subsets in memory. Considering how much memory computers have this days, it’s not particularly wasteful, but still, there’s a better way.

^{n - 1}The better way involves using a **mask**. If you have the a set of *n* elements, a valid mask would be an array of *n* boolean (true/false; 1/0) elements. When you apply a mask to a set, you check each element *(e)* in the set and the corresponding one in the mask *(m)*: if *m* is true(1), you add *e* to the result, otherwise, you ignore it. After applying the mask *(0, 1, 0, 0, 1)* to *{1, 2, 3, 4, 5}*, you get *{2, 5}*.

So, to generate all the subsets of a set of *n* elements, you first have to generate all the possible *2 ^{n}* masks of the set and then apply them.

Generating the masks is a simple problem. Basically, you just have to implement a **binary counter**, i.e. something that generates:

`000`

001

010

011

100

101

110

111

Here’s the code in C (sub.c):

#include <stdio.h> /* Applies the mask to a set like {1, 2, ..., n} and prints it */ void printv(int mask[], int n) { int i; printf(”{ “); for (i = 0; i < n; ++i) if (mask[i]) printf(”%d “, i + 1); /*i+1 is part of the subset*/ printf(”\\b }\\n”); } /* Generates the next mask*/ int next(int mask[], int n) { int i; for (i = 0; (i < n) && mask[i]; ++i) mask[i] = 0; if (i < n) { mask[i] = 1; return 1; } return 0; } int main(int argc, char *argv[]) { int n = 3; int mask[16]; /* Guess what this is */ int i; for (i = 0; i < n; ++i) mask[i] = 0; /* Print the first set */ printv(mask, n); /* Print all the others */ while (next(mask, n)) printv(mask, n); return 0; }

Note: The `next()`

function generates the bits in **reverse** order.

Always open to comments.