Some clarification on the counting of the unique addresses

Afbeelding: eMail van Esparta Palma | Licentie: CC BY

From comments at Slashdot I understand some of the counting needs some clarification. Here’s my try.

When just counting the number of addresses on the CD (by counting the number of @’s or – with one address on every line – the number of lines), one does not know how many addresses appear more than once. That would be interesting, as you don’t want to send your message to one individiual, say, fourteen times. So what I did was counting each of the addresses once.

This counting resulted in the following table:

times appearing number of unique addresses number of addresses * number of appearances
14 2 28
13 2 26
12 2 24
11 9 99
10 4 40
9 9 81
8 47 376
7 97 679
6 697 4,182
5 1,830 9,150
4 27,191 108,764
3 287,685 863,055
2 4,107,246 8,214,492
1 1,795,633 1,795,633
6,220,454 10,996,629


The second column tells you how many addresses appeared how many times (according to the first column) on the CD. For example, it tells you that 1,830 addresses appeared 5 times on the CD and that 4,107,246 addresses appeared twice on the CD. The third column is a multiplication of the first two columns.

Now, when you add up the second column, you have the number of addresses on the CD, each address counted only once (which is why I called them unique addresses). When you add up the third column you will have the number of addresses on the CD, each counted once for every time it appears. This is the total number of the addresses on the CD.

Please get in touch if this is still explained badly… :)