As one of the people behind the anti-spam foundation spamvrij.nl I get asked frequently whether these cd’s (with millions of e-mailaddresses) are a breach of the Personal Data Protection Act (see Dutch Data Protection Authority). That’s one reason to investigate, the other thing is that I am curious myself if, and to what extend, spammers possibly break these laws.
I have tried to classify a random selection of the addresses on the CD in personal and non-personal data. It has proven to be very difficult and most likely someone else will return a different outcome with the very same CD.
Personal data and the classification of addresses
In the Netherlands special precautions are needed as soon as you start to gather and record personal data. This is valid for all information that can be considered to be personal data. It’s personal data if it can be linked directly or indirectly to a specific person (for the Dutch readers among you: “gegevens die herleidbaar zijn tot een natuurlijk persoon”). This does include e-mailaddresses.
There are a couple of difficulties to tell if an address can be considered to be “personal data”. One of them is that the answer to the question whether an address can be linked to a specific person will depend on who you are asking. You, me, the spammer, we’ll all give an answer that is different from when you’re asking the provider of that address. And even then, it’s hard for you and me to tell if an address can be considered as personal data – and whether a judge, or the DPA for that matter, will do the same.
In order to have something to hold on to, I followed these requirements:
- the username must be clearly a first- or lastname, or combination; or
- the username must be a combination of initials and a lastname; or
- the username must look like initials of first- and lastname, within a non-company domain.
Addresses with a username of just three letters that are identical have not been considered as initials (so, avdk@ was and eeee@ was not classified as personal data). Addresses clearly identifyable as role accounts and nicknames are off course considered to be non-personal. If in doubt, the address is classified as non-personal.
The following procedure was followed to get a random sample:
- all 6,218,344 addresses have been given an ID
- 900 random numbers have been generated using perl’s rand() function
- corresponding addresses have been picked out
- a group of four people each classified one or more batches of 100 addresses
- a second round of checking done by myself, only changing to non-personal
- a third round of checking done by someone else, only changing to non-personal
This resulted in 120 out of 900 addresses that were classified as personal data.
In other words: 13,33% of the addresses can be considered as personal data, a total of 829,112 out of 6,218,344 valid unique addresses for the entire CD. Even if the Dutch DPA would rule that only half of these addresses actually are to be considered as personal data, we’re still talking about 414.556 addresses!
Of course I have contacted the Dutch DPA to ask about the precise criteria for classifications. However, at the time the DPA was aware of these CD’s, but didn’t had started to put together a policy, let alone the criteria for classifying addresses (for my situation). To assist them I send them a copy of one of these CD’s – they didn’t had one back then. Some months later they still hadn’t figured out a policy or criteria. This makes it rather hard to tell whether my classification even comes close to theirs.
And with the DPA lacking a clear policy, I don’t believe they will be very helpfull in solving (a part of) the spamproblem. There a couple of articles on this, focussing on the possible role of the Dutch DPA could be helpfull in the fight against spam. Dutch readers may be interested in De moeizame strijd tegen spam of mr A.R. Lodder and mr J.P.R. Bergfeld and Het einde van Spam? Regulering van ongevraagde e-mail by mr Ch. Alberdingk Thijm.