Fuck you, spam
2008-07-09 Wed – 13:20:07And this, folks, is why spam sucks so fucking much:
Junk Filter Statistics
The junk mail filter has been trained by 46868 messages, whereof 28127 (60%) have been rated as solicited and 18741 (40%) as junk. This resulted in a total of 699360 tokens read, 349780 (50%) rated as good and 349580 (50%) as evil; the number of different tokens is 622268.
The following table will show the 12 most common tokens, hiding 622256 tokens below the threshold of 24060 appearances.
(Processing this training.dat of 24060562 bytes took 780.35 seconds.)
Token Good Evil Junk Probability
1 mime-version:1.0 13818 17248 65.20 %
2 for 16579 9394 45.96 %
3 the 21511 11836 45.23 %
4 envelope-to:pgl@yoyo.org 22742 16294 51.81 %
5 content-type/type:text/plain 22016 6247 29.87 %
6 you 19875 9279 41.20 %
7 with 17209 8710 43.17 %
8 this 17078 9004 44.17 %
9 x-mozilla-status2:00000000 10897 13810 65.54 %
10 that 17419 7984 40.76 %
11 x-mozilla-status:0001 10846 13810 65.65 %
12 and 20086 10530 44.03 %
That's right, there's a greater than 40% chance any mail I receive is going to be spam if it contains any of these words: "with", "that", "and", "for", "the", or "you". 40% for any of them. Sigh.
Tags: email, junk filter, spam, stats