Hello,

We have been noticing an increasing amount of spam not being tagged as spam and automatically being moved to the junk folder. I started to research and found some things that aren't quite making sense to me.

1. I noticed that the zimbra daily cron job for zmtrainsa and zmtrainsa --cleanup does not process our entire spam account. When I log into the spam account I'll see messages from the previous day that was not processed and then cleaned.

If I then run zimtrainsa manually I start to see different results, although I'm only waiting about a minute between each time I run zmtrainsa and zmtrainsa --cleanup

20090513115445 Starting spam/ham extraction from system accounts.
[] INFO: Total messages processed: 139
[] INFO: Total messages processed: 0
20090513115449 Finished extracting spam/ham from system accounts.
20090513115449 Starting spamassassin training.
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 0 message(s) (130 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 0 message(s) (0 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
bayes: synced databases from journal in 0 seconds: 399 unique entries (412 total entries)
20090513115452 Finished spamassassin training.
[zimbra@ bin]$ ./zmtrainsa
20090513115500 Starting spam/ham extraction from system accounts.
[] INFO: Total messages processed: 139
[] INFO: Total messages processed: 0
20090513115504 Finished extracting spam/ham from system accounts.
20090513115504 Starting spamassassin training.
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 0 message(s) (130 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 0 message(s) (0 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
20090513115507 Finished spamassassin training.
[zimbra@ bin]$ ./zmtrainsa
20090513115531 Starting spam/ham extraction from system accounts.
[] INFO: Total messages processed: 139
[] INFO: Total messages processed: 0
20090513115535 Finished extracting spam/ham from system accounts.
20090513115535 Starting spamassassin training.
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 0 message(s) (130 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 0 message(s) (0 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
bayes: synced databases from journal in 0 seconds: 196 unique entries (225 total entries)
20090513115539 Finished spamassassin training.

Notice that the unique entries and total entries changes each time, sometimes it goes up sometimes its lower.

Then when I run zimtrain --cleanup it only cleans maybe around 60 messages at a time.

[zimbra@ bin]$ ./zmtrainsa --cleanup
20090513115619 Starting spam/ham cleanup
[] INFO: Total messages processed: 79
[] INFO: Total messages processed: 0
20090513115622 Finished spam/ham cleanup
[zimbra@ bin]$ ./zmtrainsa --cleanup
20090513115659 Starting spam/ham cleanup
[] INFO: Total messages processed: 30
[] INFO: Total messages processed: 0
20090513115702 Finished spam/ham cleanup
[zimbra@ bin]$ ./zmtrainsa --cleanup
20090513115714 Starting spam/ham cleanup
[] INFO: Total messages processed: 30
[] INFO: Total messages processed: 0
20090513115717 Finished spam/ham cleanup

I do not understand why either of these problems is happening and if my spam assassin is being trained properly.

2. How exactly does the tagging work? I thought it generates all data located in the spam account and then uses that database to set a score of how much information the filter matched to its database. When I was checking headers I found one particular user that received a spam mail containing the word "orgasm" but only received a score of -1, we have been feeding this trainer for about 2 weeks now with 50 users. I'm sure the database has information on an obvious spam mail such as this, why would this person only get a score of -1 on this email?

Thanks