How to tell if Spam Assassin is using my custom rules?

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
copowpow
Posts: 19
Joined: Mon Mar 26, 2018 3:34 pm

How to tell if Spam Assassin is using my custom rules?

Postby copowpow » Fri Jun 14, 2019 6:00 pm

Hey guys, I have made some custom rules for SA and i cant tell if they are actually being used.

version: Release 8.8.8.GA.2009.UBUNTU16.64 UBUNTU16_64 FOSS edition, Patch 8.8.8_P10.

I created sauser.cf under /opt/zimbra/data/spamassassin/localrules/ and added some rules to look at header subject lines as a test, like this:

Code: Select all

header LOCAL_SUB1 Subject =~ /test/i
describe LOCAL_SUB1 Subject line spam keyword detected
score LOCAL_SUB1   10


the file has correct permissions:

Code: Select all

-rw-r----- 1 zimbra zimbra  14K Jun 12 15:59 sauser.cf



To test i copied an email and ran

Code: Select all

spamassassin -D < /tmp/test.mail > /dev/null 2> /tmp/test.output


When I look at the test.output, I see:

Code: Select all

Jun 12 16:04:22.878 [10233] dbg: config: read file /opt/zimbra/data/spamassassin/localrules/sauser.cf



but when I scroll further down I dont see my rule running by name "LOCAL_SUB1"

I see lots of other rules firing like:

Code: Select all

Jun 12 16:04:24.260 [10233] dbg: rules: ran header rule __HAS_TO ======> got hit: "<YES>"


But I cannot tell if my custom rules are firing. Do custom rules show up by name in the "spamassassin -D" output? I cant tell if my rules are being used! Does my formatting look correct? Any help would be greatly appreciated!


copowpow
Posts: 19
Joined: Mon Mar 26, 2018 3:34 pm

Re: How to tell if Spam Assassin is using my custom rules?

Postby copowpow » Fri Jun 14, 2019 6:08 pm

let me also add the output of SA lint:

Code: Select all

z@test: /opt/zimbra/common/bin/spamassassin --lint
Jun 14 12:07:54.722 [22741] warn: netset: cannot include 127.0.0.0/8 as it has already been included
Jun 14 12:07:54.723 [22741] warn: netset: cannot include 0:0:0:0:0:0:0:1/128 as it has already been included


I dont think that those warnings have anything to do with the custom rules....
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 478
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 8.7.11_P14 RHEL6 Network Edition
Contact:

Re: How to tell if Spam Assassin is using my custom rules?

Postby JDunphy » Fri Jun 14, 2019 8:07 pm

warnings is normal and custom rules do show up in debug mode...

At the bottom you can also see two lines... check: is spam? which lists the score ... and check: subtests= ... and that lists the test that fired.

Other things to check.

Verify you are running this from zimbra so you get the environment setup for debug mode and SA can find its config files. Example:

Code: Select all

# su - zimbra
% which spamassassin
~/common/bin/spamassassin

There is an order which it SA searches... You might have 3.004002 or 3.00400. I am running 8.7.11 and updated mine to version 3.4.2

Code: Select all

Default configuration data is loaded from the first existing directory in:
#
#    /opt/zimbra/data/spamassassin/state/3.004002
#    /opt/zimbra/data/spamassassin/rules
   ....
   ...

# Site-specific configuration data is used to override any values which
#    had already been set. This is loaded from the first existing directory in:
#
#    /opt/zimbra/data/spamassassin/localrules
#    /opt/zimbra/common/etc/mail/spamassassin
  ...
  ...

That is how it finds its files.
copowpow
Posts: 19
Joined: Mon Mar 26, 2018 3:34 pm

Re: How to tell if Spam Assassin is using my custom rules?

Postby copowpow » Fri Jun 21, 2019 8:06 pm

JDunphy, thank you for the reply. I really appreciate all you do here man you are extremely helpful to tons of people on here! Thank you!

I ended up constructing a test email that was guaranteed to set off my test rules, and lo an behold I got some hits:

Code: Select all

Jun 21 13:37:52.538 [714] dbg: rules: ran header rule LOCAL_FROM9 ======> got hit: "Skin"
Jun 21 13:37:52.539 [714] dbg: rules: ran header rule LOCAL_SUB2 ======> got hit: "dental"
Jun 21 13:37:52.539 [714] dbg: rules: ran header rule LOCAL_SUB1 ======> got hit: "amazon"


I was expecting to see all of the custom rules fire in that file but I guess there has to be an actual hit to see the local rules in the SA output. So I am now positive my rules are being used.

One more question for you, We get a lot of spam emails with the subject line (when "viewing original" in zimbra) obscured like this:

Code: Select all

Subject: =?UTF-8?B?U2VlIHJlbW9kZWwgaWRlYXMgZm9yIHlvdXIgc21hbGwgYmF0aA==?=


my rule isnt catching it:

Code: Select all

header LOCAL_SUB38 Subject =~ /UTF/i
describe LOCAL_SUB38 Subject line spam keyword detected
score LOCAL_SUB38   10


Is there a way to deal with this type of thing? Or is my rules' regex just bad? I see lots of references to UTF encoding in the existing SA rules in the SA output like:

Code: Select all

Jun 21 13:58:18.765 [13202] dbg: rules: ran header rule __SUBJECT_ENCODED_B64 ======> got hit: "=?UTF-8?B?"


Is it worth it to try and catch the obscured "Subject" field in a rule? Whats your approach to that?
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 478
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 8.7.11_P14 RHEL6 Network Edition
Contact:

Re: How to tell if Spam Assassin is using my custom rules?

Postby JDunphy » Sat Jun 22, 2019 3:12 pm

copowpow wrote:JDunphy, thank you for the reply. I really appreciate all you do here man you are extremely helpful to tons of people on here! Thank you!

Code: Select all

header LOCAL_SUB38 Subject =~ /UTF/i
describe LOCAL_SUB38 Subject line spam keyword detected
score LOCAL_SUB38   10


Is there a way to deal with this type of thing? Or is my rules' regex just bad? I see lots of references to UTF encoding in the existing SA rules in the SA output like:

Code: Select all

Jun 21 13:58:18.765 [13202] dbg: rules: ran header rule __SUBJECT_ENCODED_B64 ======> got hit: "=?UTF-8?B?"


Is it worth it to try and catch the obscured "Subject" field in a rule? Whats your approach to that?

You are very welcome. UTF-8 is normal and things like emoji's or accented characters, etc would result in this but if that isn't normal for you than adjust the score for the rules that fire. Perhaps even better create a meta rule with CHARSET_FARAWAY if you are attempting to stop foreign charactersets, etc. You can also decode it first like SA does so while this subject looks odd to a human, this UTF-8 encoding isn't fooling any of the rules.

Code: Select all

echo 'Subject: =?UTF-8?B?U2VlIHJlbW9kZWwgaWRlYXMgZm9yIHlvdXIgc21hbGwgYmF0aA==?=' | ./k.pl
Subject: See remodel ideas for your small bath

Where k.pl is this:

Code: Select all

#!/usr/bin/env perl
use open qw(:std :utf8);
use Encode qw(decode);

while (my $line = <STDIN>) {
        print decode("MIME-Header", $line);
}

Note: CHARSET_FARAWAY... If it did fire, you can redefine the score for that rule in your salocal.cf ... The rule is defined here so you can see how they do it:

Code: Select all

% pwd
/opt/zimbra/data/spamassassin/state/3.004001/updates_spamassassin_org

Another improvement that helps with foreign characters that may end up as UTF-8 encoded would be to enable TextCat. Here is what I have in my notes to do that:

Code: Select all

#---------------------------------------------------
Note: must modify /opt/zimbra/data/spamassassin/localrules/init.pre so the following is present

# TextCat for language detection support
#
loadplugin Mail::SpamAssassin::Plugin::TextCat

Then you add this to your salocal.cf to bring in a vast number of new rules to help.

Code: Select all

#------------------------------------------------
# Foreign Languages in Body
#------------------------------------------------
#
# Note: loadplugin Mail::SpamAssassin::Plugin::TextCat
#       has to be loaded in /opt/zimbra/data/spamassassin/localrules/init.pre
# setting up for english & Western character sets
ok_languages en es fr la
ok_locales en
textcat_max_languages   6
score UNWANTED_LANGUAGE_BODY    5.0

I know that a colleague ported a newer version of TextCat to SA 3.4.1 which apparently has made it into SA 3.4.2 but I think these steps would work for the installed but not activated version of TextCat that comes with Zimbra's SA 3.4.1 ... This allows you to set a languages you are ok with. We use that ability to fine tune some of our custom rule sets.

SA is a remarkable engine and very simple to customize once you know a few key concepts. I keep a few aliases (csh) handy to pop around:

Code: Select all

alias vi-spam 'sudo vi /opt/zimbra/data/spamassassin/localrules/sauser.cf'
alias g-spam 'pushd /opt/zimbra/data/spamassassin/state/3.004001/updates_spamassassin_org'
alias g-spamassasin 'pushd /opt/zimbra/common/lib/perl5/Maalias vi-amvisd 'sudo vi /opt/zimbra/conf/amavisd.conf'
alias show-spam-stats '/opt/zimbra/common/bin/amavis-logwatch /var/log/zimbra.log'

Note: One key thing to keep in mind is that amavis (written in perl) calls spamassassin perl modules. The command line tool spamassassin that you use with the -D flag when testing your rules calls those same SA perl modules. Sometimes, when you are testing your rules they will fire but will not fire when in production. The reason is that amavisd has some extra checks that may change the logic and not call its spamassassin module... ie). size of message, whitelist, etc.

Finally, if you do this:

Code: Select all

# Zimbra Assumptions:
# Amavis at level 3 logging to see spam_scan lines in /var/log/zimbra.log to parse:
#   % zmprov ms `zmhostname` zimbraAmavisLogLevel 3
#   % zmantispamctl restart

Your zimbra.log will show additional information about your rules firing in production and the amavis-logwatch tool (which I learned from user dualcore in these forums) can tell you the frequency of your top rules for spam/ham, etc. I also have a script that uses this information so I can see how we are doing in production:

Code: Select all

#!/usr/bin/perl

#
# Zimbra Assumptions:
# Amavis at level 3 logging to see spam_scan lines in /var/log/zimbra.log to parse:
#   % zmprov ms `zmhostname` zimbraAmavisLogLevel 3
#   % zmantispamctl restart
#

use Data::Dumper qw(Dumper);
use Getopt::Long;

%Email_list = ();  #ip list
%SA_Rules_list = ();   #failed ip list
$audit_log = 0;   #todays logging

sub usage {

print <<"END";
usage: % check_spam.pl
      [--user=<username>]
      [--ham|h ]
      [--spam|s ]
      [--discard|d ]
      [--rules|r ]
      [--option|o ]
    requires one of
       --ham | --spam | --discard
    where
       --ham will display only ham
       --spam will display only spam
       --discard will display not delivered email due to scoring
       --rules DO NOT display SA rules that fired
       --user will display only email destined for that user
END
  exit 0;
}

#defaults
my $srchuser = '@';
my $ham = 0;
my $spam = 0;
my $discard = 0;
my $rules = 0;
my $help = 0;
my $dcount=0;
my $scount=0;
my $hcount=0;
my $tcount=0;

&GetOptions( "user=s" => \$srchuser,
              "ham" => \$ham,     # display ham
              "discard" => \$discard,  # display discarded not delivered email
              "rules" => \$rules, # display SA rules
              "spam" => \$spam,   # display spam
              "options" => \$help);

print "user is $srchuser rules[$rules] ham[$ham] spam[$spam] discard[$discard]\n";
my $nodisplayoptions=$ham + $spam + $discard;
usage() if($help || !$nodisplayoptions);

chdir "/var/log";

#for (glob 'zimbra.log*') {
for (glob 'zimbra.log') {

  # audit.log is always todays stuff
  #print "***** Opening file $_","\n";
  if ($_ eq 'zimbra.log')
  {
     $audit_log = 1;
     open (IN, sprintf("cat %s |", $_))
       or die("Can't open pipe from command 'zcat $filename' : $!\n");
  }
  else
  {
     $audit_log = 0;
     open (IN, sprintf("zcat %s |", $_))
       or die("Can't open pipe from command 'zcat $filename' : $!\n");
  }

my $score=0;
my $tests="";
my $flag=0;

  while (<IN>)
  {
   # Available when in level 3 logging
   if (m#spam_scan#)
   {
      #print $_;
      ($score,$tests) = m#\s+score=(-?\d+\.?\d*).*tests=\[(.*)\]\s*#i;
      #print " - score is $score, tests is $tests \n";

      # %%% spam_scan can be consequtive given this is multi-threaded writes from the amavisd's.
                #  resulting in lost records.
      #if ($flag) {print " - score is $score, tests is $tests \n";}
      $flag=1;

   }
   # Always available
        # Discarded spam
   elsif (m#DiscardedInbound# && ($flag == 1) && (m#Blocked#))
   {
      #print " - score is $score, tests is $tests \n";
      my($from,$to,$hits,$size) = m#[^<]+<([^>]+)>[^<]+<([^>]+)\>.*Hits:\s*(\d+\.?\d*),\s*size:\s+(.*)$#i;

                #by user
                next if(index($to,$srchuser) == -1);
      next if (!$discard);

      # Sanity check for working on same record
      if ($hits != $score) { next; }

      printf ("Score [%6s] To: %s From: %s\n", $score, $to, $from);
      printf ("      %s\n\n", $tests) if (!$rules);

      # reset, and look for next spam_scan line
      $score=0;
      $tests="";
      $flag=0;
      $dcount++;
   }
        # Ham
   elsif (m#spam-tag# && ($flag == 1) && (m#No#))
   {
      #print " - score is $score, tests is $tests \n";
      my($from,$to,$hits) = m#spam-tag,\s+\<+([^>]+)\>+\s+-\>\s+\<+([^>]+)\>+,\s+No,\s+score=(-?\d+\.?\d*)\s+.*$#i;

                #by user
                next if(index($to,$srchuser) == -1);
      next if (!$ham);

      # Sanity check for working on same record
      if ($hits != $score) { next; }

      #print $_;

      printf ("Score [%6s] To: %s From: %s\n", $score, $to, $from);
      printf ("      %s\n\n", $tests) if (!$rules);

      # reset, and look for next spam_scan line
      $score=0;
      $tests="";
      $flag=0;
      $hcount++;
   }
   # Spam but not discarded
   elsif (m#spam-tag# && ($flag == 1) && (m#Yes#))
   {
      #print " - score is $score, tests is $tests \n";
      my($from,$to,$hits) = m#spam-tag,\s+\<+([^>]+)\>+\s+-\>\s+\<+([^>]+)\>+,\s+Yes,\s+score=(-?\d+\.?\d*)\s+.*$#i;

                #by user
                next if(index($to,$srchuser) == -1);
      next if (!$spam);

      # Sanity check for working on same record
      if ($hits != $score) { next; }

      #print $_;

      printf ("Score [%6s] To: %s From: %s\n", $score, $to, $from);
      printf ("      %s\n\n", $tests) if (!$rules);

      # reset, and look for next spam_scan line
      $score=0;
      $tests="";
      $flag=0;
      $scount++;
   }
  }
  close (IN);

}

$tcount += $dcount if ($discard);
$tcount += $scount if ($spam);
$tcount += $hcount if ($ham);

printf ("\nTotal counts: $tcount");
printf (" Discarded Email: $dcount") if ($discard);
printf (" Spam Email: $scount") if ($spam);
printf (" Ham Email: $hcount") if ($ham);
printf ("\n");

usage:

Code: Select all

% check_rejected_spam.pl --discard | head -6
user is @ rules[0] ham[0] spam[0] discard[1]
Score [43.003] To: jim.dunphy@example.com From: catherinethomas@ellost.icu
      BAYES_99=4,BAYES_999=0.2,BL_ZEN_SPAMHAUS=1,FROM_FMBLA_NEWDOM=1.499,HTML_IMAGE_ONLY_24=1.618,HTML_MESSAGE=0.001,HTTP_IN_BODY=0.1,J_BL_IVMURI=4,J_BL_ZEN_SPAMHAUS=3,J_DOMAIN_SPAM_TLD=2.5,J_IMG_NO_EXTENS=0.1,J_MIME_BASE64_TEXT=1,J_RCVD_IN_TRUNCATE=2,J_SORBS_BL=0.1,J_TRACKING_SORBS=0.5,J_TRACKING_SPAM=2,J_TRACKING_SPAM0=2,J_URI_DOMAIN_BAD=0.1,J_URI_DOMAIN_TLD=2.5,MIME_BASE64_TEXT=0.01,MPART_ALT_DIFF=0.79,RCVD_IN_IVMSIP=3,RCVD_IN_IVMSIP24=2,RCVD_IN_SBL_CSS=3.335,SPF_HELO_PASS=-0.001,URIBL_ABUSE_SURBL=1.25,URIBL_BLACK=1.7,URIBL_CSS=0.1,URIBL_CSS_A=0.1,URIBL_DBL_SPAM=2.5,URIBL_IVMURI=0.001
     
% check_rejected_spam.pl --discard --rules | head -2
user is @ rules[1] ham[0] spam[0] discard[1]
Score [43.003] To: jim.dunphy@example.com From: catherinethomas@ellost.icu


Until a few years ago, we did very little with salocal.cf. Fortunately SA is super simple to customize and very easy to extend with its plugin architecture and great tool in the fight against SPAM/malware delivery. Given it's age, there is loads of documentation and examples for just about any problem. Zimbra offers an ideal platform to showcase and customize SA with tools to pull your users email's that were trained every night as spam/ham to further fine-tune your server and rules.

end of brain dump and time for coffee. :-)

Jim

Return to “Administrators”

Who is online

Users browsing this forum: No registered users and 8 guests