We’re getting a lot of questions about spam, so I thought I’d go over what we are doing about the problem.
Please be aware that the following information is for @mail.usf.edu accounts only. If you have an @eng.usf.edu or @stpt.usf.edu account you can use WebMail, but none of these anti-spam features are available to you.
The Problem
By now everyone has heard of, and received spam, so I’m not going to explain what it is, but I want to give you some perspective on the size of the problem we are dealing with. We receive around 300,000 email messages on an average day and we’ve had peaks of over 500,000 per day. That’s a lot of mail, and scanning each message for viruses and spam is very CPU-intensive. Spam scanning is especially hard, because of the nearly-infinite variations that spams come in, thousands of tests have to be run on each message. Up until now, scanning was done on the mail server itself, just before the message was placed in your mailbox. This was sufficient when the mail server was put into production back in 2004, but we were only receiving about 1ooK messages per day then. In order for the mail server to handle the increased workload since then, we’ve had to cut down on the number of tests that we used to scan for spam, which limited the effectiveness of the filters.
The Solution
Just before Fall semester, we moved to a different architecture: the scanning is done on a separate set of machines (called MailGate) which then hand the messages to the mail server for final delivery. The new system is working really well, and with your help (more on that later), it will get even better. However it is not perfect. Some spam will still get through, but it will make a huge difference in the amount of spam you receive. MailGate reduces the number of spams you receive in a couple of ways:
Blacklisting
The first step in combating spam happens before a message has even been transferred. When an Email server tries to contact MailGate to send a message, MailGate checks several blacklists and if the server is listed, the connection is denied and no mail is transferred. MailGate also denies access to badly mis-configured or non RFC-compliant mail servers, which are usually spam zombies.
Virus Scanning
At this point, MailGate looks at the message and determines what (if any) files are attached. All files that are executable on Windows (.exe, .bat, etc) are automatically rejected. We are doing this because most Email-borne viruses use these file formats. If you need to send an executable file for some reason, put it into a “zip” archive to get past this check. If the file is not an executable, it is sent to the virus scanner. All archived files are unpacked at this point and the contents are also scanned. If all of the contents are virus-free, the message is then ready for spam scanning.
Rules-Based Spam Scoring
We use SpamAssassin to determine if a message is spam. Spamassassin (SA) uses thousands of rules and text patterns to make this determination. In addition to SA’s built-in rules, we are also using sets of rules that are updated daily to detect the latest types of spam We are also using Razor and DCC which are massive spam databases that messages can be checked against. Each rule has a “spam score” associated with it and once the message has been tested against all of the rules, the message’s total score is added up. If this score is greater than 5.0 (this score may change at some point), the message is considered spam.
Bayesian Filtering
In addition to the rules-based spam scoring, SA also uses Bayesian Filtering to determine the spam score. I’m not going to go into all the details, but basically a bayesian filter “learns” what you think of as spam and non-spam (”ham” in SA terms). In order for a bayesian filter to work, however, you must train it. Here’s where you come in. You may not have noticed, but there is a new link in WebMail when you are reading a message: “Mark as Spam”. This link sends the message to MailGate’s bayesian filter to help train it to see that message as spam. There is a similar link ( “Mark as Non-Spam”) on every message in your SPAM folder which trains the filter to look for Non-Spam. Whenever the spam filter misses a spam message, make sure to mark the message as spam and whenever it mistakenly marks valid mail as spam, make sure to mark it as non-spam.
Delivery
Once all of these filters are run, the message is finally sent to mail.usf.edu for delivery. If you have spam filtering enabled, messages marked as spam by MailGate will be moved into your SPAM folder, if not, the message is delivered to your mailbox as usual. Again, this will NOT catch every spam! For me, it’s catching about 97% right now and with more training, it should get over 99% of the spam.
To make sure that you have the spam filtering enabled:
- Login to WebMail
- Click on Options
- Click on Spam Filtering
- Choose the destination for your spam
- Click on Update Spam Filter Action