Dockerized Spam Filtering With Rspamd
If you are running a basic mailserver either from my guide on running a dockerized mailserver or some setup of your own you will soon be bombarded with spam messages.
In this post we explore a solution based on a dockerized instance of the mail filter system Rspamd. Our Postfix (or your MTA of choice) will use a milter, a mail filter, to send all incoming mail to Rspamd. Rspamd then scans the mail for spam and viruses. When Rspamd is confident enough that the mail in question is spam or contains a virus it will signal Postfix to reject the mail before passing it on. If Rspamd is not sure that the mail should be rejected but suspects the mail is spam it will attach a header to the mail indicating that the mail is spam. When the mail storage like the IMAP server Dovecot later receives the mail it can decide based on the spam header to move the mail into the spam folder so it won’t be lost immediately. Actual mail from other persons (ham) is not modified.
Running Rspamd
While you can run Rspamd as a system service and install it via the system package management on most distributions, we run it in a Docker container. That way we don’t clutter the host system with dependencies while still beeing able to upgrade to newer versions as required. There is a guide on how to build an Rspamd Docker container.
We run a single instance of Rspamd. This instance does all processing, scanning and it exposes a web interface with some detail information and scan history. If you want to expose the web interface, check out the traefik installation guide. This post includes the necessary labels to use the Rspamd web interface with traefik.
Rspamd uses an external virus scanner to scan mail for viruses. We use clamav, running in a separate Docker container.
Deploying Rspamd Docker Container
You can of course deploy the Rspamd Docker container as you like but to give you some ideas this section includes Ansible tasks to prepare your host for the Rspamd Docker container. You can integrate them for instance with the Azure VM deployment with Ansible guide or check out the other Ansible posts. Otherwise the tasks will fit into a standard Ansible role directory structure. The paths to the config files also adhere to that structure.
The subtask to ensure the Rspamd container also has the necessary labels to make a Traefik reverse proxy expose the Rspamd stats page over https at https://example.com/rspamd. If you don’t want or need that, just remove the labels.
Rspamd uses Redis as key-value storage for several modules. So we have to set up a Redis container as well. If you already have a Redis instance that you want to use that’s ok but make sure the Redis data is persisted otherwise all Rspamd training data gets lost upon restart of Redis.
We use the Rspamd docker image from our how to build an Rspamd Docker image post.
|
Rspamd Config
In the previous section we just copied all config files to the server. Here we will look at them in detail. All filenames and paths are relative to the same Ansible roles directory schema we use in the previous section. The Ansible script takes care of copying them to the right directory on the server.
Rspamd uses a set of config files with a default config that you can override in various ways. We will place our config in separate files and move them to the local.d
directory of the Rspmd config. Rspamd will override its default settings with the settings in these files. If you want to revert to the default just remove your custom config file.
The first and a really helpful config file is the logging config. Until Rspamd works perfectly we will set the level to notice
which outputs most actions Rspamd takes.
type = console |
Rspamd needs to know where to find the Redis server. In our setup this is straight forward as the Redis container will can resolved by its name redis
and we can just type that into the config file.
write_servers = "redis"; |
As mentioned above we want Rspamd to run with only one worker so we disable the normal mail scan worker.
enabled = false; |
All the work is done by the proxy worker. Here we have to enable self scan and the milter protocol that we use to send mail from Postfix to Rspamd.
milter = yes; # Enable milter mode |
If you want to enable the Rspamd web interface with it’s statistics then add the controller worker config file and specify a socket where the web interface shall be reachable.
# note: our docker container uses an internal nginx. |
Scanning for Viruses
Not only can Rspamd scan for spam but also for viruses in incoming mail. It will use an external virus scanner for this, in our case: ClamAV. Luckily there is an official clamav Docker container. We add the following task to our above Ansible script:
- name: Ensure clamav container |
And then we use the following config file to point Rspamd to the ClamAV docker container.
clamav { |
We also have to add the config file to the copy task from the script above so that it ends up on the server.
Routing Mails Through Rspamd
We use the Postfix milter to send incoming mail to Rspamd before we process it further in Postfix. We point Postfix to the Rspamd docker container by setting smtpd_milters
and set the default action to accept
so we won’t loose mail if Rspamd malfunctions.
smtpd_milters = inet:rspamd:11332 |
The only issue now is that the milter will check any mail that arrives via smtpd including mails from the submission port, sent by our own users. For the most part this is not an issue. Rspamd will handle the situation gracefully. But just to be sure we deactivate the milter on the submission port. In the Postfix master.cf
we add an empty smtpd_milters
option:
submission inet n - n - - smtpd |
Now Rspamd will only have to scan mail incoming from Port 25.
Rspamd Workflow
So now Rspamd should work quietly in the background, scanning incoming mail for spam and viruses. When it deems a mail to be clearly spam it will signal Postfix to reject the mail. If Rspamd is unsure wether a mail is spam or not it will add a header to the mail and pass it on. To improve the detection rate we make Rspamd learn from the mail we receive and move mails that Rspamd is unsure about into a junk folder from where we can sort it by hand.
Learning Spam
Rspamd has a module that can take ham and spam and learn from both to improve the spam detection rate. That module is called the statistical module. It needs some initial mails to learn from and works best if you continue to feed it new spam messages and also tell it about mails that were categorized as spam but aren’t.
To train the statistical module we use Dovecot in conjunction with Sieve filters. We set up Dovecot so that it calls our sieve filters when we move mail around in a certain way:
- If we move mail to the junk folder we want Dovecot to feed it to Rspamd as spam
- If we move mail out of the spam folder we want Dovecot to feed it to Rspamd as ham
The statistical module has a configurable threshold of 200 learned mails by default under which it will not rate scanned mail. To speed up the process of learning ham we configure Dovecot additionally to make Rspamd learn all mail as ham that we move to a dedicated Ham
folder.
The Bayes Filter (statistical module) will only work after you trained it with 200 ham and 200 spam messages.
Add the following config section to your dovecot config:
plugin { |
The Dovecot config now references some sieve scripts that we also have to place into the dovecot config directory on the server. The first one pipes mail to Rspamd and tells it to learn the mail as ham.
require ["vnd.dovecot.pipe", "copy", "imapsieve"]; |
The second one does the same thing for spam.
require ["vnd.dovecot.pipe", "copy", "imapsieve"]; |
Remember to replace the rspamd password in the script.
Moving Suspected Spam to the Junk Folder
When Rspamd is not really sure wether a mail is spam or not it will add a spam header to the mail. We use that header to move suspicious mail to a junk folder from where we can sort it out manually. We again use Dovecot to run the sorting script. Add the following line to the above plugin section. It calls a Sieve script that moves mail with a specific header to the junk folder.
plugin { |
The Sieve script checks for the X-Spam
header in a mail and moves the mail to the Junk folder if the header says Yes
. Be aware that the Yes
is case sensitive.
require "fileinto"; |
Conclusion
Rspamd is a capable spam and virus filter system that you don’t have to configure a lot to make it work. Unfortunately this is not really clear from when you start reading the documentation. I hope this post makes it easier to get a grasp on the steps needed to get Rspamd up and running.