Dockerized Spam Filtering With Rspamd

If you are running a basic mailserver either from my guide on running a dockerized mailserver or some setup of your own you will soon be bombarded with spam messages.

In this post we explore a solution based on a dockerized instance of the mail filter system Rspamd. Our Postfix (or your MTA of choice) will use a milter, a mail filter, to send all incoming mail to Rspamd. Rspamd then scans the mail for spam and viruses. When Rspamd is confident enough that the mail in question is spam or contains a virus it will signal Postfix to reject the mail before passing it on. If Rspamd is not sure that the mail should be rejected but suspects the mail is spam it will attach a header to the mail indicating that the mail is spam. When the mail storage like the IMAP server Dovecot later receives the mail it can decide based on the spam header to move the mail into the spam folder so it won’t be lost immediately. Actual mail from other persons (ham) is not modified.

Running Rspamd

While you can run Rspamd as a system service and install it via the system package management on most distributions, we run it in a Docker container. That way we don’t clutter the host system with dependencies while still beeing able to upgrade to newer versions as required. There is a guide on how to build an Rspamd Docker container.

We run a single instance of Rspamd. This instance does all processing, scanning and it exposes a web interface with some detail information and scan history. If you want to expose the web interface, check out the traefik installation guide. This post includes the necessary labels to use the Rspamd web interface with traefik.

Rspamd uses an external virus scanner to scan mail for viruses. We use clamav, running in a separate Docker container.

Deploying Rspamd Docker Container

You can of course deploy the Rspamd Docker container as you like but to give you some ideas this section includes Ansible tasks to prepare your host for the Rspamd Docker container. You can integrate them for instance with the Azure VM deployment with Ansible guide or check out the other Ansible posts. Otherwise the tasks will fit into a standard Ansible role directory structure. The paths to the config files also adhere to that structure.
The subtask to ensure the Rspamd container also has the necessary labels to make a Traefik reverse proxy expose the Rspamd stats page over https at https://example.com/rspamd. If you don’t want or need that, just remove the labels.
Rspamd uses Redis as key-value storage for several modules. So we have to set up a Redis container as well. If you already have a Redis instance that you want to use that’s ok but make sure the Redis data is persisted otherwise all Rspamd training data gets lost upon restart of Redis.
We use the Rspamd docker image from our how to build an Rspamd Docker image post.

---
# You can place the config files wherever you like, but I prefer /srv/data
- name: Ensure directories
  file:
    name: "{{ item }}"
    state: directory
  loop:
    - "/srv/data/rspamd/config/local.d"
    - "/srv/data/clamav/config"
    - "/srv/data/redis/data"

# We will explore the config files in the next section
- name: Copy plain rspamd config files
  copy:
    src: "files/rspamd/local.d/{{ item }}"
    dest: "/srv/data/rspamd/config/local.d/{{ item }}"
  loop:
    - logging.inc
    - redis.conf
    - worker-proxy.inc
    - worker-normal.inc
  register: rspamd_static_config

- name: Render rspamd config files
  template:
    src: "files/rspamd/local.d/{{ item }}"
    dest: "/srv/data/rspamd/config/local.d/{{ item }}"
  loop:
    - worker-controller.inc
  register: rspamd_dynamic_config

# Rspamd persists some data in Redis
- name: Ensure redis container
  docker_container:
    name: redis
    image: redis:7.0.5-alpine
    # we have to persist the data otherwise the trained Rspamd data gets lost
    command: --maxmemory 512mb --save 60 1 --loglevel warning
    volumes:
      - "/srv/data/redis/data:/data"
    networks:
      - name: internal
    labels:
      traefik.enable: "false"

- name: Ensure rspamd container
  docker_container:
    name: rspamd
    image: gevattergaul/rspamd:3.3-r0-0
    recreate: "{{ rspamd_static_config.changed or rspamd_dynamic_config.changed }}"
    networks:
      - name: internal
    volumes:
      - "/srv/data/rspamd/config/local.d:/etc/rspamd/local.d"
    labels:
      # Traefik will read these labels and expose the web interface at https://example.com/rspamd
      traefik.http.routers.rspamd.rule: "Host(`example.com`) && PathPrefix(`/rspamd`)"
      traefik.http.routers.rspamd.tls: "true"
      traefik.http.routers.rspamd.entrypoints: "websecure"
      traefik.http.routers.rspamd.service: "rspamd"
      traefik.http.routers.rspamd.tls.certresolver: letsEncryptResolver
      traefik.http.routers.rspamd.middlewares: rspamd-prefix
      traefik.http.middlewares.rspamd-prefix.stripprefix.prefixes: "/rspamd"
      traefik.http.services.rspamd.loadbalancer.server.port: "80"
...

Rspamd Config

In the previous section we just copied all config files to the server. Here we will look at them in detail. All filenames and paths are relative to the same Ansible roles directory schema we use in the previous section. The Ansible script takes care of copying them to the right directory on the server.

Rspamd uses a set of config files with a default config that you can override in various ways. We will place our config in separate files and move them to the local.d directory of the Rspmd config. Rspamd will override its default settings with the settings in these files. If you want to revert to the default just remove your custom config file.

The first and a really helpful config file is the logging config. Until Rspamd works perfectly we will set the level to notice which outputs most actions Rspamd takes.

files/rspamd/local.d/logging.inc

type = console
# maybe increase to "info" for debugging
level = "notice";

Rspamd needs to know where to find the Redis server. In our setup this is straight forward as the Redis container will can resolved by its name redis and we can just type that into the config file.

files/rspamd/local.d/redis.conf

write_servers = "redis";
read_servers = "redis";

As mentioned above we want Rspamd to run with only one worker so we disable the normal mail scan worker.

files/rspamd/local.d/worker-normal.inc

enabled = false;

All the work is done by the proxy worker. Here we have to enable self scan and the milter protocol that we use to send mail from Postfix to Rspamd.

files/rspamd/local.d/worker-proxy.inc

milter = yes; # Enable milter mode

bind_socket = "*:11332";

timeout = 120s; # Needed for Milter usually
upstream "local" {
  default = yes; # Self-scan upstreams are always default
  self_scan = yes; # Enable self-scan
}
count = 4; # Spawn more processes in self-scan mode
max_retries = 5; # How many times master is queried in case of failure
discard_on_reject = false; # Discard message instead of rejection
quarantine_on_reject = false; # Tell MTA to quarantine rejected messages
spam_header = "X-Spam"; # Use the specific spam header
reject_message = "Spam message rejected"; # Use custom rejection message

If you want to enable the Rspamd web interface with it’s statistics then add the controller worker config file and specify a socket where the web interface shall be reachable.

files/rspamd/local.d/worker-controller.inc

# note: our docker container uses an internal nginx.
# This is not the port that the docker container exposes the web interface on
bind_socket = "*:11334";

# This is a password hash. Use 'rspamadm pw' to generate it.
password = {{ rspamd_encoded_password }}

Scanning for Viruses

Not only can Rspamd scan for spam but also for viruses in incoming mail. It will use an external virus scanner for this, in our case: ClamAV. Luckily there is an official clamav Docker container. We add the following task to our above Ansible script:

- name: Ensure clamav container
  docker_container:
    name: clamav
    image: clamav/clamav:0.105.1
    networks:
      - name: internal
    labels:
      traefik.enable: "false"

And then we use the following config file to point Rspamd to the ClamAV docker container.

files/rspamd/local.d/antivirus.conf

clamav {
    action = "reject";
    message = '${SCANNER}: virus found: "${VIRUS}"';

    log_clean = true;
    type = "clamav";
    servers = "clamav:3310";
    symbol = "CLAM_VIRUS";
}

We also have to add the config file to the copy task from the script above so that it ends up on the server.

Routing Mails Through Rspamd

We use the Postfix milter to send incoming mail to Rspamd before we process it further in Postfix. We point Postfix to the Rspamd docker container by setting smtpd_milters and set the default action to accept so we won’t loose mail if Rspamd malfunctions.

/etc/postfix/main.cf

smtpd_milters = inet:rspamd:11332
milter_default_action = accept
milter_protocol = 6

The only issue now is that the milter will check any mail that arrives via smtpd including mails from the submission port, sent by our own users. For the most part this is not an issue. Rspamd will handle the situation gracefully. But just to be sure we deactivate the milter on the submission port. In the Postfix master.cf we add an empty smtpd_milters option:

/etc/postfix/master.cf

submission inet n       -       n       -       -       smtpd
  -o smtpd_tls_security_level=encrypt
  -o milter_macro_daemon_name=ORIGINATING
  -o smtpd_milters=

Now Rspamd will only have to scan mail incoming from Port 25.

Rspamd Workflow

So now Rspamd should work quietly in the background, scanning incoming mail for spam and viruses. When it deems a mail to be clearly spam it will signal Postfix to reject the mail. If Rspamd is unsure wether a mail is spam or not it will add a header to the mail and pass it on. To improve the detection rate we make Rspamd learn from the mail we receive and move mails that Rspamd is unsure about into a junk folder from where we can sort it by hand.

Learning Spam

Rspamd has a module that can take ham and spam and learn from both to improve the spam detection rate. That module is called the statistical module. It needs some initial mails to learn from and works best if you continue to feed it new spam messages and also tell it about mails that were categorized as spam but aren’t.
To train the statistical module we use Dovecot in conjunction with Sieve filters. We set up Dovecot so that it calls our sieve filters when we move mail around in a certain way:

If we move mail to the junk folder we want Dovecot to feed it to Rspamd as spam
If we move mail out of the spam folder we want Dovecot to feed it to Rspamd as ham

The statistical module has a configurable threshold of 200 learned mails by default under which it will not rate scanned mail. To speed up the process of learning ham we configure Dovecot additionally to make Rspamd learn all mail as ham that we move to a dedicated Ham folder.

The Bayes Filter (statistical module) will only work after you trained it with 200 ham and 200 spam messages.

Add the following config section to your dovecot config:

/etc/dovecot/dovecot.conf

plugin {
    sieve_plugins = sieve_imapsieve sieve_extprograms

    # From elsewhere to Spam folder
    imapsieve_mailbox1_name = Junk
    imapsieve_mailbox1_from = *
    imapsieve_mailbox1_causes = COPY
    imapsieve_mailbox1_before = file:/etc/dovecot/learn-spam.sieve

    # From somewhere to Ham folder
    imapsieve_mailbox2_name = Ham
    imapsieve_mailbox2_from = *
    imapsieve_mailbox2_causes = COPY
    imapsieve_mailbox2_before = file:/etc/dovecot/learn-ham.sieve

    # From Spam to elsewhere
    imapsieve_mailbox3_name = *
    imapsieve_mailbox3_from = Junk
    imapsieve_mailbox3_causes = COPY
    imapsieve_mailbox3_before = file:/etc/dovecot/learn-ham.sieve

    sieve_pipe_bin_dir = /usr/bin
    sieve_global_extensions = +vnd.dovecot.pipe
}

The Dovecot config now references some sieve scripts that we also have to place into the dovecot config directory on the server. The first one pipes mail to Rspamd and tells it to learn the mail as ham.

learn-ham.sieve

require ["vnd.dovecot.pipe", "copy", "imapsieve"];
pipe :copy "rspamc" ["-h", "rspamd:11334", "-P", "{{ rspamd_password }}", "learn_ham"];

The second one does the same thing for spam.

learn-spam.sieve

require ["vnd.dovecot.pipe", "copy", "imapsieve"];
pipe :copy "rspamc" ["-h", "rspamd:11334", "-P", "{{ rspamd_password }}", "learn_spam"];

Remember to replace the rspamd password in the script.

Moving Suspected Spam to the Junk Folder

When Rspamd is not really sure wether a mail is spam or not it will add a spam header to the mail. We use that header to move suspicious mail to a junk folder from where we can sort it out manually. We again use Dovecot to run the sorting script. Add the following line to the above plugin section. It calls a Sieve script that moves mail with a specific header to the junk folder.

/etc/dovecot/dovecot.conf

plugin {

    # ... config form above

    sieve_before = file:/etc/dovecot/spam-to-junk.sieve
}

The Sieve script checks for the X-Spam header in a mail and moves the mail to the Junk folder if the header says Yes. Be aware that the Yes is case sensitive.

spam-to-junk.sieve

require "fileinto";
if header :contains "X-Spam" "Yes" {
    fileinto "Junk";
}

Conclusion

Rspamd is a capable spam and virus filter system that you don’t have to configure a lot to make it work. Unfortunately this is not really clear from when you start reading the documentation. I hope this post makes it easier to get a grasp on the steps needed to get Rspamd up and running.