Since the introduction of the GDPR in 2016, on almost every website you will be greeted with an extensive cookie banner. With complicated text, deliberately unusable controls and persistance it will try to get your ok to collecting lots of private data about you, and most importantly, prevent the owner of the website from getting sued.
Apart from hating these cookie banners, I was wondering wether such a banner would really benefit me and the visitors of this blog, or if it would just drive away most of the assumed audience, which are people with interests in developing software, DevOps or apparently people who want to be ansible consultants.
So, I started to take a look around for tools that would enable me to collect some data about how this blog is beeing used, while not requiring a cookie banner and staying GDPR compliant.

Why Plausible Analytics?

I like the idea of hosting my web analytics by myself. This way I keep full control over the data and also I might learn something in the process. Also, I didn’t want to risk to require a “Vertrag zur Auftragsdatenverarbeitung”. The term alone is off-putting. If you don’t want to host Plausible Analytics yourself, they offer a hosted version starting at 4$ per month at the time of writing this blog post.

If you are looking for a self hosted Google Analytics alternative, there is already Matomo, formerly known as Piwik. It is designed as direct competitor to Google Analytics and therefore has an incredible amount of functionality. Matomo also offers instructions on how to set it up for GDPR compliance, but is a bit unclear when it comes to the most important point: cookies and wether they still use them with the altered settings. But generally, Matomo is not designed to work without cookies.

I finally settled on Plausible Analytics. Plausible is as quite minimalist web analytics tool, but it is designed to be GDPR compliant. Also, it is relatively easy to host it yourself (see the Ansible scripts below).

Functionality


In contrast to Google Analytics and Matomo the featureset of Plausible is quite minimal. You will get the basic usage information of your website, though. That includes number of page views, sources, top pages, etc... You also have views on source country and used devices. The views are not configureable, as they are in Google Analytics.

Plausible doesn’t use cookies or similar technologies to track users. Instead, a number of data points like IP, date and browser is hashed to create a unique user id. This user id cannot track users across devices or different days, of course.

Plausible can also integrate Google Search Console and Twitter data. This way you can for instance see what search terms the users have used to get to your page through Google.

You even can do rudimentary campaign tracking through the campaign source view. If you want to see Plausible in action, they provide a live demo page.

Deployment

Plausible has a nice documentation on how to deploy your own Plausible installation. There is also a repository with a docker-compose script. I just want to add my experiences and a set of Ansible scripts.

If you want to use plausible for more than one domain, you can run Plausible itself on a 3rd domain, like yourdatacollectiondomain.com and then use a CNAME from stats.yourdomain1.com and stats.yourdomain2.com to the Plausible domain. This way, the Plausible script is hosted as first party script but the data collection goes to only one Plausible installation. If I had known this earlier, I wouldn’t have had to install Plausible twice. To enable the alternative domain for the page to collect data from, use the custom domain feature in Plausible.

Maxmind GeoIP Support

Plausible can sort page views by country. To do that it uses an IP database by Maxmind. The free version, Geolite2, is sufficient most of the time. You will need a (free) license though, for which you can register on the Geolite2 page.

There is also a neat docker image that will keep the database up to date. I also use it in the script below.

Google Search Console Integration

Several years ago Google stopped passing on search terms when linking from the Google search engine. But you can access the search terms users used to view your page through the Google Search Console. Plausible can access these search terms through the Google Search Console API. For this to work, you have to enable the API first and set up OAuth authentication. This will provide you with a client id and secret that you can then use in the setup later.

I found the setup of the API and the authentcation to be a little bit confusing. This might be the result of my limited experience with the Google API Console, but I still wanted to provide you with the steps that worked for me.

First you should visit the Google API Console and add the Google Search ConsoleAPI in the Library tab. Then you can create an OAuth consent screen. The notation is a little bit weird here, they seem to call the screen an app sometimes. It gets worse in the german translation. In the tab Oauth consent screen you then insert the domain name of the domain that runs your plausible installation and also the top level domain as authorized domain:


In step 2 you click on Add or Remove Scopes and add the paths for the Google Search Console API that you added before.


Then you have to add some users that may use your app. This seems to be necessary as it runs as a test app. You also could have it verified and published, but I didn’t need that for my use case.


Finally, you can go to the Credentials tab and add a new OAuth 2.0 Client ID. The following settings should work:


When you save your settings and access them again, the client id and secret should be listed.

Ansible Script

Here are my scripts to deploy Plausible via Ansible. I shortened some of them for brevity. For instance, I use an Ansible vault to store the secrets and a Traefik container to provide access to the Plausible container. But to include them would have gone too far. The code is an Ansible role that you can integrate into your own Ansible workspace.

tasks/main.ymlview raw
---
- name: Ensure plausible network
docker_network:
name: plausible
appends: yes

- name: Ensure plausible directories
file:
name: "{{ item }}"
state: directory
loop:
- "{{ plausible_db_data_directory }}"
- "{{ plausible_events_db_config_directory }}"
- "{{ plausible_events_db_data_directory }}"
- "{{ maxmind_db_data_directory }}"
- "{{ maxmind_db_config_directory }}"

- name: Copy plausible events db config
copy:
src: "{{ item }}"
dest: "{{ plausible_events_db_config_directory }}/{{ item }}"
loop:
- config.xml
- user-config.xml

- name: Copy maxmind auth data
template:
src: files/geoip.conf.j2
dest: "{{ maxmind_db_config_directory }}/geoip.conf"
register: maxmind_auth_data

- name: Ensure maxmind db download container
docker_container:
name: maxminddb
image: maxmindinc/geoipupdate
recreate: "{{ maxmind_auth_data.changed }}"
volumes:
- "{{ maxmind_db_data_directory }}:/usr/share/GeoIP"
env:
GEOIPUPDATE_EDITION_IDS: GeoLite2-Country
GEOIPUPDATE_FREQUENCY: "168"
env_file: "{{ maxmind_db_config_directory }}/geoip.conf"

- name: Ensure plausible db container
docker_container:
name: plausible_db
image: postgres:12
volumes:
- "{{ plausible_db_data_directory }}:/var/lib/postgresql/data"
networks:
- name: plausible
networks_cli_compatible: yes
env:
POSTGRES_PASSWORD: "{{ plausible_db_password }}"
POSTGRES_DB: "{{ plausible_db_name }}"
POSTGRES_USER: "{{ plausible_db_user }}"
labels:
traefik.enable: "false"

- name: Ensure plausible events db container
docker_container:
name: plausible_events_db
image: yandex/clickhouse-server:latest
volumes:
- "{{ plausible_events_db_data_directory }}:/var/lib/clickhouse"
- "{{ plausible_events_db_config_directory }}/config.xml:/etc/clickhouse-server/config.d/logging.xml:ro"
- "{{ plausible_events_db_config_directory }}/user-config.xml:/etc/clickhouse-server/users.d/user-config.xml:ro"
networks:
- name: plausible
networks_cli_compatible: yes
container_default_behavior: no_defaults
ulimits:
- nofile:262144:262144
env:
CLICKHOUSE_USER: "{{ plausible_events_db_user }}"
CLICKHOUSE_PASSWORD: "{{ plausible_events_db_password }}"
CLICKHOUSE_DB: "{{ plausible_events_db_name }}"

- name: Ensure plausible container
docker_container:
name: plausible_analytics
image: plausible/analytics:latest
command: sh -c "sleep 10 && /entrypoint.sh db createdb && /entrypoint.sh db migrate && /entrypoint.sh db init-admin && /entrypoint.sh run"
networks:
- name: plausible
- name: traefik
networks_cli_compatible: yes
container_default_behavior: no_defaults
volumes:
- "{{ maxmind_db_data_directory }}/GeoLite2-Country.mmdb:/usr/local/lib/GeoLite2-Country.mmdb:ro"
env:
ADMIN_USER_NAME: "{{ plausible_admin_user }}"
ADMIN_USER_EMAIL: "{{ plausible_admin_email }}"
ADMIN_USER_PWD: "{{ plausible_admin_pwd }}"
BASE_URL: https://tnglab.example.com
SECRET_KEY_BASE: "{{ plausible_secret_key_base }}"
DATABASE_URL: "postgres://{{ plausible_db_user }}:{{ plausible_db_password }}@plausible_db:5432/{{ plausible_db_name }}"
CLICKHOUSE_DATABASE_URL: "http://{{ plausible_events_db_user }}:{{ plausible_events_db_password }}@plausible_events_db:8123/{{ plausible_events_db_name }}"
GOOGLE_CLIENT_ID: "{{ plausible_google_client_id }}"
GOOGLE_CLIENT_SECRET: "{{ vault_plausible_google_client_secret }}"
GEOLITE2_COUNTRY_DB: "/usr/local/lib/GeoLite2-Country.mmdb"
labels:
traefik.http.routers.plausible.rule: "Host(`tnglab.example.com`) || Host(`stats.example.com`)"
traefik.http.routers.plausible.tls: "true"
traefik.http.routers.plausible.entrypoints: "websecure"
traefik.http.routers.plausible.tls.certresolver: letsEncryptResolver
traefik.http.services.plausible.loadbalancer.server.port: "8000"
...

Fill in your secrets here:

vars/main.ymlview raw
---
maxmind_db_data_directory: /srv/data/maxmind/data
maxmind_db_config_directory: /srv/data/maxmind/config
maxmind_db_license_key: "..."
maxmind_db_account_id: "..."
plausible_db_data_directory: /srv/data/plausible/db/data
plausible_db_password: "..."
plausible_db_name: plausible
plausible_db_user: plausible
plausible_events_db_data_directory: /srv/data/plausible/events-db/data
plausible_events_db_config_directory: /srv/data/plausible/events-db/config
plausible_events_db_name: plausible
plausible_events_db_user: plausible
plausible_events_db_password: "..."
plausible_admin_pwd: "..."
plausible_admin_user: admin
plausible_admin_email: admin@tnglab.example.com
plausible_secret_key_base: "..."
plausible_google_client_id: "..."
vault_plausible_google_client_secret: "..."
...
files/user-config.xmlview raw
<yandex>
<profiles>
<default>
<log_queries>0</log_queries>
<log_query_threads>0</log_query_threads>
</default>
</profiles>
</yandex>
files/geoip.config.j2view raw
GEOIPUPDATE_ACCOUNT_ID={{ maxmind_db_account_id }}
GEOIPUPDATE_LICENSE_KEY={{ maxmind_db_license_key }}

Integrating Plausible

Once you set up your Plausible installation and a domain for your website, you will be able to access a line of JS code in the settings for your website in Plausible that you can then integrate into your website code. Something like this:

<script async defer data-domain="example.com" src="https://stats.example.com/js/index.js"></script>

Unfortunately the self hosted version of Plausible will not honor the disable switch in browser local storage, as the hosted version does.
You can easily add this feature by using a js snippet like the following instead of the one above.

<script>
if(String(localStorage.plausible_ignore).toLowerCase() !== 'true') {
var s = document.createElement('script');
s.async = 1;
s.src = 'https://stats.example.com/js/plausible.js'
s.defer = 1;
s.setAttribute('data-domain', 'example.com');

var firstScript = document.getElementsByTagName("script")[0];
firstScript.parentNode.insertBefore(s, firstScript);
}
</script>

And don’t forget to configure a custom domain for the data collection to serve the plausible.js as first party script. (See note above)

Conclusion

The feature set of Plausible is sufficient for my needs. In particular, I don’t have to scare my viewers away with crazy cookie banners. If you need more user data or an integration into, say, Google Adwords, you won’t be happy with Plausible.

Plausible seems to be quite production ready at this point. The features it has seem to be working and the setup process is quite easy. I prefer that to an exhaustive feature set where nothing works properly. I haven’t tested Plausible under heavy load, though.

The option to host everything myself is a big plus for me. This way I can play around with the tool and don’t have to worry about the data beeing in 3rd party hands.