Collecting logs in the cloud with Grafana Loki
In the good old days you had one server running your services. When something failed, you logged in via SSH and checked the corresponding log file. Today, most of the time no one server is running all your services. So the log files are distributed over multiple machines and ways of accessing them. From journald, docker logs, over syslog to simple files there are just too many options to check the logs efficiently, especially if you use scale sets on Azure or something equivalent to dynamically adjust the number of VMs to the workload.
Sometimes one solves this problem by introducing an Elasticsearch, Logstash and Kibana (ELK) stack that gathers the logs and makes them searchable. That’s a nice solution, albeit a resource intensive one.
We want to look at a more lightweight alternative: The log aggregator Grafana Loki. Like Elasticsearch it stores logs that are gathered by log shippers like Promtail. You can then display the logs using Grafana.
But unlike Elasticsearch Loki is more lightweight. That’s mostly because it omits the main feature of Elasticsearch: search. Instead, and much more like Prometheus, Loki stores log lines annotated with tags that you can later filter on. So there is no real-time search on log text.
The upside is low hardware requirements. I myself run Loki comfortably on a Raspi 3B where it collects logs from several systems using below 1% CPU at all times. An ELK stack would have serious problems even running on the Raspi 3B, mostly due to the 1GB of system memory.