Log Quad: Domesticate Your Logging Bill

Photo showing many things made of logs (wood).

Semantically rich with high cardinality, logs are exceptionally powerful to serve a wide range of purposes, even unforeseen ones.

TL;DR: To safely tune your logging bill, you must first understand your log emissions. Evaluate them through the Log Quad: (intents severity frequency code-marker).

No one writes log statements because they are aesthetically pleasing. In fact, they are impure, imperative side-effects that look like mould all over the codebase. We write logs because we know we need high-cardinality glimpses into our systems when the time comes. When a distributed system inevitably goes sideways, they are the only lifeline we can afford; most of us cannot stomach the price tag attached to the promises of platforms like Honeycomb, and many cannot even justify DataDog.

But then there is the distributed systems multiplier effect; a single user interaction cascades through a dozen microservices, turning our humble observability tool into a sizable infrastructure bill. Leadership inevitably panics, and the knee-jerk mandates drop: "Sample at 5%" or "Only ingest ERROR".

Trading minutes of infrastructure savings for hours of blindfolded incident investigation is plain false economy. To tune our logging investment, we need a better heuristic than just a $$$ amount on a dashboard. We need to evaluate our telemetry through the Log Quad.

The Semantic Foundation

Every single line in the codebase that emits a log entry can be classified by a powerful semantic quad:

intents
severity
frequency
code-marker

Where Severity and Frequency are orthogonal dimensions describing the operations and Intents are heuristics that surface the semantics.

Image of the type of each element of the quad and its owner

The four elementals! Constituents of a Log Quad.

Severity

Severity is a proxy for impact. That means when logging a situation, you should ask "What's the potential impact of the situation on SLOs and data integrity if left unmitigated?" Or in less precise terms, "What is the on-call impact?"

Since severity is all about impact, only the team who owns the service can tell what is the severity of a given log emission.

The most common severities and the typical meanings associated with them are as follows:

ERROR: A request/task failed.

Data may be incorrect.

Potentially user/client visible.
WARN: Something is abnormal/off but the system is coping somehow.

Likely needs attention if it keeps happening.
INFO: Expected lifecycle.

Coarse-grained state transition.

Important but business-as-usual.
DEBUG: Detailed diagnostics.

Frequency

When looking at a log statement, you should ask "What is the impact of it on the observability/logging infrastructure bill?" Or in less precise terms, "How often is this emitted?"

In other words, frequency is all about occurrence rate. That means, it's a shared responsibility between the team owning the service and SRE/Infrastructure.

Common frequencies along some examples are as below:

Rare: The database is inaccessible.

Cannot acquire a token from STS.
Occasional: A circuit-breaker has been tripped.
High: Consuming OrderCreated event for user=123.

Updating billing address for user=123.
Extreme: Logs emitted in a for-loop.

Protocol chatter.

Remember: Severity ≠ Frequency

Severity and Frequency are independent.

Sometimes correlated but often not.

They measure different aspect of operations.

Despite the simplicity of the above, we tend to mix the two dimensions quite frequently. All because of a historical tooling limitation: up until a few years ago, most logging frameworks supported only one dimension: Severity. In the absence of frequency, it became common to treat ERROR as "rare and important" and INFO as "frequent and noisy".

Intent

While severity and frequency provide a solid understanding of the mechanical aspects of our logging practices, they tell zero about the semantics: "Why is this log emitted?" Or "What is this log statement helping with?"

In other words, the intent heuristic.

Allow me to introduce you to a highly opinionated list of intents that I've compiled over the years.

Service Hops

Approximate Description

Captures how state evolves as the information moves through the system and across (micro)services, e.g. "User 123 added item 456 to cart".

Where does it apply?

Usually, but not necessarily, at the boundaries of (micro)services, such as:

HTTP/gRPC handlers
Asynchronous message handlers

Bookkeeping

Approximate Description

Tracking scenarios, states, conditions, or actions within a service, e.g. "Processed order #123".

Where does it apply?

Within the confines of a single service.

Inspection

Approximate Description

A detailed trail of requests moving through the internal layers of a single service.

Where does it apply?

Within the confines of a single service.

Notification

Approximate Description

A specialized form of inspection conveying that a specific action happened, e.g. "Email sent").

Where does it apply?

Within the confines of a single service.

Rollout Leftover

Approximate Description

Debug logs added for a feature rollout that finished, say, six months ago.

Where does it apply?

Within the confines of a single service.

Nonsense

Approximate Description

Logs that add zero context or value.

Where does it apply?

Anywhere.

Code Marker

It's just a pointer back to the codebase where the log is emitted, e.g. in the form of path-to-file:line-no.

Nothing fancy, indeed. But despite its simplicity this is the key elemental in whipping the machines to work for you as you will see in the next section.

Execution

Executing the Log Quad framework is almost impossible if not done through the lens of the Brain-Work Brain-Toil framework. It all boils down to four stages.

Executing the log quad framework follows the new division of labour

Stage 1: The Analysis Prompt

Type: Brain-Work
Actor: You
Input: What you've found in this essay and in your research.
Output: Analysis Prompt for the machine.
Description: You've read the Log Quad framework. You've done your own research. You've considered the realities of your codebase. Time to encode your synthesis in a prompt to guide the machine during the semantic analysis.

Stage 2: Compilation

Type: Brain-Toil
Actor: The Machine
Input: Codebase plus the Analysis Prompt from stage 1.
Output: Collection of log quads covering 100% of the input. For instance in a CSV file.
Description: Whip the machine to work! Instruct it to follow the Analysis Prompt to semantically analyse your codebase log statements and produce the much desired collection of log quads.

Stage 3: Defining the Transformations

Type: Brain-Work
Actor: You
Input: Collection of log quads from stage 2.
Output: A set of semi-deterministic transformations in the form of Predicate => Action. For instance, if (severity=info, frequency=any, intents contains "bookkeeping", path-contains="myapp/*/persistence/"), then (action="convert to metric")
Description: The first order of business is to verify what the machine has produced. You spot-check some random quads to make sure and go back to stage 1 or 2 in case you're not happy with the results. Otherwise, you author a tactical set of transformations for the machine to guide its understanding of log quads and how to apply any modifications to them where appropriate.

Some Useful Transformations

Here are some non-exhaustive and high-level transformations that you may use as a starting point:

Service Hops: Definitely high value. They allow you to trace a specific dimension, like a user, across boundaries. Keep them, but exercise care at the edge (web endpoints).
Bookkeeping: These are often lazy metrics and are usually summed up or aggregated later. You can safely convert them to proper metrics (such as counters, or gauges) and pair them with highly sampled logs.
Inspection: Despite being the most voluminous intent, it is the primary (and only?) tool for incident response and retrospective investigations. Dynamic Filtering may be a good answer here; to enable this intent only for specific userIds or feature flags when debugging.
Notification: Convert to Metrics or move to a dedicated Auditable Event Stream if compliance is required.
Rollout Leftover: Garbage. Just remove it.
Nonsense: Garbage. Just remove it.

Stage 4: Apply the Transformations

Type: Brain-Toil
Actor: The Machine
Input: Collection of log quads along with the transformations from stage 3.
Output: Rather a side-effect: the updated codebase.
Description: Whip the machine to work! Instruct it to take the transformations and apply them to all, or a subset, of the log statements across your codebase.

Up Your Logs

After executing the Log Quad framework, it's quite likely that you find no easy/cheap alternative to a sizable portion of the emissions. For instance, it may turn out to be more cost effective to keep logging compared to using non-sampled traces when your intent is Service Hop.

Luckily, for such scenarios, the modern observability world offers strategies to help keep your logging investment focused. Let's look at a few pointers for your research.

Structured Logging

Besides cheaper indexing/searchability, emitting (JSON) objects instead of strings allows for low-tech and accurate routing of the emissions.

You can configure your shipper to send tenant=VIP logs to expensive, high-retention hot storage, while sending tenant=free-tier logs to cheap cold storage, say S3.

In other words, structured logging helps you get to a low-tech but effective setup of keeping your spending focused.

Application-Side Dynamic Filtering

If Plato was a software engineer, I'm quite certain he would have described his utopia as some variation of this!

Instead of hardcoding severity filters and ingestion rules, your application listens for configuration changes at runtime. During an incident, you can toggle a specific module, or a specific set of user IDs, even down to the dreaded DEBUG level. And do it instantly! You get 100% fidelity for the problem case and 0% overhead for the rest of the traffic.

Application-Side Sampling

Sampling at the log aggregator should be treated as a safety valve. Since it's far away from your application it is a blunt instrument that doesn't understand context.

To preserve context, sampling at the application-side should be treated as a design feature, i.e. a non-functional requirement. Besides allowing you to be selective and intentional about what to sample in general, it enables head sampling where you make the decision to sample a request/message right when you receive it, e.g. based on userId. This ensures that if a request is sampled, you get all the logs for that trace, preserving the causal chain, rather than a random scattering of disconnected log lines.

Final Words

You can only cost-cut your reliability so much.

If you treat logs as liabilities and bills to be reduced or removed, you will blind yourself; you will have less to pay for logging but at the price of customer churn, incident response muscles, engineering debugging hours, and less observability insight.

Instead, view your logs through the lens of the Quad: understand what is useful, what can be more efficiently emitted, and maintain your operations sanity and bills without losing the insights that matter.

Though calls to action have become a cliche, I cannot help but ask: "Do you know how much you're paying for 'Bookkeeping' logs right now!?" 😁

Search...

Bahman's Musings