Log Levels Explained and How to Use Them

Log levels are labels that indicate the severity or urgency of a log entry. Their primary purpose is to separate messages that are merely informational (meaning that the system is working normally) from those that describe a problem or potential problem, such as when recurring errors are detected in the system. Log levels also provide a way to dynamically control your application's volume of log output (more on this later).

In this article, we will discuss the following concepts that should help you get a handle on what log levels are and how to use them to log more effectively.

What log levels are and how they work.
The history of log levels.
Common log levels and how to use them.
Using log levels for filtering purposes.
Configuring alerts based on log levels.

The history of log levels

Syslog, a logging solution initially developed for the Sendmail project, first introduced the concept of log levels in the 1980s. It came with severity levels that are attached to each log entry to describe the severity of the event in question:

Emergency (emerg): system is unusable.
Alert (alert): immediate action required.
Critical (crit): critical conditions.
Error (error): error conditions.
Warning (warn): warning conditions.
Notice (notice): normal but significant conditions.
Informational (info): informational messages.
Debug (debug): messages helpful for debugging.

In the following years, Syslog was adopted by various software applications and eventually became a standard for message logging on Unix-like systems. Its severity levels were also adapted and refined by various application logging frameworks such as log4net and log4j, evolving into the various log levels that are commonplace today.

Common log levels and their use cases

The log levels available to you will vary depending on the programming language, framework, or service in use. Still, most will include some or all of the following levels: FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. You can usually override the defaults in your logging framework of choice with custom log levels, but we recommend sticking to the ones discussed below. They are arranged in decreasing order of urgency:

FATAL

The FATAL log level annotates messages with the greatest severity. It usually means that something critical is broken, and the application cannot continue to do any more useful work without the intervention of an engineer. Typically, such entries are logged before the application is shut down (with exit code 1) to prevent further data corruption. If you use a log management service, you can configure it such that you get instant alerts when such entries are logged so that someone can react to them as quickly as possible.

Examples of situations that may be logged as FATAL errors include the following:

Crucial configuration information is missing without fallback defaults.
Unable to connect to a service crucial to the application's primary function (such as the database).
Running out of disk space on the server.

ERROR

The ERROR log level is used to represent error conditions in an application that prevent a specific operation from running, but the application itself can continue working even if it is at a reduced level of functionality or performance. Generally, ERROR logs should be investigated as soon as possible but they don't carry the same urgency as FATAL messages since the application can continue working.

The occurrence of an error condition in the application does not necessarily mean that it should be logged at the ERROR level. For example, if an exception is expected behavior and does not indicate degradation in application functionality or performance, it can be logged as INFO. Also, errors with a possibility of recovery (such as network connectivity errors) can be labeled as INFO if an automatic recovery strategy is in place (e.g retries). Such conditions can be promoted to the ERROR level if recovery isn't possible after a predetermined time.

Logging significant error conditions is also useful for generating metrics such as Mean Time Between Failures (MTBF) which can be used to assess the quality of the application or to compare different systems or designs. Examples of situations that are typically logged at the ERROR level include the following:

A persistent connection failure to some external resource (after automated recovery attempts have failed).
Failure to create or update a resource in the system.
An unexpected error (e.g failed to decode a JSON object).

WARN

Messages logged at the WARN level typically indicate that something unexpected happened, but the application can recover and continue to function normally. It is mainly used to draw attention to situations that should be addressed soon before they pose a problem for the application.

Events that may be logged at the WARN level include the following:

The disk usage on the server is above a configured threshold.
Memory usage is above a configured threshold.
The application is taking longer than usual to complete some important tasks (degraded performance).

INFO

INFO-level messages indicate events in the system that are significant to the business purpose of the application. Such events are logged to show that the system is operating normally. For example, a service was started or stopped, some resource was created, accessed, updated, or deleted in the database, and so on. Production systems typically default to logging at this level so that a summary of the application's normal behavior is visible to anyone reading the logs.

Other events that are typically logged at the INFO level include the following:

The state of an operation has changed (e.g from "PENDING" to "IN PROGRESS").
The application is listening on a specific port.
A scheduled job was completed successfully.

DEBUG

The DEBUG level is used for logging messages that help developers find out what went wrong during a debugging session. While the specifics of what messages to log at the DEBUG level is dependent on your application, you generally want to include detailed information that can help developers troubleshoot an issue quickly. This can include variable state in the surrounding scope, or relevant error codes. Unlike TRACE (below), DEBUG level logging can be turned on in production without making the application unusable, but it should not be left on indefinitely to ensure optimal performance of the system.

TRACE

The TRACE level is used for tracing the path of code execution in a program. For example, you may use it to trace the processing of a incoming request or an algorithm's steps to solve a problem. Generally, TRACE is used for showing the flow of the program, and to provide a detailed breakdown of the sequence of events that led to a crash, a silent failure, an error, or some other event logged at a different level. Concrete examples of messages that should be logged at the TRACE level include the following:

Entered or exited a function or method, perhaps with the processing duration.
Calculation x + y produced output z.
Starting or ending an operation and any intermediate state changes.

As you can see, the information logged at this level generally tries to capture every possible detail about the program's execution. Therefore, TRACE logging should only be enabled for short periods due to the significant performance degradation that it often causes. You will typically enable it only in development and testing environments.

Controlling your application's log volume

Log levels are the primary way to control your application's volume of log entries. Once you select your default level, all log entries that are labeled with a severity lower than the default will not be recorded. For example, logging at the WARN level will cause INFO, DEBUG and TRACE messages to be ignored.

As you go down in default severity, the number of entries that are produced will increase, so it's a good idea to turn on only what is necessary to avoid being flooded with too much information. A typical default for production environments is INFO, which records messages logged at the INFO level or higher priority (WARN, ERROR and FATAL). You can change this to WARN if you only want to record events that indicate problems or potential problems.

When troubleshooting a problem in production, you might want to reduce the default severity of recorded messages to DEBUG. This level will typically produce a voluminous output filled with enough context that will help developers debug the issue, but it should be turned off afterward to prevent flooding the system with irrelevant log entries during normal operation of the application.

The TRACE level produces even more logs than DEBUG so it shouldn't be used in production for sustained periods. It's better utilized in a development or testing environment where system performance degradation isn't a critical consideration.

Control your default log level is best done through an environmental variable so that you can change it without modifying the code. However, you might need to restart the application each time the log level needs to be updated. There are also several ways to update the log level at runtime, but the specific technique will depend on the application environment and framework used. Ensure to thoroughly investigate the options available if this is something that interests you.

How to use log levels for monitoring and analysis

After you've configured your application to produce logs with the severity levels included, you might be wondering how to use the recorded labels to make sense of the log messages. The three main ways to use log levels for post-logging analysis are discussed below:

1. Filtering

Log levels allow you to quickly sift your logs such that only the relevant ones are displayed. If you use a cloud log management service like Logtail, it's easy to specify filters that display only the ERROR level entries that occurred in a time period.

2. Alerting

Another useful way to use log levels is for creating alerts in various scenarios. You can notify relevant members of your team if a notable event occurs on the system, or if an expected event didn't occur within a specified time frame. The example below sends an alert to configured email addresses when more than five ERROR entries are logged within a 30 second period.

Aside from sending alerts to email addresses, you can configure various integrations so that you can receive alerts in Slack or other services in your stack.

3. Calculating various metrics

Log levels are also a useful tool for generating various metrics about the application, especially those that help gauge its reliability. For example, the number of ERROR or FATAL entries recorded in a specific period is valuable data that could help inform if some sort of "bug squashing sprint" should be next up on the calendar.

Final thoughts

Using the right log level is a crucial step for effective log management. If your log levels are sound, it will be easy to filter your logs by priority, and you can create alerts for notable events. We hope this article has provided enough information to help you understand log levels and when to use them. For more details on logging techniques and practices to follow, check out the other articles in our logging guide.

Thanks for reading, and happy logging!

Article by

Ayooluwa Isaiah

Ayo is a technical content manager at Better Stack. His passion is simplifying and communicating complex technical ideas effectively. His work was featured on several esteemed publications including LWN.net, Digital Ocean, and CSS-Tricks. When he's not writing or coding, he loves to travel, bike, and play tennis.

Got an article suggestion? Let us know

6 Factors to Consider When Choosing a Logging Framework

Logging frameworks are tools that help you standardize logging in your application. This article will guide you through the process of choosing a suitable logging framework for your application

→

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contents