- What log levels are and how they work.
- The history of log levels.
- Common log levels and how to use them.
- Using log levels for filtering purposes.
- Configuring alerts based on log levels.
Log Levels Explained and How to Use Them
Log levels are labels that indicate the severity or urgency of a log entry. Their primary purpose is to separate messages that are merely informational (meaning that the system is working normally) from those that describe a problem or potential problem, such as when recurring errors are detected in the system. Log levels also provide a way to dynamically control your application's volume of log output (more on this later).
In this article, we will discuss the following concepts that should help you get a handle on what log levels are and how to use them to log more effectively.
The history of log levels
Syslog, a logging solution initially developed for the Sendmail project, first introduced the concept of log levels in the 1980s. It came with severity levels that are attached to each log entry to describe the severity of the event in question:
- Emergency (
emerg
): system is unusable. - Alert (
alert
): immediate action required. - Critical (
crit
): critical conditions. - Error (
error
): error conditions. - Warning (
warn
): warning conditions. - Notice (
notice
): normal but significant conditions. - Informational (
info
): informational messages. - Debug (
debug
): messages helpful for debugging.
In the following years, Syslog was adopted by various software applications and eventually became a standard for message logging on Unix-like systems. Its severity levels were also adapted and refined by various application logging frameworks such as log4net and log4j, evolving into the various log levels that are commonplace today.
Common log levels and their use cases
The log levels available to you will vary depending on the programming language,
framework, or service in use. Still, most will include some or all of the
following levels: FATAL
, ERROR
, WARN
, INFO
, DEBUG
, and TRACE
. You
can usually override the defaults in your logging framework of choice with
custom log levels, but we recommend sticking to the ones discussed below. They
are arranged in decreasing order of urgency:
FATAL
The FATAL
log level annotates messages with the greatest severity. It usually
means that something critical is broken, and the application cannot continue to
do any more useful work without the intervention of an engineer. Typically, such
entries are logged before the application is shut down (with exit code 1
) to
prevent further data corruption. If you use a
log management service, you can configure it
such that you get instant alerts when such entries are logged so that someone
can react to them as quickly as possible.
Examples of situations that may be logged as FATAL
errors include the
following:
- Crucial configuration information is missing without fallback defaults.
- Unable to connect to a service crucial to the application's primary function (such as the database).
- Running out of disk space on the server.
ERROR
The ERROR
log level is used to represent error conditions in an application
that prevent a specific operation from running, but the application itself can
continue working even if it is at a reduced level of functionality or
performance. Generally, ERROR
logs should be investigated as soon as possible
but they don't carry the same urgency as FATAL
messages since the application
can continue working.
The occurrence of an error condition in the application does not necessarily
mean that it should be logged at the ERROR
level. For example, if an exception
is expected behavior and does not indicate degradation in application
functionality or performance, it can be logged as INFO
. Also, errors with a
possibility of recovery (such as network connectivity errors) can be labeled as
INFO
if an automatic recovery strategy is in place (e.g retries). Such
conditions can be promoted to the ERROR
level if recovery isn't possible after
a predetermined time.
Logging significant error conditions is also useful for generating metrics such
as
Mean Time Between Failures (MTBF)
which can be used to assess the quality of the application or to compare
different systems or designs. Examples of situations that are typically logged
at the ERROR
level include the following:
- A persistent connection failure to some external resource (after automated recovery attempts have failed).
- Failure to create or update a resource in the system.
- An unexpected error (e.g failed to decode a JSON object).
WARN
Messages logged at the WARN
level typically indicate that something unexpected
happened, but the application can recover and continue to function normally. It
is mainly used to draw attention to situations that should be addressed soon
before they pose a problem for the application.
Events that may be logged at the WARN
level include the following:
- The disk usage on the server is above a configured threshold.
- Memory usage is above a configured threshold.
- The application is taking longer than usual to complete some important tasks (degraded performance).
INFO
INFO
-level messages indicate events in the system that are significant to the
business purpose of the application. Such events are logged to show that the
system is operating normally. For example, a service was started or stopped,
some resource was created, accessed, updated, or deleted in the database, and so
on. Production systems typically default to logging at this level so that a
summary of the application's normal behavior is visible to anyone reading the
logs.
Other events that are typically logged at the INFO
level include the
following:
- The state of an operation has changed (e.g from "PENDING" to "IN PROGRESS").
- The application is listening on a specific port.
- A scheduled job was completed successfully.
DEBUG
The DEBUG
level is used for logging messages that help developers find out
what went wrong during a debugging session. While the specifics of what messages
to log at the DEBUG
level is dependent on your application, you generally want
to include detailed information that can help developers troubleshoot an issue
quickly. This can include variable state in the surrounding scope, or relevant
error codes. Unlike TRACE
(below), DEBUG
level logging can be turned on in
production without making the application unusable, but it should not be left on
indefinitely to ensure optimal performance of the system.
TRACE
The TRACE
level is used for tracing the path of code execution in a program.
For example, you may use it to trace the processing of a incoming request or an
algorithm's steps to solve a problem. Generally, TRACE
is used for showing the
flow of the program, and to provide a detailed breakdown of the sequence of
events that led to a crash, a silent failure, an error, or some other event
logged at a different level. Concrete examples of messages that should be logged
at the TRACE
level include the following:
- Entered or exited a function or method, perhaps with the processing duration.
- Calculation x + y produced output z.
- Starting or ending an operation and any intermediate state changes.
As you can see, the information logged at this level generally tries to capture
every possible detail about the program's execution. Therefore, TRACE
logging
should only be enabled for short periods due to the significant performance
degradation that it often causes. You will typically enable it only in
development and testing environments.
Controlling your application's log volume
Log levels are the primary way to control your application's volume of log
entries. Once you select your default level, all log entries that are labeled
with a severity lower than the default will not be recorded. For example,
logging at the WARN
level will cause INFO
, DEBUG
and TRACE
messages to
be ignored.
As you go down in default severity, the number of entries that are produced will
increase, so it's a good idea to turn on only what is necessary to avoid being
flooded with too much information. A typical default for production environments
is INFO
, which records messages logged at the INFO
level or higher priority
(WARN
, ERROR
and FATAL
). You can change this to WARN
if you only want to
record events that indicate problems or potential problems.
When troubleshooting a problem in production, you might want to reduce the
default severity of recorded messages to DEBUG
. This level will typically
produce a voluminous output filled with enough context that will help developers
debug the issue, but it should be turned off afterward to prevent flooding the
system with irrelevant log entries during normal operation of the application.
The TRACE
level produces even more logs than DEBUG
so it shouldn't be used
in production for sustained periods. It's better utilized in a development or
testing environment where system performance degradation isn't a critical
consideration.
Control your default log level is best done through an environmental variable so that you can change it without modifying the code. However, you might need to restart the application each time the log level needs to be updated. There are also several ways to update the log level at runtime, but the specific technique will depend on the application environment and framework used. Ensure to thoroughly investigate the options available if this is something that interests you.
How to use log levels for monitoring and analysis
After you've configured your application to produce logs with the severity levels included, you might be wondering how to use the recorded labels to make sense of the log messages. The three main ways to use log levels for post-logging analysis are discussed below:
1. Filtering
Log levels allow you to quickly sift your logs such that only the relevant ones
are displayed. If you use a cloud log management service like
Logtail, it's easy to
specify filters that
display only the ERROR
level entries that occurred in a time period.
2. Alerting
Another useful way to use log levels is for creating alerts in various
scenarios. You can notify relevant members of your team if a notable event
occurs on the system, or if an expected event didn't occur within a specified
time frame. The example below sends an alert to configured email addresses when
more than five ERROR
entries are logged within a 30 second period.
Aside from sending alerts to email addresses, you can configure various integrations so that you can receive alerts in Slack or other services in your stack.
3. Calculating various metrics
Log levels are also a useful tool for generating various metrics about the
application, especially those that help gauge its reliability. For example, the
number of ERROR
or FATAL
entries recorded in a specific period is valuable
data that could help inform if some sort of "bug squashing sprint" should be
next up on the calendar.
Final thoughts
Using the right log level is a crucial step for effective log management. If your log levels are sound, it will be easy to filter your logs by priority, and you can create alerts for notable events. We hope this article has provided enough information to help you understand log levels and when to use them. For more details on logging techniques and practices to follow, check out the other articles in our logging guide.
Thanks for reading, and happy logging!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for us
Build on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github