Want a dashboard that works out of the box?
Use one of our ready-to-use templates to create a dashboard.
Gain insights into your infrastructure and applications by visualizing your metrics. Create tailored charts by writing your own custom SQL queries.
Use one of our ready-to-use templates to create a dashboard.
Chart free memory for host web-01
using metric memory_free_bytes
:
SELECT {{time}} AS time,
avgMerge(value_avg) AS value,
'Free RAM' AS series
FROM {{source}}
WHERE time BETWEEN {{start_time}} AND {{end_time}}
AND name = 'memory_free_bytes'
AND metricTag('hostname') = 'web-01'
GROUP BY time
Dashboard queries use ClickHouse SQL, which is largely similar to ANSI SQL you’re likely familiar with. In SQL query above, we use avgMerge
ClickHouse function.
On top of SQL, dashboard queries feature variables like {{source}}
or {{time}}
. Variables conveniently insert selected source, current time range, and other selected values into your queries. Read more about [variables][dashboard-variables].
Chart queries don't query pure metrics directly. Metric values are aggregated into efficient metrics. You can use name
and metricTag()
to access metric name and tags in your queries.
We automatically make relevant columns available for metrics.
You can find the columns below in Sources → your source → Configure → Advanced settings → Logs to metrics section:
Rows is the number of underlying metric data points.
Value metric represents the immediate value of the metric. Use it with gauge metrics or anytime you want to chart the metric value directly.
Rate represents how much the metric value has increased per second. Common use for rates is to calculate the number of requests per second. Use rate with counter metrics.
Rows, value and rate are aggregated metrics. Check out the examples below to see how to use the aggregated metrics.
Gauge is a metric that can go up and down, used for, e.g., CPU or memory usage.
Counter is a metric that can only increase, commonly used for number of requests.
Read more about metric types. TODO: link
Name column is the name of your metric. Use it to select, filter, or anywhere else in your SQL queries.
Tags column contains all metric tags, commonly called labels. Use metricTag(‘tag_name’)
to get value for any tag. Use the resulting tag value to select, group by, or anywhere else in your SQL queries.
Series_id column is a unique value for each metric series. Use GROUP BY series_id
when aggregating your metrics.
Metric series represent individual series of values that make up your metric. For example, a metric with hostname
tag will have one series for each hostname. Read more about metrics series. TODO: link
Chart CPU load using cpu_load
gauge metric:
WITH grouped_values AS
-- get maximum value for each unique series within the time interval
(
SELECT
{{time}} AS time,
maxMerge(value_max) as max_value
FROM {{source}}
WHERE name = 'cpu_load'
AND time BETWEEN {{start_time}} AND {{end_time}}
GROUP BY series_id, time
)
-- SUM all series to get the total value
SELECT time, SUM(max_value) as value
FROM grouped_values
GROUP BY time
We use maxMerge(value_max)
to get the maximal CPU load in each time interval.
We use maximum to get a more accurate visualisation of CPU load peaks.
Using avgMerge(value_avg)
is a good alternative if care about average values more than peaks.
We use WHERE name = 'cpu_load’
to only select cpu_load
metric.
We use grouped_values
subquery to first calculate maximum for each metric series. Using GROUP BY series_id
ensures that we get correct maximum for each series.
Then we SUM()
all metrics series together in second query.
Chart the average number of served requests using requests_total
counter metric:
WITH
-- get average rate for each unique series within the time interval
grouped_rate AS (
SELECT
{{time}} AS time,
series_id,
avgMerge(rate_avg) AS avg_rate
FROM {{source}}
WHERE name = 'requests_total'
AND time BETWEEN {{start_time}} AND {{end_time}}
GROUP BY time, series_id
)
-- SUM all series to get total rate
SELECT time, 'Requests per second' AS series, SUM(avg_rate) as value
FROM grouped_rate
GROUP BY time
avgMerge(rate_avg)
to get the average number of severed requests.
We sum the averaged in the second query to get the total average number of served requests.
ALT (?): combine?
The counter metric requests_total
represents the total number of requests. We use rate to calculate the average number of requests per second that are hitting our server.
SELECT 'Requests per second' as series
to set the name of series. It will be used in tooltips and the chart legend.
E.g. number of pods Or list of pods
Get the number of running servers by getting all different values for hostname
tag:
TODO:
We count distinct values for hostname
tag to get the number of severs.
We select the up
metric by name
to only get hostname
values for running servers. Other metrics could have hostname
tag values for servers that are down. Selecting only metrics we need also makes our queries faster.