Writing metrics SQL queries

Gain insights into your infrastructure and applications by visualizing your metrics. Create tailored charts by writing your own custom SQL queries.

Want a dashboard that works out of the box?

Use one of our ready-to-use templates to create a dashboard.

Getting started

Chart free memory for host web-01 using metric memory_free_bytes:

Metrics SQL query
SELECT {{time}} AS time,
  avgMerge(value_avg) AS value,
  'Free RAM' AS series
FROM {{source}}
WHERE time BETWEEN {{start_time}} AND {{end_time}}
  AND name = 'memory_free_bytes'
  AND metricTag('hostname') = 'web-01'
GROUP BY time

Chart preview

  • Dashboard queries use ClickHouse SQL, which is largely similar to ANSI SQL you’re likely familiar with. In SQL query above, we use avgMerge ClickHouse function.

  • On top of SQL, dashboard queries feature variables like {{source}} or {{time}}. Variables conveniently insert selected source, current time range, and other selected values into your queries. Read more about [variables][dashboard-variables].

  • Chart queries don't query pure metrics directly. Metric values are aggregated into efficient metrics. You can use name and metricTag() to access metric name and tags in your queries.

Which columns can be used in metrics queries?


We automatically make relevant columns available for metrics.

You can find the columns below in Sourcesyour sourceConfigureAdvanced settingsLogs to metrics section:

Screenshot of Logs to metrics tables

Metrics

  • Rows is the number of underlying metric data points.

  • Value metric represents the immediate value of the metric. Use it with gauge metrics or anytime you want to chart the metric value directly.

  • Rate represents how much the metric value has increased per second. Common use for rates is to calculate the number of requests per second. Use rate with counter metrics.


Rows, value and rate are aggregated metrics. Check out the examples below to see how to use the aggregated metrics.

What are gauges and counter metrics?

Gauge is a metric that can go up and down, used for, e.g., CPU or memory usage.
Counter is a metric that can only increase, commonly used for number of requests.
Read more about metric types. TODO: link

Group by

  • Name column is the name of your metric. Use it to select, filter, or anywhere else in your SQL queries.

  • Tags column contains all metric tags, commonly called labels. Use metricTag(‘tag_name’) to get value for any tag. Use the resulting tag value to select, group by, or anywhere else in your SQL queries.

  • Series_id column is a unique value for each metric series. Use GROUP BY series_id when aggregating your metrics.

What is a metric series?

Metric series represent individual series of values that make up your metric. For example, a metric with hostname tag will have one series for each hostname. Read more about metrics series. TODO: link

Metrics query examples

CPU load

Chart CPU load using cpu_load gauge metric:


CPU load example
WITH grouped_values AS
-- get maximum value for each unique series within the time interval
(
  SELECT
    {{time}} AS time, 
    maxMerge(value_max) as max_value
  FROM {{source}}
  WHERE name = 'cpu_load'
    AND time BETWEEN {{start_time}} AND {{end_time}}
  GROUP BY series_id, time 
)
-- SUM all series to get the total value
SELECT time, SUM(max_value) as value
FROM grouped_values
GROUP BY time
  • We use maxMerge(value_max) to get the maximal CPU load in each time interval.

  • We use maximum to get a more accurate visualisation of CPU load peaks. Using avgMerge(value_avg) is a good alternative if care about average values more than peaks.

  • We use WHERE name = 'cpu_load’ to only select cpu_load metric.

  • We use grouped_values subquery to first calculate maximum for each metric series. Using GROUP BY series_id ensures that we get correct maximum for each series.
Then we SUM() all metrics series together in second query.

Number of requests

Chart the average number of served requests using requests_total counter metric:

Number of requests example
WITH 
-- get average rate for each unique series within the time interval
grouped_rate AS (
  SELECT
    {{time}} AS time, 
    series_id,
    avgMerge(rate_avg) AS avg_rate
  FROM {{source}}
  WHERE name = 'requests_total' 
     AND time BETWEEN {{start_time}} AND {{end_time}}
  GROUP BY time, series_id
)
-- SUM all series to get total rate
SELECT time, 'Requests per second' AS series, SUM(avg_rate) as value
FROM grouped_rate
GROUP BY time
  • We use avgMerge(rate_avg) to get the average number of severed requests. 
We sum the averaged in the second query to get the total average number of served requests.


 ALT (?): combine?
The counter metric requests_total represents the total number of requests. We use rate to calculate the average number of requests per second that are hitting our server.

  • We SELECT 'Requests per second' as series to set the name of series. It will be used in tooltips and the chart legend.



Enumerate tag values

E.g. number of pods Or list of pods

Get the number of running servers by getting all different values for hostname tag:

 

TODO:



  • We count distinct values for hostname tag to get the number of severs.

  • We select the up metric by name to only get hostname values for running servers. Other metrics could have hostname tag values for servers that are down. Selecting only metrics we need also makes our queries faster.