Monitoring cache stats using OpenTelemetry Metrics

This article explains how to use opentelemetry-go Metrics API to collect metrics, for example, go-redis/cacheopen in new window stats.

Getting started with OpenTelemetry Metrics

To get started with metrics, you need a MeterProvider which provides access to Meters:

import "go.opentelemetry.io/otel/metric/global"

// Meter can be a global/package variable.
var Meter = global.MeterProvider().Meter("app_or_package_name")

Using the meter, you can create instrumentsopen in new window and use them to measure operations. The simplest Counter instrument looks like this:

import "go.opentelemetry.io/otel/metric/instrument"

counter := Meter.SyncInt64().Counter(
	"test.my_counter",
    instrument.WithUnit("1"),
	instrument.WithDescription("Just a test counter"),
)

// Increment the counter.
counter.Add(ctx, 1, attribute.String("foo", "bar"))
counter.Add(ctx, 10, attribute.String("hello", "world"))

You can find more examplesopen in new window at GitHub.

Cache stats

Our Redis-based cacheopen in new window keeps stats about hits and misses in the following struct:

type Stats struct {
	Hits   uint64
	Misses uint64
}

You can get the current stats with:

stats := cache.Stats()
fmt.Println("hits", stats.Hits)
fmt.Println("misses", stats.Misses)

Monitoring cache stats

To start monitoring our cache, we can create a separate instrument for each struct field. Here we are using CounterObserveropen in new window instrument which periodically calls a function to gather stats.

import (
	"go.opentelemetry.io/otel/metric"
	"go.opentelemetry.io/otel/metric/instrument"
)

func MonitorCache(cache *cache.Cache, meter metric.Meter) {
	hits, _ := meter.AsyncInt64().Counter("cache.hits")
	misses, _ := meter.AsyncInt64().Counter("cache.misses")

	if err := meter.RegisterCallback(
		[]instrument.Asynchronous{
			hits,
			misses,
		},
		// SDK periodically calls this function to collect data.
		func(ctx context.Context) {
			stats := cache.Stats()

			hits.Observe(ctx, int64(stats.Hits))
			misses.Observe(ctx, int64(stats.Misses))
		},
	); err != nil {
		panic(err)
	}
}

Using the instruments above we get access to the following metrics:

  • cache.hits - number of cache hits.
  • cache.misses - number of cache hits.
  • cache.hits + cache.misses - number of cache requests.
  • cache.hits / (cache.hits + cache.misses) - cache hit rate.

Metric attributes

The code above works well enough, but what if we want to add another metric:

type Stats struct {
    Hits   uint64
    Misses uint64
+    Errors uint64
}

We could add another instrument to observe Errors field, but then we also need to update our math:

  • cache.hits + cache.misses + cache.errors - number of cache requests.
  • cache.hits / (cache.hits + cache.misses + cache.errors) - cache hit rate.

Can we do better? Yes, using a single instrument and metric attributesopen in new window:

import (
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/metric"
	"go.opentelemetry.io/otel/metric/instrument"
)

func MonitorCache(cache *cache.Cache, meter metric.Meter) {
	cacheCounter, _ := meter.AsyncInt64().Counter("cache.stats")

	hits := []attribute.KeyValue{attribute.String("type", "hits")}
	misses := []attribute.KeyValue{attribute.String("type", "misses")}
	errors := []attribute.KeyValue{attribute.String("type", "errors")}

	if err := meter.RegisterCallback(
		[]instrument.Asynchronous{
			cacheCounter,
		},
		// SDK periodically calls this function to collect data.
		func(ctx context.Context) {
			stats := cache.Stats()

			cacheCounter.Observe(ctx, int64(stats.Hits), hits...)
			cacheCounter.Observe(ctx, int64(stats.Misses), misses...)
			cacheCounter.Observe(ctx, int64(stats.Errors), errors...)
		},
	); err != nil {
		panic(err)
	}
}

Our new math looks like this and does not require changes when you add new stats:

  • cache.stats - number of cache requests.
  • filter(cache.stats, type = "hits") - number of cache hits.
  • filter(cache.stats, type = "misses") - number of cache misses.
  • filter(cache.stats, type = "hits") / cache.stats - cache hit rate.

As a bonus, you can easily visualize all available metrics using grouping by type attribute, for example, using Uptraceopen in new window:

cache.stats | group by type

Cache metrics

Prometheus

To export metrics to Prometheuse, see Exporting OpenTelemetry Metrics to Prometheus.

What's next

Next, you can learn about the available metric instrumentsopen in new window and try to instrument your code.

Last Updated: