Ukraine flag We stand with our friends and colleagues in Ukraine. To support Ukraine in their time of need visit this page.

Sampling

Version  next-release-v2 Click here for the latest version

Sampling is essential when handling large volumes of traces. Ideally, discarding less interesting data and keeping the data needed to diagnose issues. Other capabilities like calculating metrics from traces can help reduce the volume of spans collected and stored.

Jaeger supports multiple types of sampling strategies which leverage OpenTelemetry. The jaeger binary supports specific configurations.

Head Based Sampling

Head based sampling is when the sampling decision is made when the first spans are seen by jaeger. The support of Head based sampling comes from OpenTelemetry. The topic is covered in the OpenTelemetry documentation on head based samplingexternal link .

Example configuration for various head based sampling strategiesexternal link .

Tail Based Sampling

Tail based sampling allows for the sampling decisions to be made after the trace is complete and all spans have been collected. Ideally this provides more granular control over which traces and kept and which are discarded. The downside to this is that more memory and processing power will be used by jaeger binaries.

This configuration allows for the sampling of everything observedexternal link by jaeger with debug logging to diagnose problems.

The second configuration of jaeger samples for a specific service nameexternal link .

Remote Sampling

If your client SDKs are configured to use remote sampling configuration (see Remote Sampling API ) then sampling rates can be centrally controlled. In this setup a sampling strategy configuration is served to the client SDK that describes endpoints and their sampling probabilities. This configuration can be generated by jaeger in two different ways: periodically loaded from a file or dynamically calculated based on traffic . The method of generation is controlled by the environment variable SAMPLING_CONFIG_TYPE which can be set to either file (default) or adaptive.

File-based Sampling Configuration

jaeger can be instantiated with the --sampling.strategies-file option that points to a file containing sampling strategies to be served to Jaeger clients. The option’s value can contain a path to a JSON file, which will be automatically reloaded if its contents change, or an HTTP URL from where the file will be periodically retrieved, with reload frequency controlled by the --sampling.strategies-reload-interval option.

If no configuration is provided, jaeger-collectors will return the default probabilistic sampling policy with probability 0.001 (0.1%) for all services.

Example strategies.json:

{
  "service_strategies": [
    {
      "service": "foo",
      "type": "probabilistic",
      "param": 0.8,
      "operation_strategies": [
        {
          "operation": "op1",
          "type": "probabilistic",
          "param": 0.2
        },
        {
          "operation": "op2",
          "type": "probabilistic",
          "param": 0.4
        }
      ]
    },
    {
      "service": "bar",
      "type": "ratelimiting",
      "param": 5
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.5,
    "operation_strategies": [
      {
        "operation": "/health",
        "type": "probabilistic",
        "param": 0.0
      },
      {
        "operation": "/metrics",
        "type": "probabilistic",
        "param": 0.0
      }
    ]
  }
}

service_strategies element defines service specific sampling strategies and operation_strategies defines operation specific sampling strategies. There are 2 types of strategies possible: probabilistic and ratelimiting which are described above (NOTE: ratelimiting is not supported for operation_strategies). default_strategy defines the catch-all sampling strategy that is propagated if the service is not included as part of service_strategies.

In the above example:

  • All operations of service foo are sampled with probability 0.8 except for operations op1 and op2 which are probabilistically sampled with probabilities 0.2 and 0.4 respectively.
  • All operations for service bar are rate-limited at 5 traces per second.
  • Any other service will be sampled with probability 0.5 defined by the default_strategy.
  • The default_strategy also includes shared per-operation strategies. In this example we disable tracing on /health and /metrics endpoints for all services by using probability 0. These per-operation strategies will apply to any new service not listed in the config, as well as to the foo and bar services unless they define their own strategies for these two operations.

Adaptive Sampling

Since Jaeger v1.27.

Adaptive sampling works in jaeger-collector by observing the spans received from services and recalculating sampling probabilities for each service/endpoint combination to ensure that the volume of collected traces matches --sampling.target-samples-per-second. When a new service or endpoint is detected, it is initially sampled with --sampling.initial-sampling-probability until enough data is collected to calculate the rate appropriate for the traffic going through the endpoint.

Adaptive sampling requires a storage backend to store the observed traffic data and computed probabilities. At the moment memory (for all-in-one deployment), cassandra, badger, elasticsearch and opensearch are supported as sampling storage backends.

By default adaptive sampling will attempt to use the backend specified by SPAN_STORAGE_TYPE to store data. However, a second type of backend can also be specified by using SAMPLING_STORAGE_TYPE. For instance, SPAN_STORAGE_TYPE=elasticsearch SAMPLING_STORAGE_TYPE=cassandra ./jaeger-collector will run jaeger-collector in a mode where it attempts to store its span data in the configured elasticsearch cluster and its adaptive sampling data in the configured cassandra cluster. Note that this feature can not be used to store span and adaptive sampling data in two different backends of the same type.

Read this blog postexternal link for more details on adaptive sampling engine.