OpenTelemetry with Jaeger in Kubernetesยป
This guide provides a way to configure OpenTelemetry telemetry with Jaeger for your Spacelift installation running in Kubernetes.
OpenTelemetry is a set of open standards and tools for collecting, processing, and exporting telemetry data such as traces and metrics. Jaeger is an open-source distributed tracing platform that serves as a backend for receiving, storing, and visualizing trace data. Unlike some tracing solutions that require separate components for storage and visualization, Jaeger combines both capabilities in a single platform with a built-in UI.
Architecture overviewยป
The telemetry pipeline consists of three main components:
- Spacelift applications (server, drain, scheduler) - Emit trace data
- OpenTelemetry Collector - Receives, processes, and forwards telemetry data
- Jaeger - Stores traces and provides a web UI for visualization
The OpenTelemetry Collector acts as a relay that receives traces from Spacelift applications and forwards them to Jaeger. It doesn't store any data itself - it's purely a middleware component that allows you to decouple your applications from specific backend implementations.
flowchart LR
A[Spacelift Server] --> D[OpenTelemetry Collector]
B[Spacelift Drain] --> D
C[Spacelift Scheduler] --> D
D --> E[Jaeger<br/>Storage + UI]
Info
For more information about telemetry configuration options and supported backends, see the telemetry reference documentation.
Prerequisitesยป
Before proceeding with the installation, ensure you have:
- A running Spacelift installation in Kubernetes.
- kubectl configured to access your cluster.
- Helm 3 or later installed.
Install Jaegerยป
Jaeger provides an "all-in-one" deployment mode that packages all components (collector, storage, query service, and UI) into a single binary. This guide covers both in-memory storage (suitable for development, testing, and demo environments) and optional Badger persistent storage for environments requiring trace retention.
Add Helm repositoryยป
First, add the Jaeger Helm repository and update it:
1 2 | |
Create monitoring namespaceยป
Create a dedicated namespace to hold all monitoring-related resources:
1 | |
Tip
Using a dedicated namespace makes it easy to manage monitoring components. You can remove everything by simply deleting the namespace, which automatically cleans up all resources inside it.
Install Jaegerยป
Install Jaeger with in-memory storage:
| jaeger-values.yaml | |
|---|---|
1 2 3 4 5 6 7 8 9 10 | |
Install the Jaeger chart:
1 | |
Info
This guide uses in-memory storage, which is ephemeral - all traces are lost when the Jaeger pod restarts. This is suitable for development, testing, and demo purposes. For production environments requiring persistent trace storage, Jaeger supports backends like OpenSearch, Kafka, Prometheus, Cassandra, and others. See the official Jaeger storage documentation for more information.
Optional: Enable Badger for persistent storageยป
If you need persistent trace storage without the complexity of Prometheus or Cassandra, you can use Badger, an embedded database that stores traces locally on disk. Badger is suitable for single-node deployments and provides persistence with configurable retention policies.
Create a PersistentVolumeClaimยป
First, create a PersistentVolumeClaim for Badger data storage. Save the following as jaeger-pvc.yaml:
| jaeger-pvc.yaml | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 | |
- Adjust the storage size based on your trace retention needs and expected trace volume.
Apply the PVC:
1 | |
Configure Jaeger to use Badgerยป
Create a values file for Jaeger with Badger storage. Save the following as jaeger-values.yaml:
| jaeger-values.yaml (with Badger storage) | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
- Set
ephemeraltofalseto enable persistent storage on disk. - Name of the
PersistentVolumeClaimcreated in the previous step.
Install or upgrade Jaeger with Badger storage:
1 2 3 4 5 | |
Note
Badger stores data on the PersistentVolume, ensuring traces persist across pod restarts and rescheduling. The useExistingPvcName parameter tells Jaeger to mount the PVC you created for storing Badger data.
By default, Badger retains traces for 72 hours (3 days). Older traces are automatically deleted. This retention period helps manage disk space usage while keeping recent traces available for debugging.
Verify Jaeger installationยป
Check that the Jaeger pod is running:
1 | |
Check the logs for any errors:
1 | |
Maintenance and upgradesยป
To upgrade Jaeger to a newer version:
1 | |
Then upgrade to the desired version:
1 | |
Install OpenTelemetry Collectorยป
The OpenTelemetry Collector functions as a relay that receives telemetry data from your applications and forwards it to backend systems. It acts as middleware between your Spacelift services and Jaeger, handling tasks like batching, retries, and data transformation. The Collector doesn't store any data itself - it simply processes and routes telemetry to the appropriate backends.
Add Helm repositoryยป
If you haven't already, add the OpenTelemetry Helm repository:
1 2 | |
Configure the collectorยป
Create a values file for the OpenTelemetry Collector. Save the following as otel-collector-values.yaml:
Click to expand otel-collector-values.yaml
| otel-collector-values.yaml | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
This configuration:
- Disables all receivers except OTLP (which Spacelift uses via gRPC on port 4317)
- Configures the OTLP exporter to forward both traces and metrics to Jaeger
- Disables unnecessary listener ports for protocols we don't use
Traces and Metrics
OpenTelemetry supports multiple types of telemetry data: traces (request flows across services), metrics (numerical measurements like queue size, visible messages in the queue, request counts, error rates, latency percentiles), and logs (event records). Jaeger can receive and process both traces and metrics via the OTLP protocol. This configuration sends both data types to Jaeger for comprehensive observability of your Spacelift installation.
The exporters.otlp.endpoint value points to the Jaeger collector service. Since both the OTEL Collector and Jaeger are in the same namespace (monitoring), we can use the short service name jaeger-collector with port 4317. If they were in different namespaces, you would need the fully qualified DNS name: {service-name}.{namespace}.svc.cluster.local:{port}.
Install the collectorยป
1 | |
Get the collector endpointยป
To configure Spacelift to send traces to the collector, you need the collector's endpoint. First, verify the service name:
1 | |
Since Spacelift applications run in a different namespace (spacelift), you need to use the fully qualified DNS name. The endpoint will be something like:
1 | |
Note
Even though the collector uses gRPC protocol, the URL scheme must still be http://.
Maintenance and upgradesยป
Similar to Jaeger, the Helm chart version corresponds to the OTEL Collector version. To upgrade:
1 | |
Then upgrade to the desired version:
1 | |
Warning
Spacelift applications maintain long-lived gRPC connections to the OTEL Collector. During a collector upgrade or configuration change, these connections may be lost. After upgrading the collector, restart the Spacelift deployments to re-establish connections:
1 2 3 4 | |
Configure Spaceliftยป
Now that the tracing infrastructure is ready, configure Spacelift to send traces to the OpenTelemetry Collector.
Update the Spacelift environment variablesยป
The Spacelift services need two environment variables:
OBSERVABILITY_VENDOR- Set toOpenTelemetryto enable the OpenTelemetry backend.OTEL_EXPORTER_OTLP_ENDPOINT- The fully qualified domain name of the OTEL Collector service.
Update the spacelift-shared secret with these variables:
1 2 3 | |
Note
Make sure to adjust the endpoint URL if you used different service names or namespaces.
Restart the Spacelift deployments to apply the new configuration:
1 2 3 4 | |
Verify the setupยป
Check the Spacelift logs to ensure there are no errors:
1 | |
If configured correctly, traces should start flowing to the OTEL Collector, which relays them to Jaeger.
Info
Even if there are errors related to the OTEL Collector, Spacelift will continue to function normally - it just won't emit traces.
Access Jaeger UIยป
Jaeger comes with a built-in web UI for visualizing and analyzing traces.
Port forwarding (testing)ยป
For initial testing, use port-forwarding to access the Jaeger UI:
1 | |
Open your browser and navigate to http://localhost:16686. You should see the Jaeger UI with traces from your Spacelift services.
Production Access
Port-forwarding is suitable for testing and development only. For production environments, expose the Jaeger UI using a Kubernetes Ingress, LoadBalancer service, or reverse proxy (such as nginx or Caddy) with proper TLS termination and authentication.
Viewing tracesยป
In the Jaeger UI:
- Select a Service from the dropdown (e.g.,
server,drain, orscheduler) - Click Find Traces to see recent traces
- Click on any trace to see the detailed span timeline and associated metadata
The UI provides powerful filtering options to search traces by operation, tags, duration, and more.
Troubleshootingยป
If traces aren't appearing in Jaeger, here are the most common causes:
- Traces aren't reaching Jaeger
- The Jaeger pod restarted and lost traces (when using in-memory storage)
- Traces were deleted by Jaeger's retention policy
Follow the steps below to diagnose where the issue is occurring:
Check Spacelift logsยป
Spacelift applications log errors if they can't connect to the OTEL Collector. Check the logs:
1 | |
If the environment variable is properly set and there are no tracing-related errors, Spacelift is successfully sending traces to the OTEL Collector.
Note
Spacelift continues to function normally even if telemetry collection fails - it just won't emit traces.
Check OTEL Collector logsยป
If Spacelift isn't reporting errors, check the OTEL Collector logs:
1 | |
Look for errors related to exporting traces to Jaeger.
Check Jaeger logsยป
Finally, check the Jaeger logs to see if it's receiving and storing traces:
1 | |
Verify Jaeger healthยป
You can also check Jaeger's health endpoint:
1 2 | |
A successful response indicates that Jaeger is running properly.
Enable debug loggingยป
If you need more detailed information, you can enable debug logging in the OTEL Collector by adding the following to your otel-collector-values.yaml:
| otel-collector-values.yaml (with debug logging) | |
|---|---|
1 2 3 4 5 | |
Then upgrade the OTEL Collector:
1 | |
Warning
Debug logging generates significant log volume and should only be enabled temporarily for troubleshooting purposes.
Cleanup and uninstallationยป
If you need to remove the telemetry stack from your cluster, follow these steps to completely clean up all resources.
First, verify which releases are installed:
1 | |
Based on the output, uninstall the relevant releases:
1 2 | |
Finally, delete the monitoring namespace to remove all remaining resources:
1 | |
Note
If you're using persistent storage for Jaeger with Badger, the PersistentVolumeClaims (PVCs) may need to be deleted separately depending on your storage class reclaim policy.
Disable Spacelift telemetry configurationยป
If you've already set up the telemetry configuration in Spacelift, at this point it should throw errors since the OTEL Collector is no longer available. To fully disable telemetry in Spacelift, patch the spacelift-shared secret to clear the environment variables:
1 2 3 4 5 6 7 8 | |