OpenTelemetry (OTel) is the closest thing our industry has to a shared language for observability: traces, metrics, and logs, generated in your app, exported in a standard way (OTLP), and correlated so you can answer the only question production ever asks:
"What's happening... and why... right now?"
OTel isn't "more logging." It's a system for turning running software into evidence.
I'm writing this because I'm currently setting up observability for a new project, and it reminded me of a past project where we had... let's call it "minimal telemetry." I'm still not a master of OTel, but I'm working on it. I've spent way too many hours grepping through log files trying to reconstruct what happened during an incident. OTel changes that.
Why OpenTelemetry
Debugging distributed systems without telemetry is archaeology. You're sifting through fragments, hoping to find something that makes sense.
One user action can hop through APIs, queues, databases, and third parties. Traces give you the story; metrics show system health; logs carry the details. .NET actually aligns pretty well with OTel because tracing/metrics/logging primitives are already baked into the runtime. OTel just wires them into a consistent pipeline.
Vendor-neutral instrumentation buys you options
You instrument once, export anywhere. Less lock-in, fewer rewrites when your observability backend changes. (And it will change. I've been through three different backends in the last few years alone.)
OTel's OTLP exporter is designed exactly for this portability.
Correlation is the multiplier
Here's the thing that actually sold me on OTel: when logs include TraceId/SpanId, "one error log line" becomes a jump link to the exact trace and dependency timings. That's the difference between guessing and knowing. Between "I think the database was slow" and "the database call took 4.2 seconds, here's the query."
Minimal implementation (ASP.NET Core)
This is the "basic but production-shaped" setup: traces + metrics + logs exported via OTLP. Nothing fancy, but it'll get you started.
Program.cs
using OpenTelemetry.Logs;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
var builder = WebApplication.CreateBuilder(args);
var serviceName = builder.Environment.ApplicationName;
var serviceVersion = typeof(Program).Assembly.GetName().Version?.ToString() ?? "unknown";
var resourceBuilder = ResourceBuilder.CreateDefault()
.AddService(serviceName: serviceName, serviceVersion: serviceVersion)
.AddAttributes(new[]
{
new KeyValuePair<string, object>("deployment.environment.name", builder.Environment.EnvironmentName),
});
builder.Services.AddOpenTelemetry()
.ConfigureResource(r => r.AddService(serviceName: serviceName, serviceVersion: serviceVersion))
.WithTracing(tracing => tracing
// For custom spans later (ActivitySource):
.AddSource(serviceName)
.SetResourceBuilder(resourceBuilder)
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
// Add DB instrumentation depending on your stack (SqlClient, EF Core, etc.)
.AddOtlpExporter()
)
.WithMetrics(metrics => metrics
// For custom metrics later (Meter):
.AddMeter(serviceName)
.SetResourceBuilder(resourceBuilder)
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation()
.AddOtlpExporter()
);
// Logs via ILogger, exported via OTLP (and correlatable with traces)
builder.Logging.AddOpenTelemetry(logging =>
{
logging.SetResourceBuilder(resourceBuilder);
logging.IncludeFormattedMessage = true;
logging.IncludeScopes = true;
logging.ParseStateValues = true;
logging.AddOtlpExporter();
});
var app = builder.Build();
app.MapGet("/ping", () => "pong");
app.Run();
This matches the official "initialize the SDK early" guidance. The key primitives are AddService, AddSource, and AddMeter.
Production-friendly config: use environment variables
Don't hardcode endpoints and keys into your code. Seriously. Let ops/environment decide. Your future self will thank you when you need to point at a different collector in staging vs production.
The key env vars
- OTEL_EXPORTER_OTLP_ENDPOINT - base endpoint; defaults to http://localhost:4317 for gRPC, http://localhost:4318 for HTTP
- OTEL_EXPORTER_OTLP_HEADERS - API keys and such
- OTEL_EXPORTER_OTLP_PROTOCOL - grpc, http/protobuf, etc.
- OTEL_RESOURCE_ATTRIBUTES - extra labels like region, cluster, namespace
- OTEL_SERVICE_NAME - if you want to override your app name
These are all documented in the OTel OTLP exporter spec.
Example:
export OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317"
export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"
export OTEL_EXPORTER_OTLP_HEADERS="api-key=REDACTED"
export OTEL_RESOURCE_ATTRIBUTES="service.namespace=mycompany,deployment.environment.name=prod"
First steps to production (don't hurt yourself)
1) Put a Collector in front of your backend
In production, don't send telemetry directly to your observability backend. Use an OpenTelemetry Collector. It gives you batching, retries, routing, filtering, and a clean separation between your app and wherever the data ends up.
Collector exposes OTLP on:
- 4317 (gRPC)
- 4318 (HTTP)
2) Decide your "telemetry contract" early
This one bit me. Before you scale across services, align on naming:
- service.name (and maybe service.namespace)
- deployment.environment.name
- attribute rules: what's allowed, what's PII, what's going to blow up your cardinality budget
Semantic conventions exist specificaly to prevent "everyone invents their own naming." Use them.
3) Correlate logs and traces on day 1
Don't wait on this. It turns debugging from "search and pray" into "click and see." The setup cost is minimal compared to the time you'll save during your first real incident.
4) Start small: don't instrument the universe
I've seen teams go overboard. Start with inbound HTTP, outbound HTTP, runtime metrics, and maybe a few business counters. Grow based on what actualy helps during on-call. If you're not using it, you're just paying for storage.
5) Plan sampling early (so you can afford production)
A sane evolution:
- Dev: AlwaysOn. Learn, iterate, see everything.
- Prod: head sampling for baseline volume
- Later: tail sampling policies like "keep errors + slow traces" (usually via Collector)
Sampling feels scary at first, but 100% of traces at scale is... expensive. And honestly, you don't need every single successful health check.
6) Secure the telemetry pipeline
Treat telemetry as sensitive. It often contains user IDs, request paths, maybe even query parameters you forgot to scrub. Secure your Collector configs, don't expose receivers publicly, and store config safely.
Resources
Official OpenTelemetry
- Getting Started (.NET)
- Instrumentation
- Exporters
- Collector Quick Start
- Collector Configuration
- OTLP Exporter SDK Config
- SDK Environment Variables
- Semantic Conventions
- Resource Conventions
- Deployment Attributes
- Sampling
- Security Best Practices
Microsoft (.NET)
