Monitoring Applications with OpenTelemetry, Grafana Alloy, Loki, Tempo & Mimir — A Complete Self-Hosted Observability Stack

Modern observability tooling makes big promises. Deploy your .NET services into Azure App Services, tick the right boxes, and the platform conveniently hands you metrics, logs, and traces. Azure Monitor, Application Insights, Log Analytics – they all integrate neatly as long as you never leave the hyperscaler's garden.

Move outside that comfort zone and things become less magical. Some organizations run on-prem hardware. Others prioritize cost efficiency. Some want vendor neutrality. And some simply don't want a telemetry bill that spikes every time their application starts behaving correctly (or incorrectly).

The good news: the open-source ecosystem has matured to the point where you can build a full, cloud-ready observability pipeline without surrendering half your operating margin. This post walks through the architecture I'm currently running for production-grade monitoring of .NET/Go & other services using Alloy, Loki, Tempo, Mimir & Grafana, with Caddy in front as the secure ingestion layer and Faro providing full browser-side observability.

Only open-source components. Fully interoperable with OpenTelemetry. Low operational overhead. No vendor lock-in.

Architecture Overview

The design goal is straightforward: a single ingestion gateway for all telemetry—backend, frontend, containers, infrastructure—flowing into specialized downstream systems optimized for their respective data types.

Signal flow:

Traces:
Application → OTLP → Alloy → Tempo → Grafana

Metrics:
Application → OTLP → Alloy → Prometheus remote_write → Mimir → Grafana
Docker metrics → cAdvisor → Alloy → Prometheus → Mimir

Logs:
Application → OTLP → Alloy → Loki
Docker logs → Alloy → Loki
Faro Web SDK → Alloy → Loki → Grafana

All ingestion from public networks is terminated by Caddy, which exposes HTTPS endpoints and authentication where necessary. Alloy stays internal, which keeps the attack surface tight.

The result is full-stack observability: backend traces + metrics + logs, frontend vitals + performance events, container-level signals, and infrastructure metrics—everything correlated into one Grafana interface.

Instrumenting a .NET Application with OpenTelemetry

The .NET side is intentionally simple. OpenTelemetry SDKs do the heavy lifting; your app just needs to define a resource identity and export via OTLP/HTTP to Alloy.

I've enable it with two extension methods:

builder.Host.UseSerilogObservability();
builder.Services.AddObservability("my-app-name", builder.Configuration);

Under the hood, this configures:

Trace instrumentation (ASP.NET Core, HTTP client, SQL client, EF Core)
Metrics instrumentation (runtime, ASP.NET Core, HTTP client, custom meters)
Serilog → OTLP log forwarding
Resource attributes (service.name, version, instance)
OTLP HTTP exporters for traces, metrics, logs

Here's the full implementation:

using System.Reflection;
using OpenTelemetry.Exporter;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
using OracleReports.Services.Workers;
using Serilog;
using Serilog.Events;
using Serilog.Sinks.OpenTelemetry;

namespace OracleReports.Services;

public static class ServiceCollectionExtensions
{
    public static IServiceCollection AddObservability(
        this IServiceCollection services, 
        string serviceName, 
        IConfiguration config)
    {
        var otlpEndpoint = config["OpenTelemetry:Endpoint"] ?? "https://otel.example.com";

        services.AddOpenTelemetry()
            .ConfigureResource(r =>
            {
                r.AddService(serviceName,
                    serviceVersion: Assembly.GetExecutingAssembly()
                        .GetName().Version?.ToString(),
                    serviceInstanceId: Environment.MachineName);
            })
            .WithMetrics(metrics =>
            {
                metrics
                    .AddAspNetCoreInstrumentation()
                    .AddRuntimeInstrumentation()
                    .AddHttpClientInstrumentation()
                    .AddMeter(nameof(DataUpdater))
                    .AddPrometheusExporter()
                    .AddOtlpExporter(o =>
                    {
                        o.Endpoint = new Uri($"{otlpEndpoint}/v1/metrics");
                        o.Protocol = OtlpExportProtocol.HttpProtobuf;
                    });
            })
            .WithTracing(tracing =>
            {
                tracing
                    .AddAspNetCoreInstrumentation()
                    .AddHttpClientInstrumentation()
                    .AddSqlClientInstrumentation(opt => 
                        opt.RecordException = true)
                    .AddEntityFrameworkCoreInstrumentation(opt =>
                    {
                        opt.EnrichWithIDbCommand = (activity, cmd) =>
                        {
                            activity.SetTag("db.command.timeout", 
                                cmd.CommandTimeout);
                        };
                    })
                    .AddSource(serviceName)
                    .AddOtlpExporter(o =>
                    {
                        o.Endpoint = new Uri($"{otlpEndpoint}/v1/traces");
                        o.Protocol = OtlpExportProtocol.HttpProtobuf;
                    });
            });

        return services;
    }

    public static IHostBuilder UseSerilogObservability(
        this IHostBuilder hostBuilder)
    {
        hostBuilder.UseSerilog((context, services, logger) =>
        {
            var otlpEndpoint = context.Configuration["OpenTelemetry:Endpoint"] 
                ?? "https://otel.example.com";

            logger.MinimumLevel.Information()
                .MinimumLevel.Override(
                    "Microsoft.EntityFrameworkCore.Database.Command", 
                    LogEventLevel.Warning)
                .MinimumLevel.Override(
                    "Microsoft.Hosting.Lifetime", 
                    LogEventLevel.Warning)
                .MinimumLevel.Override(
                    "Microsoft.AspNetCore.Mvc", 
                    LogEventLevel.Warning)
                .MinimumLevel.Override(
                    "Microsoft.AspNetCore.Hosting.Diagnostics", 
                    LogEventLevel.Warning)
                .Enrich.FromLogContext()
                .WriteTo.Console(
                    outputTemplate: "[{Timestamp:HH:mm:ss} {Level:u3}] " +
                    "[{SourceContext}] {Message:lj}{NewLine}{Exception}")
                .WriteTo.OpenTelemetry(
                    endpoint: $"{otlpEndpoint}/v1/logs",
                    protocol: OtlpProtocol.HttpProtobuf,
                    resourceAttributes: new Dictionary<string, object>
                    {
                        ["service.name"] = context.HostingEnvironment.ApplicationName,
                        ["service.instance.id"] = Environment.MachineName
                    });
        });

        return hostBuilder;
    }
}

At this point the app is emitting high-quality telemetry. The next move is building the backend to receive it.

Grafana Alloy

The Telemetry Gateway

Alloy replaces the classic OpenTelemetry Collector. It's easier to run, more modular, and integrates directly with Grafana stack components.

The configuration consolidates:

OTLP receiver (traces, metrics, logs)
Loki exporter for log aggregation
Prometheus remote_write for metrics
Tempo exporter for distributed tracing
Docker log source with container discovery
Faro web telemetry receiver
cAdvisor for container metrics
Metadata relabeling and extraction

logging {
  level = "info"
  format = "logfmt"
}

# OpenTelemetry receiver (we will expose this with Caddy using HTTPS)
otelcol.receiver.otlp "default" {
  http {
    endpoint = "0.0.0.0:4318"
  }

  output {
    traces = [otelcol.exporter.otlp.tempo.input]
    metrics = [otelcol.exporter.prometheus.default.input]
    logs = [otelcol.exporter.loki.default.input]
  }
}

otelcol.exporter.loki "default" {
  forward_to = [loki.write.local.receiver]
}

otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.local.receiver]
}

prometheus.exporter.cadvisor "docker_cadvisor" {
  docker_host = "unix:///var/run/docker.sock"
  storage_duration = "5m"
}

prometheus.scrape "scraper" {
  targets    = prometheus.exporter.cadvisor.docker_cadvisor.targets
  forward_to = [prometheus.remote_write.local.receiver]
  scrape_interval = "10s"
}

prometheus.remote_write "local" {
  endpoint {
    url = "http://prometheus:9090/api/v1/write"
  }
}

discovery.docker "containers" {
  host = "unix:///var/run/docker.sock"
}

discovery.relabel "logs_integrations_docker" {
  targets = []

  rule {
    source_labels = ["__meta_docker_container_name"]
    regex = "/(.*)"
    target_label = "service_name"
  }
}

loki.source.docker "default" {
  host       = "unix:///var/run/docker.sock"
  targets    = discovery.docker.containers.targets
  labels     = {"platform" = "docker"}
  relabel_rules = discovery.relabel.logs_integrations_docker.rules
  forward_to = [loki.write.local.receiver]
}

loki.write "local" {
  endpoint {
    url = "http://loki:3100/loki/api/v1/push"
  }
}

otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls {
      insecure = true
    }
  }
}

faro.receiver "default" {
  server {
    listen_address = "0.0.0.0"
    listen_port = 12347
  }

  output {
    logs = [loki.process.extract_faro_labels.receiver]
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}

loki.process "extract_faro_labels" {
  stage.logfmt {
    mapping = {
      app_name = "app_name"
      app_version = "app_version"
      sdk_version = "sdk_version"
      session_id = "session_id"
      kind = "kind"
      event_name = "event_name"
      event_domain = "event_domain"
      type = "type"
      browser_name = "browser_name"
      browser_version = "browser_version"
      browser_os = "browser_os"
      browser_mobile = "browser_mobile"
      browser_language = "browser_language"
      browser_viewportWidth = "browser_viewportWidth"
      browser_viewportHeight = "browser_viewportHeight"
      browser_userAgent = "browser_userAgent"
      page_url = "page_url"
      event_data_name = "event_data_name"
      event_data_duration = "event_data_duration"
      event_data_responseStatus = "event_data_responseStatus"
      cls = "cls"
      fcp = "fcp"
      fid = "fid"
      inp = "inp"
      lcp = "lcp"
      ttfb = "ttfb"
      context_rating = "context_rating"
    }
  }

  stage.labels {
    values = {
      app_name = "app_name"
      app_version = "app_version"
      kind = "kind"
      event_name = "event_name"
      browser_name = "browser_name"
      browser_os = "browser_os"
      context_rating = "context_rating"
    }
  }

  stage.structured_metadata {
    values = {
      session_id = "session_id"
      page_url = "page_url"
      event_data_name = "event_data_name"
      browser_version = "browser_version"
      cls = "cls"
      fcp = "fcp"
      fid = "fid"
      inp = "inp"
      lcp = "lcp"
      ttfb = "ttfb"
    }
  }

  forward_to = [loki.write.local.receiver]
}

Tempo

Distributed Tracing Backend

Tempo stores traces cheaply and scales horizontally without needing a traditional database. The configuration enables:

OTLP ingestion over gRPC
Search API with configurable SLOs
Metrics generation (service graphs + span metrics)
Local storage with S3 compatibility for production

stream_over_http_enabled: true
server:
  http_listen_port: 3200
  log_level: info

query_frontend:
  search:
    duration_slo: 5s
    throughput_bytes_slo: 1.073741824e+09
    metadata_slo:
      duration_slo: 5s
      throughput_bytes_slo: 1.073741824e+09
  trace_by_id:
    duration_slo: 5s

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: "0.0.0.0:4317"

ingester:
  max_block_duration: 5m

compactor:
  compaction:
    block_retention: 1h

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: docker-compose
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true
  traces_storage:
    path: /var/tempo/generator/traces

storage:
  trace:
    backend: local
    wal:
      path: /var/tempo/wal
    local:
      path: /var/tempo/blocks

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics, local-blocks]
      generate_native_histograms: both

Loki – Log Aggregation with Object Storage

Loki stays efficient because it indexes only labels, not log content. The configuration:

Enables TSDB mode for better performance
Ships indexes and chunks to S3-compatible object storage (Hetzner in my case)
Defines a compactor for long-term retention optimization
Supports ingestion from Alloy and Caddy-exposed endpoints

auth_enabled: false

server:
  http_listen_port: 3100

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/tsdb-shipper-active
    cache_location: /loki/tsdb-shipper-cache
  aws:
    bucketnames: logs-bucket-name
    endpoint: https://nbg1.your-objectstorage.com
    region: nbg1
    s3forcepathstyle: true
    insecure: false

compactor:
  working_directory: /loki/compactor

ruler:
  alertmanager_url: http://localhost:9093

Mimir – Horizontally Scalable Metrics Backend

Prometheus is excellent for scraping, but not designed for long-term storage or scale-out. Mimir fills that gap with a distributed architecture.

The deployment uses:

Three-node Mimir cluster with memberlist gossip protocol
Nginx load balancer for request distribution
S3-compatible storage (Hetzner Object Storage)
Integrated Alertmanager

Mimir configuration:

target: all,alertmanager,overrides-exporter

common:
  storage:
    backend: s3
    s3:
      endpoint: nbg1.your-objectstorage.com
      access_key_id: YOUR_ACCESS_KEY
      secret_access_key: YOUR_SECRET_KEY
      bucket_name: YOUR_LOGS_BUCKET_NAME

blocks_storage:
  storage_prefix: blocks
  tsdb:
    dir: /data/ingester

memberlist:
  join_members: [mimir-1, mimir-2, mimir-3]

ruler:
  rule_path: /data/ruler
  alertmanager_url: http://127.0.0.1:8080/alertmanager
  ring:
    heartbeat_period: 2s
    heartbeat_timeout: 10s

alertmanager:
  data_dir: /data/alertmanager
  fallback_config_file: /etc/alertmanager-fallback-config.yaml
  external_url: http://localhost:9009/alertmanager

server:
  log_level: warn

Nginx load balancer:

events {
    worker_connections 1024;
}

http {
    upstream backend {
        server mimir-1:8080 max_fails=1 fail_timeout=1s;
        server mimir-2:8080 max_fails=1 fail_timeout=1s;
        server mimir-3:8080 max_fails=1 fail_timeout=1s backup;
    }

    server {
        listen 9009;
        access_log /dev/null;
        location / {
            proxy_pass http://backend;
        }
    }
}

Docker Compose – Unified Deployment

A single Compose file ties everything together, creating a complete distributed observability stack that fits into a single VM:

services:
  alloy:
    image: grafana/alloy:latest
    container_name: alloy
    restart: unless-stopped
    volumes:
      - ./config.alloy:/etc/alloy/config.alloy
      - /var/run/docker.sock:/var/run/docker.sock
      - alloy_data:/var/lib/alloy/data
    command: >
      run
      --stability.level=experimental
      --server.http.listen-addr=0.0.0.0:12345
      --storage.path=/var/lib/alloy/data
      /etc/alloy/config.alloy

  tempo:
    image: grafana/tempo:latest
    container_name: tempo
    restart: unless-stopped
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
    command: ["-config.file=/etc/tempo.yaml"]

  mimir-lb:
    image: nginx:latest
    container_name: mimir-lb
    volumes:
      - ./configs/services/mimir/nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - mimir-1
      - mimir-2
      - mimir-3
    ports:
      - 9009:9009

  mimir-1:
    image: grafana/mimir:latest
    command: ["-config.file=/etc/mimir.yaml"]
    hostname: mimir-1
    volumes:
      - ./configs/services/mimir/mimir.yaml:/etc/mimir.yaml
      - ./configs/services/mimir/alertmanager-fallback-config.yaml:/etc/alertmanager-fallback-config.yaml
      - mimir-1-data:/data

  mimir-2:
    image: grafana/mimir:latest
    command: ["-config.file=/etc/mimir.yaml"]
    hostname: mimir-2
    volumes:
      - ./configs/services/mimir/mimir.yaml:/etc/mimir.yaml
      - ./configs/services/mimir/alertmanager-fallback-config.yaml:/etc/alertmanager-fallback-config.yaml
      - mimir-2-data:/data

  mimir-3:
    image: grafana/mimir:latest
    command: ["-config.file=/etc/mimir.yaml"]
    hostname: mimir-3
    volumes:
      - ./configs/services/mimir/mimir.yaml:/etc/mimir.yaml
      - ./configs/services/mimir/alertmanager-fallback-config.yaml:/etc/alertmanager-fallback-config.yaml
      - mimir-3-data:/data

  grafana:
    image: grafana/grafana
    restart: always
    volumes:
      - /var/lib/grafana:/var/lib/grafana

  loki:
    container_name: loki
    image: grafana/loki:3
    command: "-config.file=/etc/loki/config.yaml -config.expand-env=true -log.level=warn"
    volumes:
      - ./configs/loki/config.yaml:/etc/loki/config.yaml:ro
      - loki_data:/loki:rw
    environment:
      AWS_ACCESS_KEY_ID: YOUR_ACCESS_KEY
      AWS_SECRET_ACCESS_KEY: YOUR_SECRET_KEY
    restart: always

  prometheus:
    image: prom/prometheus
    restart: always
    volumes:
      - ./configs/services/monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
      - /data/prometheus:/prometheus/data
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.retention.time=7d
      - --log.format=json
      - --web.enable-admin-api
      - --enable-feature=native-histograms
      - --web.enable-remote-write-receiver

  caddy:
    image: caddy:2
    restart: always
    ports:
      - 80:80
      - 443:443
      - 443:443/udp
    volumes:
      - caddy_data:/data/caddy
      - ./configs/caddy:/etc/caddy
      - /var/www:/var/www

volumes:
  caddy_data:
  alloy_data:
  mimir-1-data:
  mimir-2-data:
  mimir-3-data:
  loki_data:
    driver: local

Caddy – The Secure Ingestion Layer

Caddy serves as the public-facing edge, handling:

TLS termination with automatic certificate management
PROXY protocol support for real client IPs behind load balancers
Reverse proxying to internal services
CORS configuration for Faro browser telemetry
HTTP Basic Authentication for ingestion endpoints

Main Caddyfile:

{
  servers :443 {
    listener_wrappers {
      proxy_protocol {
        allow 10.0.0.0/8
        allow 172.16.0.0/12
        allow 192.168.0.0/16
        allow 203.0.113.0/24
        timeout 5s
      }
      tls
    }
    protocols h1 h2 h3
  }

  servers :80 {
    listener_wrappers {
      proxy_protocol {
        allow 10.0.0.0/8
        allow 172.16.0.0/12
        allow 192.168.0.0/16
        allow 203.0.113.0/24
        allow 167.235.107.217/32
        timeout 5s
      }
      http_redirect
    }
    protocols h1 h2c
  }

  servers :8080 {
    protocols h1 h2c
  }

  log {
    output stdout
    format json
  }
}

(cors_any) {
  header {
    Access-Control-Allow-Origin "{http.request.header.Origin}"
    Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS"
    Access-Control-Allow-Headers "Authorization, Content-Type, X-Requested-With, x-faro-session-id"
    Access-Control-Allow-Credentials "true"
    Access-Control-Max-Age "3600"
    Vary "Origin"
  }
  @preflight method OPTIONS
  respond @preflight 204
}

https:// {
  tls internal {
    on_demand
  }

  root * /var/www/default
  file_server
}

import sites-enabled/*.conf

Site configurations:

# Expose Faro receiver with CORS
faro.example.com {
  import cors_any
  reverse_proxy /* alloy:12347
}

# Main Grafana interface
grafana.example.com {
  reverse_proxy /* grafana:3000
}

# OTLP HTTP receiver endpoint
otel.example.com {
  reverse_proxy /* alloy:4318
}

# Prometheus with authentication
prom.example.com {
  basic_auth {
    ingest $2y$12...
  }
  reverse_proxy /* prometheus:9090
}

# Loki with authentication
loki.example.com {
  basic_auth {
    ingest $2y$12...
  }
  reverse_proxy /* loki:3100
}

Frontend Observability with Grafana Faro

This stack doesn't stop at backend telemetry. Faro sends browser events, Web Vitals, errors, performance metrics, and navigation data to Alloy → Loki → Grafana.

  (function () {
    // Create a script tag for loading the library
    var script = document.createElement('script');
    script.onload = () => {
      window.GrafanaFaroWebSdk.initializeFaro({
        url: 'https://otel.example.com/collect',
        app: {
          name: 'my-frontent-app',
          version: '1.0.0',
        },
      });
    };
    script.src = 'https://unpkg.com/@grafana/faro-web-sdk@^1.19.0/dist/bundle/faro-web-sdk.iife.js';
    document.head.appendChild(script);
  })();

Faro provides:

Core Web Vitals: LCP (Largest Contentful Paint), CLS (Cumulative Layout Shift), FID (First Input Delay), INP (Interaction to Next Paint), TTFB (Time to First Byte)
Resource timing: DNS lookup, fetch time, transfer size, cache status
SPA navigation events: Page transitions, route changes
Error tracking: JavaScript exceptions with stack traces
Session metadata: Browser, OS, viewport dimensions
Custom events: Application-specific telemetry

The Alloy pipeline extracts low-cardinality fields as labels for efficient filtering, while keeping high-cardinality data (URLs, session IDs, user agents) as structured metadata. This design keeps Loki queries fast while preserving full context for debugging.

Final Thoughts

This setup is intentionally future-facing: a single observability pipeline you can deploy anywhere – Hetzner, your own servers, Kubernetes, or hybrid cloud environments – without reinventing the wheel each time.

What you get:

Managed-level features without managed pricing
Predictable costs based on storage, not ingestion volume
Vendor neutrality with full control over data
Deep insight into backend + frontend behavior
A single debugging interface for all production incidents
Production-ready architecture that scales horizontally

Everything is built on emerging industry standards (OpenTelemetry, OTLP, PromQL, LogQL) rather than proprietary agents and exporters. The components are interchangeable – swap Tempo for Jaeger, Loki for ElasticSearch – without rewriting application code.

If the cloud-native vendors are the hotel buffet of observability, this stack is the à la carte kitchen – more control, cleaner ingredients, and no surprise charges when traffic spikes or your application actually works as intended.

The total cost? A $20/month VM on Hetzner, some object storage at pennies per GB, and the satisfaction of knowing exactly where your telemetry goes and what it costs you.

Monitoring Applications with OpenTelemetry, Grafana Alloy, Loki, Tempo & Mimir — A Complete Self-Hosted Observability Stack

Architecture Overview

Signal flow:

Instrumenting a .NET Application with OpenTelemetry

Grafana Alloy

Tempo

Loki – Log Aggregation with Object Storage

Mimir – Horizontally Scalable Metrics Backend

Docker Compose – Unified Deployment

Caddy – The Secure Ingestion Layer

Frontend Observability with Grafana Faro

Final Thoughts

Read next

We Are Developers: Docker Offload, Docker Model Runner & more...

Your personal offline coding assistant with Docker Model Runner and Zed Editor

Docker Bake is here!

Comments ()

Monitoring Applications with OpenTelemetry, Grafana Alloy, Loki, Tempo & Mimir — A Complete Self-Hosted Observability Stack

Architecture Overview

Signal flow:

Instrumenting a .NET Application with OpenTelemetry

Grafana Alloy

Tempo

Loki – Log Aggregation with Object Storage

Mimir – Horizontally Scalable Metrics Backend

Docker Compose – Unified Deployment

Caddy – The Secure Ingestion Layer

Frontend Observability with Grafana Faro

Final Thoughts

Read next

Comments ( )

Comments ()