telemetry-kit Documentation

Overview

telemetry-kit includes a reference server implementation that you can self-host. This gives you complete control over your telemetry data.

Recommended: Use Our Managed Service

This server is provided as a reference implementation and starting point for custom deployments. For production use, we recommend our managed service with a free tier:

✨ Free Tier: 10,000 events/month
🚀 5-Minute Setup: No infrastructure required
📊 Built-in Analytics: Dashboard and insights
🔒 Fully Managed: Updates, security, and scaling handled

Get Started with Managed Service →

The server reference implementation is located in the server/ directory of the repository.

Quick Start

Clone the repository

git clone https://github.com/ibrahimcesar/telemetry-kit
cd telemetry-kit/server

Start with Docker Compose

docker compose up -d

This starts:

Axum ingestion API (port 3000)
PostgreSQL database (port 5432)

Verify it's running

curl http://localhost:3000/health

Should return:

{"status":"healthy","version":"0.1.0"}

With header:

X-Clacks-Overhead: GNU Terry Pratchett

All responses include the X-Clacks-Overhead: GNU Terry Pratchett header as a tribute to Sir Terry Pratchett. Learn more at gnuterrypratchett.com.

Architecture

┌─────────────────────────────────────────────────┐
│  Client Applications                            │
│  (Your Rust apps with telemetry-kit SDK)        │
└──────────────────┬──────────────────────────────┘
                   │ HTTPS + HMAC-SHA256
                   ▼
┌─────────────────────────────────────────────────┐
│  Ingestion API (Axum)                           │
│  - HMAC authentication                          │
│  - Timestamp validation                         │
│  - Batch processing                             │
└──────────────────┬──────────────────────────────┘
                   │
                   ▼
              ┌──────────┐
              │PostgreSQL│
              │(Events)  │
              └──────────┘

Configuration

Environment Variables

Create a .env file:

# Database
DATABASE_URL=postgresql://telemetry:password@postgres:5432/telemetry
 
# Server
HOST=0.0.0.0
PORT=3000
LOG_LEVEL=info

Docker Compose

The default docker-compose.yml:

version: '3.8'
 
services:
  api:
    build: .
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://telemetry:password@postgres:5432/telemetry
    depends_on:
      - postgres
 
  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: telemetry
      POSTGRES_USER: telemetry
      POSTGRES_PASSWORD: password
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./migrations:/docker-entrypoint-initdb.d
 
volumes:
  postgres_data:

Database Setup

Migrations

Migrations are automatically run on container startup from server/migrations/.

001_initial_schema.sql:

CREATE TABLE IF NOT EXISTS events (
    event_id UUID PRIMARY KEY,
    org_id VARCHAR(255) NOT NULL,
    app_id VARCHAR(255) NOT NULL,
    user_id VARCHAR(255) NOT NULL,
    session_id VARCHAR(255),
    event_type VARCHAR(255) NOT NULL,
    event_category VARCHAR(255),
    event_data JSONB NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    INDEX idx_org_app (org_id, app_id),
    INDEX idx_timestamp (timestamp),
    INDEX idx_event_type (event_type)
);
 
CREATE TABLE IF NOT EXISTS organizations (
    org_id VARCHAR(255) PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    tier VARCHAR(50) DEFAULT 'free',
    created_at TIMESTAMPTZ DEFAULT NOW()
);
 
CREATE TABLE IF NOT EXISTS applications (
    app_id VARCHAR(255) PRIMARY KEY,
    org_id VARCHAR(255) REFERENCES organizations(org_id),
    name VARCHAR(255) NOT NULL,
    hmac_secret VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Manual Migration

To run migrations manually:

# Enter postgres container
docker compose exec postgres psql -U telemetry -d telemetry
 
# Run migration files
\i /docker-entrypoint-initdb.d/001_initial_schema.sql

Client Configuration

Once your server is running, configure your Rust applications to sync telemetry:

Getting Credentials

When using the managed service at telemetry-kit.dev, you'll receive:

Credential	Description	Example
App ID	Unique identifier for your application	`app_e963188b`
API Token	Authentication token for API requests	`tk_8008db0dc4dd41eca94f58a08b4c95d5`
HMAC Secret	Secret for signing requests	`kIRV9eC/2+Dvqc4E9ubP9Cjzd0LG2/Dg0OVEfknKBPQ=`

For self-hosted deployments, generate these in your database:

-- Create an organization
INSERT INTO organizations (org_id, name, tier)
VALUES ('org_your_org', 'Your Organization', 'self-hosted');
 
-- Create an application with credentials
INSERT INTO applications (app_id, org_id, name, hmac_secret)
VALUES (
    'app_' || substr(md5(random()::text), 0, 9),
    'org_your_org',
    'My App',
    encode(gen_random_bytes(32), 'base64')
);
 
-- Get your credentials
SELECT app_id, hmac_secret FROM applications WHERE name = 'My App';

Configuring the SDK

use telemetry_kit::prelude::*;
 
let telemetry = TelemetryKit::builder()
    .service_name("my-app")?
    .service_version(env!("CARGO_PKG_VERSION"))
    .with_sync_credentials(
        "app_e963188b",                              // App ID
        "my-app",                                     // Service name
        "tk_8008db0dc4dd41eca94f58a08b4c95d5",       // API Token
        "kIRV9eC/2+Dvqc4E9ubP9Cjzd0LG2/Dg0OVEfknKBPQ=", // HMAC Secret
    )?
    .auto_sync(true)  // Enable automatic syncing
    .build()?;

Never hardcode credentials in source code for production! Use environment variables:

let telemetry = TelemetryKit::builder()
    .service_name("my-app")?
    .with_sync_credentials(
        &std::env::var("TK_APP_ID").unwrap_or_default(),
        "my-app",
        &std::env::var("TK_TOKEN").unwrap_or_default(),
        &std::env::var("TK_SECRET").unwrap_or_default(),
    )?
    .auto_sync(true)
    .build()?;

Custom Server Endpoint

For self-hosted deployments, specify your server URL:

let telemetry = TelemetryKit::builder()
    .service_name("my-app")?
    .endpoint("https://telemetry.yourcompany.com")?  // Your self-hosted server
    .with_sync_credentials(app_id, service, token, secret)?
    .auto_sync(true)
    .build()?;

Sync Behavior

Setting	Description	Default
`auto_sync(true)`	Enables automatic background syncing	`false`
`sync_interval(seconds)`	Time between sync attempts	`300` (5 min)
`sync_on_shutdown(true)`	Flush events on graceful shutdown	`true`

let telemetry = TelemetryKit::builder()
    .service_name("my-app")?
    .with_sync_credentials(app_id, service, token, secret)?
    .auto_sync(true)
    .sync_interval(60)        // Sync every minute
    .sync_on_shutdown(true)   // Flush on exit
    .build()?;

Security

HMAC Authentication

All requests must be signed with HMAC-SHA256:

Authorization: Bearer <api_token>
X-Signature: <hmac_sha256_signature>
X-Timestamp: <unix_timestamp>

Signature Format: HMAC-SHA256(secret, timestamp:body)

The SDK handles this automatically.

Timestamp Validation

Prevents replay attacks:

Timestamp must be within ±10 minutes of server time
Provides reasonable window for clock drift
No additional storage required

Production Deployment

Using a Reverse Proxy

Put the API behind nginx or Caddy:

nginx example:

upstream telemetry_api {
    server localhost:3000;
}
 
server {
    listen 443 ssl http2;
    server_name telemetry.example.com;
 
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
 
    location / {
        proxy_pass http://telemetry_api;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Health Checks

Monitor the /health endpoint:

curl https://telemetry.example.com/health

Scaling

For high-volume deployments:

Horizontal Scaling - Run multiple API instances behind a load balancer
Database - Use managed PostgreSQL (RDS, Cloud SQL, etc.)
CDN - Add CloudFlare or similar
Connection Pooling - Configure appropriate database connection limits

Monitoring

Key metrics to monitor:

Request rate (requests/second)
Error rate (5xx responses)
Latency (p50, p95, p99)
Database connections
Disk space (PostgreSQL)
Event ingestion rate

Backup and Recovery

Database Backups

# Backup
docker compose exec postgres pg_dump -U telemetry telemetry > backup.sql
 
# Restore
docker compose exec -T postgres psql -U telemetry telemetry < backup.sql

Automated Backups

Add to crontab:

# Daily backup at 2 AM
0 2 * * * cd /path/to/server && docker compose exec postgres pg_dump -U telemetry telemetry | gzip > backups/telemetry-$(date +\%Y\%m\%d).sql.gz

Troubleshooting

API Not Starting

Check logs:

docker compose logs api

Common issues:

Database not ready (wait for postgres to start)
Port 3000 already in use
Missing environment variables

Database Connection Errors

# Test database connectivity
docker compose exec api psql $DATABASE_URL
 
# Check postgres logs
docker compose logs postgres

Production Best Practices

Database Optimization

Connection Pooling

Configure connection pool in your server:

// Example configuration
use sqlx::postgres::PgPoolOptions;
 
let pool = PgPoolOptions::new()
    .max_connections(20)           // Adjust based on load
    .min_connections(5)
    .acquire_timeout(Duration::from_secs(3))
    .idle_timeout(Duration::from_secs(600))
    .connect(&database_url)
    .await?;

Indexing Strategy

Add indexes for common queries:

-- Performance indexes
CREATE INDEX CONCURRENTLY idx_events_org_timestamp
    ON events(org_id, timestamp DESC);
 
CREATE INDEX CONCURRENTLY idx_events_app_timestamp
    ON events(app_id, timestamp DESC);
 
CREATE INDEX CONCURRENTLY idx_events_category
    ON events(event_category)
    WHERE event_category IS NOT NULL;
 
-- JSONB indexes for event_data queries
CREATE INDEX CONCURRENTLY idx_events_data_gin
    ON events USING GIN (event_data);
 
-- Analyze tables
ANALYZE events;
ANALYZE organizations;
ANALYZE applications;

Partitioning

For high-volume deployments, partition by time:

-- Create partitioned table
CREATE TABLE events_partitioned (
    LIKE events INCLUDING ALL
) PARTITION BY RANGE (timestamp);
 
-- Create monthly partitions
CREATE TABLE events_2025_01 PARTITION OF events_partitioned
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
 
CREATE TABLE events_2025_02 PARTITION OF events_partitioned
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
 
-- Automate partition creation with pg_cron or external scripts

Query Optimization

-- Vacuum regularly
VACUUM ANALYZE events;
 
-- Update statistics
ANALYZE events;
 
-- Monitor slow queries
SELECT
    query,
    calls,
    total_time,
    mean_time,
    max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

Security Hardening

TLS/SSL Configuration

Use TLS for database connections:

# In .env
DATABASE_URL=postgresql://user:pass@host:5432/db?sslmode=require
 
# For self-signed certificates
DATABASE_URL=postgresql://user:pass@host:5432/db?sslmode=verify-full&sslrootcert=/path/to/ca.crt

API Rate Limiting

Implement rate limiting with Tower middleware:

use tower::ServiceBuilder;
use tower_http::limit::RateLimitLayer;
 
let app = Router::new()
    .route("/events/batch", post(handle_batch))
    .layer(
        ServiceBuilder::new()
            .layer(RateLimitLayer::new(
                100,                              // 100 requests
                Duration::from_secs(60)           // per minute
            ))
    );

IP Allowlisting

Restrict access to known IPs:

# nginx
geo $allowed_ip {
    default 0;
    10.0.0.0/8 1;        # Your VPC
    192.168.1.0/24 1;    # Office network
}
 
server {
    if ($allowed_ip = 0) {
        return 403;
    }
    # ... rest of config
}

CORS Configuration

Configure CORS appropriately:

use tower_http::cors::{CorsLayer, Any};
 
let cors = CorsLayer::new()
    .allow_origin("https://yourdomain.com".parse::<HeaderValue>().unwrap())
    .allow_methods([Method::POST, Method::GET])
    .allow_headers([header::CONTENT_TYPE, header::AUTHORIZATION]);
 
let app = Router::new()
    .route("/events/batch", post(handle_batch))
    .layer(cors);

Monitoring & Observability

Structured Logging

use tracing::{info, error, instrument};
use tracing_subscriber;
 
#[tokio::main]
async fn main() {
    tracing_subscriber::fmt()
        .with_target(false)
        .with_level(true)
        .json()  // JSON logs for parsing
        .init();
 
    info!("Server starting");
}
 
#[instrument(skip(pool))]
async fn handle_batch(
    State(pool): State<PgPool>,
    Json(events): Json<Vec<Event>>,
) -> Result<Json<BatchResponse>, StatusCode> {
    info!(event_count = events.len(), "Processing batch");
    // ...
}

Prometheus Metrics

Add metrics endpoint:

use axum_prometheus::PrometheusMetricLayer;
 
let (prometheus_layer, metric_handle) = PrometheusMetricLayer::pair();
 
let app = Router::new()
    .route("/events/batch", post(handle_batch))
    .route("/metrics", get(|| async move { metric_handle.render() }))
    .layer(prometheus_layer);

Monitor these metrics:

http_requests_total - Total requests
http_request_duration_seconds - Request latency
db_connections_active - Active connections
events_ingested_total - Events processed
events_failed_total - Failed events

Grafana Dashboard

Example queries:

# Request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# P95 latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Event ingestion rate
rate(events_ingested_total[5m])

High Availability

Multi-Region Deployment

# docker-compose.prod.yml
version: '3.8'
 
services:
  api-1:
    build: .
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '1'
          memory: 512M
    environment:
      DATABASE_URL: ${DATABASE_URL}
      INSTANCE_ID: api-1
 
  postgres-primary:
    image: postgres:16
    environment:
      POSTGRES_DB: telemetry
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - pgdata-primary:/var/lib/postgresql/data
    command: |
      postgres
      -c wal_level=replica
      -c max_wal_senders=3
      -c max_replication_slots=3
 
  postgres-replica:
    image: postgres:16
    environment:
      PGUSER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - pgdata-replica:/var/lib/postgresql/data
    command: |
      bash -c "
      until pg_basebackup --pgdata=/var/lib/postgresql/data -R --slot=replication_slot --host=postgres-primary --port=5432
      do
        echo 'Waiting for primary to connect...'
        sleep 1s
      done
      postgres
      "
 
  loadbalancer:
    image: nginx:alpine
    ports:
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - api-1
 
volumes:
  pgdata-primary:
  pgdata-replica:

Load Balancer Configuration

# nginx.conf
upstream api_servers {
    least_conn;
    server api-1:3000 max_fails=3 fail_timeout=30s;
    server api-2:3000 max_fails=3 fail_timeout=30s;
    server api-3:3000 max_fails=3 fail_timeout=30s;
}
 
server {
    listen 443 ssl http2;
    server_name telemetry.example.com;
 
    ssl_certificate /etc/nginx/ssl/cert.pem;
    ssl_certificate_key /etc/nginx/ssl/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
 
    # Health check endpoint
    location /health {
        access_log off;
        proxy_pass http://api_servers;
        proxy_next_upstream error timeout http_500 http_502 http_503;
    }
 
    location / {
        proxy_pass http://api_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
 
        # Timeouts
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
 
        # Retry failed requests
        proxy_next_upstream error timeout http_500 http_502 http_503;
        proxy_next_upstream_tries 3;
    }
}

Data Retention & Archival

Automated Cleanup

Create a cleanup job:

-- Archive old events to cold storage
CREATE TABLE events_archive (
    LIKE events INCLUDING ALL
);
 
-- Function to archive and delete old events
CREATE OR REPLACE FUNCTION archive_old_events(days_to_keep INTEGER)
RETURNS INTEGER AS $$
DECLARE
    archived_count INTEGER;
BEGIN
    -- Archive to cold storage
    WITH archived AS (
        INSERT INTO events_archive
        SELECT * FROM events
        WHERE timestamp < NOW() - (days_to_keep || ' days')::INTERVAL
        RETURNING *
    )
    SELECT COUNT(*) INTO archived_count FROM archived;
 
    -- Delete archived events
    DELETE FROM events
    WHERE timestamp < NOW() - (days_to_keep || ' days')::INTERVAL;
 
    RETURN archived_count;
END;
$$ LANGUAGE plpgsql;
 
-- Schedule with pg_cron (requires pg_cron extension)
SELECT cron.schedule('cleanup-old-events', '0 2 * * *',
    'SELECT archive_old_events(90)');

Export to S3

#!/bin/bash
# export-to-s3.sh
 
# Export events older than 90 days
docker compose exec -T postgres psql -U telemetry -d telemetry -c "
    COPY (
        SELECT * FROM events
        WHERE timestamp < NOW() - INTERVAL '90 days'
    ) TO STDOUT WITH CSV HEADER
" | gzip | aws s3 cp - "s3://telemetry-archive/events-$(date +%Y%m%d).csv.gz"
 
# Delete exported events
docker compose exec -T postgres psql -U telemetry -d telemetry -c "
    DELETE FROM events
    WHERE timestamp < NOW() - INTERVAL '90 days'
"

Disaster Recovery

Backup Strategy

3-2-1 Rule: 3 copies, 2 different media, 1 offsite

#!/bin/bash
# backup.sh - Daily backup script
 
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups"
S3_BUCKET="s3://telemetry-backups"
 
# 1. PostgreSQL dump
docker compose exec -T postgres pg_dump -U telemetry telemetry | gzip > "$BACKUP_DIR/db-$DATE.sql.gz"
 
# 2. Copy to S3
aws s3 cp "$BACKUP_DIR/db-$DATE.sql.gz" "$S3_BUCKET/"
 
# 3. Keep local backups for 7 days
find $BACKUP_DIR -name "db-*.sql.gz" -mtime +7 -delete
 
# 4. Verify backup integrity
gunzip -t "$BACKUP_DIR/db-$DATE.sql.gz" && echo "✓ Backup verified"

Point-in-Time Recovery

Enable WAL archiving:

# docker-compose.yml
postgres:
  image: postgres:16
  environment:
    POSTGRES_DB: telemetry
    POSTGRES_USER: telemetry
    POSTGRES_PASSWORD: password
  volumes:
    - postgres_data:/var/lib/postgresql/data
    - ./wal_archive:/var/lib/postgresql/wal_archive
  command: |
    postgres
    -c wal_level=replica
    -c archive_mode=on
    -c archive_command='test ! -f /var/lib/postgresql/wal_archive/%f && cp %p /var/lib/postgresql/wal_archive/%f'
    -c max_wal_senders=3

Performance Tuning

PostgreSQL Configuration

# postgresql.conf optimizations
 
# Memory
shared_buffers = 256MB              # 25% of RAM
effective_cache_size = 1GB          # 50-75% of RAM
maintenance_work_mem = 64MB
work_mem = 16MB
 
# Checkpoints
checkpoint_completion_target = 0.9
wal_buffers = 16MB
max_wal_size = 1GB
min_wal_size = 80MB
 
# Query planner
random_page_cost = 1.1              # For SSD
effective_io_concurrency = 200      # For SSD
 
# Connections
max_connections = 100

Batch Size Optimization

// Process events in optimal batch sizes
const OPTIMAL_BATCH_SIZE: usize = 500;
 
async fn handle_batch(
    State(pool): State<PgPool>,
    Json(events): Json<Vec<Event>>,
) -> Result<Json<BatchResponse>, StatusCode> {
    let mut total_inserted = 0;
 
    // Process in chunks
    for chunk in events.chunks(OPTIMAL_BATCH_SIZE) {
        let inserted = insert_events(&pool, chunk).await
            .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
        total_inserted += inserted;
    }
 
    Ok(Json(BatchResponse {
        accepted: total_inserted,
        rejected: events.len() - total_inserted,
    }))
}

Operational Playbooks

Incident Response

High Latency:

Check database connections: SELECT count(*) FROM pg_stat_activity;
Check slow queries: SELECT * FROM pg_stat_activity WHERE state != 'idle' AND query_start < NOW() - INTERVAL '1 minute';
Check disk I/O: iostat -x 1
Scale horizontally if needed

Database Full:

Check disk space: df -h
Archive old events: SELECT archive_old_events(30);
Vacuum full: VACUUM FULL events;
Increase disk if needed

High Memory Usage:

Check connections: Reduce max_connections
Tune shared_buffers and work_mem
Restart PostgreSQL: docker compose restart postgres

Runbook: Deploy New Version

#!/bin/bash
# deploy.sh - Zero-downtime deployment
 
# 1. Pull latest code
git pull origin main
 
# 2. Build new image
docker compose build api
 
# 3. Run database migrations
docker compose exec postgres psql -U telemetry -d telemetry -f /migrations/new_migration.sql
 
# 4. Rolling restart
for i in {1..3}; do
    docker compose up -d --no-deps --scale api=$i api
    sleep 30  # Wait for health checks
done
 
# 5. Verify
curl -f https://telemetry.example.com/health || { echo "Health check failed"; exit 1; }
 
echo "✓ Deployment successful"

Cost Optimization

Database Storage

-- Compress old JSONB data
UPDATE events
SET event_data = jsonb_strip_nulls(event_data)
WHERE timestamp < NOW() - INTERVAL '30 days';
 
-- Remove unused columns from archive
CREATE TABLE events_archive_compressed AS
SELECT event_id, org_id, app_id, event_type, timestamp
FROM events_archive;
 
DROP TABLE events_archive;
ALTER TABLE events_archive_compressed RENAME TO events_archive;

Caching Strategy

use moka::future::Cache;
 
// Cache aggregated results
let cache: Cache<String, QueryResult> = Cache::builder()
    .max_capacity(10_000)
    .time_to_live(Duration::from_secs(3600))  // 1 hour
    .build();
 
async fn get_aggregated_stats(
    org_id: &str,
    cache: &Cache<String, QueryResult>,
    pool: &PgPool,
) -> Result<QueryResult> {
    let cache_key = format!("stats:{}", org_id);
 
    cache.try_get_with(cache_key, async {
        // Expensive database query
        query_database(pool, org_id).await
    }).await
}

Self-Hosting

On this page