All posts
-
ML Model Monitoring Best Practices for Production Systems
A practitioner guide to the metrics, drift detection methods, alerting thresholds, and tooling that keep production ML reliable — without drowning your on-call in noise.
-
How to Detect Data Drift: Statistical Tests, Thresholds, and Production Wiring
A practitioner's guide to how to detect data drift: PSI, KS, Wasserstein, and Jensen-Shannon compared, with Evidently code, threshold guidance, and real production caveats.
-
How to Monitor LLM in Production: Metrics, Drift, and Alerting
A practitioner's guide to production LLM monitoring — covering TTFT, token throughput, output quality drift, hallucination signals, and alerting with
-
Weights & Biases vs MLflow vs Comet (2026): Choosing by Constraint, Not Hype
Three tools that look interchangeable in their marketing solve subtly different problems. An honest breakdown of W&B, MLflow, and Comet — what each owns
-
Alerting for ML Model Drift: A Practical Setup
Drift alerting fails in one of two ways — it never fires, or it fires constantly until everyone mutes it. A concrete setup for alerts that fire when
-
LLM Cost & Latency Observability with OpenTelemetry
Token spend and tail latency are the two metrics that decide whether an LLM feature ships or gets killed. How to instrument both with OpenTelemetry so you
-
The Open-Source ML Observability Stack: Evidently to Phoenix
An honest breakdown of the three open-source tools most teams reach for — what problem each was built for, where they overlap, where they don't, and how
-
Closing the Eval-Prod Gap: Online Evaluation as Observability
Offline eval scores are green and production is worse. The gap is not a measurement error — it is structural. Here is how to instrument online evaluation
-
Embedding and Vector-Store Observability: The Unwatched Layer
RAG systems fail at the embedding and index layer long before the LLM does. Here is what to actually monitor: embedding drift, index staleness, recall
-
End-to-End Tracing for LLM Applications: What Belongs in a Span
Production LLM apps span multiple model calls, tool invocations, retrieval steps, and re-tries. A complete trace makes them debuggable; a sparse one