All posts

ML Model Monitoring Best Practices for Production Systems

A practitioner guide to the metrics, drift detection methods, alerting thresholds, and tooling that keep production ML reliable — without drowning your on-call in noise.
June 21, 2026
How to Detect Data Drift: Statistical Tests, Thresholds, and Production Wiring

A practitioner's guide to how to detect data drift: PSI, KS, Wasserstein, and Jensen-Shannon compared, with Evidently code, threshold guidance, and real production caveats.
June 20, 2026
How to Monitor LLM in Production: Metrics, Drift, and Alerting

A practitioner's guide to production LLM monitoring — covering TTFT, token throughput, output quality drift, hallucination signals, and alerting with
June 12, 2026
Weights & Biases vs MLflow vs Comet (2026): Choosing by Constraint, Not Hype

Three tools that look interchangeable in their marketing solve subtly different problems. An honest breakdown of W&B, MLflow, and Comet — what each owns
May 22, 2026
Alerting for ML Model Drift: A Practical Setup

Drift alerting fails in one of two ways — it never fires, or it fires constantly until everyone mutes it. A concrete setup for alerts that fire when
May 22, 2026
LLM Cost & Latency Observability with OpenTelemetry

Token spend and tail latency are the two metrics that decide whether an LLM feature ships or gets killed. How to instrument both with OpenTelemetry so you
May 22, 2026
The Open-Source ML Observability Stack: Evidently to Phoenix

An honest breakdown of the three open-source tools most teams reach for — what problem each was built for, where they overlap, where they don't, and how
May 10, 2026
Closing the Eval-Prod Gap: Online Evaluation as Observability

Offline eval scores are green and production is worse. The gap is not a measurement error — it is structural. Here is how to instrument online evaluation
May 9, 2026
Embedding and Vector-Store Observability: The Unwatched Layer

RAG systems fail at the embedding and index layer long before the LLM does. Here is what to actually monitor: embedding drift, index staleness, recall
May 8, 2026
End-to-End Tracing for LLM Applications: What Belongs in a Span

Production LLM apps span multiple model calls, tool invocations, retrieval steps, and re-tries. A complete trace makes them debuggable; a sparse one
May 6, 2026