Tag
#observability
7 posts tagged observability.
- tooling
Weights & Biases vs MLflow vs Comet (2026): Choosing by Constraint, Not Hype
Three tools that look interchangeable in their marketing solve subtly different problems. An honest breakdown of W&B, MLflow, and Comet — what each owns, where the real trade-off is, and how to pick by your actual constraint.
- ops
Alerting for ML Model Drift: A Practical Setup
Drift alerting fails in one of two ways — it never fires, or it fires constantly until everyone mutes it. A concrete setup for alerts that fire when performance is actually at risk, and stay quiet when it isn't.
- ops
LLM Cost & Latency Observability with OpenTelemetry
Token spend and tail latency are the two metrics that decide whether an LLM feature ships or gets killed. How to instrument both with OpenTelemetry so you can answer 'why did this cost double?' in a query, not a war room.
- tooling
The Open-Source ML Observability Stack: Evidently to Phoenix
An honest breakdown of the three open-source tools most teams reach for — what problem each was built for, where they overlap, where they don't, and how to assemble them without buying a platform you don't need yet.
- ops
Closing the Eval-Prod Gap: Online Evaluation as Observability
Offline eval scores are green and production is worse. The gap is not a measurement error — it is structural. Here is how to instrument online evaluation so production quality becomes observable.
- ops
Embedding and Vector-Store Observability: The Unwatched Layer
RAG systems fail at the embedding and index layer long before the LLM does. Here is what to actually monitor: embedding drift, index staleness, recall decay, and retrieval quality in production.
- ops
End-to-End Tracing for LLM Applications: What Belongs in a Span
Production LLM apps span multiple model calls, tool invocations, retrieval steps, and re-tries. A complete trace makes them debuggable; a sparse one leaves you guessing.