JMX + Prometheus + alerts
The metrics that matter
Kafka exposes broker, producer, consumer and Streams internals via JMX. In a modern deployment you scrape them with the JMX Exporter for Prometheus and alert on:
- Consumer lag per group, per partition —
kafka_consumergroup_lag. - Under-replicated partitions —
kafka_server_replicamanager_underreplicatedpartitions. - Request latency p99 per API.
- Disk usage per log dir.
Tools like Burrow and Cruise Control turn raw lag into health calls and automatic partition rebalancing respectively.