Ops Briefing Surface

Production Reliability Dashboard

Generated 2026-04-01 17:24 for 2026-03-16 00:00 to 2026-03-23 00:00 from Pingdom checks, Slack #_alerts_prod, and AWS SNS alerts.

All sources Pingdom customer checks Slack alert families AWS alarm emails
Email-confirmed customer incidents0Pingdom down/slow events confirmed by inbox alertsObserved in Pingdom: 2
Impacted services15Mapped from Slack and Pingdom evidence
AWS alarms in ALARM1Still alarming at window end
Latest observed signal2026-03-23 00:00Most recent cross-source activity
Executive summary

What needs attention

No dominant issue stood out in this window.

Pingdom customer impact

External signal
No criticalActive: 0Total seen: 0

No strong signal in this lane.

No active issue listed in this category.

Slack impacted services

Application signal
No criticalActive: 0Total seen: 0

No strong signal in this lane.

No active issue listed in this category.

AWS alarms

Infrastructure signal
No criticalActive: 0Total seen: 0

No strong signal in this lane.

No active issue listed in this category.

What to do next

  1. NowReview the evidence categories below

    No high-priority action was pre-ranked in this window.

    Dashboard overview
Global Evidence Explorer

Global Evidence Explorer

Report-wide charts and tables stay here, separate from the active investigation scope.

Application + Infrastructure Alerts by Day

Pingdom latency + downtime by source

Customer View

Pingdom Checks

Pingdom CheckStatusEventsDowntimeLast SeenLikely ServicesCorrelated Evidence
https://www.adservio.ro/api/v2/statusNo recent customer-visible issue22m2026-03-23 00:00adservio-ro-api-v2-status
Adservio RoNo recent customer-visible issue00m2026-03-23 00:00adservio-ro

Pingdom rows show externally visible signal first. The correlated evidence column helps tie the failing check back to services, Slack alert families, or AWS alarms when those links exist.

Application View

Slack Impacted Service / Resource View

This view attributes alerts to the workload or resource named in the alert text. Grafana, Loki, and Tempo are treated as observability components and are excluded when a more specific impacted target is also present.

Impacted Service / ResourceHighest SeverityCountLast SeenStatusTop Alert TypesDiscussion SignalLatest Thread Note
uni-api-svc-4000Critical122026-03-20 21:47No recent signalTraefikServiceHighErrorRate (12)Observability storageGeneral investigation
SQL Integrity Constraint Violation: (conn=3570102) Cannot add or update a child row: a foreign key constraint fails (`ums_uni_catalog`.`dis… | Andrei Alexandru
accommodations-api-svc-4100Critical52026-03-18 13:24No recent signalTraefikServiceHighErrorRate (5)General investigation
Ionut Ciolan | de ce apar stack trace-urile alea? nu par folositoare
admission-end-session-29561780Warning142026-03-19 07:46No recent signalKubeJobFailed (14)None
docgen2-apiWarning62026-03-19 13:11No recent signalKubeHpaMaxedOut (6)None
ai-api-svc-3900Warning42026-03-20 16:35No recent signalTraefikServiceHighLatency (4)None
grafanaWarning42026-03-19 12:27No recent signalNodeSystemSaturation (3)NodeCPUHighUsage (1)None
subscriptions-apiWarning32026-03-20 11:11No recent signalKubeHpaMaxedOut (3)None
admission-end-session-29556020Warning32026-03-16 10:52No recent signalKubeJobFailed (3)None
admission-end-session-29557460Warning32026-03-16 10:52No recent signalKubeJobFailed (3)None
admission-end-session-29558900Warning32026-03-16 10:52No recent signalKubeJobFailed (3)None
admission-end-session-29560340Warning32026-03-16 10:52No recent signalKubeJobFailed (3)None
update-recurenta-29554565Warning32026-03-16 10:52No recent signalKubeJobFailed (3)None
web-80Warning12026-03-17 04:13No recent signalTraefikServiceHighLatency (1)None
Evidence

Slack Alert Families

AlertSeverityCountLast SeenStatusThreadsTop Impacted ServicesDiscussion SignalLatest Thread Note
TraefikServiceHighErrorRateCritical172026-03-20 21:47No recent signal3uni-api-svc-4000 (12)accommodations-api-svc-4100 (5)General investigationObservability storage
SQL Integrity Constraint Violation: (conn=3570102) Cannot add or update a child row: a foreign key constraint fails (`ums_uni_catalog`.`dis… | Andrei Alexandru
KubeJobFailedWarning172026-03-19 07:46No recent signal0admission-end-session-29561780 (14)admission-end-session-29556020 (3)admission-end-session-29557460 (3)admission-end-session-29558900 (3)admission-end-session-29560340 (3)None
KubeHpaMaxedOutWarning92026-03-20 11:11No recent signal0docgen2-api (6)subscriptions-api (3)None
TraefikServiceHighLatencyWarning52026-03-20 16:35No recent signal0ai-api-svc-3900 (4)web-80 (1)None
NodeSystemSaturationWarning32026-03-19 12:27No recent signal0grafana (3)None
NodeCPUHighUsageWarning12026-03-18 14:53No recent signal0grafana (1)None

Status is heuristic. Slack rarely posts explicit resolutions, so “Seen today” or “Recent” means the alert family still appeared in production recently, not that it is definitely unresolved.

AWS Email Alarm Families

AWS AlarmEmailsALARMOKState FlipsFirst SeenLast SeenLatest StateStatus
adservio-rds-mysql-master-memory-low281414272026-03-16 00:552026-03-18 07:09ALARMStill alarming
adservio-rds-mysql-master-write-latency-high21112026-03-16 23:082026-03-16 23:13OKLatest OK

“Flapping, latest OK” means the most recent email was an OK, but the alarm toggled repeatedly and is still a reliability concern.

Global Discussion-Derived Signal

Thread DateAlertSeverityServicesSignalKey Notes
2026-03-20 21:47TraefikServiceHighErrorRateCriticaluni-api-svc-4000General investigation
SQL Integrity Constraint Violation: (conn=3570102) Cannot add or update a child row: a foreign key constraint fails (`ums_uni_catalog`.`dis… | Andrei Alexandru
2026-03-18 12:48TraefikServiceHighErrorRateCriticalaccommodations-api-svc-4100General investigation
Ionut Ciolan | de ce apar stack trace-urile alea? nu par folositoare
2026-03-17 10:45TraefikServiceHighErrorRateCriticaluni-api-svc-4000Observability storage
errors.group-already-exists