Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-03-28 07:00 to 2026-04-04 07:00. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

7
Daily Runs
4/7
Daily Green
19m 58s
Avg Daily Runtime
7
Smoke Attempts
4/7
Smoke Green
1m 54s
Avg Smoke Runtime
2m 50s
Median Smoke Time
2
Current Green Streak

Executive Analysis

Bottom line: release confidence is unstable in both the broad regression path and the deploy smoke path. The immediate job is to separate real product regressions from execution noise, then burn down the concentrated failure clusters.

What Matters

  • Daily regression passed 4 of 7 runs (57.1%), with a current green streak of 2 and a best streak of 2 in this window.
  • Smoke passed 4 of 7 attempts (57.1%) across 5 production pipelines. 2 pipeline(s) recovered on rerun, which is useful for continuity but also a sign that first-pass deploy signal is noisier than it should be.
  • Failure concentration is not random: Frontend has the highest strict failure ratio at 0.39%, while Frontend has the broadest non-pass footprint at 0.39%.
  • Frontend is the weakest smoke surface in this window at 2/5 green (40.0%).
  • Daily-suite runtime averaged 19m 58s, while observed daily test volume moved from 1,189 to 1,233.

Engineering Analysis

  • A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries and incomplete smoke attempts suggest those two failure modes are still partially mixed together.
  • The failure profile is concentrated enough to act on. Frontend and Frontend are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.

Recommended Actions

  • Assign one owner to Frontend for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Daily Daily Suite Status0000103-2803-3004-0104-03
Daily Smoke Attempts0123403-2803-3004-0104-03
Daily Average Daily Suite Runtime19m 23s19m 43s20m 02s20m 22s20m 42s03-2803-3004-0104-03
Daily Average Smoke Runtime0m 00s0m 51s1m 41s2m 32s3m 22s03-2803-3004-0104-03
Daily Suite Total Test Growth (Recent 7 Runs)1189120012111222123303-2803-3004-0104-03
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
10356085110Frontend 04-01: 110Frontend 04-02: 110Frontend 04-03: 110University 04-01: 10University 04-02: 1004-0104-0204-03

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing7560000.00%0.00%0
Web52380000.00%0.00%0
Frontend17857000.39%0.39%3
Library6020000.00%0.00%0
CatFailF%NP%Tot
Billing
Pend 0Skip 0Runs 0
0
0.00%
0.00%
756
Web
Pend 0Skip 0Runs 0
0
0.00%
0.00%
5238
Frontend
Pend 0Skip 0Runs 3
7
0.39%
0.39%
1785
Library
Pend 0Skip 0Runs 0
0
0.00%
0.00%
602

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-03-28 18:23150332BillingWebFrontendLibraryPASSEDTotal 1189 | Passed 1189 | Failed 0
2026-03-29 18:23150337BillingWebFrontendLibraryPASSEDTotal 1189 | Passed 1189 | Failed 0
2026-03-30 18:22150455BillingWebFrontendLibraryFAILEDTotal 1189 | Passed 1188 | Failed 1
2026-03-31 18:22150645BillingWebFrontendLibraryFAILEDTotal 1189 | Passed 1187 | Failed 2
2026-04-01 18:24150842BillingWebFrontendLibraryFAILEDTotal 1196 | Passed 1192 | Failed 4
2026-04-02 18:23151093BillingWebFrontendLibraryPASSEDTotal 1196 | Passed 1196 | Failed 0
2026-04-03 18:23151309BillingWebFrontendLibraryPASSEDTotal 1233 | Passed 1233 | Failed 0
2026-03-28 18:23Pipeline 150332BillingWebFrontendLibrary
PASSED
T 1189 | P 1189 | F 0 | Pend 0
2026-03-29 18:23Pipeline 150337BillingWebFrontendLibrary
PASSED
T 1189 | P 1189 | F 0 | Pend 0
2026-03-30 18:22Pipeline 150455BillingWebFrontendLibrary
FAILED
T 1189 | P 1188 | F 1 | Pend 0
2026-03-31 18:22Pipeline 150645BillingWebFrontendLibrary
FAILED
T 1189 | P 1187 | F 2 | Pend 0
2026-04-01 18:24Pipeline 150842BillingWebFrontendLibrary
FAILED
T 1196 | P 1192 | F 4 | Pend 0
2026-04-02 18:23Pipeline 151093BillingWebFrontendLibrary
PASSED
T 1196 | P 1196 | F 0 | Pend 0
2026-04-03 18:23Pipeline 151309BillingWebFrontendLibrary
PASSED
T 1233 | P 1233 | F 0 | Pend 0

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-04-01 13:00University150721University smokePASSED1002m 32s
2026-04-01 13:06Frontend150721Frontend smokeFAILED040m 45s
2026-04-02 16:45University151068University smokePASSED1002m 11s
2026-04-02 16:46Frontend151068Frontend smokeFAILED040m 38s
2026-04-02 17:46Frontend151088Frontend smokeFAILED040m 43s
2026-04-02 23:17Frontend151131Frontend smokePASSED11003m 07s
2026-04-03 11:32Frontend151173Frontend smokePASSED11003m 22s

Smoke Suite Breakdown

Frontend
5 attempts across 5 pipelines
40% green
Passed2
Failed3
Incomplete0
Avg runtime1m 43s
Median passing runtime3m 14s
Pipelines5
University
2 attempts across 2 pipelines
100% green
Passed2
Failed0
Incomplete0
Avg runtime2m 22s
Median passing runtime2m 22s
Pipelines2
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.