Skip to content
The Codebase Assurance Index, explained

What we measure — the Codebase Assurance Index

The Codebase Assurance Index (CAI) is one reproducible 0–100 score for a whole C#/.NET codebase. It rolls up ten lenses — most dimensions measured by deterministic tools, a few given an advisory, tolerance-banded LLM read. A measurement, not an opinion.

A quick orientation

What makes the Codebase Assurance Index different.

Reproducible

Every dimension is computed by a deterministic tool reading your code. Commit + frozen rubric → exactly one score. Same commit, same rubric, same advisory data — same number.

Depth is never gated

Every plan — including the free plan — computes the full CAI: all dimensions, all lenses. You never pay for depth; you pay for breadth (portfolio roll-up, team analytics) and cadence.

Verifiable

Read the exact rule each dimension was scored by, and run the same measurement on your own code — the algorithm and rubric are open.

How we measure

Graded by the open CAI standard — across ten lenses.

Five are always on; five light up with your architecture. We don't just score them — we locate every finding to file:line, trend each lens scan over scan, and hand you what to fix. The standard is open: each lens links to its exact dimensions on cai.canine.dev.

Always on

Light up with your architecture

The full vocabulary — every dimension, its evaluator and rubric version — lives on the open standard. Browse the catalog →

  • Not a CI scanner or linterNever scores a line or blocks a merge.
  • Not a SAST / dataflow engineReads their signal; doesn't out-depth one.
  • Not a coding agentNever edits, commits, pushes or opens a PR.
  • Not a certifierRecords the evidence; a named human signs.
  • An independent surveyorOne altitude above your scanners.
  • One reproducible CAISigned, commit-pinned — re-runs to the same number.
  • A read-only oracleServes every finding to your agent over MCP.
  • A whole-system surveyArchitecture, maturity, compliance & risk in one report.
How we measure

The method is an open standard.

Watchdog doesn't grade by house style — it measures against CAI, the Codebase Assurance Index: an open, reproducible 0–100 standard. The full algorithm and the worst-first fold, the firewall between the deterministic score and the advisory layer, the four git-history-mined dimensions (hotspots, bus factor, knowledge freshness, change coupling), and exactly what raises or lowers a score all live on the standard — open to read, cite, or recompute.

The firewall

Nothing moves the number but the code.

The deterministic score sits on one side of a firewall; an advisory LLM read sits on the other — and it can never cross. That's the difference between a measurement and asking an LLM, which answers differently every time.

The AI only ever advises ◐

A few findings get an advisory, tolerance-banded LLM read that can never, by construction, move the headline number. It explains in plain English; it never scores. The measurement stays pure.

Your inputs never score

Your compliance declarations, a suppressed finding, your contract profile — they change what the artifact says, never the CAI. A declaration is presentation; the score is measurement. Neither party to a contract can tilt the number — only the code changing moves it (or a disclosed advisory refresh like a new CVE).

The full firewall is drawn out on the standard: cai.canine.dev/spec.

From the lenses to one number

How the lenses roll up.

The CAI is a weighted roll-up of the lens scores under the frozen rubric — not an average you can't see inside.

Core always counts; conditional lenses only when they apply

The five core lenses (Code Health, Architecture, Maturity, Production-Readiness, Security & Compliance) always contribute. The conditional lenses (Domain Modelling, Event-Driven, Event Sourcing, Accessibility, Performance) contribute only when the code calls for them, and the weights re-normalise — so a repo is never penalised for a lens that doesn't apply.

A critical lens caps the headline

The roll-up can't read Strong while a lens reads Critical: a single critical-band lens caps the CAI, so the one number can't hide a serious failure in one dimension behind strong scores elsewhere.

So a contract floor of CAI ≥ 80 means every always-on lens is Strong or better with no lens Critical — decomposable, not opaque.

Calibrated, not noisy

A reading you can act on — not a thousand findings to triage.

A measurement is only useful if you can trust what it surfaces. Watchdog is calibrated against a corpus of real .NET codebases, so the idioms a line-level checker trips over — a repository that coheres through a base class, a test that asserts through a harness, an interface a façade is obliged to implement — don't read as defects. Zero setup, no rule-tuning weekend: the false-positives are calibrated out before you ever see them.

Tuned on real code

Every detector is tested that it fires on the real defect and stays quiet on the idiom, against a public reference corpus. On that corpus the typical repository's findings are over 95% real, and reference clean-architecture codebases exceed 99%.

Disagree, and it learns

Any scored finding can be disputed in one click — routed to human triage, and a confirmed false-positive becomes a detector test so it can't recur. The instrument sharpens with use; the score never bends to the dispute.

Quiet by design

Findings are ranked by what moves the grade, folded so one stray stub barely registers, and a lens returns not-measured, with the reason rather than a phantom zero. Volume is never mistaken for rigour.

A worked example — cohesion

Cohesion metrics like LCOM4 count how a class's methods share internal state: a class whose methods touch unrelated fields reads as low cohesion — split it up. But a well-designed domain aggregate — an Order that exposes AddItem, ChangeShipping, Cancel — is cohesive by its invariant: it's the single place those rules are enforced, even though those methods touch different fields. To a raw metric, that clean aggregate looks identical to a genuine god-class. Most tools just report the number and leave you to sort the false alarms. We don't: Watchdog recognises the shapes LCOM4 provably mis-measures — domain aggregates, data-access repositories, source-generated view-models, contract-mandated plumbing — and exempts them, while still flagging the real god-object: the service that injects a dozen collaborators and bolts unrelated concerns together. The result is a cohesion signal you can act on instead of a list you have to triage.

Calibration is an ongoing programme — idiom-heavy codebases still surface residual noise we keep tuning down, and every dispute feeds the next round.

Rubric versioning

Freeze the rubric, keep the score constant.

Watchdog scores with a versioned rubric. Any change that can move a score for unchanged code bumps the rubric version — and the rubric is contestable: a scoring change that isn't reflected in the published spec fails our CI, so every number stays re-derivable from a rule you can read. Engagements can pin a repository to a frozen rubric for the duration of a contract — so the CAI you underwrite at LOI and the number at close are directly comparable.

Contract rubrics

Pin a repository to a frozen rubric and the ruler stops moving under you — the same commit re-scores to the same number under that rubric, so any movement you see is the asset changing, never the ruler. (Advisory data still refreshes, so a new CVE can legitimately move a security finding — a real signal, disclosed in the changelog.) The current version and full history live on the open standard.

Get the measurement. No depth is ever gated.

Sign in with GitHub · no card · C#/.NET · the first full report is €0.