What we measure — the Codebase Assurance Index
The Codebase Assurance Index (CAI) is one reproducible 0–100 score for a whole C#/.NET codebase. It rolls up ten lenses — most dimensions measured by deterministic tools, a few given an advisory, tolerance-banded LLM read. A measurement, not an opinion.
What makes the Codebase Assurance Index different.
Every dimension is computed by a deterministic tool reading your code. Commit + frozen rubric → exactly one score. Same commit, same rubric, same advisory data — same number.
Every plan — including the free plan — computes the full CAI: all dimensions, all lenses. You never pay for depth; you pay for breadth (portfolio roll-up, team analytics) and cadence.
Read the exact rule each dimension was scored by, and run the same measurement on your own code — the algorithm and rubric are open.
Graded by the open CAI standard — across ten lenses.
Five are always on; five light up with your architecture. We don't just score them — we locate every finding to file:line, trend each lens scan over scan, and hand you what to fix. The standard is open: each lens links to its exact dimensions on cai.canine.dev.
Always on
Light up with your architecture
The full vocabulary — every dimension, its evaluator and rubric version — lives on the open standard. Browse the catalog →
-
Not a CI scanner or linterNever scores a line or blocks a merge.
-
Not a SAST / dataflow engineReads their signal; doesn't out-depth one.
-
Not a coding agentNever edits, commits, pushes or opens a PR.
-
Not a certifierRecords the evidence; a named human signs.
-
An independent surveyorOne altitude above your scanners.
-
One reproducible CAISigned, commit-pinned — re-runs to the same number.
-
A read-only oracleServes every finding to your agent over MCP.
-
A whole-system surveyArchitecture, maturity, compliance & risk in one report.
The method is an open standard.
Watchdog doesn't grade by house style — it measures against CAI, the Codebase Assurance Index: an open, reproducible 0–100 standard. The full algorithm and the worst-first fold, the firewall between the deterministic score and the advisory layer, the four git-history-mined dimensions (hotspots, bus factor, knowledge freshness, change coupling), and exactly what raises or lowers a score all live on the standard — open to read, cite, or recompute.
Nothing moves the number but the code.
The deterministic score sits on one side of a firewall; an advisory LLM read sits on the other — and it can never cross. That's the difference between a measurement and asking an LLM, which answers differently every time.
A few findings get an advisory, tolerance-banded LLM read that can never, by construction, move the headline number. It explains in plain English; it never scores. The measurement stays pure.
Your compliance declarations, a suppressed finding, your contract profile — they change what the artifact says, never the CAI. A declaration is presentation; the score is measurement. Neither party to a contract can tilt the number — only the code changing moves it (or a disclosed advisory refresh like a new CVE).
The full firewall is drawn out on the standard: cai.canine.dev/spec.
How the lenses roll up.
The CAI is a weighted roll-up of the lens scores under the frozen rubric — not an average you can't see inside.
The five core lenses (Code Health, Architecture, Maturity, Production-Readiness, Security & Compliance) always contribute. The conditional lenses (Domain Modelling, Event-Driven, Event Sourcing, Accessibility, Performance) contribute only when the code calls for them, and the weights re-normalise — so a repo is never penalised for a lens that doesn't apply.
The roll-up can't read Strong while a lens reads Critical: a single critical-band lens caps the CAI, so the one number can't hide a serious failure in one dimension behind strong scores elsewhere.
So a contract floor of CAI ≥ 80 means every always-on lens is Strong or better with no lens Critical — decomposable, not opaque.
A reading you can act on — not a thousand findings to triage.
A measurement is only useful if you can trust what it surfaces. Watchdog is calibrated against a corpus of real .NET codebases, so the idioms a line-level checker trips over — a repository that coheres through a base class, a test that asserts through a harness, an interface a façade is obliged to implement — don't read as defects. Zero setup, no rule-tuning weekend: the false-positives are calibrated out before you ever see them.
Every detector is tested that it fires on the real defect and stays quiet on the idiom, against a public reference corpus. On that corpus the typical repository's findings are over 95% real, and reference clean-architecture codebases exceed 99%.
Any scored finding can be disputed in one click — routed to human triage, and a confirmed false-positive becomes a detector test so it can't recur. The instrument sharpens with use; the score never bends to the dispute.
Findings are ranked by what moves the grade, folded so one stray stub barely registers, and a lens returns not-measured, with the reason rather than a phantom zero. Volume is never mistaken for rigour.
Cohesion metrics like LCOM4 count how a class's methods share internal state: a class whose methods
touch unrelated fields reads as low cohesion — split it up. But a well-designed domain
aggregate — an Order that exposes AddItem, ChangeShipping,
Cancel — is cohesive by its invariant: it's the single place those rules are
enforced, even though those methods touch different fields. To a raw metric, that clean aggregate looks
identical to a genuine god-class. Most tools just report the number and leave you to sort the false
alarms. We don't: Watchdog recognises the shapes LCOM4 provably mis-measures — domain aggregates,
data-access repositories, source-generated view-models, contract-mandated plumbing — and exempts them,
while still flagging the real god-object: the service that injects a dozen collaborators and bolts
unrelated concerns together. The result is a cohesion signal you can act on instead of a list you have
to triage.
Calibration is an ongoing programme — idiom-heavy codebases still surface residual noise we keep tuning down, and every dispute feeds the next round.
Freeze the rubric, keep the score constant.
Watchdog scores with a versioned rubric. Any change that can move a score for unchanged code bumps the rubric version — and the rubric is contestable: a scoring change that isn't reflected in the published spec fails our CI, so every number stays re-derivable from a rule you can read. Engagements can pin a repository to a frozen rubric for the duration of a contract — so the CAI you underwrite at LOI and the number at close are directly comparable.
Pin a repository to a frozen rubric and the ruler stops moving under you — the same commit re-scores to the same number under that rubric, so any movement you see is the asset changing, never the ruler. (Advisory data still refreshes, so a new CVE can legitimately move a security finding — a real signal, disclosed in the changelog.) The current version and full history live on the open standard.
Get the measurement. No depth is ever gated.
Sign in with GitHub · no card · C#/.NET · the first full report is €0.