agents-govern in this project
Language: Suomeksi → governance-study.md
This page describes what the agents-govern framework is, what
problems it tries to solve, what "learning" means inside the
framework, and how this project (blue-marlin) uses it.
The structural side (agents, communication channels, gates) lives in its own document: Agents map.
What is agents-govern?
Source, lightly adapted: agents-govern README (CC-BY-SA-4.0).
agents-govern is a governance framework for multi-agent AI systems in software development. When multiple AI agents work together on a codebase — planning, coding, testing, reviewing, deploying — they need boundaries, quality gates, and accountability. Without these, you get authority conflicts, capability drift, accountability gaps, and knowledge decay. The framework defines the structure to prevent those failures.
What this is: Governance of multi-agent collaboration in software development workflows — the boundaries, gates, and accountability needed when AI agents (and humans) collaborate to plan, code, review, test, and deploy software.
What this is not:
- Not a general AI governance platform. The framework does not govern model behavior, prompt content, or AI products as artifacts.
- Not an ML governance framework. It does not handle experiment tracking, model cards, dataset lineage, evaluation sign-off, or drift monitoring.
- Not an agent runtime or orchestrator. It defines roles, gates, and artifacts; it does not execute agents.
The framework is open source (CC-BY-SA-4.0). This project uses v0.34.0 (installed from the release tarball on 2026-04-26).
Five problems the framework addresses
Source, lightly adapted: framework.md §1 (CC-BY-SA-4.0).
Before designing agents, understand what goes wrong without governance. These problems were identified empirically in production multi-agent systems:
-
Authority without boundaries. Two agents both believe they own a technical decision. The Planner scopes a feature one way; the Architect redesigns it. Neither knows the other acted — the result is incoherent oscillation between competing visions.
-
Capability drift. An agent asked to "improve the documentation" decides that means refactoring the codebase. A "review this PR" agent starts making its own commits. Without constraints, agents expand their scope to match their capabilities, not their mandate.
-
The accountability gap. Agent A delegates to Agent B, which calls Agent C, which modifies a shared resource. When something breaks, there is no trace of the delegation chain. You see the symptom but not the cause.
-
Local optimization, global misalignment. Each agent optimizes its local objective. The coder writes elegant code, the tester achieves high coverage, the deployer ships fast. Each is right within its own scope, but the system-level outcome can still be wrong.
-
Knowledge decay. What the framework has previously learned evaporates. The same bug is rediscovered repeatedly because no prior solution (or attempted solution) is recorded in any searchable form.
"Learning" in this context
agents-govern is an evidence-driven framework. That phrase has concrete structural meaning:
What "learning" is NOT
- It is not AI model training — the framework does not touch model weights or fine-tuning datasets.
- It is not prompt tuning for a single task.
- It is not code refactoring.
What "learning" IS
A learning record in the framework is a YAML structure that captures one concrete observation from running the project's governed pipeline. Each entry contains, at minimum:
- Category —
gap(missing check),validation(a model confirmed to work),adaptation(a project-specific tweak),tension(a conflict between rules) - Severity —
informational,minor,significant,critical - What happened — prose description of the situation
- Which agents / gates / rules were involved
- Which framework section the observation touches (when relevant)
- Business impact (e.g.
prevented_loss,escaped_to_production, etc.)
Records live in learnings/<codename>.yaml — for this project,
blue-marlin.yaml.
Where learning leads
Learning is a feedback loop into the framework's own evolution:
- An adopter project hits a gap, validates an assumption, or adapts a rule → records a learning entry
- The entry is submitted upstream (issue / MR)
- The InfoSec Sentinel and Contribution Auditor agents review the entry (does it leak information? is it manipulative?)
- Once an observation has corroboration from multiple projects, the framework version is revised — into a rule, a new gate check, or a tier promotion
- Single-adopter evidence stays provisional until a second adopter hits the same thing
This is why even individual entries are valuable: they are raw evidence on which the framework evolves — they don't need a "solution" at submission time.
This project's adoption
| Setting | Value |
|---|---|
| Adoption layout | Layout B — framework vendored under agents-govern/ |
| Codename | blue-marlin (anonymous identifier in upstream learnings) |
| Framework version | v0.34.0 |
| Adoption started | 2026-04-26 |
| Active agents | 6 (Agents map) |
| Active gates | 2 (Gate 1 + Gate 2) |
| Human Governor | Jani Päijänen |
| LLM driver | Claude AI (via Claude Code) |
What this project has surfaced so far
The project has captured 17 learning entries in blue-marlin.yaml. Distribution:
| Category | Count | Severity | Count |
|---|---|---|---|
gap |
6 | critical |
1 |
adaptation |
6 | significant |
5 |
validation |
5 | minor |
8 |
informational |
3 |
From the framework's perspective the most valuable entries are the gap-class ones (the framework didn't cover the situation — three of these became upstream issues and one became a feature proposal), and the critical-severity entry (a single one but a meaningful demonstration):
- Iter 13: A top-waling beam was placed on top of the deck (148 mm
trip hazard at the deck edge). All 11 pytest invariants in place at
the time approved the change. The Human Governor caught it in the
Gate 2 visual review. → Iter 15 relocated the beam below the deck
and added
test_top_waling_below_deck_SAFETYas a new invariant.
Upstream proposals
| ID | Topic | Status |
|---|---|---|
| C1 | Output-level invariants (Iter 7 gap) | Submitted (issue #39) |
| C2 | Explicit visual acceptance gate (Iter 13 gap) | Submitted (issue #40) |
| C3 | Lowest-common-denominator output (Iter 9–10 gap) | Submitted (issue #41) |
| D1–D4 | Documentary batch (4 minor) | Draft ready |
| E1 | agov-render-agents-map (new framework command + prototype) |
Draft ready |
What the gates have caught
Concrete examples where Gate 2 review produced value (Gate 1 has mostly been fast-tracked in this project for small tasks):
| Iter | What the gate caught | Severity |
|---|---|---|
| 13→15 | Beam placed above deck (trip hazard) | Critical |
| 7 | X-cross brace rendered horizontal due to rotation bug | Significant |
| 7 | Lower-waling z-formula placed it ABOVE the "upper" waling | Significant |
| 6 | Pytest invariants didn't catch the visual bug | Significant |
| 9–10 | DXF $INSUNITS missing — CAD tools mis-interpreted scale | Minor |
| 14a | DXFs missing unit suffix on dimension labels | Minor |
What the study shows so far
Observation. The governance process surfaces visible incidents that would otherwise ship invisibly:
- The Iter 13 trip hazard would have shipped without the Gate 2 visual review.
- The pre-existing geometric bugs (Iter 7) would have stayed in
laituri_3d.pyindefinitely without review. - The discrepancy between the framework's own MANIFEST file and the project's learning record only surfaced when the Agents map prototype tried to consume both as if they were the same.
Caveats. This is illustrative, not statistical:
- Sample size = 1 project. Upstream evidence requires N>1 projects' corroboration.
- No formal control group. REPLICATION-BRIEF.md defines a baseline task that could serve as a comparison point if anyone runs it.
- The Human Governor (Jani) is also the person scheduling the work. "Does the governance process catch more than a careful human alone would?" remains an open question.
Deeper links
- Upstream agents-govern repo
- Agents map — agents, gates, communication channels
- Learning record source
- Upstream submissions
- Replication brief (control group)
- Disclaimer
The concept sections (What is agents-govern, Five problems) are adapted from the framework's own README and framework.md, both licensed CC-BY-SA-4.0. The remaining sections are this project's own content.