Governed Incident Response

Governed Incident Response is a GPT-4o agent that automates safety procedure lookup, severity-1 notifications, and procedure update drafting in regulated industries, with every action governed by a real-time authorization, evidence, and audit pipeline.

Boston AI Tinkerers Generative UI Global Hackathon, May 5 to 9, 2026, Solo build

Why I built it

Agentic systems in regulated industries face a credibility problem. The agent can take actions: order a notification, draft an update, query a system. But "the agent took an action" is not a defensible answer in a regulated environment. The defensible answer is: who authorized the action, what evidence supported it, what role attempted it, and is the audit trail tamper-evident.

The hackathon was a chance to extend the governed-RAG work I've been doing on Keystone AI into agentic territory. The same patterns, RBAC, evidence thresholding, hash-chained audit, fail-closed defaults, apply, but the consequence space now includes tool execution, not just text generation.

What the agent does

Three governed tools, exposed through CopilotKit's useCopilotAction hook:

  • lookup_procedure: retrieve emergency procedures (confined space entry, atmospheric testing, evacuation, decontamination) from a fixed corpus
  • queue_notification: queue a severity-1 incident notification to specific roles (supervisor, attending, regional manager)
  • draft_procedure_update: draft a proposed change to an existing procedure for human review

Each tool runs through the governance pipeline before it executes. The pipeline produces one of four outcomes:

  • SERVE: all checks pass, action executes, evidence cited
  • BLOCK: a check fails (typically RBAC), action does not execute, refusal is logged
  • ROUTE: action is outside the current role's authority but legitimate, routed to an approval queue for the appropriate role
  • REFUSE: evidence is below threshold, agent declines to answer rather than fabricate

The four outcomes are not error states. They are first-class behaviors that the right-side instrument panel surfaces in real time.

Architecture

A two-column layout. Sixty-forty.

The two-column generative UI: chat on the left, governance instrument panel on the right.

Left column: a CopilotKit chat interface. The operator asks plain-language questions. The agent calls one or more of the three governed tools as needed.

Right column: a ControlFeedback instrument panel. Every step of the governance pipeline streams into this panel as the agent executes. RBAC check, retrieval scoring, ACL filtering, evidence gating, HHEM-2.1 hallucination scoring. Each step shows PENDING, then resolves to PASS, BLOCK, or FAIL with a visible state transition.

The panel is the differentiator. Most agent demos show what the agent can do. This shows what the agent cannot do without authorization, and at which step the authorization or evidence check would block it.

Stack:

  • CopilotKit 1.56.5 (pinned, no upgrade during the build) for the agentic UI and tool wiring
  • Next.js 16 with TypeScript for the application shell
  • Tailwind for styling
  • Zustand for the event store (in-memory for the demo; in Keystone proper, this is append-only PostgreSQL with hash chaining)
  • OpenAI GPT-4o via the CopilotKit OpenAI adapter

Source: github.com/arnaldosepulveda/governed-incident-agent.

Three demo scenarios

The demo runs three queries and exposes all four governance outcomes.

Scenario 1: SERVE

Operator asks: "What atmospheric testing is required before entering a confined space?"

The pipeline runs in sequence: RBAC passes (operator is authorized to retrieve safety procedures), retrieval finds the matching procedure, ACL filter clears the document for the operator's role, evidence gate passes (high retrieval score), HHEM passes (answer is supported by the source), audit log captures the chain. The agent renders a procedure card citing the source.

This is the happy path. The point of showing it is to establish the baseline before the more interesting scenarios.

Scenario 2: SERVE + BLOCK + ROUTE in one query

Operator asks: "There's been a confined space collapse at the petroleum facility. What's the response?"

Two procedures retrieved and SERVED to the operator. The agent then attempts a severity-1 notification on the operator's behalf, because the response procedure calls for it.

The notification attempt hits the RBAC check. Operators are not authorized to send severity-1 notifications. BLOCKED.

But the underlying request is legitimate. So instead of dropping it, the system ROUTES the notification to a supervisor approval queue, where it sits until a supervisor signs off. The operator sees both outcomes in the instrument panel: the SERVE on the procedure lookup, the BLOCK on the notification attempt, and the ROUTE to the approval queue.

Three governance outcomes in one query. This is the scenario that most directly demonstrates why "governance" is more than just access control.

Scenario 3: REFUSE

Operator asks: "What are the TIER greenhouse gas reporting requirements?"

Retrieval runs. Some documents come back, but the evidence score is 0.41 against a threshold of 0.70. The system declines to answer.

Notably: the system does not hallucinate an answer to be helpful. It does not summarize what little evidence it has. It says it cannot confidently answer, surfaces the evidence score, and stops.

The fail-closed default is a deliberate choice. In a regulated environment, a confident wrong answer is worse than an explicit "I cannot answer." This scenario demonstrates that pattern in an agentic context.

A framing shift: managed response, not control

I went into the hackathon thinking of this as a closed-loop control system. RBAC is the actuator authorization check. Evidence gating is the feedback signal. The audit log is the system trace. Standard control-theory pattern applied to an AI agent.

Coming out of the hackathon, I think that framing is wrong. Or at least incomplete.

A control system has a setpoint and a feedback path that drives the system toward that setpoint. The system is dynamic. The state changes continuously. The control loop runs continuously.

What Governed Incident Response actually does is more like managed response. There is no setpoint. There is no continuous loop. Each query is a discrete event. The governance pipeline evaluates each event against a set of policies, who is asking, what evidence supports it, what action is implied, who is authorized to take it, and produces one of four outcomes (SERVE, BLOCK, ROUTE, REFUSE).

The pipeline is closer to a regulatory compliance check than a feedback controller. The work is in the policy evaluation, not in the dynamic response.

This matters for how I describe the work going forward. "Closed-loop control over AI agents" sounds great in a pitch but isn't quite accurate. "Managed response with auditable governance" is less catchy but is what's actually happening.

The architectural pattern is the same. The framing was off.

Cross-domain validation: Provana AcuteCare

The morning of the hackathon I met Sai Gopal Jarabana, who was building Provana AcuteCare, a clinical copilot for acute care medicine. Sai was working on dynamic clinical UI generation: protocol cards for sepsis bundle, stroke code, pediatric fever, with patient context driving live UI updates. He needed governance. I had a governance layer.

In six hours I adapted the governed-incident-agent scaffold (governance.ts, eventStore.ts, RBAC patterns, audit trail) to the clinical context. I also wrote the ProtocolRenderer bridge that passes governance hooks through to each card type (SepsisCard, StrokeCard, PediatricFeverCard), so a single integration point handles authorization, action gating, and audit logging across every protocol Sai's runtime generates.

Why governance matters in clinical AI specifically

Healthcare AI agents face three problems regulators and clinicians actually care about:

  • Scope of practice. A nurse cannot legally write certain orders. An AI agent that ignores role authority creates liability and patient harm. RBAC enforcement at the moment of action, not at login, is the only defensible pattern.
  • Allergy and contraindication safety. An AI suggesting penicillin to a documented penicillin allergy is a sentinel event. The governance layer must modify protocol recommendations when patient data requires it, not just authorize or deny them wholesale.
  • Auditability. Every clinical decision and override must be traceable for Joint Commission and HIPAA compliance. Append-only, tamper-evident logs are not optional.

Provana addresses all three within the six-hour build window because the governance scaffold from the OHS demo already encoded the patterns. The clinical context required different role names, different permission tables, different domain data. The architecture transferred unchanged.

Adapting the four outcomes, and adding a fifth

The OHS demo had four governance outcomes: SERVE, BLOCK, ROUTE, REFUSE. The clinical context added a fifth that didn't exist in industrial safety: MUTATE.

  • SERVE: protocol renders, all checks passed
  • BLOCK: clinician attempts action outside their scope, denied inline. A nurse clicks "Order Antibiotics," the button shows BLOCKED with the role reason visible.
  • ROUTE: a nurse escalates an action to the attending via an approval pathway. The action does not execute until the attending signs off.
  • REFUSE: not used in the Provana demo, but available; mirrors the OHS evidence-gate behavior.
  • MUTATE: patient data modifies the protocol recommendation automatically. A documented penicillin allergy causes the sepsis card to cross out Piperacillin-Tazobactam and substitute Meropenem, with cross-reactivity reasoning visible. The governance layer is not just gating actions; it is reshaping the action set based on patient-specific constraints.

MUTATE is interesting because it changes what governance can do. In the OHS context, governance was binary on each action: allowed or not. In the clinical context, governance can modify the action itself before authorization runs. The architectural pattern generalizes; the outcome space expands when the domain demands it.

Provana AcuteCare: the governance layer adapted to clinical context, with allergy-driven protocol mutation (MUTATE outcome).

What this validated

Sai brought the dynamic UI generation, the clinical protocol detection, the card components, the allergy mutation engine, and the overall product concept. I brought the governance layer.

The interesting result: the governance pattern was domain-portable with minimal adaptation. Industrial safety and acute care medicine sound dissimilar. From a governance-pattern perspective they are not. Both need authorization checks before actions, evidence-backed decisions, and tamper-evident logs. The RBAC model, action gating logic, and audit trail required zero structural changes to move between domains. Only the role names, permission tables, and domain data changed.

Sai's repo: github.com/saigopaljarabana/provana-acutecare. His work on the dynamic clinical UI is worth looking at independently of the governance layer I contributed.

What's next

Two threads pull forward from this build:

KDAT-002. Extending the Keystone evaluation harness to cover governed agentic actions. The current eval (KDAT-001B) measures retrieval quality, ACL enforcement, and fail-closed behavior on a fixed RAG corpus. KDAT-002 adds metrics for tool authorization accuracy, action audit completeness, and HITL approval gate behavior across multi-step reasoning.

Generalizing the pattern. The cross-domain validation at the hackathon was a 6-hour proof point. The longer-term work is to make the governance layer a reusable module, not a per-project rewrite. The OHS demo and Provana share roughly 80% of the governance code today. The target is closer to 95%, with domain-specific configuration handling the rest.

Code and credits

Sai Gopal Jarabana built the clinical UI in Provana. Six hours from introduction to working demo.