AI Governance Lab

A Governed AI Workforce

Purpose-built AI agents trained to perform IT risk and audit functions — from control assessment to compliance scanning to evidence validation. A three-tier architecture provides the infrastructure: local models for private execution, cloud for scale, and a frontier model managing the workforce. Every agent is measured, monitored, and continuously improved.

Architecture

Three tiers — local private, cloud private, and public — each with a distinct role and privacy boundary.

L1 — Local Private

Mac Studio (M4 Max, 36GB)

On-device execution engine. Nothing leaves the machine. Handles assessments, classification, document extraction, and compliance scanning.

Qwen2.5 14B (text)Qwen2.5-VL 7B (vision)
Fully offline$0
L2 — Cloud Private

AWS Bedrock + NVIDIA NIM

Private cloud inference within a controlled AWS boundary. Managed services with full audit trails. No infrastructure to maintain.

Claude Sonnet 4.6 (Bedrock)Llama 3.3 70B (NIM)121 models via NIM
Private account boundary~$15-50/mo
L3 — Public

Claude Code (VS Code)

Operations manager — strategy, orchestration, and workforce optimization. Designs work packages, measures outcomes, and continuously improves the system.

Claude Opus / Sonnet
Third-party APISubscription

Agent Registry

Each agent is trained to perform a specific IT risk or audit function. Like employees, they are hired one at a time, measured against acceptance criteria, supervised through observability, and required to earn trust before reaching production.

CRI Coverage Classification

Tier: L1

Pilot

Executive Observation Writing

Tier: L1

Pilot

Control Design Assessment

Tier: L1/L2

Experimental

Control-to-RG Mapping

Tier: L1/L2

Experimental

MITRE Technique-to-CRI Mapping

Tier: L1/L2

Experimental

Gap Assessment Narrative

Tier: L1/L2

Experimental

Evidence Validation

Tier: L1/L2

Experimental

Finding Generation

Tier: L1/L2

Experimental

CIS Benchmark Scan & Report

Tier: L1

Experimental

Workpaper Generation

Tier: L1/L2

Experimental

KPI & KRI Definition and Implementation

Tier: L1/L2

Experimental

Data Analytics & Executive Dashboards

Tier: L1/L2

Experimental

Governance Principles

The same questions an auditor asks about IT controls apply to AI capabilities: Is it designed appropriately? Is it operating effectively? How do we know?

01

Every Capability Earns Trust

New AI capabilities follow a probation model: Experimental, Pilot, Production, Review Required, Retired. 50 successful executions before production status.

02

No Observability, No Production

Every capability requires an observability contract before reaching production — job ID, acceptance metric, escalation threshold, and Splunk dashboard.

03

Privacy by Architecture

Three tiers with distinct data boundaries. L1 stays on-device. L2 stays within a private cloud account. L3 handles strategy only. Privacy is structural, not policy.

04

Measure Like an Auditor

Don't ask 'does this prompt work?' Ask 'what evidence demonstrates this capability is reliable enough to trust with production work?'

Observability Stack

Every inference event is logged, sanitized, and shipped to Splunk. Operational telemetry only — content is hashed, not stored.

Splunk

Centralized log ingestion, search, and alerting for all inference activity

JSONL Pipeline

Structured event logging with sanitization — only operational telemetry is stored

CloudTrail

AWS audit trail for all Bedrock API calls

Canary Tests

Automated leak detection — planted sensitive strings must never appear in logs

Toolset

The platforms and services powering the governance lab — from risk assessment to observability to automated compliance.

Amazon Web Services (AWS)

Cloud infrastructure, Bedrock AI, CloudTrail audit

Splunk

SIEM, log ingestion, AI inference observability

ServiceNow

IT service management — incident, change, asset

Microsoft Power BI

Executive dashboards and audit analytics

Microsoft Lists

Control registers, risk logs, finding trackers

Power Automate

Workflow automation — findings routing, alerts

Google Cloud

Secondary cloud platform and CLI tooling

CIS-CAT

Automated CIS Benchmark compliance scanning

Ollama

Local inference engine — private model execution

NVIDIA NIM

121 cloud models — free tier research inference

Claude Code

L3 operations manager — strategy and orchestration

GitHub

Source control, CI/CD, portfolio hosting