A Governed AI Workforce
Purpose-built AI agents trained to perform IT risk and audit functions — from control assessment to compliance scanning to evidence validation. A three-tier architecture provides the infrastructure: local models for private execution, cloud for scale, and a frontier model managing the workforce. Every agent is measured, monitored, and continuously improved.
Architecture
Three tiers — local private, cloud private, and public — each with a distinct role and privacy boundary.
Agent Registry
Each agent is trained to perform a specific IT risk or audit function. Like employees, they are hired one at a time, measured against acceptance criteria, supervised through observability, and required to earn trust before reaching production.
CRI Coverage Classification
Tier: L1
Executive Observation Writing
Tier: L1
Control Design Assessment
Tier: L1/L2
Control-to-RG Mapping
Tier: L1/L2
MITRE Technique-to-CRI Mapping
Tier: L1/L2
Gap Assessment Narrative
Tier: L1/L2
Evidence Validation
Tier: L1/L2
Finding Generation
Tier: L1/L2
CIS Benchmark Scan & Report
Tier: L1
Workpaper Generation
Tier: L1/L2
KPI & KRI Definition and Implementation
Tier: L1/L2
Data Analytics & Executive Dashboards
Tier: L1/L2
Governance Principles
The same questions an auditor asks about IT controls apply to AI capabilities: Is it designed appropriately? Is it operating effectively? How do we know?
Every Capability Earns Trust
New AI capabilities follow a probation model: Experimental, Pilot, Production, Review Required, Retired. 50 successful executions before production status.
No Observability, No Production
Every capability requires an observability contract before reaching production — job ID, acceptance metric, escalation threshold, and Splunk dashboard.
Privacy by Architecture
Three tiers with distinct data boundaries. L1 stays on-device. L2 stays within a private cloud account. L3 handles strategy only. Privacy is structural, not policy.
Measure Like an Auditor
Don't ask 'does this prompt work?' Ask 'what evidence demonstrates this capability is reliable enough to trust with production work?'
Observability Stack
Every inference event is logged, sanitized, and shipped to Splunk. Operational telemetry only — content is hashed, not stored.
Splunk
Centralized log ingestion, search, and alerting for all inference activity
JSONL Pipeline
Structured event logging with sanitization — only operational telemetry is stored
CloudTrail
AWS audit trail for all Bedrock API calls
Canary Tests
Automated leak detection — planted sensitive strings must never appear in logs
Toolset
The platforms and services powering the governance lab — from risk assessment to observability to automated compliance.
Amazon Web Services (AWS)
Cloud infrastructure, Bedrock AI, CloudTrail audit
Splunk
SIEM, log ingestion, AI inference observability
ServiceNow
IT service management — incident, change, asset
Microsoft Power BI
Executive dashboards and audit analytics
Microsoft Lists
Control registers, risk logs, finding trackers
Power Automate
Workflow automation — findings routing, alerts
Google Cloud
Secondary cloud platform and CLI tooling
CIS-CAT
Automated CIS Benchmark compliance scanning
Ollama
Local inference engine — private model execution
NVIDIA NIM
121 cloud models — free tier research inference
Claude Code
L3 operations manager — strategy and orchestration
GitHub
Source control, CI/CD, portfolio hosting