Services

Deployability Audits & Evaluation

Independent, enforcement-grounded evidence that your agentic AI system fails safely. ConstantX evaluates your candidate model under enforcement constraints configured to reflect your deployment environment. Every audit produces cryptographically verifiable artifacts with statistical confidence intervals.

Engagement Tiers

Audit

Point-in-time deployability assessment

Structured threat model against your repository — trust boundaries, attack surfaces, threat IDs specific to your deployment, verified against the OWASP Agentic AI taxonomy (T1–T17)
Adversarial scenario suite derived from your High/Critical threats, covering all 10 OWASP ASI risk codes plus positive-path task completion
Full traceability: attacker technique → threat → OWASP ASI risk category → scenario → verdict
Decision Coverage report with Wilson 95% CI
Failure envelope analysis with containment mechanism classification
NIST AI RMF / OWASP ASI compliance mapping
Cryptographic evidence bundle with full trace artifacts
Delivered in 1–3 weeks

Annual Evaluation

Continuous compliance for production deployments

Everything in Audit
Re-evaluation on model version changes with drift detection
Behavioral change tracking across evaluations
Updated reports for regulatory submissions
Priority engagement scheduling

How It Works

Threat Model. We run a structured threat model against your repository — mapping trust boundaries, attack surfaces, and threat IDs specific to your deployment. Every threat model walks the OWASP Agentic AI taxonomy (17 attacker technique classes, 70+ documented attack scenarios) for completeness. The output drives scenario selection and generates adversarial scenarios directly from your High/Critical threats.
Configuration. Lock suite version, enforcement policy, and tool schemas. Map threat IDs from your threat model to scenario selection.
Execution. Run the full scenario suite under RuntimeX enforcement. Single-pass, no retries, per-action gate enforcement.
Reduction. Deterministic verdict assignment: every run classified as valid_commit, bounded_failure, or undefined_behavior.
Reporting. Decision Coverage report with confidence intervals, failure envelope analysis, and evidence bundle.
Delivery. Cryptographically signed artifact package with full audit trail.

Evidence Standard

Every ConstantX audit produces:

Terminal Coverage metric with 95% Wilson score confidence intervals
Per-category breakdown across threat-model-derived scenario categories with containment mechanism attribution
Threat traceability: every scenario traces from OWASP attacker technique through threat model entry to empirical verdict
Cryptographic hashes binding dated model snapshot, enforcement policy, and prompt configuration
Deterministic validity window: report expires when the pinned snapshot is deprecated

Reports map to EU AI Act, NIST AI RMF, and OWASP ASI 2026 compliance frameworks — whichever your compliance team requires. Threat models are verified against all 17 OWASP attacker technique classes to ensure coverage completeness. The evidence is deployment-specific: your exact model version, tool configuration, and policy set, not a summary of general platform capabilities.

Get in touch

Tell us what your agent does and what you're afraid of. Every engagement starts with a structured threat model against your repository.

[email protected]