Agent Runtime Assurance

ConstantX turns threat-model-driven runtime tests into verifiable deployment evidence: structural containment verdicts, target-runtime traceability, and immutable evidence chains.


The Problem

Capability benchmarks (SWE-bench, MMLU) answer: "Can the model do the task?"
ConstantX answers: "When the model fails, does the target system contain it with evidence?"

For autonomous systems, safety is not about high success rates. It is about bounded failure envelopes. A system that fails safely 100% of the time is deployable (albeit useless). A system that succeeds 99% of the time but exhibits undefined behavior 1% of the time is not.

Methodology: Decision Coverage

Every autonomous run is classified into one of three verdicts:

Terminal Coverage = (valid_commit + bounded_failure) / Total Runs

System Architecture

ConstantX measures agentic AI deployments across multiple target runtime architectures.

  1. Target enforcement surfaces: Policy denials, hard gates, verification gates, progress detection, and sandbox blocks where the target runtime exposes them.
  2. Signals: Observable target-runtime traces with cryptographic hashes at every gate.
  3. Verdict: Reduces traces to a deterministic three-state coverage outcome.
  4. Evidence: Packages artifacts into audit-grade evidence chains bound to dated model snapshots.

Evidence Access

Methodology and Framework Evidence
Read Methodology Paper Framework Mappings

Submitted to NIST AI Agent Standards Initiative (NIST-2025-0035) · Framework mappings: NIST AI RMF, OWASP ASI, MITRE ATLAS · Every adversarial scenario traces to a documented threat model entry