Introducing Agent Harness Testing in Cisco AI Defense

At the moment, we’re excited to introduce Agent Validation as a brand new analysis functionality in AI Protection: Explorer Version, the free self-service model of Cisco AI Protection, that’s constructed particularly for agentic AI techniques. Agent Validation builds on the agentic safety enhancements to Cisco AI Protection introduced at Cisco Reside, which launched adaptive purple teaming, Coverage Studio guardrails, and provide chain discovery for brokers. Agent Validation joins the prevailing suite of purple teaming options, extending Explorer Version’s protection to the surfaces which might be distinctive to agent harnesses: device routes, oblique content material channels, and chronic state throughout classes.

Agent Validation is the primary functionality in what is going to change into a broader portfolio of agent harness testing in Cisco AI Protection. We are going to proceed increasing protection as new agent patterns, frameworks, and assault courses emerge within the menace panorama.

Why Brokers Want Their Personal Purple Teaming

Chat-based purple teaming is important for evaluating how a mannequin handles adversarial prompts, jailbreaks, and multi-turn manipulation. It assessments the conversational floor completely, as a result of it’s how most customers work together with most fashions. When a mannequin is wrapped in an agent harness, the scaffolding of instruments, reminiscence, retrieval, and orchestration logic that turns a standalone mannequin into an agent, new assault surfaces seem {that a} conversational evaluator was by no means designed to observe or exploit.

Brokers learn assist tickets, fetch documentation, set up abilities, and write to recordsdata. They could name instruments with arguments the consumer by no means typed or run multi-step workflows that span throughout a number of classes. An attacker who understands agent harnesses could deal with plant directions in content material the agent will retrieve, form device arguments in methods the consumer by no means typed, or coerce the agent into modifying persistent state that survives the present session.

A conversational analysis is not going to observe any of this. The chat transcript seems to be clear. In the meantime, the precise exploit exists exterior the chat interplay itself.

We constructed Agent Validation to check the surfaces that matter for agentic techniques:

Device routes: what the agent does when its personal legit instruments are invoked with malicious arguments
Oblique channels: directions hidden in retrieved paperwork, device outputs, assist tickets, and different content material the agent treats as knowledge
Persistent state: modifications to coverage recordsdata, workflow definitions, approval state, and put in capabilities that survive previous the present session

These threats map again to the Cisco AI Safety and Security Framework taxonomy, masking attacker targets like OB-001 Objective Hijacking, OB-007 Sabotage / Integrity Degradation, and OB-009 Provide Chain Compromise, alongside agent-specific methods like oblique immediate injection, device parameter abuse, and untrusted ability set up. The framework provides us a shared vocabulary for what we’re testing and why it issues.

What Makes Our Method Completely different

Each agent deployment has completely different instruments, content material sources, and coverage artifacts; the assault floor is formed by what’s wired into the harness itself. Agent Validation runs an autonomous attacker that performs dwell reconnaissance towards your particular agent, builds a structured profile of the assault floor, and adapts if preliminary assaults have been unsuccessful.

A troublesome downside in agent purple teaming is figuring out whether or not an assault truly succeeded. If the agent says “I put in the ability” or “I fetched that URL,” that’s a declare, not proof. Agent Validation solves this with a verification strategy that produces impartial floor fact by correlating the agent’s response with what the framework truly noticed and with out-of-band telemetry the agent has no motive to deal with as important. A discovering is barely marked confirmed when these impartial alerts agree.

The Agent Validation UX is three simple steps: join an agentic goal, decide Agent Validation because the validation kind, and click on Run. No goal picker, price range slider, or purpose textual content field. Determine 1 reveals this intimately.

Determine 1. Beginning an Agent Validation Run

Each run executes a pre-defined protection matrix curated by Cisco’s AI Menace Intelligence & Safety Analysis crew—the identical crew that maintains the Cisco AI Safety and Security Framework. The targets cowl oblique immediate injection, system-prompt integrity, device argument abuse, exfiltration, persistence and coverage mutation, functionality chaining, untrusted code paths, and sensitive-data solicitation.

What the Report Delivers

Determine 2. Protection matrix and overview seen after run completion

Each Agent Validation run produces a report organized round what a safety chief must act on:

Protection transparency: targets whole versus targets exercised, so clients can see actually what was executed for any given run (Determine 2)
Findings sorted by severity: every with the originating try, the agent’s response, the device calls noticed, the canary sign if any, the benign-control replay end result, and a remediation word (Determine 3)
Found, attacked, and skipped instruments: what reconnaissance enumerated, what the attacker exercised, and what it skipped and why
A full proof path: the immediate, the response, the baseline habits on a impartial floor, the management replay, and the generated “malicious” artifact

Determine 3. Findings overview of an Agent Validation run

Trying Forward

As agent frameworks, device ecosystems, and ability codecs evolve, the assault surfaces will evolve with them. The menace panorama will drive what we construct subsequent: new targets, new attacker techniques, and broader protection as agent patterns shift in actual deployments.

To see Agent Validation in motion, go to Cisco AI Protection: Explorer Version at the moment.

Disclaimer: Agent Validation analysis outcomes mirror agent habits towards the described methodology on the time of testing and don’t represent an endorsement, certification, or assure that any agent is secure, safe, or match for a selected use case. Prospects are chargeable for conducting their very own assessments and for layering applicable runtime protections on high of validation outcomes. Cisco AI Protection: Explorer Version is supplied as-is with out warranties of any variety.

Source link