Skip to content

Research Note

Red teaming RAG systems for poisoned context and over-disclosure

Testing approaches for retrieval abuse, source trust, sensitive context leakage, and model response drift.

Back to resources
Research Note10 min readSecurity Engineering, AI Product Teams

RAG Expands the Attack Surface

Retrieval-augmented generation connects models to enterprise knowledge, but retrieved content can also carry malicious instructions, outdated policy, or over-permissive context. Red teaming must test the retrieval path as carefully as the model response.

Prompt Attacks in Documents

Indirect prompt injection often hides inside tickets, webpages, PDFs, emails, and knowledge articles. A strong test program plants adversarial instructions in realistic sources and verifies whether the application follows retrieval content over system policy.

Over-Disclosure Tests

Teams should test whether the assistant reveals restricted source text, cross-tenant content, secrets, or confidential summaries. Access control drift and weak source filtering can turn a helpful assistant into a data exposure channel.

Remediation Signals

Useful findings identify the source, retrieval path, user role, model response, exploitability, and business impact. Retesting should confirm that source filtering, context inspection, and response controls actually reduced risk.

Request a Demo

Secure the AI your enterprise runs on.

See how Kavalan helps security and AI teams govern workforce AI, protect agentic systems, and continuously validate GenAI risk.