FAQ
What is prompt injection?
Prompt injection is a security vulnerability where an attacker manipulates an AI system by injecting malicious instructions into user inputs, causing the system to ignore its original instructions.
Is my data safe?
Yes. AgentShieldScan operates 100% locally in your browser. No data is transmitted to any external server.
What risks does this tool detect?
The tool detects Prompt Injection, Privilege Escalation attempts, Data Exfiltration patterns, and System Manipulation commands.
How accurate is the scan?
Accuracy depends on scanning intensity. Paranoid mode provides maximum coverage but may produce more false positives.
What is agent security?
Agent security focuses on protecting AI agents from unauthorized access, data leaks, and malicious manipulation of their tool-calling capabilities.
Agent Security Ontology
Indirect Prompt Injection
A type of attack where malicious instructions are embedded within seemingly harmless user inputs, causing the AI to execute unintended actions or reveal sensitive information.
Confused Deputy Problem
A security vulnerability where an agent with limited permissions is tricked into performing actions on behalf of a malicious actor, effectively bypassing access controls.
Data Exfiltration
The unauthorized transfer of sensitive data from a secure system to an external location, often through covert channels like URLs, APIs, or encoded data.
Jailbreak Attack
An attempt to bypass an AI system's safety constraints and content policies by using specially crafted prompts that encourage the model to ignore its instructions.
Local Execution Guarantee
AgentShieldScan performs all security scanning operations locally in the browser using JavaScript regex patterns. No network requests are made, and no user data is transmitted to external servers. The scanning engine operates entirely within the client environment, ensuring complete privacy and data sovereignty.
Regex-Based Security Engine
A pattern-matching system that uses regular expressions to identify potential security threats in text inputs. The engine compares input against a database of known attack patterns and generates risk scores based on match severity and frequency.