AIR Research: Fake AI Agent Skill Bypasses Scanners, Hits 26k Users
Just reviewed the report from security firm AIR regarding their experiment with a fake AI agent skill. For those who missed it, they managed to push a malicious-looking skill through a major marketplace and Instagram ads, reportedly reaching roughly 26,000 agents—including some on corporate accounts.
While the payload was harmless (just collecting an email address), the implications are terrifying. Every single skill security scanner they tested marked it as safe. This highlights a critical gap where signature-based scanning fails to detect logic abuse in agent workflows. We're essentially seeing a rehash of the supply chain problems we faced with npm and PyPI, but now moving into the AI execution layer.
Since there are no CVEs assigned yet for this specific bypass, we have to rely on behavioral monitoring. Static analysis of the manifest is clearly insufficient.
I've started auditing our internal agent definitions. If you are using agents that load external skills, you should be checking for broad permission scopes in the skill manifests. Here is a basic Python snippet to audit local skill YAML definitions for risky permissions like 'full_access' or network exfiltration capabilities:
import yaml
def audit_skill_permissions(manifest_path):
with open(manifest_path, 'r') as f:
config = yaml.safe_load(f)
# Define a denylist of high-risk actions
risky_actions = ['http.post', 'filesystem.write_all', 'email.send_all']
findings = []
for tool in config.get('tools', []):
if tool['api'] in risky_actions:
findings.append(f"Risky Tool Found: {tool['name']} ({tool['api']})")
return findings
On the SOC side, we are treating AI agent traffic as 'Shadow IT'. We've set up KQL queries to alert on agents communicating with unknown endpoints outside our approved internal tool registry.
How are your teams handling the vetting of third-party agent skills? Are you restricting usage to strictly private repositories, or do you have a vetting process in place?
This is exactly why we blocked access to public skill repositories on our corporate network last quarter. The risk of 'prompt injection' via a compromised skill is too high. We force all devs to use a curated internal registry that goes through manual code review.
Good post. The Instagram ad vector is interesting—it shows that users are treating agent skills like mobile apps, blindly installing them for convenience. We implemented a strict 'allow-list' policy for agent API calls. If a skill tries to access an API not on the list, the runtime terminates the session immediately.
We've seen similar behavior in our red team exercises. It's trivial to hide data exfiltration inside 'natural language' responses that scanners don't parse. We moved to using a 'sandboxed' execution environment for all unvetted skills that intercepts outbound network requests.
Verified Access Required
To maintain the integrity of our intelligence feeds, only verified partners and security professionals can post replies.
Request Access