Poisoned Models: Digging into SGLang CVE-2026-5760 and the GGUF RCE Vector
Just saw the disclosure on CVE-2026-5760 regarding SGLang. With a CVSS score of 9.8, this is about as critical as it gets for anyone running inference workloads. The gist of it is that a maliciously crafted GGUF model file can trigger command injection during the loading process, leading to full RCE.
Since SGLang is often used to serve open-source LLMs, the attack surface here is pretty wide if you're pulling models from unverified repositories. The vulnerability stems from insufficient sanitization of model metadata or filenames, which allows an attacker to break out of the loading context.
For those running SGLang in production, I’d recommend immediately locking down model permissions and validating inputs. If you’re looking to detect potential exploitation in your environment, watch for the SGLang worker process spawning unexpected shells. Here is a quick check you can run on your Linux servers:
ps aux | grep '[s]glang' | awk '{print $2}' | xargs -I {} pstree -p {} | grep -E '(bash|sh|curl|wget|python)'
If you see shells or curl/wget hanging off the SGLang PID, you might already be compromised.
Treating model files like untrusted executables seems to be the new norm. How is everyone else handling model verification in their CI/CD pipelines right now? Sandboxing everything or actually scanning the GGUF structures?
Solid write-up. We've moved all our inference endpoints to gVisor-based sandboxes specifically because of these supply chain risks. It adds a bit of latency, but the containment is worth it.
Also, I highly recommend adding a YARA rule to your scanning pipeline to catch known malicious GGUF headers before they even hit the SGLang runtime.
From a SOC perspective, this is a nightmare to catch with traditional signatures because the 'malware' looks like a legitimate AI model file.
We are focusing on behavioral detection—specifically looking for Python child processes of the SGLang parent that make network connections outbound. If SGLang is serving text, it shouldn't be phoning home to a C2 server.
I tested this in a lab environment earlier today. It's scary how smooth the execution is.
If you can't patch immediately, consider dropping the capability to execute from the SGLang worker container. This won't stop the injection, but it will prevent the RCE from gaining root or accessing sensitive system binaries.
getcap -v /usr/local/bin/python3
Verified Access Required
To maintain the integrity of our intelligence feeds, only verified partners and security professionals can post replies.
Request Access