Python Forensics
Python Forensics
Source: https://en.wikipedia.org/wiki/Daubert_standard
Daubert Standard and Python
In 2003, Brian Carrier [Carrier] published a paper that examined rules of evidence
standards including Daubert, and compared and contrasted the open source and
closed source forensic tools. One of his key conclusions was, “Using the guidelines of
the Daubert tests, we have shown that open source tools may more clearly and
comprehensively meet the guideline requirements than would closed source tools.”
The results are not automatic of course, just because the source is open. Rather,
specific steps must be followed regarding design, development, and validation.
Can the program or algorithm be explained? This explanation should be explained in words,
not only in code.
Has enough information been provided such that thorough tests can be developed to test the
program?
Have error rates been calculated and validated independently?
Has the program been studied and peer reviewed?
Has the program been generally accepted by the community?
Installation(Operating System)
Right version of Python
Graphical vs Shell
Built-in Functions and Modules
dir(__builtins__)
Source: https://stackoverflow.com/questions/45528559/retrieve-all-builtin-functions
Forensic Indexing and Searching
You can use simple file search and index() function
Whoosh: Forensic Indexing and Searching
Whoosh was created and is maintained by Matt Chaput. It was originally
created for use in the online help system of Side Effects Software’s 3D
animation software Houdini.
Pythonic API.
Pure-Python.
Fielded indexing and search.
Fast indexing and retrieval
Pluggable scoring algorithm (including BM25F), text analysis, storage, posting
format, etc.
Powerful query language.
Pure Python spell-checker
Whoosh: Forensic Indexing and Searching
Source: https://whoosh.readthedocs.io/en/latest/quickstart.html
Hash Functions for Forensics
Source: https://www.journaldev.com/16035/python-hashlib
Hash Functions for forensics
Forensic Evidence Extraction
Pillow is the friendly PIL fork by Alex Clark and Contributors. PIL is the Python
Imaging Library by Fredrik Lundh and Contributors.
The Python Imaging Library adds image processing capabilities to your
Python interpreter.
This library provides extensive file format support, an efficient internal
representation, and fairly powerful image processing capabilities.
The core image library is designed for fast access to data stored in a few
basic pixel formats. It should provide a solid foundation for a general image
processing tool.
Source: https://pillow.readthedocs.io/en/stable/
Forensic Evidence Extraction
Installation
$ python3 -m pip install Pillow pyscreenshot
Source: https://github.com/ponty/pyscreenshot
Pyscreenshot – Full Screen
Pyscreenshot – Part of Screen
Pyscreenshot - Performance
Pyscreenshot – Performance
Pyscreenshot – Force backend
Metadata Forensics
Mutagen is a Python module to handle audio metadata.
It supports ASF, FLAC, MP4, Monkey’s Audio, MP3, Musepack, Ogg Opus, Ogg FLAC,
Ogg Speex, Ogg Theora, Ogg Vorbis, True Audio, WavPack, OptimFROG, and AIFF
audio files. All versions of ID3v2 are supported, and all standard ID3v2.4 frames are
parsed. It can read Xing headers to accurately calculate the bitrate and length of
MP3s. ID3 and APEv2 tags can be edited regardless of audio format. It can also
manipulate Ogg streams on an individual packet/page level.
Mutagen works with Python 3.6+ (CPython and PyPy) on Linux, Windows and macOS,
and has no dependencies outside the Python standard library.
Source: https://mutagen.readthedocs.io/en/latest/
Metadata Forensics
Installation: python3 -m pip install mutagen
The File functions takes any audio file, guesses its type and returns a FileType instance or None
Metadata Forensics
The File functions takes any audio file, The following example gets the length
guesses its type and returns a FileType and bitrate of an MP3 file.
instance or None
Metadata Forensics
PyPDF2 - Pure-Python library built as a PDF toolkit. It is capable of:
extracting document information (title, author, …)
splitting documents page by page
merging documents page by page
cropping pages
merging multiple pages into a single page
encrypting and decrypting PDF files
It is a useful tool for websites that manage or manipulate PDFs.
Source: https://pypi.org/project/PyPDF2/
Metadata Forensics
pefile is a multi-platform Python module to parse and work with Portable Executable
(PE) files. Most of the information contained in the PE file headers is accessible, as
well as all the sections' details and data.
The structures defined in the Windows header files will be accessible as attributes in
the PE instance. The naming of fields/attributes will try to adhere to the naming
scheme in those headers. Only shortcuts added for convenience will depart from that
convention.
pefile requires some basic understanding of the layout of a PE file — with it, it's
possible to explore nearly every single feature of the PE file format.
Source: https://github.com/erocarrera/pefile
Metadata Forensics
Some of the tasks that pefile makes possible are:
Inspecting headers
Analyzing of sections' data
Retrieving embedded data
Reading strings from the resources
Warnings for suspicious and malformed values
Basic butchering of PEs, like writing to some fields and other parts of the PE
This functionality won't rearrange PE file structures to make room for new fields, so use
it with care.
Overwriting fields should mostly be safe.
Packer detection with PEiD’s signatures
PEiD signature generation
Using Natural Language Tools
Examine the text for evidence using NLP concepts
NLTK, spaCy, Textacy
Stanza, Polyglot
inltk, indic-nlp
Summary
It is very important to follow the standard procedure laid by law
enforcement agencies during investigation process.
There are many open source as well as commercial tools for
digital forensics. Learning to develop your own tool is
advantageous.
Many tools written in Python are pure Python implementations.
And most importantly Python and Open Source tools comply
with Daubert Standard.
Widescreen Test Pattern (16:9)
4x3
16x9