0% found this document useful (0 votes)
26 views16 pages

DIFO2023 Lab1

Uploaded by

linnammouri01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views16 pages

DIFO2023 Lab1

Uploaded by

linnammouri01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DIFO 2023 Lab1 Instructions

Johannes Olegård
version: 2023-10-20 20:00

Introduction
This lab aims to introduce you to some forensic techniques and try to apply them. We mainly
focus on Windows artifacts, but if you get the idea, you should be able to extrapolate to other
systems.

What you will need


• Linux (e.g. Kali Linux VirtualBox VM https://www.kali.org/get-kali/#kali-virtual-machines)
• Windows 10 (e.g. brainvm.zip vm from the labfiles)
• VirtualBox with VirtualBox Extension Pack https://www.virtualbox.org/wiki/Downloads
• lab files https://extftp.cs2lab.dsv.su.se/DIFO/2023/lab1

You will need a Linux-like system to run various programs to do this assignment. Kali Linux
is recommended and is already installed directly onto computers in the lab room, but you can use
your own computer if you like.
If you run macOS, it might work without a Linux VM, but you may need to substitute some
programs/commands for mac equivalents (so a Linux VM is probably easier). Later in the as-
signment, you will use a VirtualBox-based Windows 10 VM (that we provide) to do experiments,
so you will still need VirtualBox. Note that in VirtualBox you add VBOX-files, but you use
import appliance on OVA-files.
There are no harmful files in this assignment (such as malware), so it is safe to analyze it on
your own computer.

Passwords / Credentials
extftp.cs2lab.dsv.su.se cs2lab:dsvcs2
labcomputers cs2lab:dsvcs2
kali VM kali:kali
windows VM cs2lab:dsvcs2

Handin
Hand in a single PDF in iLearn consisting of:
1. A frontpage that clearly states the names and email addresses of each group members, and
the name of the assignment the handin is for (i.e. include the text “DIFO 2023 lab1”).
2. A list of answers to all the “Qn:” questions (there is a total of 59 questions). There is no
need to explain how you got the answers unless explicitly asked.
If you have issues with a group member not participating, please state so on the front page. We
will contact all group members by email to hear both sides of the conflict. Usually, the result is
that we kick the non-contributing member out of the group (so they will have to join a new group
and submit lab1 again), and only the participating members (the ones listed on the assignment)
get a grade for the assignment. See also the DSV code of honor1 .
1 https://www.su.se/department-of-computer-and-systems-sciences/education/during-your-studies/code-of-

honour-at-dsv-1.548067

1
Some tips, tricks and quality of life suggestions
Keyboard layout. The kali VM might not realize that you use a Swedish keyboard. You can
tell it that by clicking on the button in the top left corner of the screen (the kali logo) and typing
keyboard and clicking on the Keyboard app. Next, go to the layout tab and change the
language.
RTFM. Whenever you are asked to run a command with certain options, look up what the
command (e.g. curl ) does and what each option does (e.g. -L ). This will leave you a lot less
confused and a command used in one assignment might also be useful in another assignment (e.g.
sort , uniq -c and less ). ChatGPT can help but it is also good to know how to quickly
search a man page (since chatGPT does not know everything).
Parallelization. If you want your commands to faster, and your linux machine has multiple
CPU cores, then you can try parallel (GNU parallel). For example: ls -1 *.bin | parallel ./a.sh
where the file a.sh contains:

#!/bin/bash

# NOTE: remember to: chmod u+x a.sh

if (( $# != 1 )) ; then
echo "ERROR: wrong num args" 1>&2;
exit 1
fi

if grep heffaklump "$1" >/dev/null ; then


echo "$1"
fi

ChatGPT, Aids, etc.


It is totally fine to use chatGPT and any aids you can find to generate code. However, please try
to understand what it does. For example, there are a few questions that asks you to explain what
a command line does and we expect you to understand what you have written here (so don’t just
chatGPT the whole thing).

2
Contents
1 Part 1—Dumpster Diving (trash.zip) 4
1.1 File type classification using a tool . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 File type classification using a hex editor . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Text search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Visualize the difference between two files . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Filtering with NSRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Fuzzy file comparison using ssdeep . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Photo Metadata extraction using Exiftool . . . . . . . . . . . . . . . . . . . . . . . 7
1.9 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.10 Finding encrypted files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.11 Password cracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.12 Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Part 2—Working with disks (brain.raw, brainram.dmp) 10


2.1 Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Viewing the Partition tables using mmls . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Viewing files using sleuthkit cli tools . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Carving MFT entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Carving deleted file contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Resident files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7 Directory index slack space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.8 Windows RAM artifacts using Volatility . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Part 3—Autopsy: Windows artifacts (brain.raw) 13


3.1 Running Autopsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Installed programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Recycle bin—actual dumpster diving . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Zone.identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Windows registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.6 Windows event log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.7 Prefetch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.8 (Optional) Eric Zimmerman’s tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.9 (Optional) Bulk Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.10 (Optional) Automatic timelining with plaso . . . . . . . . . . . . . . . . . . . . . . 14

4 Part 4—Forensic Experiments (brainvm.zip) 14


4.1 Alternative A: Virtualbox logical Acquisition . . . . . . . . . . . . . . . . . . . . . 15
4.2 Alternative B: Virtualbox physical Acquisition . . . . . . . . . . . . . . . . . . . . 15
4.3 Browser artifacts: Google chrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3
1 Part 1—Dumpster Diving (trash.zip)
For Part 1 all tasks are about dataset trash.zip , or “the dataset” for short. The dataset consists
of almost 4000 files named 1.bin, 2.bin, and so on. Your job is to analyze these files from various
perspectives, using various command line tools. It is useful for forensic analysis to think of a hard
drive as a dataset of files (or file-like things) in this way.

$ curl https://extftp.cs2lab.dsv.su.se/DIFO/2023/lab1/trash.zip
$ unzip -d trash/ trash.zip
$ ls trash

1.1 File type classification using a tool


Just because a file has a name that ends with .txt does not mean that it is a text file (its
just a name). There are tools for identifying the real type (or “format”) of file. One such tool is
confusingly named file :

$ file *.bin

Combine file with tools like sed , grep , sort and uniq to answer these question:
Q1: How many photos/images are there in the dataset?
Q2: What does the file -command say when it does not recognize a binary file format?
Q3: How many files did the file -command not recognize?
Q4: Write a command line that answers the above question (i.e. a line of text that can be
pasted into the kali terminal) .
Q5: In at most 200 characters, explain briefly how that command line works.

1.2 File type classification using a hex editor


A “hex editor” is a tool that displays the raw data in a file (using hexadecimal notation, hence
the name). Hex editors are foundational to digital forensics, since when all other tools fail you
can always use a hex editor to look at the data manually. Many common file formats start with
a unique sequence of bytes, usually called a “magic number”, to help tools recognize the format
easier. There are many file formats, but the “magic numbers” of some common file formats are
documented here2 . More broadly we might say that any pattern in a file that we use to recognize
it is a “signature”.

$ bless
$ xxd 1.bin

Q6: Compare the first few bytes of the photo/image files you found. What are the three most
common “magic numbers” you see?
You might want to check the options for xxd to make this question easier.

1.3 Text search


Often it is useful to search for text in files, even if those files are binary. Below are some examples
of how to do that. See the man (ual) pages of strings and grep to see how they work.

$ strings 1.bin
$ grep -a smurf *.bin
$ strings *.bin | grep -o 'smurf'

Q7: Which files contain the text “heffaklump” in either ASCII or UTF-8?
Q8: Which files contain the text “heffaklump” in little endian UTF-16 encoding?
Q9: Write a grep pattern to find IPV4-addresses (e.g. 193.123.156.189 ). (It does not have
to be perfect, some false positives are okay.)
2 https://www.garykessler.net/library/file_sigs.html

4
Q10: How many unique IPv4-addresses can you find in the dataset? Use the pattern you
described above.

1.4 Hashing
Hashing is used in at least three ways in digital forensics: preservation, file comparison and fuzzy
file comparison (including malware analysis). For preservation we hash a file before processing it
so that we later can check that we did not accidentally modify the file. Hashes can be used to
check if two files are identical or almost identical.
On Linux, here are some tools to compute hashes of (whole) files:

$ sha1sum 1.bin
$ sha256sum 1.bin
$ md5sum 1.bin
$ hashdeep 1.bin

Q11: Which groups of files in the dataset share the same SHA1 hash? (Tip: sort | uniq -c
and grep )
For example, files 2535.bin and 2536.bin both have the same hash and no other files in
the dataset have that hash, so these two files make up one of the groups.

$ sha1sum *.bin | grep 2995201601ab916bd24c4700749965e1bfa82ec5


2995201601ab916bd24c4700749965e1bfa82ec5 2535.bin
2995201601ab916bd24c4700749965e1bfa82ec5 2536.bin

Q12: What command line did you use to answer the above question?
Q13: Explain in at most 200 characters how that command line works.
Q14: There is something odd about the hashes of four files in the dataset. Which ones? (hint:
compare sha1 and md5 using the command line you developed in the above questions).3
File 3614.bin contains the output of sha1sum command. Check if those hashes are still
correct. There is a neat option for sha1sum that does this:

$ sha1sum -c 3614.bin

Q15: Which files have the wrong hash, according to 3614.bin ?


Tip: It might be a good idea to hash all the files in the dataset and store that to a file. That
way, if you accidentally modify a file you can check this.

1.5 Visualize the difference between two files


There are tools that can highlight the difference between two files.

$ diff 1.bin 2.bin


$ diff -u 1.bin 2.bin
$ diff -y 1.bin 2.bin
$ vimdiff 1.bin 2.bin # to exit: press ESC then type ZQ
$ diff -u <(xxd foo1.bin) <(xxd foo2.bin)
$ dhex foo1.bin foo2.bin

Q16: Compare files 1315.bin and 2091.bin (using e.g. vimdiff ). How many bytes differ?

1.6 Filtering with NSRL


A real file system contains lots of forensically uninteresting junk. Fortunately, many uninteresting
files appear on multiple systems (e.g., static windows OS files, common software binaries, etc.).
There is a famous list of uninteresting hashes for use in digital forensics called the “National
3 Understanding what is odd will probably net you a question on the exam.

5
Software Reference Library (NSRL) Reference Data Set (RDS)” 4 . There should already be a copy
of it on the lab computers ( /home/kali/difo/nsrl ). Using this hash set we can filter out what
files are not interesting. As you can imagine, we can in a similar way build a hash set of illicit images
to filter out files that are instead very interesting forensically (which we will not demonstrate here).
Here is some code to download and query NSRL RDS:

$ curl -OL THEURLOFTHEZIPFILE # https://.../RDS_2023.03.1_modern_minimal.zip


$ unzip RDS_2023.03.1_modern_minimal.zip
$ sqlite3 RDS_2023.03.1_modern_minimal/RDS_2023.03.1_modern_minimal.db
sqlite3> .schema
sqlite3> select sha256, sha1 from DISTINCT_HASH limit 10;
sqlite3> .quit

Q17: Write a python script to determine which files are in the database. Below is a stub you
can start from ( 894.bin ). Beware of upper and lower case hashes when comparing! It is probably
fastest to use SHA256.

#!/usr/bin/env python3

import hashlib
import sqlite3
import pathlib
import argparse

def sha256_hash_file(path):
hash = 123 # TODO write code to calculate the hash of the file named by path
return hash

def main():
# $ python3 script.py database_file files_to_hash...
parser = argparse.ArgumentParser()
parser.add_argument('database_file')
parser.add_argument('files_to_hash', nargs='+')
args = parser.parse_args()

#print(args.database_file, args.files_to_hash)

# TODO hash each file in files_to_hash and check if it is in the database.


# TODO print the paths of the files in files_to_hash that are in the database.
for path in args.files_to_hash:
hash = sha256_hash_file(path)
hash_is_in_database = False # TODO do something else here
if hash_is_in_database:
print(path, hash)

if __name__ == '__main__':
main()

So that I can run the script like this:

$ python3 ./myscripy.py
,→ ~/difo/nsrl/RDS_2023.03.1_modern_minimal/RDS_2023.03.1_modern_minimal.db
,→ ~/difo/trash/*.bin
900.bin 13BC70E4D044FD383194F1FA9C7C102D0F8D2B81302E80B0E54693470AD4B6A7
... (and so on, 2756.bin should not be here!)

As an example, 900.bin is in the RDS, while 2756.bin is not in the RDS.


Q18: How many files in the lab dataset are “uninteresting” according to the
2023.03.1 Modern PC minimal version of NSRL RDS?
4 https://www.nist.gov/itl/ssd/software-quality-group/national-software-reference-library-nsrl/

nsrl-download/current-rds

6
1.7 Fuzzy file comparison using ssdeep
Sometimes, we want to find similar (but not exactly identical) files, such as two versions of the same
MS word document, or a (slightly) photo-shopped photo and its original. Tools like ssdeep -d
can do this and works by (intelligently) chopping the file into smaller chunks, hashing the chunks
individually and comparing files based on common chunks.
Q19: What are the 10 most similar pairs of files in the dataset, excluding pairs that are exactly
identical?

1.8 Photo Metadata extraction using Exiftool


Many photo/image file formats contain not just (heavily compressed) pixels, but also metadata
about how the photo (such as where it was taken, how it was rotated, the software that produced
it, the camera that took the photo, the settings of that camera, author, etc.). Image formats like
PNG and JPEG structure their metadata similarly in a format called EXIF. On Linux, a tool
called exiftool can extract this data.

$ exiftool 1.bin

Q20: Which photo files in the dataset were taken by a Samsung S9 smartphone camera (ac-
cording to the EXIF metadata)?
Q21: Which files in the dataset contain GPS coordinates in their EXIF data?
Q22: For each of those files, which was the closest Swedish city/town from that photo taken?
See: 5 6

1.9 Entropy
One way to detect encrypted or compressed files is by testing for high information entropy7 . There
are a few tools that can do this:

$ binwalk -N -E *.bin
$ ent 1.bin

1.10 Finding encrypted files


Q23: Which ZIP-files and which PDF-files are encrypted? Try opening each file one by one.

1.11 Password cracking


Information security, as employed by a criminal, is an obstacle for forensic investigators. Often
we must break into a system, for example by figuring out a smartphone PIN-code, a Facebook
password, the password for encrypted vault-files and (more rarely) the login password to a PC. In
this task, we will learn how to use some password-cracking tools.
In general there is only one way to crack passwords: keep guessing until you find it. Typically,
password protection in a system works by storing and comparing hashes of passwords. So to test
crack a password, you hash your guess of the password and check if the hash is correct. This means
that most password cracking tools, like john (the ripper) and hashcat , are (in a sense) just
glorified hashing tools (for short strings rather than whole files). john is password cracking tool
that uses openMPI for multiprocessing (by default it uses all CPU cores on one computer, but you
can also configure openMPI so that john can run on a cluster of machines). hashcat uses a GPU
(CUDA for nvidia, openCL for other GPUs and openCL for CPUs pretending to GPUs). This
means that if you want to do cracking on your own computer, then hashcat is probably faster
than john , but that hashcat is extremely annoying to install and run. This is why we will use
john below. In this task, the passwords and techniques are chosen so that it should at most take
a few seconds or minutes to crack a password—if it takes longer than that then you are probably
running the tool incorrectly!
5 https://gps-coordinates.org/coordinate-converter.php
6 https://maps.google.com
7 https://en.wikipedia.org/wiki/Entropy_(information_theory)

7
Tools like john and hashcat take as input a list of hashes (and the hash algorithm that
generated them). So to crack a password-protected file, we first must convert it into a list of
hashes.
Q24: File 1924.bin is a unix-style “passwd” file and 651.bin is a unix-style “shadow” file
(both from the same computer). What is the password of user drsnuggles ?

$ unshadow 1924.bin 651.bin > passwd.johnformat


$ john --help |& less
$ john passwd.johnformat # crack and store successful result in
,→ ~/.john/john.pot
$ john --show passwd.johnformat # show cracked password using ~/.john/john.pot

Q25: Files 2834.bin and 1596.bin consist of the “SYSTEM” and “SAM” registry files of a
Windows 10 machine. What is the password of user inspectorgadget?

$ impacket-secretsdump -sam 2834.bin -system 1596.bin LOCAL > nt.johnformat


$ john --format=NT nt.johnformat # john needs a little help with this file
$ john --format=NT --show nt.johnformat

In digital forensics we might know something about the person whose password we are trying
to crack. People tend to reuse passwords (we’re all guilty—please start using a password manager
like BitWarden8 !). Similarly, people tend to use words and numbers from things they like (like
the names and birthyears of their kids, or names of fictional characters, sports teams, concepts
or celebrities). Or they just wrote the password down in a text-file on their computer. So if we
have a list of known passwords (from the same person or other people) or a list of strings from
that person that could be passwords, then we can use that as the “wordlist” of passwords to crack.
When we do not provide a wordlist, john will use a default list (and resort to brute forcing when
it runs out).
Q26: There is a password-protected PDF in the dataset. What is the title of that PDF (i.e.
the title text written inside that PDF)? Use file 3616.bin as a wordlist (it contains a list of all
4-character long lower-case only strings).

$ pdf2john something.pdf > pdf.johnformat # extract hash and save in esoteric file
,→ format
$ john --wordlist=3616.bin pdf.johnformat
$ john --show pdf.johnformat # show the password
$ qpdf -password=PUTTHEPASSWORDHERE -decrypt something.pdf
,→ something.versionwithoutpassword.pdf
$ evince something.pdf # the evince GUI will prompt for the password

Q27: There is a password-protected ZIP-file in the dataset. What is the total of John Doe?
The password is in a file somewhere in the file dataset. Use strings (and sort -u ) on the
dataset and use the output as a wordlist. Hint: the password is between 7 and 30 characters
long, contains only: digits, lowercase ASCII letters, uppercase ASCII letters and punctuation from
%&/()!"#$ . It has at least one character from each of these four classes.

$ zip2john something.zip > zip.johnformat


$ john --wordlist=passwordsoneperline.txt zip.johnformat
$ john --show zip.johnformat

Note that tools like john also support generating more passwords from a wordlist (e.g. using
password to generate “ P@$sw0rD123 ”), which we did not need here.

1.12 Steganography
Steganography is about hiding data inside other data, typically in plain sight—for example by
manipulating the pixels in a photo in a way that is difficult to perceive by human eye. Typically
8 https://bitwarden.com/

8
steganography is difficult to detect (especially if the message is encrypted before being hidden)
and we often have to test individual techniques/tools one by one. steghide is one tool for hiding
data inside JPEG-files (by manipulating pixel/color data) and there are many others.

$ steghide embed -cf original.jpg -ef secret.txt -p password123 -sf steg.jpg


$ steghide extract -sf steg.jpg -p password123
$ stegseek -sf steg.jpg # password cracking, extracts secret.txt to steg.jpg.out

$ wordlists # if stegseek complains about /usr/share/wordlists/rockyou.txt

Q28: I have used steghide to hide a string of text in one of the JPEG files in the dataset.
Which file?
Q29: What is the “fourth step” mentioned in the embedded file?

9
2 Part 2—Working with disks (brain.raw, brainram.dmp)
In this part of the lab will examine the raw disk image brain.raw , which I will refer to as “the
disk”, below. Later, we will also look at a RAM dump from the same machine, brainram.dmp .

2.1 Preservation
I have already done the task of “collecting” the disk for you. As part of that process I computed
hashes of that disk. As a forensic analyst, your first task would be to compute and verify that this
hash is still correct.
Q30: What is the sha1 hash of the disk?
Q31: What is the md5 hash of the disk?

2.2 Viewing the Partition tables using mmls


The first part of a disk will typically contain a data structure called the “partition table”, in GUID
Partition Table (GPT) format (not to be confused with chatGPT!). Sometimes, the older Master
Boot Record (MBR) format (also known as the “DOS partition format”) is used instead of GPT.
Most PCs will have one big Windows partition (C:) and maybe a few smaller ones for weird
windows-internals. Using multiple partitions tends to be rare among normal people. People dual
booting Windows and Linux will have one partition for Windows and one for Linux. Some Linux
distributions will (still) create a separate partition for swap by default to optimize for spinning
HDDs, but since most modern computers have SSDs this makes little sense anymore.
“The Sleuth Kit” (TSK) is a set of tools for analysing disk images. The mmls can be used to
list partitions:

$ mmls -B brain.raw

Note that a “sector” is 512 bytes and that mmls by default gives you offsets counted in sectors.
Q32: What is the sector-offset of the largest partition?
Q33: What is the byte-offset of the largest partition?
Q34: How many sectors long is the largest partition?
Q35: How many bytes long is the largest partition?
If you look at the .raw -file in a hexeditor you should see the text NTFS at near the start of
the largest partition.

2.3 Viewing files using sleuthkit cli tools


The files in a partition/filesystem can be listed using the fls command:

$ mmls brain.raw # show sector-offset of each partition


$ fls -r -p -o MYSECTOROFFSET brain.raw

Q36: How many (allocated) files are in the largest partition of the disk? Hint: use wc -l .

Individual (allocated) files can be quickly extracted using fcat :

$ fcat -o MYSECTOROFFSET /ProgramData/Microsoft/Diagnosis/osver.txt brain.raw

Q37: What is the contents of osver.txt ?

Note that to extract all (allocated) files it is faster to run tsk_recover . This might be useful
later.

$ tsk_recover -a -o MYSECTOROFFSET brain.raw myextractiondir/

10
2.4 Carving MFT entries
The scalpel tool can be used to do simple file carving. In /etc/scalpel/scalpel.conf you
can see a example configuration for scraping various based on their first few bytes (header) and
last few bytes (footer).

Q38: Write a scalpel configuration file for carving only MFT entries.9

$ scalpel -c myscalpel.conf -o myoutputdir brain.raw

Note that scalpel can take a while to run, so you may want to leave this running while you do
other tasks further below.
Q39: How many MFT entries did you manage to carve?

Scalpel will output an audit.txt file that describes where each file was carved from.
Q40: Look at the MFT that was carved from byte-offset 3320834624 (it is easier to look at
the file you extracted, but you can also look directly in the disk file). What is the modification
timestamp of this MFT10 ?
In python you can parse a WINFILETIME 11 12 13 like this:

>>> x = int.from_bytes(bytes.fromhex('e132 8985 9fb6 d701'), 'little')


>>> d = datetime.datetime(1601,1,1) + datetime.timedelta(microseconds=x*100/1000))
>>> print(d)
>>> print(d.isoformat())

Q41: Look at the MFT entry that was carved from byte-offset 3320834624. What is the
filename of this MFT entry?

Q42: Look at the MFT entry that was carved from byte-offset 3320834624. Is this a file or a
directory?

2.5 Carving deleted file contents


The scalpel can be used to carve JPEG files:

$ scalpel -c myscalpel.conf -o scalpel-output/ brain.raw

Check /etc/scalpel/scalpel.conf on how to carve jpg -files. Make sure your config only
carves jpg and nothing else (or scalpel will easily fill up all of your diskspace with junk).
Q43: How many JPEG-files did you find?
Note that scalpel can take a while to run, so you may want to leave this running while you do
other tasks further below.

2.6 Resident files


The metadata for a file can be extracted using the istat :

$ mmls -B brain.raw # show sector-offset of each partition


$ fls -r -p -o MYSECTOROFFSET brain.raw # get the file mft entry number, e.g.
,→ "34362" from "34362-128-1"
$ istat -o MYSECTOROFFSET brain.raw MYMFTNUMBER # show attributes

9 Carrier, B. (2005). File system forensic analysis. Addison-Wesley Professional. Online: https://raw.
githubusercontent.com/Urinx/Books/master/Forensic/File%20System%20Forensic%20Analysis.pdf
10 13Cubed MFT structure https://www.youtube.com/watch?v=l4IphrAjzeY
11 https://gist.github.com/kosh04/36cf6023fb75b516451ce933b9db2207
12 https://www.silisoftware.com/tools/date.php?inputdate=132775510287135457&inputformat=filetime
13 https://stackoverflow.com/a/6161842

11
Each MFT entry (basically a file) has a list of “attributes” 14 . An attribute is a sequence of data
and a description of where it is stored. Usually, the attribute will point somewhere on the disk,
but for really small amounts data it will be stored inside the MFT entry itself. Each attribute type
has a number (e.g. 48) and a name (e.g. $FILE_NAME).
Some interesting MFT entry types include:

• $DATA (type number=128). Store the contents of the file.


• $FILE_NAME (type number=128), stores the name (e.g. “a.txt”) of the file and some
timestamps.
• $STANDARD_INFORMATION (type number=16), which stores the timestamps for the file
(creation time, modification time, etc.).
• $INDEX_ROOT (type number=144). Stores the root node of the directory index B-tree.
• $INDEX_ALLOCATION (type number=160). Stores the nodes of the directory index B-
tree.

Note that the list of attributes in an MFT can contain multiple attributes of the same type
(which have to be combined to be interpreted). Each attribute also contains an ID that we can
use to distinguish it from other attributes of the same type. In sleuthkit we can reference files or
their attributes in the following way: MftNumber-AttributeTypeNumber-AttributeId .

$ istat -o MYSECTOROFFSET brain.raw 34362 # MFT file entry


$ icat -o MYSECTOROFFSET brain.raw 34362-128 # $DATA entries for that file
$ icat -o MYSECTOROFFSET brain.raw 34362-128-1 # only the $DATA entry with ID=1

Q44: In the folder C:\difo_resident on the disk, there are four files. Which of these use
resident $DATA attributes?

2.7 Directory index slack space


NTFS directory have to keep track of what files are inside them. They could do this with a
simple list of names, but instead they use a datastructure called a B-tree15 called the “index” (or
sometimes “$I30”). Each internal node in the $I30 B-tree is formatted like the $FILE_NAME
attributes of files (i.e. each internal node contains a filename and timestamps). The exact format
(list, B-tree, whatever) does not matter much for forensics—the main thing is that when files are
removed from a directory and internal nodes are removed from the B-tree, then data from those
internal nodes might be around in “slack” space (since space for each $INDEX_ALLOCATION
is allocated in the same way as files). So even if a file is completely removed from a computer
(MFT entry and contents gone) we might still be able to recover the name and timestamps from
the directory index.
Q45: In folder C:\difo_gone_small I have created a lot of files (1.txt, 2.txt, etc.) and then
deleted some of them. What is the highest numbered deleted filename you can obtain from the
directory’s $I30 metadata (aka. “index file”). Hint: use istat and icat .

2.8 Windows RAM artifacts using Volatility


For this task you will need brainram.dmp . Volatility16 is a RAM-dump analysis tool. The tool
contains a set of “modules” that you can run individually on a ramdump:

$ vol --help |& less -S # show installed plugins


$ vol -f brainram.dmp windows.pslist.PsList # list running processes
$ vol -f brainram.dmp windows.netstat.NetStat # list open network connections

Q46: How many processes were running?


Q47: What was the name of exe-file of process 1612?
14 Carrier, B. (2005). File system forensic analysis. Addison-Wesley Professional. Online: https://raw.
githubusercontent.com/Urinx/Books/master/Forensic/File%20System%20Forensic%20Analysis.pdf
15 https://en.wikipedia.org/wiki/B-tree
16 https://www.volatilityfoundation.org/

12
Q48: There was a connection from ?.?.?.?:X to 193.10.9.5:443. What was port X?
Q49: How many occurrences of the password dsvcs2 can you find in the blob? Hint:
strings

3 Part 3—Autopsy: Windows artifacts (brain.raw)


Now that we know a bit about work with disks, we will instead focus more plain old files. We will
look at them using Autopsy .

3.1 Running Autopsy


Autopsy is the graphical interface from Sleuthkit. You can use it to explore a disk image. Autopsy
also helps automate some types of analysis of the disk in the form of “ingestion modules”.
Autopsy is free and you can easily install it on your own computer. Note that in the real world,
PC forensics is more commonly done using commercial tools like Encase, FTK, X-Ways or Magnet
AXIOM.
Unfortunately, Autopsy runs very poorly on Kali Linux (the one installed with apt is also very
old). So for this part of the lab I would suggest you boot windows. In the cs2lab computer room
you can just reboot and during the boot you will get a menu where you can choose windows. Note
that you can reach the windows partition from /mnt/winc on Kali, if you need any files accessible
from both operating systems. Alternatively (if you don’t have access to windows), you can install
Autopsy into a snapshot of brainvm.zip, the VM you will later be using in Part 4.

3.2 Installed programs


Q50: There is a web browser installed (that is not made by Microsoft). What is its name?

3.3 Recycle bin—actual dumpster diving


Q51: There is a deleted WEBP file in the recycle bin, what was its original name?

3.4 Zone.identifier
When you download something from the internet in windows 10, it will add a special “alternate
data stream” named Zone.identifier to those files to indicate that they were downloaded from
the internet.
Q52: There are two files in the C:\difo_carve_me -directory. According to their respective
Zone-identifier-files, Where were the files downloaded from?

3.5 Windows registry


Windows keeps a kind of database called the “Windows Registry”. This database lives partly in
RAM and partly on disk in so-called “hive files”. The database stores all kinds of things—everything
from system settings to hashed user passwords (maybe you remember SAM and SYSTEM from
part 1?) to lists of recently opened Microsoft Word documents for a specific user.

In Autopsy, navigate to the \Windows\System32\Config\SYSTEM registry hive. From there,


look at the Data Content > Application tab (bottom right window) and go to \CurrentControlSet\
Control\TimeZoneInformation17 .
Q53: How many hours is the timezone off from UTC?

Each user has a hive-file in \Users\MYUSERNAME\NTUSER.dat . Navigate to it in Autopsy and


browse its \Software\Microsoft\Windows\CurrentVersion\Explorer\WordwheelQuery. This
is a list of windows search queries.
Q54: List the search queries, one per line.
17 https://learn.microsoft.com/en-us/windows/win32/api/timezoneapi/ns-timezoneapi-time_zone_

information

13
Similarly, there is \Users\MYUSERNAME\AppData\Microsoft\Windows\USRCLASS.dat , which
has a similar role to NTUSER.dat . Notably, it contains “shell bags” 18 , which is essentially the
settings that Windows Explorer keeps for each directory (window size, big icons vs small icons,
etc.). It is interesting because it contains timestamps of when a user was looking at that directory
using Explorer, so we can use it to try to prove that a person knew about the contents of that
directory.
Now, go to Tools > run ingestion module > brain.raw and run only the Recent Activity
ingestion module. You should see it start loading in the bottom right corner of the screen. Once
it is done, go to the Data Artifacts > Shell Bags view on the left-hand side of the screen.
Q55: When was the shell bag for C:\difo_gone_small last modified in local time?

3.6 Windows event log


Windows has a system of logs called the “Windows Event Logs”in C:\Windows\System32\winevt\Logs .
In Autopsy you can export the each evtx-file and then parse it using Eric Zimmerman’s EvtxECmd
tool 19 . Alternatively there is the python tool python-evtx 20 that works linux.
Q56: When was the last time this system booted up? Hint: the relevant log is named
Diagnostics Performance .
Q57: Which normal user was the last to log in? Hint: check the SECURITY log. The user
should also have a directory in C:\Users (so e.g. “system” is not an interesting user here).

3.7 Prefetch
If you did not already run the Recent Activity ingestion module, then do so now (only do it
once per case or weird things happen). Now go to Data Artifacts > Run programs (look for
Prefetch in the Comments field).
Q58: How many times has chrome.exe run, according to windows prefetch?

3.8 (Optional) Eric Zimmerman’s tools


A guy named Eric Zimmerman has made forensic tools21 for looking at various windows artifacts.
His tools only run on windows. To streamline the lab assignment I did not include all of them
(especially since autopsy handles some of these anyway). However, if you are curious you can try
them yourself. Try for example parsing prefetch, shimcache and shell bags.

3.9 (Optional) Bulk Extractor


bulk_extractor is an automated carving and text search tool. It can do stuff like carve out
every credit card number, every email address and so in.

$ bulk_extractor -o mybulkdir brain.raw

3.10 (Optional) Automatic timelining with plaso


There is a tool called Plaso that can parse disk images, log files, file system metadata, and so
on, and translate it all into one giant timeline (CSV-file). You can try it if you want. Note that
Autopsy on windows has an ingestion module that runs Plaso for you, autopsy even has a graphical
viewer for Plaso timelines. On Linux, it is usually easiest to run the docker-version of Plaso.

4 Part 4—Forensic Experiments (brainvm.zip)


Most forensic artifacts are generated by software. To make conclusions about an artifact, we have
to research the behavior of the software that generated it. Often, the simplest way to do this
is to treat the software like a black box and do empirical experiments. In this part of the lab
18 13Cubed Shell Bags https://www.youtube.com/watch?v=YvVemshnpKQ
19 pickthe EvtxECmd for .NET 6 https://ericzimmerman.github.io/
20 https://pypi.org/project/python-evtx/
21 https://ericzimmerman.github.io/

14
we will explore how to use VirtualBox to do such experiments. Use the following Windows VM:
brainvm.zip , but if you have a running Windows 10 system that is fine too.

4.1 Alternative A: Virtualbox logical Acquisition


The physical acquisition using snapshots, as described above, is quite slow. When we are just
interested in specific files, copying those files out is faster than copying the entire VM. VirtualBox
has a feature called “shared folder” which allows you to do this. Note that this feature is part of
the “VirtualBox Extension Pack” that you can download from the VirtualBox website. You can
configure a shared folder for a VM by going to the settings of that VM. In windows you also need
to install “VirtualBox Guest additions” into the VM itself. In the windows VM the shared folder
will pop up as a network share. Outside the VM the shared folder is just a directory.

4.2 Alternative B: Virtualbox physical Acquisition


VirtualBox has a feature called “snapshots” which allows you to “pause” and store the current state
of the VM (storage, RAM, cpu registers, etc.) and resume it later. Our experiments can therefore
take on the following structure:
1. Take snapshot A.
2. Use the VM to do something, pretending to be a normal user.
3. Take snapshot B.
4. Compare A and B.

A VirtualBox VM consists of a .vbox -textfile of settings and a .vdi -file (or other file format)
that represents the virtual hard disk of the VM. When you take a snapshot, VirtualBox will create
another .vdi -file (disk) and .sav -file (RAM and other stuff). So if we take a snapshot of a VM,
then copy all .vdi and .sav -files, then we have our acquisitions. If we want to be prim and

15
proper, we would save hashes for all those files to preserve them, but this is a bit overkill for our
experiments. Below is an example of what it might look like:

myvm/
myvm.vbox # metadata and settings
myvm.vdi # BIG original hard disk (not diff)
Snapshots/
{01a51ac7-33cc-4d98-b1a1-c11d7bb15b26}.vdi # currently running snapshot
2023-09-07T15-42-02-310924000Z.sav # snapshot A RAM
{01a51ac7-33cc-4d98-b1a1-c11d7bb15b26}.vdi # snapshot A disk (diff)
2023-09-07T16-28-23-410287000Z.sav # snapshot B RAM
{f8b691e8-3f27-4c1a-88e0-82d9cfdb2d60}.vdi # snapshot B disk (diff)

DISK. To use these files, we must convert them into something our tools can analyze. Autopsy
claims to be able to read “VM image” files but they are lying. To get autopsy to read a snapshot
disk, we must “flatten it”. You can do this in the VirtualBox GUI by going to Tools , clicking the
≡ symbol then going to Media . In here you should see the snapshot disks for all VMs with their
UUID-names. Right click the snapshot and choose copy to a new VMDK-file somewhere. This will
“flatten’ the snapshot so that it has the whole disk. Once it is done copying, convert the vmdk-file
to raw:

$ qemu-img convert -f vmdk -O raw mydisk.vmdk mydisk.raw

RAM. The method of converting the .sav -file into something that volatility can understand,
involves running the snapshot and then dumping RAM from the running VM. Since this is slow
(many extra steps), I recommend running dumpvmcore (see the full command below) just before
taking the snapshot, instead of later trying to convert the .sav -file.

1. Power off the VM.


2. Restore snapshot (the snapshot should now be chosen but the VM is still powered off).
3. Run VirtualBoxVM --start-paused --startvm "myvmname" & to start the cloned VM
paused.

4. Run VBoxManage debugvm "myvmname" dumpvmcore --filename myram.dmp to start the


cloned VM paused.

5. Power off the VM.

If you forget the & you will need to run the second command in a separate terminal window
(alternatively, do CTRL+Z and then run bg to put the first command into the background).

4.3 Browser artifacts: Google chrome


Do a forensic experiment with google chrome. Start chrome in the windows VM, google for some-
thing unique (easy to search), for example "winniethepooh". Close google chrome.
Now perform either a logical or physical acquisition. The former is probably faster and easier.
Q59: Search through your acquisition. What files seem to contain traces of your search?

16

You might also like