Mount SMB - Pcap Reconstructing File
Mount SMB - Pcap Reconstructing File
DFRWS APAC 2024 - Selected Papers from the 4th Annual Digital Forensics Research Conference APAC
A R T I C L E I N F O A B S T R A C T
Keywords: File system and network forensics are fundamental in forensic investigations, but are often treated as distinct
File systems disciplines. This work seeks to unify these fields by introducing a novel framework capable of mounting network
Network forensics captures, enabling investigators to seamlessly browse data using conventional tools. Although our imple
File extraction
mentation supports various protocols such as HTTP, TLS, and FTP, this work will particularly focus on the
Digital forensics
complexities of the Server Message Block (SMB) protocol, which is fundamental for shared file system access,
Server message block
especially within local networks.
For this, we present a detailed methodology to extract essential file system data from SMB network traffic,
aiming to reconstruct the share’s file system as accurately as the original. Our approach goes beyond traditional
tools like Wireshark, which typically only extract individual files from SMB transmissions. Instead, we recon
struct the entire file system hierarchy, retrieve all associated metadata, and handle multiple versions of files
captured within the same network traffic. In addition, we also investigate how file operations impact SMB
commands and show how these can be used to accurately recreate user activities on an SMB share based solely on
network traffic. Although both methodologies and implementations can be applied independently, their com
bination provides investigators with a comprehensive view of the reconstructed file system along with the
corresponding user activities extracted from network traffic.
1. Introduction transferred files. Besides network protocols supporting file transfer, such
as HTTP, SMB or FTP, the rise of distributed file systems has resulted in
File system analysis, as described by Brian Carrier in 2005, is a more and more file systems utilizing a network for data sharing, either
fundamental part of any forensic investigation (Carrier, 2005). It in by building on top of existing protocols or by implementing their own.
volves the analysis of a given file system, including its structures, to Consequently, many file system artifacts can be present within captured
recover deleted files, extract metadata such as timestamps, or harness network traffic.
certain specific features such as journals or snapshots (Kim et al., 2012; Currently, standard tools such as Wireshark1 provide only limited
Hilgert et al., 2018). In certain scenarios, performing file system analysis possibilities to deal with and analyze these files in transit. Typically,
may not be practical, for instance, when there is no physical access to the they only support their extraction from the network capture. However,
device or when critical files on persistent storage have already been we found that in most cases, more information valuable for forensic
modified or deleted. In these instances, the use of network traffic can investigations such as file system hierarchies, timestamps, or other
help bridge this gap. metadata is contained within these transmissions, which is usually
In general, network forensics deals with a multitude of tasks, such as neglected.
the identification of relevant IP addresses, the analysis of protocols, and, For this reason, this work aims to close the gap between file system
consequently, the extraction of data. Since data can be transferred over and network forensics. In this research, we focus on the Server Message
the network in arbitrary ways, there is no universal solution for file Block protocol, which is extensively used for file transfers on the Win
extraction, and dedicated methods must be implemented to deal with dows operating system. SMB is frequently used within local corporate
* Corresponding author.
E-mail addresses: [email protected] (J.-N. Hilgert), [email protected] (A. Mahr), [email protected]
(M. Lambertz).
1
https://www.wireshark.org.
https://doi.org/10.1016/j.fsidi.2024.301807
networks, offering clients access to shared files and directories. There followed by SMB2 SESSION_SETUP requests and responses to establish
fore, analyzing SMB is critical for reconstructing events in incidents such an authenticated session, which include key details about the domain,
as ransomware attacks or data exfiltration. Although Wireshark is host, and user name used within the session. To access a specific server
already capable of extracting files transferred via SMB from network share, the client sends TREE_CONNECT messages with the full path of
captures, it does by no means harness all of the information available. the share. If successful, the TREE_CONNECT response provides the tree
To address this, we created a methodology to recreate a file system ID, which is used in the SMB header for subsequent requests related to
representation from SMB network traffic, including the files’ content this share.
and reconstructing its original hierarchy, timestamps, and other meta Commands. In the SMB protocol specification, Microsoft lists several
data. Moreover, we developed a framework that implements our commands that fall under the File Access category of SMB messages. The
methodology and is capable of mounting SMB network traffic as a file most important ones for the upcoming sections will be introduced next.
system. In addition to SMB, this framework also supports other network
protocols, such as FTP and HTTP. • SMB2 CREATE requests are used to request access or the creation of a
Complementing the reconstruction of a file system from network file or directory. It includes 4 Bytes to specify the desired access,
traffic, we take a closer look at the relationship between a user’s actual given in the SMB2 Access Mask encoding. Additionally, it also con
file operations and the resulting SMB network traffic. Knowing and tains flags to indicate what actions the server should take, if the file
understanding this relationship enables us to reconstruct user in already exists, further options relevant for opening or creating the
teractions from captured network traffic. In general, the contributions of file as well as file attributes given in the [MS-FSCC] specification by
our work are as follows. Microsoft. The response to a SMB2 CREATE request contains infor
mation about the status of the operation, e.g. success as well as
• An analysis of the steps required to reconstruct a file system from create, last access, last write and change timestamps of the file. It also
SMB network traffic. returns a 16 Byte FileId, which is used to identify the accessed or
• A framework for mounting SMB network traffic as a file system, created file in subsequent requests.
including its original hierarchy and metadata, which also supports • SMB2 CLOSE requests are sent by a client to close an opened file or
FTP and HTTP Hilgert et al. (2024a). directory by specifying its FileId.
• A novel method and implementation for SMB Command Finger • SMB2 READ requests contain the FileId of the file a client wants to
printing used to reconstruct user file operations from SMB network request data from. The request contains the offset as well as the
traffic Hilgert et al. (2024b). length that should be read. Consequently, the response, if successful,
contains the requested data.
Section 2 will provide an overview of the fundamentals of the SMB • SMB2 WRITE requests work in a similar way and are used to write
protocol. In Section 3, we will show the steps necessary to reconstruct data of a certain length to a certain offset of a file, identified by its
the original file system of the SMB share from captured SMB network FileId. The successful response then contains the number of bytes
traffic and present our implementation that allows investigators to that have been written.
mount network captures in Section 4. Afterwards, in Section 5, we will • SMB2 IOCTL commands can be used by the client to issue file system
explore the possibilities of reconstructing actual file operations from (FSCTL) or device control (IOCTL) commands to the server over the
captured SMB commands. Section 6 presents related work in this area, network. A list of permitted FSCTL commands can be found in Sec
before we conclude in Section 6. tion 2.3 of the [MS-FSCC] specification.
• SMB2 QUERY_INFO requests, known as GetInfo requests in Wire
2. Server Message Block protocol shark, are utilized to gather details about files, quotas, security, or
the underlying storage system, based on the specified 1 Byte Infor
The SMB protocol versions 2 and 3 were introduced with Windows mation Type. Additionally, the specific information requested is
Server 2008; 2012, respectively and are described in Microsoft’s speci determined by the 1 Byte File Information Class, such as Fil
fication Corporation (2024), which includes information about sup eBasicInformation for timestamps and attributes. When the in
ported commands and parameters, as well as descriptions of the network formation type is SMB2_0_INFO_FILESYSTEM, the response
packet structures for sending requests and responses. This section pro includes detailed information about the share’s file system.
vides a basic overview of the SMB protocol to aid in understanding Requesting the FileFsAttributeInformation class for instance
subsequent discussions on file system and file operation reconstruction. would provide the file system’s attributes and its name.
Packet Structure. Every SMB request and response starts with a 64 • SMB2 SET_INFO commands are used to update specific information
byte SMB header that features a protocol identifier, flags (such as to on files and other objects. The details to be updated are defined by
indicate whether it’s a request or response), and two bytes that denote the information type and information class, along with the actual
the SMB command type. Compound requests or responses can be used to data to be applied. For instance, setting the Fil
include multiple commands linked together in a single packet. In these eDispositionInformation is used to mark files for deletion.
instances, the header will contain an offset pointing to the subsequent 8 • SMB2 QUERY_DIRECTORY requests, known as FIND requests in
byte-aligned SMB header in the packet. Additionally, to correlate re Wireshark, are used to retrieve details about the contents of a
quests with their responses, each SMB header contains an 8 byte mes directory. In addition to the FileId of the target directory and the
sage ID. specific information class to be returned, the request includes a
Moreover, SMB headers include a 4-byte field that indicates the Unicode search pattern, which can also be a wildcard. The server
status of a response. In the case of requests, this field is disregarded and provides the requested specific information for each match to this
must be zeroed out. An exhaustive list of possible status codes is avail search pattern.
able in the [MS-ERREF] document by Microsoft. A status field filled with
zeros denotes a successful response. In subsequent sections, we will use abbreviated forms of these
Connection Setup. All SMB dialects, that is versions, support direct commands, e.g. CREATE for SMB2 CREATE.
TCP as their transport protocol, typically using port 445 on the server
side. Dialect 3.1.1 also introduces support for QUIC. Initially, the client 2.1. Create context
sends an SMB2 NEGOTIATE request to inform the server of the SMB
dialects it supports. The server then selects its preferred dialect for Within a CREATE request, the client can also include Create Context
subsequent communications in its SMB2 NEGOTIATE response. This is Structures to request additional information. Some common ones are.
2
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 50 (2024) 301807
• Maximal Access Request (MxAc): In this request, the client re described in the next subsection. Finally, metadata can also be extracted
quests the maximal access it has on the opened file or directory based from SET_INFO requests targeting metadata like timestamps.
on the current session. The response includes the corresponding ac
cess mask. 3.2. File names
• Query On Disk ID (QFid): If this is sent, the server responds with the
corresponding 8 Byte FileID as well as the VolumeID to which the Our main method for obtaining file and directory names is through
opened file belongs. CREATE requests. These not only include the name of the requested file
• Request Lease V2 (RqLs): The client requests a lease for the opened or directory but also its complete file path relative to the root directory
file. of the share, which is derived from the TREE_CONNECT request, pro
vided it is present in the network capture. This method enables us to
Leases were introduced in SMB 2.1 to enhance client-side caching, reconstruct parts of the share hierarchy, including the parent directories
effectively replacing OPLOCKs. To utilize this feature, a client requests a of the requested file. However, this reconstruction is only performed
lease for an opened file, specifying the desired mode—such as read, when a corresponding and successful CREATE response is received,
write, or handle cache. In response, the server grants the appropriate ensuring that only existing or newly created files are reconstructed.
lease, allowing the client to cache reads and writes locally and thereby Another crucial command for hierarchy reconstruction is QUERY_
reduce network traffic associated with SMB operations. When a lease is DIRECTORY. The output of this command typically includes matches to
broken — for instance, due to external changes in a directory for which a a specified search pattern. For standard interactions with the share, this
client has an open file handle — the server issues a Lease Break Notifi pattern is usually set to the wildcard *. Consequently, the server returns
cation to the client. The client must then act based on the lease’s mode. all available files in a directory up to a specified buffer length. The de
For example, if a read cache lease is broken, the application is required tails stored in the corresponding responses are then used to expand the
to purge all cached data. More detailed information on lease breaks is file hierarchy. Additionally, depending on the query sent, this infor
available in the SMB specification. mation contains at least the basic metadata for the files matching the
pattern. Extracting files in this manner results in the creation of hollow
3. File system reconstruction files as described in Section 3.5. Similar to metadata, we also use
SET_INFO requests to gather information about files that have been
In order to reconstruct a file system from network traffic, it is renamed.
important to consider what data actually makes up a file system. Ac
cording to Brian Carrier, the data of a file system belongs to one of the
3.3. Content
five data categories presented within his reference model Carrier (2005).
File contents can mainly be retrieved leveraging READ and WRITE
• Metadata Category: Metadata encompasses data describing files
commands. To achieve this, we first identify all such command types and
such as their timestamps or access rights.
correlate them with the actual files by matching their FileIds against our
• File name Category: File as well directory names and their rela
internal mapping table. Then, we use the offset and length fields within
tionship to each other are stored in this category, which is why it
the commands to accurately reconstruct the file content.
basically describes the file system hierarchy.
• Content Category: The actual content of files within the file system
belongs to this category. 3.4. File system and application data
• File system Category: Data in this category defines the structure of
the file system itself, e.g. its size or where other data is stored. Extracting information about the file system of the underlying share
• Application Category: This category consists of all the data the file can be achieved through QUERY_INFO responses when a File System
system does not necessarily need to read and write data, but is added Information Class is requested. Section 2.5 of the [MS-FSCC] specifica
for special features, e.g. journaling. tion provides a detailed overview of the available classes and their
corresponding data. In our upcoming experiments, we have primarily
In the subsequent sections, we will outline our approach for data encountered requests and responses for the FileFsVolumeInformation and
extraction from SMB network traffic corresponding to the previously FileFsAttributionInformation classes. These classes provide details about
mentioned categories of file system data. In addition, we will discuss the volume on which the file system is mounted, such as its creation date
certain peculiarities encountered during the reconstruction of a file or serial number, and a list of attributes describing the file system,
system from SMB network traffic. respectively. Since each file system has a unique layout and internal
structures, the data on file system details in SMB network traffic does not
3.1. Metadata allow for an exact replication of the original file system. This also applies
to any data that belongs to the application category. However, as shown
Most metadata, such as timestamps or file size, can be obtained from in the previous sections, this is not necessary for reconstructing the most
the corresponding SMB2 CREATE response. While it also includes file critical data for forensic analysis.
attributes, these do not necessarily match all attributes of the share’s
original file system. Instead, the file attributes used in SMB are detailed 3.5. Hollow files
in [MS-FSCC] as mentioned earlier. To associate extracted information
from subsequent requests with a specific file or directory, we also extract A hollow file is a file whose content does not appear in the SMB
the 16 Byte FileId from the CREATE response, along with the corre network traffic. Nevertheless, as mentioned previously, various SMB
sponding TreeId, and store them in an internal mapping table. commands already contain extensive metadata, which we use to create a
QUERY_INFO requests and responses can provide additional meta hollow file that includes the correct file name, path, attributes, and
data as this command is used to retrieve various types of file informa timestamps. This method aims to provide the most comprehensive view
tion. Timestamps and file attributes can be obtained from the possible of the original file system on the share. If a corresponding READ
BasicInformation class, while the StandardInformation class or WRITE request for a hollow file is identified, we populate the file with
includes details such as the allocation size and the end-of-file value, its content, thereby making it a regular file. Fig. 1 shows an example of
which indicates the file’s first unoccupied byte, i.e., its end. Further three SMB requests and responses and how we use their information to
metadata can also be found in QUERY_DIRECTORY responses as reconstruct the file system.
3
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 50 (2024) 301807
3.6. Version history we process network captures, supporting the PCAP and PCAPNG for
mats. We analyze and parse the information within these captures so
Unlike traditional file systems, which typically provide a snapshot of they can be mounted and accessed as a regular file system. For this
files and directories at a specific point in time, network captures contain purpose, our implementation creates virtual files for each network pro
data over a continuous period. Consequently, the same file can be tocol it supports, e.g., TCP or SMB files. As outlined in the previous
accessed multiple times during a capture period. If the file changes section, this involves extracting content, metadata, and filenames and
during this time, different versions, including older and more recent integrating these components using the methods provided by libfuse.
versions, may be present in the network capture. Since these previous Naturally, our file system is read-only and thus does not allow writing or
versions might not be recoverable from persistent storage of the network altering the data.
share itself, it is crucial to extract these versions when reconstructing the To enhance our implementation’s modularity, we utilize a recursive
file system. approach to analyze various network protocols within network captures.
To facilitate this, we monitor the timestamps associated with each In a first run, virtual files are created for the network capture files
file in our reconstructed file system. A change in these timestamps in themselves. Then, other protocols, typically TCP and UDP, are parsed
dicates a modification to the file. In such instances, we generate a new within these files and new corresponding virtual files are created. These
version of the file, denoted by appending ”@<version>” to its filename. virtual files contain a set of offsets and lengths that point directly into
It is important to note that new versions arise not only from files being the lower virtual file, as depicted in Fig. 2. When accessing data, such as
read but also from write operations detected in the network traffic. reading a TCP file, our implementation leverages these pointers to
retrieve and assemble the data efficiently.
4. Mounting network traffic Similarly, for SMB files, pointers within the SMB file point to data in
lower files, e.g., TCP files. Metadata for SMB files is extracted during an
After detailing the process of reconstructing an original file system initial parsing step and then stored for each SMB file. Since this can be a
from SMB network traffic in the previous section, this section outlines time-consuming task, our implementation utilizes an index file, which
our implementation for mounting acquired network traffic to achieve stores all relevant information about the detected files, their set of off
such reconstruction. sets, as well as any metadata for these files and is typically only a frac
Our approach extends traditional network forensics, which typically tion of the size of the associated network capture file.
focuses on packet-level or protocol-level data analysis. Instead, we Additionally, our implementation supports arbitrary transformation
enable an analysis similar to traditional storage forensics, where in steps between virtual file layers. For instance, if data is encrypted,
vestigators can navigate through network data using standard forensic reading a virtual file may first access the encrypted data from a lower
tools and techniques. This includes operations such as calculating file, decrypt it—provided that decryption keys are available—and then
hashes, searching for YARA signatures, or employing other sophisticated present the decrypted data seamlessly in the mounted file system,
tools. maintaining transparency throughout the process. This concept allows
Furthermore, our solution tackles a major challenge in network fo for the support of more complex network protocols such as TLS.
rensics: the performance drop due to the extensive size of network traffic
captures, which can consist of countless packets and require lengthy 4.2. Structure
loading periods in analysis tools such as Wireshark. This is achieved by
utilizing a specialized index file that stores the layout of the recon By default, a separate directory is automatically created within the
structed file system. This eliminates the need for repeated parsing and mounted file system for each supported protocol, in which the corre
examination of the network capture upon mounting, thereby enhancing sponding parsed virtual files are stored, as detailed in Listing 1. File
performance and accelerating the analysis process. names start with the index of the source file — for UDP and TCP files,
which usually directly reference the network capture, the index remains
4.1. Overview uniformly ’0’ in our example, indicating a single capture-file.pcap.
This index is followed by the offset at which the file begins. For example,
Our implementation utilizes the Filesystem in Userspace (FUSE)2, TCPFILE12 starts at offset 770 within the network capture file. This
which facilitates the creation of customizable and mountable file sys naming pattern also extends to other protocols, such as the HTTP
tems. Unlike traditional storage forensics, where a volume is mounted, banner.svg file, which points to TCP file 31 and starts at offset 22434.
All necessary offsets for file construction are initially stored in memory,
but can optionally be written to a special index file on disk to facilitate
2
https://github.com/libfuse/libfuse. faster mounting by avoiding repeated data parsing.
4
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 50 (2024) 301807
Fig. 2. Overview of data access within our implementation for mounting network captures.
5
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 50 (2024) 301807
To offer a more complete understanding of the reconstructed file from our experiments.
system data, it is crucial to comprehend its origins by extracting file
operations from network traffic. The subsequent section introduces a • Calling the CreateFile API call results in at least one CREATE
novel methodology for identifying user activities within SMB network request.
traffic. This approach can be utilized in conjunction with mounting the • The specified file share access, create disposition and file attributes are
network capture for a more comprehensive view or separately. reflected in the corresponding fields of the CREATE request.
• File attributes do not influence the sequence of SMB commands sent.
5. Reconstructing file operations • Similarly, file flags including the security flags did not change the SMB
commands sent. Instead, most of the file flags are represented in the
File operations are any interaction an application has with a file or create options field within the CREATE request.
directory. As a result, there is a strong connection between file opera • The desired share mode does not have an impact on the sequence of
tions and user interactions, since every user interaction may initiate a SMB commands either.
series of file operations. This section details how analyzing captured • The API parameters OPEN_EXISTING, OPEN_ALWAYS, CREA
SMB network traffic can provide insights into file operations and, by TE_NEW and CREATE_ALWAYS for the disposition are mapped to the
extension, the underlying user interactions. FILE_OPEN, FILE_OPEN_IF, FILE_CREATE and FILE_OVER
WRITE_IF dispositions in SMB commands.
5.1. Methodology • Using OPEN_ALWAYS, CREATE_NEW or CREATE_ALWAYS as a
disposition adds an additional CREATE request to the sequence of
For this purpose, we divide the process into three steps. SMB commands targeting the parent directory.
• If write access is requested in an API call using the OPEN_EXISTING
1. Windows API Analysis: We begin by examining the influence of disposition, an additional QUERY_INFO requesting the normalized
various Windows API calls on the resultant SMB commands. The name of a file is issued.
Windows API offers a diverse set of functions that enable applica
tions to interact with the Windows file system, playing a crucial role 5.2.2. FindFirstFile
in all file operations within Windows. This API call is used to search a directory for a specific file name or
2. SMB Command Fingerprinting (SCF): Building on our under pattern, including a wild card, and returns a search handle for subse
standing of the Windows API, we propose a novel technique to detect quent searches, as well as the file information for the first matching file.
the execution of a Windows API call on an SMB share, exclusively Performing this call on an explicit file name or directory name results in
through the analysis of intercepted network traffic. a CREATE command for its parent directory followed by two QUER
3. Case Study with cmd.exe: To demonstrate the effectiveness of our Y_INFO commands, which were sent as a compound request in our
approach, we employ SCF rules to reconstruct specific user in
teractions, starting with the widely used command line utility, cmd.
exe. This tool is selected for its ubiquity, simplicity, and versatile file
system manipulation capabilities.
The Windows API offers a wide array of functions for various tasks
including data access, system management, and networking. Functions
within the Windows API that handle character data typically appear in
three forms: a variant ending in A that utilizes Windows code pages for
text processing, a variant ending in W that accommodates Unicode, and a
basic form without suffix. Given that the standard form ultimately relies
on one of these specific API calls, our emphasis will be on the more
contemporary W-versions of these APIs where relevant. The following
subsections will detail the SMB commands observed when we executed a
compiled C version of the single Windows API call.
5.2.1. CreateFile
Since many Windows API methods require a file handle, it is often
necessary to first open the file using the CreateFile call. In addition to
the file name, it requires the desired access and share mode, the creation
disposition, and flags or attributes as arguments. These arguments thus
need to be adapted to the actual use case, e.g. a read or write.
In our experiments, we have found that the arguments given to the
CreateFile API call can highly influence the resulting SMB commands
sent via the network. For this reason, we present the most crucial results
Fig. 3. SMB commands triggered by a FindFirstFile API call. The right side
illustrates the outcomes when the call is made with the prior file handle
3
https://frida.re/docs/frida-trace/. remaining open.
6
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 50 (2024) 301807
experiments, as illustrated in Fig. 3. If the client receives a response handle, a pointer to the buffer containing the data, and the number of
indicating a NO_MORE_FILES status after the initial QUERY_INFO re bytes to write. The file handle must be obtained first, with the correct
quests, it does not perform additional ones. However, if the server still access rights set for writing. For our experiments, we created the handle
returns files, it performs additional QUERY_INFO requests using a buffer using GENERIC_WRITE and OPEN_ALWAYS.
length of 1048612 Bytes until all files are returned. Most interestingly, The WriteFile operation itself triggers two additional SMB re
regardless of the search pattern specified in the API call, the SMB quests: A WRITE request, which includes the actual data to write, follows
commands seem to always utilize the wild card parameter *, thus the CREATE response for the target file. If the data length exceeds
returning information for all files within a directory. 3,473,408 bytes, the operation is handled through multiple WRITE re
Since the CREATE also requests a lease, the file handle is kept open quests. These requests utilize the length and offset fields in the SMB
and the CLOSE request is only sent when the DormantDirector commands to indicate which part of the data is sent. Once the write
yTimeout is reached, which by default is set to 10 min. Alternatively, operation is complete, a QUERY_INFO command is issued to retrieve the
the client also sends this command as soon as a Lease Break Notification FileNetworkOpenInfo, which provides details about the file status
for the directory is received. During the time the file handle is still open, post-write.
performing this API call still creates the CREATE request as shown in the
second example in Fig. 3. However, the QUERY_INFO commands are 5.2.6. CreateDirectory and RemoveDirectory
omitted in this case, and the CLOSE request is sent immediately. The API calls CreateDirectory and RemoveDirectory are
The FindFirstFileEx call additionally allows us to specify attri intended for creating and deleting directories, respectively.Crea
butes that need to match the returned file. In our tests, the resulting SMB teDirectory generates a single CREATE request followed by a CLOSE
commands were identical to the ones observed for FindFirstFile. request. As illustrated in Fig. 4, invoking RemoveDirectory initiates a
Subsequently, the FindNextFile API call is usually used to obtain CREATE request, which is then succeeded by a SET_INFO request that
the next file that matches the search pattern. This call requires the search sets the FileDispositionInformation to explicitly mark the
handle returned by FindFirstFile. However, since this API function directory for deletion. Finally, a CLOSE request is issued.
initially requests all matching and even nonmatching files through SMB,
using FindNextFile does not trigger any additional commands in the 5.2.7. DeleteFile
network compared to just using FindFirstFile. The DeleteFile API call requires the name of the file to be deleted.
The resulting SMB commands are similar to those of the RemoveDir
5.2.3. GetFileAttributes and SetFileAttributes ectory command. However, the parameters for the CREATE request
The GetFileAttributes API call is a straightforward method to are different, and there is an additional QUERY_INFO command issued
retrieve the attributes of a file or directory based on its name, elimi to retrieve the FileNormalizedNameInformation class. This com
nating the need for a preceding CreateFile call. This operation trig mand sequence is illustrated in Fig. 4.
gers a single CREATE command for the target file, with the parameters in
the SMB packet set automatically. This is immediately followed by a
CLOSE command. 5.3. SMB Command Fingerprinting
Conversely, to modify file attributes, the Windows API offers the
SetFileAttributes call, which requires a file name and the new Our research has shown that each Windows API call generates a
attributes to apply. Following the CREATE request, which employs pa distinctive sequence of SMB commands. These sequences are
rameters distinct from those for attribute retrieval, a QUERY_INFO
command is issued to obtain FileNormalizedNameInfo. Subse
quently, attribute modifications are made using a SET_INFO command
directed at FileBasicInfo. The sequence ends with a CLOSE
command.
5.2.4. ReadFile
To read a file, the Windows API provides the ReadFile function,
which requires a file handle and a specified number of bytes to read. For
our experiments, we obtained the file handle by performing the Cre
ateFile API call with standard GENERIC_READ settings, yielding SMB
commands as detailed in Section 5.2.1.
When ReadFile is called, it causes an additional READ command. In
particular, the number of bytes requested in the SMB command is always
rounded up to the nearest multiple of 4096 or the actual file size of the
file, if it is lower. For example, an API call to read only 50 bytes of a large
file will actually request 4096 bytes over SMB. For read operations that
exceed 2,097,152 bytes, multiple READ requests are issued, using the
offset parameter to request the next parts of the file. These requests are
transmitted consecutively without pausing for the server’s response.
While the ReadFile function lacks a direct parameter to set a read
offset, this can be accomplished by adjusting the file pointer using the
SetFilePointer function. This adjustment also affects the offset
utilized in the SMB READ commands. Similarly to the read length, any
specified offset is rounded down to the nearest multiple of 4096 bytes in
the respective READ command.
5.2.5. WriteFile
Writing data to a file in the Windows APIs is performed using the Fig. 4. SMB commands originating from a RemoveDirectory and Dele
WriteFile function. This function requires three key inputs: a file teFile API call.
7
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 50 (2024) 301807
characterized by two key aspects: the specific types of SMB commands the Windows Command Line Utility cmd.exe, a ubiquitous tool across
issued and the parameters set within these commands. This is because Windows systems that can be used to perform various file operations.
API calls that require a filename generally initiate a file operation using We executed a series of commands on a mounted SMB share, captured
unique parameters, such as file attributes or desired access levels. We the corresponding network traffic, and used our SCF rules to reconstruct
can use this information to associate SMB command sequences with the file operations. Although only 18 commands were executed over a
their respective API calls. span of about 2 min, the generated SMB traffic included roughly 250
To facilitate the analysis, we propose an SMB Command Finger SMB packets, making a manual analysis impractical and unscalable.
printing approach. This method calculates an MD5 hash for each SMB Table 1 provides a comprehensive summary of the events that were
packet, simplifying the identification of distinct command sequences. To automatically reconstructed purely from network traffic, together with
ensure that the hashes are both precise and universally applicable, i.e., the original commands executed and their timestamps.
independent of dynamic fields, we selectively hash values based on the Our findings show that our approach successfully identified 16 of 18
specific type of SMB command. Fig. 5 illustrates which parameters are commands executed using cmd.exe. The exceptions were the cd …
used to calculate an SCF for an SMB Create request. commands, which did not generate SMB commands, likely due to
For compound requests containing multiple SMB requests or re caching mechanisms, hence they could not be reconstructed. All other
sponses in a single SMB packet, we calculate the individual SCF for each commands, including other simple directory changes, were accurately
SMB command, concatenate them, and calculate the final hash of this reconstructed. Notably, the mkdir Files\Other command was
result. To facilitate this process, we developed a utility that calculates reconstructed as two separate events, reflecting the recursive nature of
the SCFs for SMB packets in a given network capture file automatically. directory creation in this scenario.
Examples of SCFs resulting from various Windows API calls can be found It is important to note that the timestamps of reconstructed events
in Table 2. typically lag behind the actual execution times due to the inherent delay
in capturing the corresponding network packets. Therefore, the preci
5.4. Case study: cmd.exe sion of these timestamps in real-world scenarios can vary significantly
depending on the network configuration.
While reconstructing specific Windows API calls from SMB network
traffic yields valuable insights, the true strength of our approach lies in 6. Related work
reconstructing explicit user interactions. For this purpose, we developed
a set of SCF rules that comprise one or more SCFs. Thus, these rules Over the years, multiple research efforts have explored methods to
consider not only the individual parameters of an SMB command but facilitate network traffic analysis. In digital forensics, research has, for
also the sequence in which these commands are sent. Listing 4 provides example, explored the extraction of HTTP traffic events (Gugelmann
an example of an SCF rule. This rule identifies a sequence that includes a et al., 2015). Other studies range from employing neural projections for
, a , a , and a the visualization of network traffic for intrusion detection (Corchado
, while considering the specific parameters for creation, and Herrero, 2011), incorporating 3D representations that integrate the
as well as the information type and class through the use of SCFs. temporal aspect (Clark and Turnbull, 2020), to using relational graphs
for enhanced data exploration (Cermak et al., 2023). A 2021 study
emphasized the difficulties in using visualization techniques for anom
aly detection in network traffic, highlighting the persistent challenges in
this research area (Corchado and Herrero, 2011).
File extraction from network traffic is a well-established practice,
with current methods capable of extracting various file types from
different network protocols, similar to the functionalities provided by
tools such as Wireshark (Choi et al., 2015; Hansen et al., 2018). How
ever, these methods either do not support or are inadequate in extracting
and presenting all available information for protocols like SMB.
Although initially not developed with digital forensics in mind, a
Listing 4. An SCF rule for the detection of file creation using e.g. echo “Data” conceptually similar approach to our SMB Command Fingerprinting
> file in SMB network traffic. already emerged in 1992. The researchers introduced a toolkit to
To demonstrate the practicability of this methodology, we utilized approximate the activity of the file system by analyzing the network
traffic of the NFS (Network File System) (Blaze, 1992). Over the years,
various research on NFS tracing has evaluated and refined these
methods, enhancing the ability to recover file system traces from passive
monitoring of network traffic (Moore, 1995; Ellard and Seltzer, 2003).
However, this concept has not been extended to protocols like SMB, nor
has it aimed to establish a universally applicable set of rules to detect
diverse user actions across different applications, as we propose with our
SCF Ruleset.
Furthermore, the broader domain of network traffic fingerprinting
has traditionally focused on identifying specific applications rather than
user interactions (Dai et al., 2013; Taylor et al., 2016). Our research tries
to identify precise user behaviors, thus expanding the forensic capabil
ities of network traffic analysis.
8
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 50 (2024) 301807
allowing them to navigate the data and use standard file-based tools Windows API call executions purely from SMB network traffic and the
easily. Our implementation facilitates this process by offering various accurate reconstruction of user activities. For this purpose, we created
options for customizing the file system hierarchy, such as using IP ad SCF rules that allow the precise reconstruction of commands executed
dresses or ports, thereby merging network and file system forensics. through the Windows command utility. While this was merely a case
Although our framework supports multiple protocols that can be study to demonstrate the applicability of our approach, it is essential to
mounted, including HTTP, FTP, and TLS, we particularly emphasized expand on this research in the future.
the SMB protocol due to its common use for file sharing. We have out For example, it is crucial to broaden the SCF ruleset to encompass
lined a methodology for extracting critical data from SMB network other applications and explore the feasibility of distinguishing between
traffic, which can be used to accurately reconstruct the file system of the them, such as determining which application was responsible for
share. Unlike other analysis tools such as Wireshark, which only allow creating or deleting a file. In this context, it is also vital to examine
for the extraction of transferred files, our approach enables the recon different operating systems, considering various implementations of the
struction of the file system hierarchy and metadata by leveraging all SMB protocol, such as Samba on Linux. Furthermore, it is necessary to
available information in the captured traffic. Therefore, using hollow investigate whether failed attempts, such as unsuccessful file access, can
files that lack actual data, our method offers a more comprehensive be accurately reconstructed from SMB network traffic. To support
representation of the original file system. research in this field, both our framework for mounting network traffic
As an additional method for SMB network analysis, we examined the and our implementation for calculating SCFs, reconstructing file oper
unique SMB sequences resulting from Windows API calls and proposed ations, and our current rule set are available in our repositories Hilgert
SMB Command Fingerprinting. This method enables the identification of et al. (2024a,b).
Appendix
Table 1
Comparison of actual executed cmd.exe commands and the reconstructed commands from SMB network traffic.
Table 2
SMB Command Fingerprints for various Windows API calls.
9
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 50 (2024) 301807
Table 2 (continued )
WinAPI SCF Description
References Gugelmann, D., Gasser, F., Ager, B., Lenders, V., 2015. Hviz: Http (s) traffic aggregation
and visualization for network forensics. Digit. Invest. 12, S1–S11.
Hansen, R.A., Seigfried-Spellar, K., Lee, S., Chowdhury, S., Abraham, N., Springer, J.,
Blaze, M., 1992. Nfs tracing by passive network monitoring. In: Proceedings of the
Yang, B., Rogers, M., 2018. File toolkit for selective analysis & reconstruction
USENIX Winter 1992 Technical Conference, pp. 333–343.
(filetsar) for large-scale networks. In: 2018 IEEE International Conference on Big
Carrier, B., 2005. File System Forensic Analysis. Addison-Wesley Professional.
Data (Big Data). IEEE, pp. 3059–3065.
Cermak, M., Fritzová, T., Rusňák, V., Sramkova, D., 2023. Using relational graphs for
Hilgert, J.N., Lambertz, M., Yang, S., 2018. Forensic analysis of multiple device btrfs
exploratory analysis of network traffic data. Forensic Sci. Int.: Digit. Invest. 45,
configurations using the sleuth kit. Digit. Invest. 26, S21–S29.
301563.
Hilgert, J.N., Mahr, A., Lambertz, M., 2024a. pcapFS – Mounting Network Data. URL:
Choi, Y., Lee, J.Y., Choi, S., Kim, J.H., Kim, I., 2015. Transmitted file extraction and
https://github.com/fkie-cad/pcapFS/tree/main.
reconstruction from network packets. In: 2015 World Congress on Internet Security
Hilgert, J.N., Mahr, A., Lambertz, M., 2024b. SCF - SMB Command Fingerprinting. URL:
(WorldCIS). IEEE, pp. 164–165.
https://github.com/fkie-cad/SCF.
Clark, D., Turnbull, B.P., 2020. Interactive 3d visualization of network traffic in time for
Kim, D., Park, J., Lee, K.g., Lee, S., 2012. Forensic analysis of android phone using ext4
forensic analysis. VISIGRAPP 177–184, 3: IVAPP.
file system journal log. In: Future Information Technology, Application, and Service:
Corchado, E., Herrero, Á., 2011. Neural visualization of network traffic data for intrusion
FutureTech 2012, vol. 1. Springer, pp. 435–446.
detection. Appl. Soft Comput. 11, 2042–2056.
Moore, A.W., 1995. Operating System and File System Monitoring: A Comparison of
Corporation, M., 2024. [ms-smb2]: Server Message Block (Smb) Protocol Versions 2 and
Passive Network Monitoring with Full Kernel Instrumentation Techniques. Ph.D.
3. https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/
Thesis. Monash University.
5606ad47-5ee0-437a-817e-70c366052962.
Taylor, V.F., Spolaor, R., Conti, M., Martinovic, I., 2016. Appscanner: automatic
Dai, S., Tongaonkar, A., Wang, X., Nucci, A., Song, D., 2013. Networkprofiler: towards
fingerprinting of smartphone apps from encrypted network traffic. In: 2016 IEEE
automatic fingerprinting of android apps. In: 2013 Proceedings Ieee Infocom, IEEE,
European Symposium on Security and Privacy (EuroS&P). IEEE, pp. 439–454.
pp. 809–817.
Ellard, D., Seltzer, M., 2003. New nfs tracing tools and techniques for system analysis. In:
Proceedings of the 17th Large Installation Systems Administration Conference.
USENIX Association.
10