Analyzing TikTok From A Digital Forensics Perspective
Analyzing TikTok From A Digital Forensics Perspective
Patricio Domingues1,2†, Ruben Nogueira1 , José Carlos Francisco1 , and Miguel Frade1,3
1 ESTG/Polytechnic of Leiria, Leiria, Portugal
[email protected], {2171569, 2202274}@my.ipleiria.pt, [email protected]
2 Instituto de Telecomunicações, Leiria, Portugal
3 CIIC Research Centre, Leiria, Portugal
Received: January 4, 2021; Accepted: May 3, 2021; Published: September 30, 2021
Abstract
TikTok is a major hit in the digital mobile world, quickly reaching the top 10 installed applications
for the two main mobile OS, iOS and Android. This paper studies Android’s TikTok application
from a digital forensic perspective, analyzing the digital forensic artifacts that can be retrieved on
a post mortem analysis and their associations with operations performed by the user. The paper
also presents FAMA (Forensic Analysis for Mobile Apps), an extensible framework for the forensic
software Autopsy, and FAMA’s TikTok module that collects, analyzes, and reports on the main dig-
ital forensic artifacts of TikTok’s Android application. The most relevant digital artifacts of TikTok
include messages exchanged between TikTok so-called “friends”, parts of the email/phone number
of registered users, data about devices, and transactions with TikTok’s virtual currency. One of the
results of this research is the set of forensic traces left by users’ transactions with TikTok’s in-app
virtual currency. Another result is the detection of patterns that exist in TikTok’s integer IDs, allow-
ing to quickly link any 64-bit TikTok’s integer ID to the type of resources – user, device, video, etc.
– that it represents.
1 Introduction
TikTok is a social media platform which has taken the mobile world by storm. With less than four years
of existence in the international market, TikTok reached the top 10 of most downloaded applications for
mobile devices in the 2010-2019 decade [28], even achieving rank #2 in 2019 [19], and credited with
around 800 million active users [31].
The concept of TikTok is simple and centered around combining an endless stream of short duration
videos, ranging from 15-second to 60-second long, of all kinds, often with music. On top of video
sharing, TikTok has social network features, such as the concept of followers/friends, the possibility to
comment on videos, exchange messages, perform live transmissions, and to donate gifts bought with
TikTok virtual currency.
The origins of TikTok stem from two other services: Musical.ly and Douyin. The former – Musica
l.ly – was a popular mobile application released in 2014 centered around 15-second lip syncing videos,
making available to video creators a wide library of high quality commercial music [24]. In the third
Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA), 12(3):87-115, Sept. 2021
DOI:10.22667/JOWUA.2021.09.30.087
∗ Extended
version of “Post-mortem digital forensic artifacts of TikTok Android App” [12]
† Corresponding author: ESTG / Polytechnic of Leiria, Tel: 2411-901 Leiria, Portugal Email: patri-
[email protected]
87
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
quarter of 2016, a similar sync-lip service appeared in China, under the name of Douyin, with ByteDance,
the company behind Douyin, establishing TikTok as the international (i. e., non-Chinese) version of
Douyin [1]. In late 2017, ByteDance acquired and merged musical.ly with TikTok. Nowadays, both
TikTok and Douyin continue to exist in their respective markets [16], while musical.ly has been
totally integrated into TikTok. Interestingly, the URL musical.ly redirects to TikTok.com. Lately,
TikTok has been subjected to some instability due to non-technical reasons [2, 3], and has attracted its
fair share of controversy [4, 1].
TikTok appeals and targets a young population, mostly the pre-adolescent (10-14 years old) and
adolescent (15-19 years old) groups. This is attributed not only to the short duration of videos, but also
to the empowerment brought by TikTok to young individuals, as it eases the creation of videos with
background audio and visual effects, all through the smartphone, which is a central tool in the life for the
so called digital natives [8]. Moreover, there is also the attractiveness that a video can reach viral status,
bringing recognition and revenues to creators. TikTok seems particularly appealing to the digital born
generation [8].
The importance of digital forensic for TikTok is manyfold. First, the young demographic of TikTok
can attract crime against children, such as online enticement, sexual exploration and sex extortion [7].
Secondly, due to the easiness of creating and publishing videos, TikTok is sometimes used by extremist
groups to publish hate material [35]. The absence of a real search feature, as searches for TikTok videos
need to be made by selecting hashtags, hardens the detection of inappropriate content by law enforcement
agents. Thirdly, analyzing the interactions of an individual with TikTok – published videos, liked videos,
friends in the network, exchanged messages, donated gifts – might provide valuable insight of user’s
tastes and way of thinking, besides providing evidence of his/her actions.
All of these contribute to the motivation for this work, whose main goals are to i) identify the main
forensic artifacts of TikTok application for Android and to ii) provide an open-source software tool to
extract, analyze and report on these forensic artifacts.
This paper extends our previous work [12], which studied TikTok’s digital forensic artifacts available
in version 16.0.41 (May 2020), whereas this work now focuses on version 18.1.3 (December 2020). This
extended version adds the following main contributions: 1) Revision and validation of the digital forensic
artifacts of TikTok’s in the newer studied version; 2) Analysis of TikTok’s 64-bit integer ID scheme,
decoding the different types of identifiers, allowing for a quick detection of resource types (accounts,
videos, devices, etc) and the associated timestamps; 3) Study of the forensic artifacts related to the usage
of TikTok own coins and virtual gifts, namely purchasing coins, donating and receiving virtual gifts; 4)
Presentation of our Forensic Analysis for Mobile Apps (FAMA), a new open-source framework to collect,
process and report Android’s digital forensic artifacts of Android smartphones; 5) Analysis of the TikTok
module for FAMA, namely its main outputs when dealing with TikTok artifacts.
The remainder of this paper is organized as follows. Section 2 reviews related work, while Section 3
presents the TikTok ecosystem. Section 4 analyzes the multiple forensic artifacts provided by the An-
droid TikTok app, while Section 5 studies artifacts linked to TikTok coins and virtual gifts usage. In
Section 6, the forensic framework for the Autopsy software is presented, with emphasis to the applica-
tion of the framework for TikTok Android app. Finally, Section 7 concludes the paper and shows some
possible venues for future work.
2 Related Work
Khoa et al. analyze TikTok for Android from a digital forensic perspective [14]. They describe the main
artifacts for version 8.9.4 of TikTok for Android. However, contrary to our work, they do not propose
any tool to ease the task of digital forensic practitioners charged to analyze TikTok. Moreover, TikTok
88
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
has evolved substantially since version 8.9.4, as our work addresses version 18.1.3. In fact, as we report
in this paper, we noted differences between our previous work, with version 16.0.41 [12] and the now
studied version.
Benson observes that the posting timestamp of a TikTok video can be determined by its URL [5, 6].
Specifically, the 32 most significant bits of the 64-bit ID of a TikTok video corresponds to the date/time
of when it was posted to TikTok. The timestamp is expressed in UNIX Epoch format. Additionally,
the author’s online service Unfurl1 decomposes the data comprised within an URLs of TikTok’s videos,
namely the account of the video, as well as the post date/time. Note that TikTok currently displays, on
the web interface, the posting date (not the hour) of a video, when the video is being shown, although
the precise date/time can be found in the HTML of the page [5]. Benson also hints on the pattern of
the lowest 32 bits of TikTok’s ID of videos, hinting that they could be flags and map to an internal
lookup table. In this work, we further analyze TikTok IDs, detecting patterns in the least significant 32
bits. These patterns allow us to quickly classify the type of resources – account, video, device, etc. –
represented by a given ID.
Robert analyzes network traffic generated by TikTok, namely an HTTPS Post message that contains
clear text plus an encrypted part [25]. This message is sent every five minutes by TikTok. The clear text
contains a large set of parameters, regarding 1) the device (e. g., screen resolution), 2) the user and 3)
the application. In a followup post, the author analyzes the encrypted content by intercepting the call to
the encryption method, observing that it is comprised of JSON-formatted data that include information
regarding the device, and events logging [26]. Robert classifies as “not really personal” the encrypted
data periodically sent by the application to TikTok servers. Likewise, Tidy [31] observes that the TikTok
mobile application collects data about the watched videos, the location, phone model and operating
system, as well as all touch activity within the application, that is, the keystroke dynamics. Our work
confirms that these data and other events are collected by the mobile app, as it is stored in its databases
and XML files, as we shall see later.
The anonymous author BTF 117 provides a detailed Open Source Investigation (OSINT) of Tik-
Tok through several blog posts, with the main goal of studying the possibility of gathering data about
a specific TikTok user [10]. In his detailed research, the author sets a so-called man-in-the-middle in-
frastructure to intercept the HTTPS network traffic exchanged between the client sides of TikTok –
both the web interface and the phone app are studied – and TikTok servers. Captured traffic is mostly
JSON-formatted data that provides a significant amount of information about the targeted user, including
personal data and published video. Contrary to BTF 117’s work, our study focuses on the content that
exists in the phone. We believe that both approaches are complementary and can provide valuable data
when combined.
In his digital forensic blog, Brignoni [9] analyzes the forensic artifacts of TikTok, focusing on mes-
sage exchange and on some XML files. Our observations confirm that Brignoni’s results regarding
messages are still valid, indicating some stability of the app in the way it deals and stores messages, as
Brignoni conducted the study in 2018.
Fergus et al. scrutinize TikTok and WeChat from a privacy point of view [27]. They observe that
version 17 of TikTok’s mobile application was collecting the local IP address of the device, along with
the IP addresses of the DNS servers. The work also comments on the clipboard access for reading
detected in the iOS version of the application, remarking that it was caused by an third-party SDK linked
to ads, and that the same behavior was observed in many other mobile applications for the same reasons.
Another issue with clipboard access was labelled as anti-spam feature, as TikTok sought to detect users
that were repeatedly dumping the clipboard content as comments, either for self promotion or simply for
nuisance [27]. The report also mentions that plaintiffs declared in a class action lawsuit that previous
1 https://dfir.blog/unfurl/
89
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
versions of TikTok were collecting device identifiers such as the International Mobile Equipment Identity
(IMEI) and the International Mobile Subscriber Identity (IMSI) numbers. However, the authors verified
that both the mobile applications and the web platform were not collecting these data as of August
2020 [27]. As we shall see later, our work confirms that version 18.1.3 no longer stores the IMEI and the
IMSI identifiers.
Neyaz et al. forensically review the Android versions of FaceApp and TikTok, analyzing privacy
issues, needed OS permissions, and the network traffic [20]. The authors also used steganography in
TikTok (version 12.3.5) to inject hidden text messages into videos posted to TikTok to check whether
hidden text could be retrieved by TikTok viewers. They concluded that TikTok processing garbled the
hidden content of videos, rendering useless the attempted steganography technique.
Kaye et al. compare Douyin and TikTok [16], focusing on infrastructures, features, business models,
and governance. Wang [34] provides a study, mostly based on graphical representation, to highlight the
data dimension and wide spread of TikTok.
Pandela and Riandi study the digital forensic artifacts retrieved when several operations are per-
formed on a TikTok account accessed through a web browser [22]. They resort to three main tools: FTK,
Browser History Capture, and Video Cache View. They conclude that login name, some text and photos
thumbnails, but no videos, can be retrieved from the analysis of the browser. In this work, we focus
solely on TikTok Android application.
3 TikTok Ecosystem
In this section, we review the main elements of what we call the TikTok ecosystem: i) TikTok’s resource
identifiers; ii) ways of accessing TikTok; and iii) the main features of TikTok.
90
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Tiktok ID (decimal)
6912215720931757317
Hexadecimal
5F ED 1F 93 61 02 0D 05
Pattern
Resource Description
(3rd lowest hex)
published in 2017, we hypothesize that these accounts were created in musical.ly and then migrated
to TikTok when musical.ly was merged with TikTok.
Table 1 lists TikTok’s types of identifiers and the corresponding 3rd least significant hexadecimal
value for each pattern. The name for each type of identifier, given in the left column of the table, follows
the designation found in the logs of TikTok’s mobile application. Note that account IDs (user id in
Table 1) can have either a 0 or a 4 as pattern identifier. Furthermore, the designation wid references the
ID assigned by TikTok to a web browser when one accesses TikTok’s website. This ID can be seen in
the page code received by the browser with the label wid and seems linked to the browser’s sessions, as
cleaning cookies originates the attribution of a new wid to the browser.
Within TikTok, a registered user has three distinct identifiers: i) a userID also referred as UID, which is
the 64-bit integer ID sets by TikTok when the account is created; ii) an alias name set by the registered
user; and iii) a uniqueID, referred as the username, which can be used with the @ symbol to reference
the user within the TikTok network. The uniqueID is also set by the account owner, and can be changed,
although as of December 2020, the minimum time interval between two successive changes is 30 days.
Table 2 lists the identifiers for the @nature2admire TikTok account. Note that the userID 69121917
00166542342 encodes, as shown earlier, in 32-bit Unix EPOCH format, the timestamp of the account
creation in the higher 32 bits, “2020-12-30 23:14:02” in this case. The hexadecimal representation of
the lowest 32-bit of the identifier – 0x9BD48406 – also confirms our previous observation that account
91
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Type Example
1 "author":{"id":"6912191700166542342",...,
2 "uniqueId":"nature2admire",
3 "nickname":"user981414799249",...}
92
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Figure 2: TikTok app displaying the main screen for the @nature2admire account (left), and displaying
a video (right)
for a given video are defined by the owner of the video. In videos with music, while the video is being
displayed on the device, a music identifier with a resemblance to a vinyl disk is shown rotating at the
bottom right of the screen. By clicking on it, TikTok opens a view with a list of videos that use the same
music. This is yet another way to access videos that share a common property.
Besides posted videos, accounts that have 1 000 or more followers can perform live shows, that is,
streaming a real-time video feed. Followers can interact with the live feed, commenting, and sending gifts
in the form of emojis and stickers. These virtual gifts need to be purchased with TikTok’s own digital
currency named coins, which users can buy from within the TikTok application, as in-app purchases.
Gifts received by a TikTok account are converted into so-called diamonds that broadcasters can later
cash out to PayPal accounts, but only after having reached a threshold amount, which at the time of this
writing is $US 100 dollars [18]. In Section 5, we analyze the forensic artifacts related to TikTok’s virtual
currency and gift donation.
At the interaction level, TikTok allows a user to like videos and comment on them. Comments can
be text or videos. Obviously, a user can read/watch existing comments. By default, the video owner
receives notifications regarding new likes and new comments on his/her videos. The video owner can
disable comments for a given video. A user can follow another user, receiving a notification when new
videos are posted in the account being followed. The top-level of interaction in TikTok is friendship.
In TikTok parlance, friendship occurs when two accounts follow each other. Friends in TikTok can
exchange messages and thus communicate, although sending messages can only be done after the phone
number has been linked with the sender’s TikTok account.
TikTok enables two main levels of privacy: i) video-level and ii) account-level. At the video level, the
account owner can, at any time, set any of his/her own published videos as private, effectively forbidding
others to interact – watch, comment, like – the video. Account-level privacy has a much wider effect, as
only users approved by the owner account can interact with the account. This means that solely autho-
rized users can follow the account, watch/like/download posted videos, interact with live performances
93
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
and view the account information. In short, this effectively locks the account content to unauthorized
users. Since January 2021, accounts of TikTok users in the age range 13-15 years old are set as private
by default [32].
TikTok mobile application provides several features to assist and facilitate the creation of videos
with the smartphone. Examples of these features include templates, filters, visual effects coupled to a
vast library of music [13].
4 Forensic Artifacts
In this section, we analyze the main forensic artifacts of the Android TikTok app. First, we present ma-
terials and methods used to analyze the digital forensic artifacts. We then describe the forensic artifacts
collected from the smartphone by first looking at the public data of the TikTok app and then at the private
data of TikTok, that is, data that are solely accessible on rooted Android phones.
3 https://github.com/NickstaDB/SerializationDumper
4 https://frescolib.org
94
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
/
├── sdcard/ → /storage/self/primary/
├── mnt/
│ ├── sdcard/ → /storage/self/primary/
│ └── user/0/primary/ → /storage/emulated/0/
└── storage/
├── self/primary/ → /mnt/user/0/primary/
└── emulated/0/Android/data/com.zhiliaoapp.musically/
├── awemeSplashCache/
│ └── awemeJson/
├── bytedance/
├── cache/
│ ├── awemeCache/
│ ├── fonts/
│ ├── hashedimages/
│ ├── head/
│ ├── picture/
│ ├── prefs/
│ ├── tmpimages/
│ └── video/
├── liveSplashCache/
│ └── awemeJson/
└── splashCache/
Figure 3: TikTok’s app public storage directory hierarchy, including four symbolic links that can be used
as alternative paths to /storage/emulated/0/ and, therefore, reach the public storage. Directories
names end with “/”, while links are marked with “→” followed by the target directory.
itself as an image management library that complements Android’s image functionalities. The pictur
es directory has a further set of directories, numerically named, from 0 to 99. Inside, each of these
directories holds image files, either JPEG or WebP. These files are images loaded during TikTok app
usage and correspond to profile pictures and banners from the videos.
The prefs directory has a single JSON-formatted file, named local prefs.json. The file mostly
holds URL addresses related to TikTok APIs. Listing 2 shows a small extract of the local prefs.json
file. Besides the spdy reference seen in Listing 2, and which corresponds to HTTP/2, the file has also
references to the QUIC protocol (HTTP/3) and to the HTTPS interface of Google’s public DNS service,
dns.google.com.
Finally, awemeCache holds a set of files whose names correspond to TikTok video identifiers, such
as 6912215720931757317. Each file is identified by the Unix file command as Java serialization
data, version 5, and in fact the files can be decoded with the SerializationDumper tool. Within
the decoded data of a given file, one can found an URL of the form https://m.tiktok.com/v/vid
eoID.html, where videoID corresponds simultaneously to the file name and the ID of a TikTok video.
1 "network_qualities": {
2 "CAISCyJXaXJlZFNTSUQiGAM=": "4G",
3 "CAISCyJXaXJlZFNTSUQiGAQ=": "4G",
4 "CAYSABiAgICA+P////8B": "Offline"
5 }
95
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
/
└── data/
├── user/0/ → /data/data/
└── data/com.zhiliaoapp.musically/
├── app_librarian/
├── app_textures/
├── app_webview/
├── cache/
├── code_cache/
├── databases/
├── no_backup/
├── shared_prefs/
└── system_emoji_res/
Figure 4: TikTok’s app private storage directory hierarchy for Android’s main user, including a link that
can be used as alternative path.
Private data are only accessible on rooted Android phones and corresponds to the app private storage that
is kept in /data/data/com.zhiliaoapp.musically. The directory hierarchy, shown in Figure 4,
follows Android’s rules regarding directory naming for the private storage of applications. The most
interesting data from a digital forensic point of view are kept in three directories: databases, cache
and shared prefs. We now focus on each of these directories.
4.4 Databases
In the databases directory, there are 30 SQLite 3 databases. Another one – cookies – is located in
the app webview directory. The names shown in Table 5 correspond to databases that have relevant data
for digital forensic examinations, as well as their respective user\spaceversion (when it is defined).
The remaining datababes are listed in Table 6. Note that the database <userID> seen in the first entry
of Table 5 – <userID> im.db – needs to be replaced by the account ID that is configured in the app,
for example, 6912191700166542342 im.db for the nature2admire account. Moreover, there are as
many <userID> im.db databases as there are TikTok accounts configured in the Android app. Next,
we summarily describe the most relevant databases for forensic practitioners.
96
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
<userID> im.db 21
Cookies.db –
ss app log.db 10
db im xx 18
lib log queue.db 1
video.db 3
TIKTOK.db 1
Table 5: Forensically relevant SQLite 3 databases of TikTok existing in the database directory. Note
that there is one <userID> im.db file per configured TikTok account.
Filenames
Table 6: SQLite 3 databases of TikTok existing in the database directory that do not hold forensic
value.
4.4.1 Cookies.db
The Cookies.db database has only one meaningful table, named Cookies. This table holds the session
cookies of the app access to TikTok’s servers. The date/time fields of the table – creation utc,
expires utc and last access utc – are in a slightly modified format of the Microsoft Filetime
64 since they represent the number of tenths of nanoseconds and not hundredths of nanoseconds as the
original Microsoft format. The fields creation utc and last access utc are interesting since they
correspond to access by the app to TikTok’s social network. Examples of domains whose cookies are
kept in the Cookies table are byteoversea.com and musical.ly.
This database logs user’s interaction with the app, collecting analytics and monitoring data (e. g. network
speed). It has has seven tables: event, misc log, mon log, page, queue, session and succ rate.
The event table contains entries tagged with names such as click, enter page, video request,
and video play end, with each entry holding in the field ext json, a JSON string that carries data of
the event. The page table logs the user’s navigation within the views of the app. Data in the database
97
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
are organized around a common session id, with the session descriptor kept in session table. As
reported by Robert [26], data of the database are part of the periodic analytics sent in JSON-formatted
messages to TikTok servers. Data are deleted, possibly after being sent, as the amount of data kept in
the database is reduced and only covers a short timespan. However, the rollback file of the database
– ss app log.db-journal – holds data from the last performed transactions, providing meaningful
forensic data, as we shall see in Section 5.
As stated earlier, there is one <userID> im.db database per TikTok account configured in the smart-
phone, with userID corresponding to the account ID. Moreover, the database file persists even after the
user logs out of the account. This is expected, as the user might want, later on, to log in again into the
account. The userID im.db has great value for digital forensics, as it holds the messages exchanged
between the account linked to the database and other TikTok users. This database contains 14 tables.
Data regarding conversations are kept in three tables: conversation core, conversation list
and conversation settings, while messages are kept in the msg table. Specifically, msg table
holds the messages exchanged between userID and other TikTok’s users. The main fields of msg table
are shown in Table 7. Each entry in the table represents an exchanged message. Deleted messages are
kept in the table, with the field deleted set to 1. Likewise, the table keeps track whether a message
was read or not, with the field read status set to 1 for unread messages. Furthermore, data in msg
allow to retrace messages exchanged between two TikTok users, each of them identified by the text field
conversation id that has the following format: 0:1:ID1:ID2, where ID1 and ID2 are the TikTok
IDs of the users engaged in the message, while the field sender keeps the TikTok ID of the message’s
sender. The field type indicates the type of the message through an integer field. Table 8 lists the values
and their meanings that were identified in our testing environment for this field. A msg record has a self
explanatory created time, formatted as an Unix Epoch timestamp. Interestingly, two other fields –
index in conversation and order index – use Unix Epoch values. Both resort to Unix timestamp
up to milliseconds, but the latter also adds three index digits to the right (e. g. 001, 002,...). This way, the
value is not only a timestamp but also an index.
98
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Value Meaning
5 animated gif
7 text
8 video
15 animated gif located in a remote server (e. g., giphy.com or tenor.com)
19 for hashtag
22 audio
25 profile
4.4.4 db im xx
The database db im xx encompasses four tables, but only the table SIMPLE USER provides meaningful
data. This table has one record per user with whom userID has interacted. Each user is identified
through his/her respective TikTok account ID in the field UID. Besides UID, the most relevant data are
the NICK NAME, the AVATAR THUMB which holds JSON-formated content of the user’s avatar including
an URL to the thumbnail, UNIQUE ID which represents the name handle, and FOLLOW STATUS. The
FOLLOW STATUS field stores the relationship between userID and the listed user, as follows: 0 does not
follow but can be followed by userID; 1 follows; 2 follows and is followed, that is a friend in TikTok’s
parlance.
4.4.6 video.db
The video.db database logs the HTTPS interaction between the app and TikTok’s video repositories,
as the name of the only table of the database expresses: video http header t. Each row of the table
keeps track of a video. One of the fields of the table is key, which holds unique values represented in
32-character hexadecimal (128 bits, e. g. A76D7A943A1185919B8DAD308CC918BB), that might be an
MD5 checksum. The name kept in the key field is also used as the filename of the video if it exists in
the video cache described in Section 4.2.
Other relevant fields are 1) mime which represents the MIME type of the video (e. g., “video/ MP4”),
2) contentLength which holds the size in bytes of the video, and 3) extra which is a JSON-formatted
string, itself with three fields: requestUrl, requestHeaders and responseHeaders. These three
fields correspond to traditional HTTP protocol elements, respectively, the URL, the request header and
the response header. The content response header can provide some useful data, namely the UTC-based
date/time of the HTTP server that has sent the response. Although the URL is not accessible outside of
99
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
the app, as the URL requires proper authentication, the video might still be present in the video cache of
the app. This can be assessed by checking, in the video cache directory, for the existence of a file whose
name matches the key field.
4.4.7 TIKTOK.db
This database holds the table app open, which has only a single field: open time. This field records, in
a Unix Epoch format, the date/time whenever the TikTok app is launched.
4.5.1 cacheV2
The cachev2 directory stores videos in files whose names have mdl extension. Videos players such as
Windows Media Player and VLC fail to decode the files, reporting a format error. The name of certain
files such as v09044a30000 h264 540p 310461.mdl and v09044190000bre h265 720p 945952.mdl
suggests that they are H264 and H265 encoded video files [17]. This is confirmed by inspecting the files,
as they contain markers of the MP4 container format (e. g., mp41 for H264 and mp42 for H265) and
structures of the respective codec (e.g., avc for H264 and hvc1 for H265) [15]. For some of the mdl
files, there are files with the same name, but with extension mdlnodeconf in place of mdl. These
mdlnodeconf files are no larger than 400 bytes and hold content compatible with MP4/H264 headers,
namely the isomiso2mp41 descriptor [15]. Oddly, some of the mdlnodeconf exist but are empty (0
bytes), while some mdl files have no matching mdlnodeconf file.
Nonetheless, the files lack a proper header and thus cannot be watched in a video player, unless the
header is fixed. A quick workaround is to overwrite the leading zero bytes of the mdl files with an MP4
file header, for either an H264 or H265 video, depending on the format of the mdl file. Figure 5 shows
the 64 bytes used to patch an H265 video.
Offset(h) 0001 0203 0405 0607 0809 0A0B 0C0D 0E0F Decoded text
00000000: 0000 001c 6674 7970 6973 6f6d 0000 0200 ....ftypisom....
00000010: 6973 6f6d 6973 6f32 6d70 3431 0000 6ffa isomiso2mp41..o.
00000020: 6d6f 6f76 0000 006c 6d76 6864 0000 0000 moov...lmvhd....
00000030: 0000 0000 0000 0000 0000 03e8 0000 8127 ...............'
The cache directory host two more directories with cache functions: awemeCache and feedCach
e. The former has the same content that the same-named directory that exists in the publicdata
(Section 4.2). The feedCache directory holds ready to play, that is, properly formatted MP4 files,
whose name corresponds to TikTok videoID (e. g. 6912215720931757317). As the name of the
directory suggests, these files are part of TikTok’s feed and are the next in line to be shown to the user.
100
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
This file holds some data associated to logging. Data include some TikTok’s identifiers (device id and
install id), plus several date/time related fields, such as send fingerprint time, last config t
ime and app log last config time All date/time fields are expressed in Unix Epoch milliseconds
format. The file also references several addresses of the pstatp.com domain (e. g., p5.pstatp.com).
Regarding information of the network, we found two entries with network identifiers, namely BSSID
and SSID: npth privacy detection dynamic wifi bssid error and npth privacy detecti
on dynamic wifi ssid error, both followed with the value 1. Intrigued by the presence of these
entries, we analyzed a dump of the same smartphone made when it had TikTok version 16.0.41 [12].
We found out that the file applog stats.xml the older version held two entries related to the BSSID
(Basic Service Set IDentifiers): last wifi bssid and last check bssid time. The first entry held
the correct BSSID of the WiFi access point, while the latter held, again in Unix Epoch milliseconds
format, the last date/time the BSSID was collected. Moreover, when we updated TikTok from version
16.0.41 to 18.1.3, the XML file kept these two entries, with the same values, adding the two error ones.
While the entries last wifi bssid and last check bssid time still exist when TikTok is upgraded
from a previous version, a fresh installation of TikTok confirms that the BSSID is no longer collected by
the TikTok app. Note that the practice of collecting network identifiers has been heavily criticized in the
past [23].
This file holds data regarding TikTok’s accounts which are configured in the smartphone. To preserve
space, a shortened list of main data is given in Table 10. Note that contrary to previous versions [12], the
file no longer keeps the whole email address, instead only the first and last letter of the username plus
the full domain name are kept in the file. Moreover, only four digits plus the international phone code
are kept, and, obviously, only for users that have linked their phone number to TikTok. The file contains
a set of entries per each user whose account is configured in the application. The aweme user.xml file
should definitely be consulted in a digital forensic examination. Other relevant fields include whether
101
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Name Comment
the TikTok account is linked to Facebook, Twitter, Weibo, Youtube, or Instagram, and the number of
followers, friends and of followed accounts.
4.6.3 LoginSharePreferences.xml
This file registers data linked to the current session of the user in the TikTok network. Specifically, for
an email-based login, that is, the user employed his/her email address to log in TikTok, the file holds
the full email address. For a login realized with the phone number, the file keeps the full phone number.
Additionally, the file holds in the field expires the date/time expiration, in human format (e.g., Jan 10,
2021 12:49:48), of the session. The file also keeps the userID.
4.6.5 search.xml
As the name suggests, the XML file search.xml keeps track of the recent searches performed by the
app’s user. The history of recent searches are kept in the XML file under the entity recent history v2.
Note that the XML file solely holds the string searched: no results and no timestamps are kept, although
each entry identifies the account ID that performed the search.
102
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
103
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Listing 4: Partial content of the psdkmon v2.db-journal file, corresponding to the in-app purchase of
70 coins.
1 {
2 "nt":4,
3 "user_id":6910000000000000021,
4 "---omitted for brevity---":...,
5 "event":"pay_callback_event",
6 "params":{
7 "result_code":0,
8 "result_detail_code":0,
9 "result_message":"pay success in QueryOrderStateCallback.",
10 "pay_type":"NOMAL",
11 "product_id":"com.zhiliaoapp.musical.iap.coins.v2.100",
12 "request_id":"10000016940000000000000065",
13 "user_id":"6910000000000000021",
14 "timestamp":1617822494000
15 },
16 "event_id":11825,
17 "tea_event_index":70,
18 "local_time_ms":1617822494000,
19 "session_id":"00000000-0000-0000-0000-000000000000",
20 "datetime":"2021-04-07 20:08:14"
21 }
Listing 5: Partial content of the ss app log.db-journal file where recharge package correspond
to the amount of purchased coins (line 8). Fields user id, local time ms, and session id were
anonymized by replacing original values with zeros.
1 {
2 "nt":4,
3 "user_id":6910000000000000021,
4 "---omitted for brevity---":...,
5 "event":"livesdk_recharge_pay",
6 "params":{
7 "recharge_package":"70",
8 "request_page":"my_profile",
9 "pay_method":"Google Pay"
10 },
11 "event_id":11821,
12 "tea_event_index":66,
13 "local_time_ms":1617822466000,
14 "session_id":"00000000-0000-0000-0000-000000000000",
15 "datetime":"2021-04-07 20:07:46"
16 }
some basic TikTok usage, the purchase artifacts were no longer found.
104
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Table 11: Some examples of gifts’ names, IDs and cost in TikTok coins.
Listing 6: Extract of the databases/ss app log.db-journal file. The livesdk gift send click
event registers gifts sent to another user. The gift id is 5655, which corresponds to a Rose, worth
1 TikTok coin. The anchor id identifies the receiving account, while actual room id is the live
session’s ID. Fields user id, room id, anchor id, and session id were anonymized with zeros.
1 {
2 "nt":4,
3 "user_id":6910000000000000021,
4 "---omitted for brevity---":...,
5 "event":"livesdk_gift_send_click",
6 "params":{
7 "room_id":"6950000000000000038",
8 "live_window_mode":"live_big_picture",
9 "action_type":"click",
10 "---omitted for brevity---":...,
11 "live_type":"video_live",
12 "---omitted for brevity---":...,
13 "anchor_id":"6860000000000000065",
14 "gift_id":"5655",
15 "---omitted for brevity---":...,
16 "actual_room_id":"6950000000000000038",
17 "---omitted for brevity---":...,
18 "gift_type":"single_gift",
19 "---omitted for brevity---":...,
20 "gift_cnt":"1"
21 },
22 "---omitted for brevity---":...,
23 "local_time_ms":1618584189000,
24 "session_id":"00000000-0000-0000-0000-000000000000",
25 "datetime":"2021-04-16 15:43:09"
26 }
took place; iii) anchor id which is ID of the live session’s performer and thus the TikTok account that
receives the gift; and iv) gift id that identifies the donated gift, which is 5655 corresponding to a Rose.
Except for the gift id, all others are TikTok’s IDs that follow the 32-bit + 32-bit format.
105
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Listing 7: Extract of the databases/ss app log.db-journal file that shows the event triggered by
the end of a live session performed by anchor id. IDs are filled with zeros for privacy issues.
1 {
2 "nt":4,
3 "user_id":6910000000000000021,
4 "---omitted for brevity---":...,
5 "event":"livesdk_live_end_duration",
6 "params":{
7 "room_id":"6940000000000000078",
8 "duration":"31900",
9 "anchor_id":"6910000000000000007",
10 "room_orientation":"0",
11 "follow_status":"0",
12 "sdk_version":"1800",
13 "live_type":"video_live",
14 "---omitted for brevity---":...,
15 },
16 "event_id":13286,
17 "tea_event_index":417,
18 "local_time_ms":1617990743000,
19 "session_id":"00000000-0000-0000-0000-000000000000",
20 "datetime":"2021-04-09 18:52:23"
21 }
are worth 100 USD or plus. The cash-out is performed through a PayPal account. In our experiments,
we could not assess forensic evidence of cash-out operations due to the 100 USD threshold needed to
trigger a payment.
We did not find evidence about gifts received within a live session. However, we found artifacts left
when the user performs a live session. Listing 7 shows the livesdk live end duration event, which
was found in the databases/ss app log.db-journal file. The most relevant fields are TikTok’s
ID room id and anchor id, while duration measures the duration of the live session in hundredths
of seconds. Specifically, room id identifies the live session, with the highest 32 bits corresponding to
the date/time creation of the live session, while anchor id identifies the account that performed the live
session.
6 FAMA
The FAMA (Forensic Analysis for Mobile Apps) is a modular digital forensic framework. We developed
FAMA to help digital practitioners to deal with the numerous mobile applications for Android. Indeed,
Google Play hosts millions of Android applications, all of them having their own specificity, namely
databases, configuration files, and logs, to name just the more relevant forensic data sources. Even,
106
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
keyword Description
Table 12: Keywords to search for on a TikTok forensic analysis about users interaction, coins purchase
and gifting.
if a reduced number of apps are really relevant to forensic investigations, developing software, from
a zero base, extracting and processing digital forensic artifacts is cumbersome and time-consuming.
The goals of FAMA is to facilitate both i) the extraction of data from Android smartphone and ii) the
forensic analysis of the extracted data. For this purpose, FAMA supports data collection via ADB and
the execution of tailored scripts for apps. To ease the development of those scripts, FAMA provides
a framework with several coding facilities to interact with SQLite3 databases, XML files, geolocation
coordinates, multimedia files, and so on. FAMA also supports the creation of a timeline, a handy tool for
forensic analysis.
When faced with a yet not supported app, one can quickly develop support for the app within FAMA.
FAMA has been developed by our team in Python, runs in the three major platforms – Windows, Linux,
and macOS – and is available as open source [21].
Regarding data extraction, FAMA can work with non-rooted Android phones, although extracted
data will be limited to the publicly available, thus missing important elements, as shown earlier for
TikTok.
5 https://www.autopsy.com/
107
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Figure 6: Messages exchanged within the TikTok network between two accounts
droid Analyzer, implementing the required classes and methods. However, functionalities are limited
as we experimented in previous work when we developed a TikTok module fitted to Autopsy’s Android
Analyzer [12].
FAMA provides code models and a set of Python classes that smoothen the development of FAMA’s
modules, based on our experience of developing Autopsy and Android Analyzer’s modules. Moreover,
while FAMA can be run within a regular terminal, FAMA can also be integrated within Autopsy through
a module. For this purpose, the framework offers three main components: i) Data source processor, ii)
Ingest, and iii) Report. Details about FAMA are available at FAMA’s GitHub repository6 .
6 https://github.com/labcif/FAMA
108
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
7 https://github.com/sleuthkit/autopsy/pull/6027
109
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Figure 10: FAMA report showing analyzed TikTok’s account list of Published videos
110
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Category: Log
session_id body action time
123 {"ab_sdk_version":"50017293,50001140,50019350,1472623"} "personal_homepage_profile_status" 1607796159
123 {"duration":"1820","author_id":"","ab_sdk_version":"50017293,50001140,50019350,1472623","page_uid":""} "stay_time" 1607796159
123 {"ab_sdk_version":"50017293,50001140,50019350,1472623"} "livesdk_topview_show_failed" 1607796159
123 {"ab_sdk_version":"50017293,50001140,50019350,1472623"} "splash_ad" 1607796159
123 {"ab_sdk_version":"50017293,50001140,50019350,1472623"} "discovery_video_play" 1607796161
123 {"is_first":"0","duration":"0","ab_sdk_version":"50017293,50001140,50019350,1472623"} "video_request" 1607796161
123 {"is_first":"0","duration":"0","ab_sdk_version":"50017293,50001140,50019350,1472623"} "video_request" 1607796161
123 {} "click" 1607796163
123 {"duration":"3236","ab_sdk_version":"50017293,50001140,50019350,1472623"} "video_request_leave" 1607796164
123
Figure 11: Partial view of FAMA’s generated report for TikTok example case
{"video_duration":51662,"access":"WIFI","video_quality":14,"play_duration":0,"ab_sdk_version":"50017293,50001140,50019350,1472623"} "video_play_end" 1607796164
123 {"ab_sdk_version":"50017293,50001140,50019350,1472623"} "livesdk_topview_show_failed" 1607796164
123 {"ab_sdk_version":"50017293,50001140,50019350,1472623"} "personal_homepage_profile_status" 1607796164
123 {} "banner_show" 1607796164
successful
123 execution, data are extracted and processed from the archive(s) "banner_show"
{"ab_sdk_version":"50017293,50001140,50019350,1472623"} and then imported1607796164into Au-
123 {"ab_sdk_version":"50017293,50001140,50019350,1472623"} "splash_ad" 1607796164
topsy,
123
as shown in Figure 9. As can be observed from Figure 9, extracted and
{"duration":"909","author_id":"","ab_sdk_version":"50017293,50001140,50019350,1472623","page_uid":""}
processed data are
"stay_time"
shown
1607796165
with123the Extracted
{} Content tree of the Autopsy Interface. The artifacts can be browsed, with
"homepage" details
1607796165
123 {"page":"homepage_hot","ab_sdk_version":"50017293,50001140,50019350,1472623"} "enter_page" 1607796165
being
123
shown on the right side of Autopsy interface. A note about the entry “TikTok:
{"ab_sdk_version":"50017293,50001140,50019350,1472623"}
Deleted rows
"homepage_hot_click"
(98)”
1607796165
visible in the Extracted Content tree: it encompasses data recovered from SQLite 3 database resort-
ing to DeGrazia’s tool to recover SQLite records [11]. This functionality of recovering SQLite data is
integrated within FAMA and can be used by other modules.
Figure 6 displays a graph highlighting the messages exchanged between the analyzed TikTok account
and another TikTok account. On the right side, one can see that 39 messages were exchanged between
the two accounts. Note that the interface – Communication Editor – is the one made available by Autopsy
for easily showing calls and messages (SMS, emails, etc) between users in a case. This is the reason why
there is an (empty) Call Logs entry in the right side of the interface.
FAMA can also generate a set of reports in HTML format. The reports are dynamic, meaning that
the user can navigate and access several elements. The front end HTML page for the TikTok module is
shown in Figure 7, where it informs that there are 14 artifacts. The core of the HTML report is shown
in Figure 8, with the Timeline view selected. The possible views – Timeline, Map, Media and PDF –
are selected at the top of the page. In the vertical left bar, one can select the type of artifacts to be shown
in the main window. For example, by selecting Published videos, a list of the published videos is
displayed as shown in Figure 10. Finally, a very partial and cropped view of the PDF report generated
for the TikTok example case through the vertical bar option PDF is shown in Figure 11
7 Conclusion
TikTok has quickly become an important social network, installed in several millions of smartphones,
with a predominance in the teenage population. It is thus important to study data and information that
can be gathered in digital forensic analysis. For this purpose, this work studied TikTok for Android,
analyzing both the public and private parts of the mobile application.
The number of SQLite 3 databases – 31 – was surprising, although understandable from a devel-
oper perspective, has the app integrated a previous product –musical.ly –, and has evolved along 18
111
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
versions.
From the digital forensic perspective, most of the relevant data are located in the private storage
area of the app, which is only accessible in a rooted smartphone. From this private area, important
data for the configured account in the examined smartphone can be gathered. Examples include the
last watched videos, list of friends in TikTok network, followers, followed accounts, and the messages
exchanged between friends. Also of relevance is the simple yet useful algorithm employed by TikTok
to generate most of its 64-bit numerical identifiers, such as install id, user id and video id: the
most significant 32 bits corresponds to the date/time timestamp, in Unix format, of the ID creation, and
as we have identified, patterns exist in the least significant 32 bits that allows to quickly identify whether
a TikTok ID corresponds to an account, a video, a live session and so on. This way, for example, one can
easily determine when a video was posted to the network, when an account was created, or when a live
session was launched. This can be valuable for a digital forensic investigation.
Regarding personal data, it was possible to extract partially email address or phone number, depend-
ing on which one was used to create the account, or for the phone number, whether the user had posted
comments on TikTok network, as posting comments requires linking the phone number to the TikTok
network. Additionally, for users logged in TikTok with their email address, the full email address is
recorded in TikTok’s private LoginSharePreferences.xml file. The same occurs when the phone
number is used as a login identifier. Other data included phone hardware and OS, time zone, network
type (WiFi, 4G, etc.) and speed, country, and whether other social network accounts were linked to the
account. Our study also looked for artifacts left by interactions with TikTok’s coins and digital gifts.
Although some artifacts were recovered, none were kept in persistence files, as all artifacts existed in
SQLite 3 -journal files, which are by definition, ephemeral and short-lived. Nonetheless, we provide
a set of coin-related keywords which can be looked up in TikTok’s data to detect forensic artifacts.
To ease the extraction and analysis of Android forensic artifacts, we provide the open-source FAMA
framework. Coupled with the TikTok module, FAMA analyzes and allows to generate different types of
reports on TikTok artifacts that it detects on an Android smartphone.
As future work, we plan to continue to develop and adapt FAMA to the new versions of TikTok.
Additionally, we aim to introduce within FAMA the concept of versions, so that a given module will
be able to target a precise version of the app. Indeed, from our experience gathered with TikTok, we
observed that the app may change quite substantially between versions, at least from the digital forensic
point of view. Finally, we also aim to study the interaction of TikTok app with the network.
Acknowledgments
This work was partially supported by CIIC under the FCT/MCTES project UIDB/CEC/4524/2020, and
EU funds under the project UIDB/EEA/50008/2020.
References
[1] K. E. Anderson. Getting acquainted with social networks and apps: it is time to talk about TikTok. Library
Hi Tech News, 37(4):7–12, February 2020.
[2] BBC. India bans TikTok, WeChat and dozens more Chinese apps, June 2020.
https://www.bbc.com/news/technology-53225720 [Online; accessed on September 15, 2021].
[3] BBC. TikTok and WeChat: US to ban app downloads in 48 hours, September 2020.
https://www.bbc.com/news/technology-54205231 [Online; accessed on September 15, 2021].
[4] BBC. TikTok faces legal action from 12-year-old girl in England, December 2020.
https://www.bbc.com/news/technology-55497350 [Online; accessed on September 15, 2021].
112
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
113
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
——————————————————————————
Authors Biography
Patricio Domingues holds a PhD (2009) in Informatics Engineering from the Uni-
versity of Coimbra, Portugal. He is currently an associate professor at the School of
Technology and Management of the Polytechnic of Leiria, Portugal, where he teaches
in the BSc in Informatics Engineering and in the Master of Cybersecurity and Digital
Forensics. His main research interests include digital forensics, high performance,
and many-core computing.
Ruben Nogueira is currently completing his BSc degree at the School of Manage-
ment of the Polytechnic of Leiria, Portugal. He is currently working as a Full-Stack
Web Developer and his main research interests include digital forensics, pentesting
and cybersecurity.
114
TikTok: Digital Forensics Domingues, Nogueira, Francisco and Frade
Miguel Frade holds a PhD (2012) in Informatics Engineering from the University of
Extremadura, Spain. He is currently an adjunct professor at the School of Technology
and Management of the Polytechnic of Leiria, Portugal, where he teaches in the BSc
in Informatics Engineering, and in the Master of Cybersecurity and Digital Forensics.
His main research interests include digital forensics, cybersecurity, and cryptography.
115