Effectiveness of DNS
Effectiveness of DNS
ABSTRACT of misused domains, e.g., disposable domains [12], causes the in-
DNS cache plays a critical role in domain name resolution, pro- effectiveness of caching on resolvers since the cache is filled by
viding (1) high scalability at Root and Top-level-domain (TLD) records with very low or almost zero cache hit rates.
name servers with reduced workloads and (2) low response latency In this paper, we attempt to mitigate the negative effect on DNS
to clients when the resource records of the queried domains are caching caused by the domain name misuses, especially the “one-
cached. However, the pervasive misuses of domain names, e.g., time-use” domains. Different from previous approaches, we do not
the domains of “one-time-use” pattern, have a negative impact on pursue an accurate detection of domain misuses by employing the
the effectiveness of DNS caching because the cache has been filled deep inspection techniques, such as behavioral features [10], al-
with entries that are highly unlikely to be retrieved. In this paper, phanumeric characters-based metric [26], or entropy-based com-
we investigate such misuse and identify domain name-based fea- puting [21]. Instead, our key insight is that since most misused
tures to characterize those one-time domains. By leveraging the domains, either malicious or benign, tend to transmit information
features that are explicitly available from the domain name itself, over DNS query names, the domain name itself may have distinct
we build a classifier to combine these features, propose simple pol- features that are explicitly available from individual queries and can
icy modifications on caching resolvers for improving DNS cache be readily exploited for improving DNS cache performance.
performance, and validate their efficacy using real traces. Based on DNS trace logs captured in the resolvers of campus net-
works, we extract the re-used and once-used RRs. The reused RRs
CCS Concepts indicate that the queried domains are retrieved for multiple times,
and the once-used RRs only appeared once in one trace. By ana-
•Networks → Network measurement; Network protocols; lyzing a large amount of once-used entries, we observe that several
explicit domain name-based features are capable of characterizing
1. INTRODUCTION the reusability of domains. As such, we propose modified caching
As one of the most important components of the Internet, the Do- behavior to enhance the effectiveness of DNS caching by prelimi-
main Name System (DNS) provides vital mapping service for Inter- narily excluding unreusable RRs. In order to validate their capabil-
net users by translating domain names to IP addresses. Since DNS ity, we quantify the statistical properties of each feature and build
is a globally distributed database system, caching has been widely a classifier that combines those features. The classification results
adopted in DNS infrastructures, where the acquired mapping re- demonstrate that the proposed modifications are able to prevent ap-
sults (i.e., the DNS resource records, RRs) will be cached locally proximately 85% of once-used RRs from being cached while only
to answer the following queries in a specific duration. DNS cache less than 1% of reusable RRs are mistakenly kept out of the cache.
significantly reduces the resolution traffic along referral chains to The remainder of this paper is organized as follows. We intro-
interact with multiple name servers, resulting in a much shorter duce the background of DNS caching and disposable domains in
client-perceived delay and high scalability of DNS. §2. We present the proposed domain name-based features on DNS
Due to its fundamental role for accessing Internet services, DNS caching in §3. We analyze collected datasets and build a classifier
traffic is the least blocked [21], and provides both attackers and de- to validate the features in §4, and conduct a trace-driven evaluation
velopers with an attractive channel to transmit information. Thus, in §5. We survey related work in §6 and conclude the paper in §7.
the misuse of domain names (either malicious or non-malicious)
is widely observed on the Internet. On the other hand, since the 2. BACKGROUND
cached objects in DNS resolvers are typically small, in some in-
stances caches are not size-limited [17] and memory usage is rel- 2.1 DNS Caching
atively stable as the expired entries are being evicted. However, Recursive DNS resolvers retrieve the name resolution results for
when serving a large group of users with a heavy load (e.g., in ISP clients and cache the received responses to answer the following
or CDN/cloud providers), although modern DNS resolvers manage queries. The duration that the cached records would be valid is
the memory well, they will still quickly consume the memory bytes specified by a time-to-live (TTL) value.
and go into swap. Meanwhile, this may also cause performance In standard TTL-based caching, the TTL value is set and handed
problems on CPU if cleaning-interval is enabled to periodically out by the administrator of authoritative DNS record, and the cached
check the stale records. To this effect, a fixed memory allocation is entries are expunged after their TTLs expire. The duration for
a common configuration [12], and the typical replacement policies caching a negative response (e.g., NXDOMAIN, NODATA, etc.) is
(e.g., LRU and LFU) are employed to manage the cache usage [13, given by the TTL value of SOA record [4]. While TTL-aging-based
14]. Therefore, it is critical to ensure that the cached RRs would behavior is legible, the violation of TTL is observed pervasively
be likely to be accessed again. Unfortunately, the pervasiveness both in modern web browsers and DNS infrastructures [11]. For
CDF
CDF
2.2 Disposable Domains 0.4 0.4
all queries all queries
The use of DNS in various ways for which it was not originally 0.2 distinct names 0.2 distinct names
designed has been observed for many years. For example, DNS is 0 0
1 50 100 150 200 250 1 10 20 30 35
exploited as an effective covert channel for surreptitious commu-
nications [21, 25]. Moreover, Chen et al. [12] studied disposable (a) Query Name Length (b) Subdomain Depth
domains, a more generic class of domain misuse where the query
Figure 1: Example of distribution for lengths and depths.
names are adopted to convey the “one-time signals.” These do-
mains are not necessarily malicious and are observed pervasively
from various types of service providers, including popular search • F3-a: Subdomain depth: To report information over DNS,
engines, social networks, CDNs, and security companies, and have the implications of the domain names must be easily recog-
being increased to a significant portion of queried domains on the nized by the receivers. In doing so, the domain separator,
Internet. Due to the “one-time-use” pattern and the increasing use i.e., the period “.”, would naturally be employed to format
of such domains, the DNS cache would be filled with entries with the domains to give the name strings meaningful informa-
near-zero hit rates. Our work is mainly built on the analysis of tion, resulting in a deep subdomain level, i.e., a large number
these entries for disposable domains. of subdomains in one query name.
• F3-b: Number of format fields: Like the period specifying
3. DOMAIN NAME-BASED FEATURES the subdomain level, we also observed that the hyphen “-”
Domain names are human-readable and easy-to-remember char- is widely used as field separator to format messages in one
acter sets. However, the once-used domains exploit the query names subdomain name.
as a communication channel. One of our insights is that such mis- Due to the function similarity, in this paper we treat both periods
used domains are encoded automatically to convey formatted in- (F3-a) and hyphens (F3-b) as equal format separators and em-
formation and should have significantly different patterns on their ploy the same term, “number of format fields” (F3), to represent
domain names. the number of strings separated by either “.” or “-”.
Therefore, we consider the possibility of characterizing the once-
used domains, and then exploit the derived features to filter out the • F4: Number of fields with unusual lengths: To represent
disposable domains. The removal of such once-used domains from various pre-defined types of information inserted in speci-
the DNS cache will improve its performance because the pervasive fied positions, the length of format strings would vary widely,
use of misused domains causes the DNS cache to be occupied by and many fields would be either unusually long or unusually
those entries that are highly unlikely to be reused. Figure 1 presents short. We consider such domains quite hard to reuse. Thus,
the preliminary examples that motivate our feature selection. We we define a metric of the sum of the number of long-format-
simply plot the distributions of the query name length and the sub- fields (L-FF) and short-format-fields (S-FF) within one query
domain depth (i.e., the number of subdomains), respectively, for all name to identify such a feature.
observed query names and distinct domain names from one of our Figure 2 shows the sample of domains with the explicit features.
trace logs (§4.1). It is evident that (1) most repeatedly appeared Note that these features, e.g., the length of query names, do not
domains have a short name and limited subdomain depth, and (2) a necessarily indicate malicious purposes [21]; in fact, most of them
significant portion of domains have a long query name and a large are benign. However, they indeed indicate the usage of “signaling”
number of subdomains. This implies that, under a limited mem- and thus imply the high possibility of a “one-time-use” pattern. Ac-
ory space, discarding those entries with long names or deep subdo- cordingly, there would be a strong correlation between each feature
mains would save more space in the cache to store the entries with and once-used RRs in the cache. By exploiting these features, we
a higher possibility of being reused, and thus effectively improving can revise the caching policies to proactively prevent those RRs
the caching effectiveness. that are less likely to be reused from being cached, such that the
Based on the analysis of large amounts of once-used domains, effectiveness of DNS caching could be significantly improved.
we identified that the domain name-based features, such as the two Methodology. For the proposed features, we characterize the
features above, are able to characterize the caching behavior of do- properties of re-used and once-used domains, train a classifier to
mains without the help of sophisticated features used in the de- classify the entries, and conduct a trace-driven simulation to vali-
tection of malicious domains (e.g., the behavioral features in EX- date their efficacy in caching. In the feature validation (§4.2) and
POSURE [10] and statistical features in Notos [6]). As a result, classification (§4.3), the analysis simply relies on domain names
we propose the following features and explain why they may af- and assumes implicitly that both the cache size and TTL values are
fect caching effectiveness. All but the first one have not yet been unlimited. This assumption simply creates an ideal scenario for
characterized in the analysis of DNS, and none of them have been caching RRs, and cache hits are not limited by the cache size and
studied in the context of caching performance. TTL values. The simulation (§5.2) runs within a resolver program
• F1: Length of query name: Since most of the once-used that caches the entries according to the classification results and
domains tend to send messages over DNS queries, those do- common practices on modern DNS resolvers.
mains naturally have (much) longer query names to pack as
much information as possible and are hardly reusable. 4. MEASUREMENT ANALYSIS
• F2: Length of the longest subdomain name: Similar to the
query name, the individual lower-level name (i.e., the string 4.1 Dataset
representing a subdomain) could also be larger than a legiti- The datasets used in the study are the trace logs of outgoing DNS
mate subdomain name that tends to be “easy-to-remember.” queries captured at local DNS servers at the College of William and
Mary (WM) and the University of Delaware (UD) over a period of takenly kept out of the cache while 32.58% of once-used domains
two weeks. We summarize the datasets in Table 1. (25.53% of RRs in total) are dropped. Such results indicate that
The trace logs from campus vantage points have two limitations. rejecting domains with large names would significantly reduce the
First, the traces may not reflect the dynamics of domain names waste of cache space but keep the cache hit ratio at the same level.
observed at ISP’s DNS servers. However, due to similar patterns F2: Length of the Longest Subdomain Name. Figure 3(b)
of disposable domains from a large-scale ISP dataset reported in demonstrates that the large strings are widely adopted in once-used
[12], we believe that the proposed policies would also be effec- domains. Specifically, with each dataset, we identify 3.03% and
tive in ISP’s DNS cache, especially due to the heavy load on their 1.51% of re-used entries, as well as 73.40% and 68.05% of once-
resolvers. Also, we removed the RR entries that retrieve local do- used entries, include one subdomain with the length of more than
mains. Second, the length of our trace logs is limited. We believe, 20 bytes, respectively. If we increase the threshold to 30 bytes,
however, they are still capable of demonstrating the typical cache the fractions of re-used domains will decline to 0.39% and 0.30%
usage patterns of local resolvers because the origin TTL values of while the fractions of once-used domains remain at 69.86% and
A/AAAA records are typically shorter than one day [15]. 57.22%, respectively. Thus, a subdomain name length greater than
30 bytes would strongly indicate that the domain is once-used, with
4.2 Feature Validation little chance of being a useful entry if cached.
Given the heuristics presented in §3, we validate our speculation
F3: Number of Format Fields. Figure 3(c) presents the dis-
that these explicit domain name-based features will help to improve
tribution of the total number of format fields. It is easy to identify
the effectiveness of DNS caching. From Table 1, we can see that
that a threshold of 10 is capable of distinguishing the reusability for
86.15% and 78.63% of queried domains appeared only once in each
each class of domains. Using this threshold, we can exclude 0.59%
dataset, which is similar to the results of identified disposable do-
and 0.79% of re-used entries, and 31.17% and 42.39% of once-used
mains [12]. In our study, however, we do not attempt to achieve
domains, respectively, from each dataset; overall, around 25% and
high detection accuracy in identifying such a class of domains. In-
37% of entries would be discarded from the cache, respectively.
stead, we focus on exploring the efficacy of proposed features on
improving the overall cache performance by avoiding a large num- F4: Total number of L-FF and S-FF. In order to profile the
ber of once-used records being filled in the cache. total number of the long-format-fields (L-FF) and the short-format-
To validate each proposed feature based on its factual caching fields (S-FF), we first empirically determine the specific values of
effect, we tentatively derive a threshold to measure the fractions of the length to define the L-FF and S-FF. Since most of TLDs include
excluded domains. We then leverage a learning module to train our one or two fields with two or three characters, we define the S-
datasets and build a classifier that combines the proposed features. FF as the fields with characters less than or equal to three.1 Also,
we investigate the distributions of F4 by varying the length of L-
F1: Length of Query Name. Figure 3(a) shows the distributions
FF, and observe that a length of 10 is sufficient to demonstrate the
of the query name length in which the two classes of domains are
distinct statistical properties for this feature.
clearly distinguishable from one another. The majority of once-
Figure 3(d) shows the distribution of the sum of L-FF (>10
used domains apparently have much longer name lengths than re-
bytes) and S-FF (63 bytes). With a clear threshold observed at
used domains.
five, we identify that 0.61% of re-used and 70.03% of once-used
To exclude the useless RRs, we start to consider a tentative thresh-
domains in WM’s dataset (60.33% of RRs in total), and 0.40% of re-
old at 50 bytes of query name length. In the WM dataset, a thresh-
used and 74.55% of once-used domains in UD’s dataset (63.97% of
old of 50 bytes would exclude 81.10% of once-used domains and
RRs in total), would be discarded. As a result, this would exclude
1.61% of re-used domain, resulting in 69.87% of overall entries
being rejected from the cache. By gradually raising the threshold, 1
A more accurate approach may need to exclude the TLDs (e.g., .com and
at the length of 100 bytes we observe that 0.027% of reusable do- .co.uk) and SLDs (second-level domains, such as msn and cnn) since
mains are mistakenly dropped while 44.99% of once-used domains they may be regarded as the S-FF. However, we observe that checking the
(38.77% of RRs in total) are discarded. Similarly in the UD set, entire domain names has already produced effective results, so we decide to
with the length of 100 bytes, 0.019% of re-used entries are mis- use a simple way to avoid introducing additional steps to identify the SLDs.
CDF
CDF
CDF
0.4 0.4 0.4 0.4
0.2 Re-used 0.2 Re-used 0.2 Re-used 0.2 Re-used
Once-used Once-used Once-used Once-used
0 0 0 0
1 50 100 150 200 255 1 10 20 30 40 50 63 1 10 20 30 40 0 5 10 20 30
CDF
CDF
CDF
0.4 0.4 0.4 0.4
0.2 Re-used 0.2 Re-used 0.2 Re-used 0.2 Re-used
Once-reused Once-used Once-reused Once-reused
0 0 0 0
1 50 100 150 200 255 1 10 20 30 40 50 63 1 10 20 30 40 0 5 10 20 30
(a) DNS Query-Name Length (b) Longest-Subdomain Length (c) Number of Format Field (d) Total Number of L-FF and S-FF
Figure 3: Distribution of domain name-based features for re-used and once-used domains.
Variable Importance Primary Split Values
Table 2: Percentage of mis-classified instances
30 WM 60 WM
Decision Tree Random Forest (ntree=5) UD
50
UD
25
D R O D R O 20 40
WM 16.07% 0.26% 13.88% 12.93% 0.19% 11.16% 15 30
UD 13.91% 0.98% 14.13% 11.89% 0.34% 12.18% 10 20
5 10
D: Disposable, R: Re-used, O: Overall
F1 F2 F3 F4 F1 F2 F3 F4
the majority of the useless entries but have little negative impact on
caching reusable domains. Figure 4: Training Results (with Decision Tree).
4.3 The Classifier goals in the classification processes. They also indicate that, al-
though a simple decision tree tends to overfit the training set, it is
To validate the efficacy of the combination of proposed features, capable of producing accurate results when applied to the classifier
we train the datasets with both decision tree and random forest constructed by the combination of proposed features. More specif-
models [5] by using the rpart and randomForest packages in ically, 85% to 88% of once-used RRs are correctly labeled and ex-
R, respectively. Note that we leveraged the class-weights (i.e., the pelled from the cache, while only 0.2% to 1% of re-used RRs are
parms parameter in rpart and the classwt parameter in random- incorrectly classified in the WM and UD datasets, respectively.
Forest) to handle unbalanced class sizes in our datasets. The unbalanced (but expected and positive) results can be inter-
Ground truth. Since it is (almost) impossible to have a ground preted by the observation that the re-used entries have more consis-
truth that identifies the “disposable domains,” we label the once- tent and concentrated distributions of features, while the extracted
appeared domains extracted from the datasets as disposable do- features from once-used entries exhibit more diffused distributions.
mains. As such, our labels correspond to those assigned by an Figure 4 shows the index of variable importance and the primary
oracle with perfect knowledge. Although the domain unpopularity split values in the decision tree training for each dataset, which il-
may cause mis-labeling (i.e., some unpopular domains may be mis- lustrates that (1) all the features play important roles in the classifi-
labeled as disposable), our labels are the acceptable approximation cation (the importance index varies from 21 to 31), and (2) although
to the ground truth in practice, especially given more than thou- the primary split values are more aggressive than the thresholds de-
sands of users from each campus network. Moreover, mis-labelling rived from any single feature (§4.2), we can further lower error
a rarely re-used domain as disposable would have a marginal im- rates by using the combination of features.
pact on practical caching performance, being it likely to be evicted
before reappearing. 5. TRACE-DRIVEN SIMULATION
Evaluation of the classifiers. Each dataset is divided into mu- In §4, we demonstrate that the explicit domain name-based fea-
tually exclusive training and testing partitions, where 66% of the tures are useful to infer the reusability of RRs. However, in prac-
dataset is used for training and the rest is used for testing. With tice, the resolvers behave slightly differently due to the presence
the random forest model, we observe that the benefit cannot be of TTLs. The re-used entries may still cause cache misses as the
achieved with the number of constructed trees higher than five. Ta- cached RRs are expired. Meanwhile, some mis-classified reusable
ble 2 lists the percentage of incorrectly classified instances using entries may not affect the caching performance since many of them
the combination of all features.2 Note that we aim to improve the have a lower possibility to be retrieved again within the duration of
caching effectiveness in two folds: (1) effectively reject the use- TTL. In this section, we apply the classifier in §4.3 with the com-
less entries, and (2) minimize the negative impact on the reusable bination of proposed features to conduct a trace-based simulation3
ones. The results in Table 2 demonstrate that we can achieve both to evaluate the effectiveness of proposed policies.
2 3
We explored different combinations of feature sets and found that using We only perform the simulation with WM’s trace since the actual domains
all features in the classification can achieve the minimum error rate. have been anonymized in UD’s trace.
(a) Cache hit rate with FIFO (b) Cache hit rate with LRU
We implemented the proposed caching policies in a simulated re-
solver program modified from djbdns [2], in which the decisions
Figure 5: Distribution of cache hit rates. The X-axis represents
employ the classification results from §4.3. Our resolver program
the number of entries read into the resolver.
follows the standard TTL model, i.e., do not assign a default min-
imum TTL value. The duration of negative caching is subject to
the TTL values of SOA records [4]. Moreover, we do not set the from local campus networks, and in ISPs’ DNS servers, a larger
cleaning-interval to periodically expel the stale records due to the cache size is needed to handle the larger number of DNS queries.
use of sophisticated memory management in modern resolvers [3]. FIFO. We first evaluate proposed policies with the FIFO caching
Only when hitting the cache limit will some entries be prematurely scheme, which is still widely used in popular DNS resolvers such as
evicted from the cache (e.g., using LRU replacement policy). djbdns [2]. Figure 5(a) presents the measured cache hit rates un-
Types of RRs. First we study the caching properties of differ- der the FIFO, with and without proposed polices, respectively. We
ent types of RRs. In particular, we examine whether they need to observe that the modified caching policies can improve the cache
be given discriminative considerations when caching in resolvers. hit rate by about 8% with a cache size of 100,000 entries.
Table 3 lists the breakdown of the types of queries. The SOAs are Pseudo-LRU. We then evaluate the proposed policies with a
treated the same as A/AAAAs, i.e., caching such RRs according to simplified pseudo-LRU that leverages one bit to store the cache
the proposed policies in §4. Other unspecified types of RRs are not status for each entry (i.e., the MRU-bit). When a cache hit occurs
particularly studied because of their small amounts of queries. for an entry, its cache bit is set to 1. When the cache is full, the
oldest entry with the cache bit 0 is evicted. When the cache bits of
• TXT records: We identify that only 0.01% of TXT records
all cached entries have been set to 1, all the bits are cleaned to 0
have been reused, and indeed observe that the TXTs are be-
except for the last one. Figure 5(b) shows the measured cache hit
ing used as an information channel. We observed similar
rates with the LRU replacement scheme. We observe that the pro-
distributions of proposed features in TXTs and thus use the
posed caching modifications improve the cache hit rate by about
proposed policies for caching TXTs.
7%. Compared with FIFO, LRU without our policies can increase
• Reverse lookup queries (PTRs): We do not apply the mod- the cache hit rate by about 2%. With the proposed modifications,
ifications on PTRs since it is rarely misused, and we do not both FIFO and LRU can increase the cache hit rate to 92% - 93%.
observe the studied features taking effect on PTRs.
• Service records (SRVs): We identify that most SRVs (97.46%) 5.3 Discussion
are involved with local queries that have been removed in TTL values. The lower TTL values have been observed in both
our study (§4.1). Like PTRs, we also observe that the stud- malicious domains [10] and disposable domains [12]. However,
ied features have no impact on SRVs’ caching effectiveness the domain owners are free to set the TTLs and have been switch-
either, and thus we will not apply the polices on SRVs. ing to larger TTL values (most of them have a TTL of 300s [12]),
• NS records: Caching NS records can significantly enhance resulting in a larger duration of the useless entries being regarded as
the efficiency of DNS and reduce the load on name severs valid in the cache. Meanwhile, since modern resolvers have made
[17]. Also, the number of NS records is much smaller than the cleaning-interval obsolete [3], caching the once-used domains
the other types of RRs above. Thus, there is no need to apply even for a short time still degrades the performance. This is because
the policies on NS records. In fact, no NS records would be the cache is filled with such useless entries, and no space is left for
excluded from caching if the proposed policies were applied. caching useful ones. Thus, TTL may not be a reliable indicator of
the caching effectiveness, and we do not consider it to be a metric
to quantify the caching behavior.
5.2 Results
Counteraction. One may be concerned that the domain owners
We now evaluate the effectiveness of the proposed policies given could circumvent our polices by changing the structure of domain
a fixed cache memory allocation. To simplify the assessment, we names. However, we believe the modifications will not provoke
define the cache allocation by the number of RRs. We input the them to seek more sophisticated approaches since those “one-time
RRs to a cache file and then examine the cache hit rate, which is use” domains have accomplished the communication mission, and
calculated as the ratio between the number of cache hits and the the developers using such approaches would not care if their DNS
total number of retrieved RRs. responses are cached.
We need to determine how many RRs should be cached to rep-
resent a real scenario for the evaluation of our proposed policies.
Jung et al. [17] identified that the DNS cache hit ratio is between 6. RELATED WORK
80% and 87%. Thus, we choose our cache size as the number of DNS Caching and TTL characterization. Pang et al. [20] pre-
cached RRs that can achieve a similar cache hit ratio. To this end, sented a comprehensive study on DNS, and [19] observed that a
given the moderate size of our dataset and an FIFO replacement significant fraction of web clients and LDNSes do not honor DNS
policy, i.e., simply remove the oldest entry when the cache runs out TTLs. Callahan et al. [11] passively monitored DNS traffic within
of space, we observe that a size of 100,000 entries would have a a residential network to profile the modern DNS behaviors and
cache hit rate of about 86%. Therefore, we set the cache size to properties. They also observed that web browsers do not adhere to
100,000 RRs in our simulations. Note that this setting is derived the given TTLs, and CDNs tend to shape traffic with shorter TTLs.