0% found this document useful (0 votes)

81 views6 pages

Effectiveness of DNS

This document discusses how domain name misuse, such as disposable domains used only once, negatively impacts the effectiveness of DNS caching. It proposes using features derived directly from domain names to identify one-time domains and modify caching behavior. The features are analyzed using real DNS trace logs to build a classifier that can prevent 85% of one-time domains from being cached while keeping misclassification of reusable domains below 1%.

Uploaded by

Mako Kazama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views6 pages

Effectiveness of DNS

Uploaded by

Mako Kazama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Exploring Domain Name-Based Features on the

Effectiveness of DNS Caching

Shuai Hao† Haining Wang

* †
University of Delaware College of William and Mary
{haos, hnw}@udel.edu

ABSTRACT of misused domains, e.g., disposable domains [12], causes the in-
DNS cache plays a critical role in domain name resolution, pro- effectiveness of caching on resolvers since the cache is filled by
viding (1) high scalability at Root and Top-level-domain (TLD) records with very low or almost zero cache hit rates.
name servers with reduced workloads and (2) low response latency In this paper, we attempt to mitigate the negative effect on DNS
to clients when the resource records of the queried domains are caching caused by the domain name misuses, especially the “one-
cached. However, the pervasive misuses of domain names, e.g., time-use” domains. Different from previous approaches, we do not
the domains of “one-time-use” pattern, have a negative impact on pursue an accurate detection of domain misuses by employing the
the effectiveness of DNS caching because the cache has been filled deep inspection techniques, such as behavioral features [10], al-
with entries that are highly unlikely to be retrieved. In this paper, phanumeric characters-based metric [26], or entropy-based com-
we investigate such misuse and identify domain name-based fea- puting [21]. Instead, our key insight is that since most misused
tures to characterize those one-time domains. By leveraging the domains, either malicious or benign, tend to transmit information
features that are explicitly available from the domain name itself, over DNS query names, the domain name itself may have distinct
we build a classifier to combine these features, propose simple pol- features that are explicitly available from individual queries and can
icy modifications on caching resolvers for improving DNS cache be readily exploited for improving DNS cache performance.
performance, and validate their efficacy using real traces. Based on DNS trace logs captured in the resolvers of campus net-
works, we extract the re-used and once-used RRs. The reused RRs
CCS Concepts indicate that the queried domains are retrieved for multiple times,
and the once-used RRs only appeared once in one trace. By ana-
•Networks → Network measurement; Network protocols; lyzing a large amount of once-used entries, we observe that several
explicit domain name-based features are capable of characterizing
1. INTRODUCTION the reusability of domains. As such, we propose modified caching
As one of the most important components of the Internet, the Do- behavior to enhance the effectiveness of DNS caching by prelimi-
main Name System (DNS) provides vital mapping service for Inter- narily excluding unreusable RRs. In order to validate their capabil-
net users by translating domain names to IP addresses. Since DNS ity, we quantify the statistical properties of each feature and build
is a globally distributed database system, caching has been widely a classifier that combines those features. The classification results
adopted in DNS infrastructures, where the acquired mapping re- demonstrate that the proposed modifications are able to prevent ap-
sults (i.e., the DNS resource records, RRs) will be cached locally proximately 85% of once-used RRs from being cached while only
to answer the following queries in a specific duration. DNS cache less than 1% of reusable RRs are mistakenly kept out of the cache.
significantly reduces the resolution traffic along referral chains to The remainder of this paper is organized as follows. We intro-
interact with multiple name servers, resulting in a much shorter duce the background of DNS caching and disposable domains in
client-perceived delay and high scalability of DNS. §2. We present the proposed domain name-based features on DNS
Due to its fundamental role for accessing Internet services, DNS caching in §3. We analyze collected datasets and build a classifier
traffic is the least blocked [21], and provides both attackers and de- to validate the features in §4, and conduct a trace-driven evaluation
velopers with an attractive channel to transmit information. Thus, in §5. We survey related work in §6 and conclude the paper in §7.
the misuse of domain names (either malicious or non-malicious)
is widely observed on the Internet. On the other hand, since the 2. BACKGROUND
cached objects in DNS resolvers are typically small, in some in-
stances caches are not size-limited [17] and memory usage is rel- 2.1 DNS Caching
atively stable as the expired entries are being evicted. However, Recursive DNS resolvers retrieve the name resolution results for
when serving a large group of users with a heavy load (e.g., in ISP clients and cache the received responses to answer the following
or CDN/cloud providers), although modern DNS resolvers manage queries. The duration that the cached records would be valid is
the memory well, they will still quickly consume the memory bytes specified by a time-to-live (TTL) value.
and go into swap. Meanwhile, this may also cause performance In standard TTL-based caching, the TTL value is set and handed
problems on CPU if cleaning-interval is enabled to periodically out by the administrator of authoritative DNS record, and the cached
check the stale records. To this effect, a fixed memory allocation is entries are expunged after their TTLs expire. The duration for
a common configuration [12], and the typical replacement policies caching a negative response (e.g., NXDOMAIN, NODATA, etc.) is
(e.g., LRU and LFU) are employed to manage the cache usage [13, given by the TTL value of SOA record [4]. While TTL-aging-based
14]. Therefore, it is critical to ensure that the cached RRs would behavior is legible, the violation of TTL is observed pervasively
be likely to be accessed again. Unfortunately, the pervasiveness both in modern web browsers and DNS infrastructures [11]. For

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

instance, many browsers and resolvers assign a minimum amount 1 1
of seconds for holding an RR, and many resolvers trim the large 0.8 0.8
TTL values to a default maximum value. 0.6 0.6

CDF

CDF
2.2 Disposable Domains 0.4 0.4
all queries all queries
The use of DNS in various ways for which it was not originally 0.2 distinct names 0.2 distinct names
designed has been observed for many years. For example, DNS is 0 0
1 50 100 150 200 250 1 10 20 30 35
exploited as an effective covert channel for surreptitious commu-
nications [21, 25]. Moreover, Chen et al. [12] studied disposable (a) Query Name Length (b) Subdomain Depth
domains, a more generic class of domain misuse where the query
Figure 1: Example of distribution for lengths and depths.
names are adopted to convey the “one-time signals.” These do-
mains are not necessarily malicious and are observed pervasively
from various types of service providers, including popular search • F3-a: Subdomain depth: To report information over DNS,
engines, social networks, CDNs, and security companies, and have the implications of the domain names must be easily recog-
being increased to a significant portion of queried domains on the nized by the receivers. In doing so, the domain separator,
Internet. Due to the “one-time-use” pattern and the increasing use i.e., the period “.”, would naturally be employed to format
of such domains, the DNS cache would be filled with entries with the domains to give the name strings meaningful informa-
near-zero hit rates. Our work is mainly built on the analysis of tion, resulting in a deep subdomain level, i.e., a large number
these entries for disposable domains. of subdomains in one query name.
• F3-b: Number of format fields: Like the period specifying
3. DOMAIN NAME-BASED FEATURES the subdomain level, we also observed that the hyphen “-”
Domain names are human-readable and easy-to-remember char- is widely used as field separator to format messages in one
acter sets. However, the once-used domains exploit the query names subdomain name.
as a communication channel. One of our insights is that such mis- Due to the function similarity, in this paper we treat both periods
used domains are encoded automatically to convey formatted in- (F3-a) and hyphens (F3-b) as equal format separators and em-
formation and should have significantly different patterns on their ploy the same term, “number of format fields” (F3), to represent
domain names. the number of strings separated by either “.” or “-”.
Therefore, we consider the possibility of characterizing the once-
used domains, and then exploit the derived features to filter out the • F4: Number of fields with unusual lengths: To represent
disposable domains. The removal of such once-used domains from various pre-defined types of information inserted in speci-
the DNS cache will improve its performance because the pervasive fied positions, the length of format strings would vary widely,
use of misused domains causes the DNS cache to be occupied by and many fields would be either unusually long or unusually
those entries that are highly unlikely to be reused. Figure 1 presents short. We consider such domains quite hard to reuse. Thus,
the preliminary examples that motivate our feature selection. We we define a metric of the sum of the number of long-format-
simply plot the distributions of the query name length and the sub- fields (L-FF) and short-format-fields (S-FF) within one query
domain depth (i.e., the number of subdomains), respectively, for all name to identify such a feature.
observed query names and distinct domain names from one of our Figure 2 shows the sample of domains with the explicit features.
trace logs (§4.1). It is evident that (1) most repeatedly appeared Note that these features, e.g., the length of query names, do not
domains have a short name and limited subdomain depth, and (2) a necessarily indicate malicious purposes [21]; in fact, most of them
significant portion of domains have a long query name and a large are benign. However, they indeed indicate the usage of “signaling”
number of subdomains. This implies that, under a limited mem- and thus imply the high possibility of a “one-time-use” pattern. Ac-
ory space, discarding those entries with long names or deep subdo- cordingly, there would be a strong correlation between each feature
mains would save more space in the cache to store the entries with and once-used RRs in the cache. By exploiting these features, we
a higher possibility of being reused, and thus effectively improving can revise the caching policies to proactively prevent those RRs
the caching effectiveness. that are less likely to be reused from being cached, such that the
Based on the analysis of large amounts of once-used domains, effectiveness of DNS caching could be significantly improved.
we identified that the domain name-based features, such as the two Methodology. For the proposed features, we characterize the
features above, are able to characterize the caching behavior of do- properties of re-used and once-used domains, train a classifier to
mains without the help of sophisticated features used in the de- classify the entries, and conduct a trace-driven simulation to vali-
tection of malicious domains (e.g., the behavioral features in EX- date their efficacy in caching. In the feature validation (§4.2) and
POSURE [10] and statistical features in Notos [6]). As a result, classification (§4.3), the analysis simply relies on domain names
we propose the following features and explain why they may af- and assumes implicitly that both the cache size and TTL values are
fect caching effectiveness. All but the first one have not yet been unlimited. This assumption simply creates an ideal scenario for
characterized in the analysis of DNS, and none of them have been caching RRs, and cache hits are not limited by the cache size and
studied in the context of caching performance. TTL values. The simulation (§5.2) runs within a resolver program
• F1: Length of query name: Since most of the once-used that caches the entries according to the classification results and
domains tend to send messages over DNS queries, those do- common practices on modern DNS resolvers.
mains naturally have (much) longer query names to pack as
much information as possible and are hardly reusable. 4. MEASUREMENT ANALYSIS
• F2: Length of the longest subdomain name: Similar to the
query name, the individual lower-level name (i.e., the string 4.1 Dataset
representing a subdomain) could also be larger than a legiti- The datasets used in the study are the trace logs of outgoing DNS
mate subdomain name that tends to be “easy-to-remember.” queries captured at local DNS servers at the College of William and

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

F1 F2 F3 F4
0.28e.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.27ebb55310eb3785446c4874fefabd756df2ff2361
,→ 2f512d4ddc7c9a3ba70d2.b.f.01.s.sophosxl.net • N F H
f6a9fc3efdffd146663f44de1c071c0f548c5653.p.00.s.sophosxl.net N
p4-hnpieeqf4ghwk-ab6awvsfyjixzdw7-629649-i1-v6exp3-v4.metric.gstatic.com N H
0107c2e22801.t-1436279541.i45381316.04e6d5fe76b6755a1edd7532a34ec877-27718-htm.fp.bl.
,→ barracudabrts.com • N F H
f6a556b42904cfd118c0-553848320f40ae46bf95fbb566795773.r63.cf2.rackcdn.com • N
ada1a1b36908553d09b507630a84f2606.profile.lhr5.cloudfront.net N
e4cbe5a2594fa1dd8306275ef1e7e4df.azr.msnetworkanalytics.testanalytics.net • N
b-0.19-a3000008.8011081.1644.981.3ea3.410.0.q3j4p1csa3z1seadcmamni9295.avts.mcafee.com • F H
0.0.0.0.1.0.0.4E.c7eijlj4gwumadva92s62m52ri.avqs.mcafee.com F H
i1-j1-18-15-4-114-3425533130-i.init.cedexis-radar.net N F H

Figure 2: Sample of domains with the domain name-based features.

Table 1: Summary of datasets

Dataset Dates Size (per day) Features† # Records # Distinct name # Re-used # Once-used
WM 6/25/2015 - 7/8/2015 1.1 - 2.3GB N, S, K, T 192,251,799 4,470,732 619,045 3,851,687
UD 12/8/2015 - 12/23/2015 2.4 - 3.9GB S‡, , T 1,011,877,341 13,179,395 2,852,021 10,327,374
†Features: Query Name (N), Structure of Domain Name (S), Query Type (K), and Query Timestamp (T)
‡In UD dataset, the actual query names are anonymized but remain distinguishable to identify reappearance. Meanwhile, the structure of
domains is given by the length of each format field (e.g., a.b.c-d-e.f.g, where each letter represents the length of each field).

Mary (WM) and the University of Delaware (UD) over a period of takenly kept out of the cache while 32.58% of once-used domains
two weeks. We summarize the datasets in Table 1. (25.53% of RRs in total) are dropped. Such results indicate that
The trace logs from campus vantage points have two limitations. rejecting domains with large names would significantly reduce the
First, the traces may not reflect the dynamics of domain names waste of cache space but keep the cache hit ratio at the same level.
observed at ISP’s DNS servers. However, due to similar patterns F2: Length of the Longest Subdomain Name. Figure 3(b)
of disposable domains from a large-scale ISP dataset reported in demonstrates that the large strings are widely adopted in once-used
[12], we believe that the proposed policies would also be effec- domains. Specifically, with each dataset, we identify 3.03% and
tive in ISP’s DNS cache, especially due to the heavy load on their 1.51% of re-used entries, as well as 73.40% and 68.05% of once-
resolvers. Also, we removed the RR entries that retrieve local do- used entries, include one subdomain with the length of more than
mains. Second, the length of our trace logs is limited. We believe, 20 bytes, respectively. If we increase the threshold to 30 bytes,
however, they are still capable of demonstrating the typical cache the fractions of re-used domains will decline to 0.39% and 0.30%
usage patterns of local resolvers because the origin TTL values of while the fractions of once-used domains remain at 69.86% and
A/AAAA records are typically shorter than one day [15]. 57.22%, respectively. Thus, a subdomain name length greater than
30 bytes would strongly indicate that the domain is once-used, with
4.2 Feature Validation little chance of being a useful entry if cached.
Given the heuristics presented in §3, we validate our speculation
F3: Number of Format Fields. Figure 3(c) presents the dis-
that these explicit domain name-based features will help to improve
tribution of the total number of format fields. It is easy to identify
the effectiveness of DNS caching. From Table 1, we can see that
that a threshold of 10 is capable of distinguishing the reusability for
86.15% and 78.63% of queried domains appeared only once in each
each class of domains. Using this threshold, we can exclude 0.59%
dataset, which is similar to the results of identified disposable do-
and 0.79% of re-used entries, and 31.17% and 42.39% of once-used
mains [12]. In our study, however, we do not attempt to achieve
domains, respectively, from each dataset; overall, around 25% and
high detection accuracy in identifying such a class of domains. In-
37% of entries would be discarded from the cache, respectively.
stead, we focus on exploring the efficacy of proposed features on
improving the overall cache performance by avoiding a large num- F4: Total number of L-FF and S-FF. In order to profile the
ber of once-used records being filled in the cache. total number of the long-format-fields (L-FF) and the short-format-
To validate each proposed feature based on its factual caching fields (S-FF), we first empirically determine the specific values of
effect, we tentatively derive a threshold to measure the fractions of the length to define the L-FF and S-FF. Since most of TLDs include
excluded domains. We then leverage a learning module to train our one or two fields with two or three characters, we define the S-
datasets and build a classifier that combines the proposed features. FF as the fields with characters less than or equal to three.1 Also,
we investigate the distributions of F4 by varying the length of L-
F1: Length of Query Name. Figure 3(a) shows the distributions
FF, and observe that a length of 10 is sufficient to demonstrate the
of the query name length in which the two classes of domains are
distinct statistical properties for this feature.
clearly distinguishable from one another. The majority of once-
Figure 3(d) shows the distribution of the sum of L-FF (>10
used domains apparently have much longer name lengths than re-
bytes) and S-FF (63 bytes). With a clear threshold observed at
used domains.
five, we identify that 0.61% of re-used and 70.03% of once-used
To exclude the useless RRs, we start to consider a tentative thresh-
domains in WM’s dataset (60.33% of RRs in total), and 0.40% of re-
old at 50 bytes of query name length. In the WM dataset, a thresh-
used and 74.55% of once-used domains in UD’s dataset (63.97% of
old of 50 bytes would exclude 81.10% of once-used domains and
RRs in total), would be discarded. As a result, this would exclude
1.61% of re-used domain, resulting in 69.87% of overall entries
being rejected from the cache. By gradually raising the threshold, 1
A more accurate approach may need to exclude the TLDs (e.g., .com and
at the length of 100 bytes we observe that 0.027% of reusable do- .co.uk) and SLDs (second-level domains, such as msn and cnn) since
mains are mistakenly dropped while 44.99% of once-used domains they may be regarded as the S-FF. However, we observe that checking the
(38.77% of RRs in total) are discarded. Similarly in the UD set, entire domain names has already produced effective results, so we decide to
with the length of 100 bytes, 0.019% of re-used entries are misuse a simple way to avoid introducing additional steps to identify the SLDs.

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

dataset - 1 dataset - 1 dataset - 1 dataset - 1
1 1 1 1
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
CDF

CDF

CDF
0.4 0.4 0.4 0.4
0.2 Re-used 0.2 Re-used 0.2 Re-used 0.2 Re-used
Once-used Once-used Once-used Once-used
0 0 0 0
1 50 100 150 200 255 1 10 20 30 40 50 63 1 10 20 30 40 0 5 10 20 30

dataset - 2 dataset - 2 dataset - 2 dataset - 2

1 1 1 1
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
CDF

CDF

CDF
0.4 0.4 0.4 0.4
0.2 Re-used 0.2 Re-used 0.2 Re-used 0.2 Re-used
Once-reused Once-used Once-reused Once-reused
0 0 0 0
1 50 100 150 200 255 1 10 20 30 40 50 63 1 10 20 30 40 0 5 10 20 30
(a) DNS Query-Name Length (b) Longest-Subdomain Length (c) Number of Format Field (d) Total Number of L-FF and S-FF

Figure 3: Distribution of domain name-based features for re-used and once-used domains.
Variable Importance Primary Split Values
Table 2: Percentage of mis-classified instances
30 WM 60 WM
Decision Tree Random Forest (ntree=5) UD
50
UD
25
D R O D R O 20 40
WM 16.07% 0.26% 13.88% 12.93% 0.19% 11.16% 15 30
UD 13.91% 0.98% 14.13% 11.89% 0.34% 12.18% 10 20
5 10
D: Disposable, R: Re-used, O: Overall
F1 F2 F3 F4 F1 F2 F3 F4

the majority of the useless entries but have little negative impact on
caching reusable domains. Figure 4: Training Results (with Decision Tree).

4.3 The Classifier goals in the classification processes. They also indicate that, al-
though a simple decision tree tends to overfit the training set, it is
To validate the efficacy of the combination of proposed features, capable of producing accurate results when applied to the classifier
we train the datasets with both decision tree and random forest constructed by the combination of proposed features. More specif-
models [5] by using the rpart and randomForest packages in ically, 85% to 88% of once-used RRs are correctly labeled and ex-
R, respectively. Note that we leveraged the class-weights (i.e., the pelled from the cache, while only 0.2% to 1% of re-used RRs are
parms parameter in rpart and the classwt parameter in random- incorrectly classified in the WM and UD datasets, respectively.
Forest) to handle unbalanced class sizes in our datasets. The unbalanced (but expected and positive) results can be inter-
Ground truth. Since it is (almost) impossible to have a ground preted by the observation that the re-used entries have more consis-
truth that identifies the “disposable domains,” we label the once- tent and concentrated distributions of features, while the extracted
appeared domains extracted from the datasets as disposable do- features from once-used entries exhibit more diffused distributions.
mains. As such, our labels correspond to those assigned by an Figure 4 shows the index of variable importance and the primary
oracle with perfect knowledge. Although the domain unpopularity split values in the decision tree training for each dataset, which il-
may cause mis-labeling (i.e., some unpopular domains may be mis- lustrates that (1) all the features play important roles in the classifi-
labeled as disposable), our labels are the acceptable approximation cation (the importance index varies from 21 to 31), and (2) although
to the ground truth in practice, especially given more than thou- the primary split values are more aggressive than the thresholds de-
sands of users from each campus network. Moreover, mis-labelling rived from any single feature (§4.2), we can further lower error
a rarely re-used domain as disposable would have a marginal im- rates by using the combination of features.
pact on practical caching performance, being it likely to be evicted
before reappearing. 5. TRACE-DRIVEN SIMULATION
Evaluation of the classifiers. Each dataset is divided into mu- In §4, we demonstrate that the explicit domain name-based fea-
tually exclusive training and testing partitions, where 66% of the tures are useful to infer the reusability of RRs. However, in prac-
dataset is used for training and the rest is used for testing. With tice, the resolvers behave slightly differently due to the presence
the random forest model, we observe that the benefit cannot be of TTLs. The re-used entries may still cause cache misses as the
achieved with the number of constructed trees higher than five. Ta- cached RRs are expired. Meanwhile, some mis-classified reusable
ble 2 lists the percentage of incorrectly classified instances using entries may not affect the caching performance since many of them
the combination of all features.2 Note that we aim to improve the have a lower possibility to be retrieved again within the duration of
caching effectiveness in two folds: (1) effectively reject the use- TTL. In this section, we apply the classifier in §4.3 with the com-
less entries, and (2) minimize the negative impact on the reusable bination of proposed features to conduct a trace-based simulation3
ones. The results in Table 2 demonstrate that we can achieve both to evaluate the effectiveness of proposed policies.
2 3
We explored different combinations of feature sets and found that using We only perform the simulation with WM’s trace since the actual domains
all features in the classification can achieve the minimum error rate. have been anonymized in UD’s trace.

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

1 1
Table 3: Summary of RR Types (%) 0.9 0.9

Cache hit rate

A AAAA TXT PTR SRV SOA NS other†
0.8 0.8
70.80 15.33 5.09 3.96 3.86 0.83 0.08 0.06
0.7 0.7
† MX, ANY, NAPTR, DS, DNSKEY, CNAME, AFSDB, and AFXR
0.6 Origin
0.6 Origin
Modified Modified
0.5 0.5
5.1 Implementation 0 1x106 2x106 3x106 4x106 5x106 0 1x106 2x106 3x106 4x106 5x106

(a) Cache hit rate with FIFO (b) Cache hit rate with LRU
We implemented the proposed caching policies in a simulated re-
solver program modified from djbdns [2], in which the decisions
Figure 5: Distribution of cache hit rates. The X-axis represents
employ the classification results from §4.3. Our resolver program
the number of entries read into the resolver.
follows the standard TTL model, i.e., do not assign a default min-
imum TTL value. The duration of negative caching is subject to
the TTL values of SOA records [4]. Moreover, we do not set the from local campus networks, and in ISPs’ DNS servers, a larger
cleaning-interval to periodically expel the stale records due to the cache size is needed to handle the larger number of DNS queries.
use of sophisticated memory management in modern resolvers [3]. FIFO. We first evaluate proposed policies with the FIFO caching
Only when hitting the cache limit will some entries be prematurely scheme, which is still widely used in popular DNS resolvers such as
evicted from the cache (e.g., using LRU replacement policy). djbdns [2]. Figure 5(a) presents the measured cache hit rates un-
Types of RRs. First we study the caching properties of differ- der the FIFO, with and without proposed polices, respectively. We
ent types of RRs. In particular, we examine whether they need to observe that the modified caching policies can improve the cache
be given discriminative considerations when caching in resolvers. hit rate by about 8% with a cache size of 100,000 entries.
Table 3 lists the breakdown of the types of queries. The SOAs are Pseudo-LRU. We then evaluate the proposed policies with a
treated the same as A/AAAAs, i.e., caching such RRs according to simplified pseudo-LRU that leverages one bit to store the cache
the proposed policies in §4. Other unspecified types of RRs are not status for each entry (i.e., the MRU-bit). When a cache hit occurs
particularly studied because of their small amounts of queries. for an entry, its cache bit is set to 1. When the cache is full, the
oldest entry with the cache bit 0 is evicted. When the cache bits of
• TXT records: We identify that only 0.01% of TXT records
all cached entries have been set to 1, all the bits are cleaned to 0
have been reused, and indeed observe that the TXTs are be-
except for the last one. Figure 5(b) shows the measured cache hit
ing used as an information channel. We observed similar
rates with the LRU replacement scheme. We observe that the pro-
distributions of proposed features in TXTs and thus use the
posed caching modifications improve the cache hit rate by about
proposed policies for caching TXTs.
7%. Compared with FIFO, LRU without our policies can increase
• Reverse lookup queries (PTRs): We do not apply the mod- the cache hit rate by about 2%. With the proposed modifications,
ifications on PTRs since it is rarely misused, and we do not both FIFO and LRU can increase the cache hit rate to 92% - 93%.
observe the studied features taking effect on PTRs.
• Service records (SRVs): We identify that most SRVs (97.46%) 5.3 Discussion
are involved with local queries that have been removed in TTL values. The lower TTL values have been observed in both
our study (§4.1). Like PTRs, we also observe that the stud- malicious domains [10] and disposable domains [12]. However,
ied features have no impact on SRVs’ caching effectiveness the domain owners are free to set the TTLs and have been switch-
either, and thus we will not apply the polices on SRVs. ing to larger TTL values (most of them have a TTL of 300s [12]),
• NS records: Caching NS records can significantly enhance resulting in a larger duration of the useless entries being regarded as
the efficiency of DNS and reduce the load on name severs valid in the cache. Meanwhile, since modern resolvers have made
[17]. Also, the number of NS records is much smaller than the cleaning-interval obsolete [3], caching the once-used domains
the other types of RRs above. Thus, there is no need to apply even for a short time still degrades the performance. This is because
the policies on NS records. In fact, no NS records would be the cache is filled with such useless entries, and no space is left for
excluded from caching if the proposed policies were applied. caching useful ones. Thus, TTL may not be a reliable indicator of
the caching effectiveness, and we do not consider it to be a metric
to quantify the caching behavior.
5.2 Results
Counteraction. One may be concerned that the domain owners
We now evaluate the effectiveness of the proposed policies given could circumvent our polices by changing the structure of domain
a fixed cache memory allocation. To simplify the assessment, we names. However, we believe the modifications will not provoke
define the cache allocation by the number of RRs. We input the them to seek more sophisticated approaches since those “one-time
RRs to a cache file and then examine the cache hit rate, which is use” domains have accomplished the communication mission, and
calculated as the ratio between the number of cache hits and the the developers using such approaches would not care if their DNS
total number of retrieved RRs. responses are cached.
We need to determine how many RRs should be cached to rep-
resent a real scenario for the evaluation of our proposed policies.
Jung et al. [17] identified that the DNS cache hit ratio is between 6. RELATED WORK
80% and 87%. Thus, we choose our cache size as the number of DNS Caching and TTL characterization. Pang et al. [20] pre-
cached RRs that can achieve a similar cache hit ratio. To this end, sented a comprehensive study on DNS, and [19] observed that a
given the moderate size of our dataset and an FIFO replacement significant fraction of web clients and LDNSes do not honor DNS
policy, i.e., simply remove the oldest entry when the cache runs out TTLs. Callahan et al. [11] passively monitored DNS traffic within
of space, we observe that a size of 100,000 entries would have a a residential network to profile the modern DNS behaviors and
cache hit rate of about 86%. Therefore, we set the cache size to properties. They also observed that web browsers do not adhere to
100,000 RRs in our simulations. Note that this setting is derived the given TTLs, and CDNs tend to shape traffic with shorter TTLs.

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

Jung et al. [17] presented a detailed analysis on DNS traces to [4] RFC 2308: Negative Caching of DNS Queries (DNS NCACHE).
evaluate the client-perceived performance and the effectiveness of https://tools.ietf.org/html/rfc2308.
DNS caching. They [18] then presented an analytic method of [5] P. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining.
modeling the cache hit rate given consistent TTLs. Choungmo- Addison-Wesley, 2006.
Fofack et al. [14] studied the DNS cache hierarchy in which the [6] M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster.
Building a Dynamic Reputation System for DNS. In USENIX Security
TTLs are overridden by the local values. Symposium, 2010.
Cache modifications. Shang et al. [23] proposed an approach [7] M. Antonakakis, R. Perdisci, W. Lee, N. Vasiloglou, and D. Dagon.
to improve the DNS caching by letting the ADNS’s response pig- Detecting Malware Domains in the Upper DNS Hierarchy. In
gyback extra resolution results for future queries predicted by site USENIX Security Symposium, 2011.
usage and DNS history. Cohen et al. [13] studied the proactive [8] M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh,
renewal policies in LDNSes, where the expired records are reused W. Lee, and D. Dagon. From Throw-Away Traffic to Bots: Detecting
the Rise of DGA-based Malware. In USENIX Security Symposium,
to answer the queries and are validated with a concurrent query. 2012.
Similarly, Ballani et al. [9] proposed a minor change in caching [9] H. Ballani and P. Francis. Mitigating DNS DoS Aattacks. In ACM
behavior of DNS resolvers to mitigate the DNS DoS attacks, where Conference on Computer Communication Security (CCS), 2008.
the expired records are stored in a separate “stale cache” and reused [10] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi. EXPOSURE:
to answer the queries unresponded by the authoritative servers. Finding Malicious Domains Using Passive DNS Analysis. In Network
Malicious domain detection. There have been studies to under- and Distributed System Security Symposium (NDSS), 2011.
stand and detect the malicious domains. Hao et al. [16] examined [11] T. Callahan, M. Allman, and M. Rabinovich. On Modern DNS
Behavior and Properties. ACM SIGCOMM Computer Communication
the features that may indicate malicious purposes of a domain dur- Review (CCR), 43(3): 7-15, July 2013.
ing its registration. Antonakakis et al. [7, 8] and Yadav et al. [26] [12] Y. Chen, M. Antonakakis, R. Perdisci, Y. Nadji, D. Dagon, and W.
proposed methods to detect dynamically generated malicious do- Lee. DNS Noise: Measuring the Pervasiveness of Disposable
mains in DNS traffic. Domains in Modern DNS Traffic. In IEEE/IFIP International
Bilge et al. [10] built a passive analysis system extracting 15 Conference on Dependable Systems and Networks (DSN), 2014.
features to detect malicious domains. Similar to our work, two [13] E. Cohen and H. Kaplan. Proactive Caching of DNS Records:
of the features are domain name-based: (1) the percentage of nu- Addressing a Performance Bottleneck. Journal of Computer
Networks, 41(6): 707-726, Apr. 2003, .
merical characters and (2) the ratio of the length of LMS (longest-
[14] N. Choungmo Fofack and S. Alouf. Modeling Modern DNS Caches.
meaningful substring) to the total length of a second-level domain. In EAI International Conference on Performance Evaluation
The most salient difference is that our work studies caching behav- Methodologies and Tools (ValueTools), 2013.
iors, regardless of whether a domain is malicious or benign. Also, [15] H. Gao, V. Yegneswaran, Y. Chen, P. Porras, S. Ghosh, J. Jiang, and
the features we used are simpler and more straightforward (e.g., the H. Duan. An Empirical Reexamination of Global DNS Behavior. In
second feature above requires checking the English dictionary). ACM SIGCOMM, 2013.
[16] S. Hao, M. Thomas, V. Paxson, N. Feamster, C. Kreibich, C. Grier,
and S. Hollenbeck. Understanding the Domain Registration Behavior
7. CONCLUSION of Spammers. In ACM Internet Measurement Conference (IMC),
We presented an empirical study on the domain name-based fea- 2013.
tures of DNS queries and exploited these features to improve DNS [17] J. Jung, E. Sit, H. Balakrishnan, and R. Morris. DNS Performance
cache performance. The identified features, including the length of and the Effectiveness of Caching. IEEE/ACM Transactions on
a query name, the length of the longest-subdomain, and the num- Networking, 10(5): 589-603, Oct. 2002.
ber of subdomains or format fields, are explicitly available from a [18] J. Jung, A. W. Berger and H. Balakrishnan. Modeling TTL-based
Internet Caches. In IEEE INFOCOM, 2003.
domain name itself, without involving deep inspections. Whereas
[19] J. Pang, A. Akella, A. Shaikh, B. Krishnamurthy, and S. Seshan. On
the features do not indicate malicious purposes, the majority of do- the Responsiveness of DNS-based Network Control. In ACM Internet
mains with such properties are indeed associated with the one-time- Measurement Conference (IMC), 2004.
use pattern and would be highly unlikely to be reused in a DNS [20] J. Pang, J. Hendricks, A. Akella, R. D. Prisco, B. Maggs, and S.
cache. Our analysis and simulation demonstrate that proactively Seshan, Availability, Usage, and Deployment Characteristics of the
rejecting such domains from the cache can improve the overall ef- Domain Name System. In ACM Internet Measurement Conference
fectiveness of DNS caching. Finally, we make one of the traces (IMC), 2004.
used in this paper publicly available [1], with proper anonymiza- [21] V. Paxson, M. Christodorescu, M. Javed, J. Rao, R. Sailer, D. L.
Schales, M. Stoecklin, K. Thomas, W. Venema, and N. Weaver.
tion while being able to perform training and simulation. Practical Comprehensive Bounds on Surreptitious Communication
Over DNS. In USENIX Security Symposium, 2013.
8. ACKNOWLEDGMENTS [22] P. Porras, H. Saidi, and V. Yegneswaran. A Foray into Conficker’s
Logic and Rendezvous Points. In USENIX Workshop on Large-Scale
We would like to thank the anonymous reviewers and our Area Exploits and Emergent Threats (LEET), 2009.
Editor, Marco Mellia, for their detailed and insightful comments, [23] H. Shang and C. E. Wills. Piggybacking Related Domain Names to
which help to improve the quality of this paper. We would also like Improve DNS Performance. Journal of Computer Networks,
to thank Clarke Morledge, Roger O. Clark, Thomas M. Baxter, Ben 50(11):1733-1748, Aug. 2006.
Miller, and other IT team members at the College of William and [24] D. Wessels, M. Fomenkov, N. Brownlee, and K. C. Claffy.
Mary and the University of Delaware for the DNS data collection. Measurements and Laboratory Simulations of the Upper DNS
Hierarchy. In Passive and Active Measurement Conference (PAM),
2004.
9. REFERENCES [25] K. Xu, P. Butler, S. Saha, and D. Yao. DNS for Massive-Scale
[1] Feature Datasets. Command and Control. IEEE Transactions on Dependable and
https://doi.org/10.5281/zenodo.161198. Secure Computing, 10(3):143-153, 2013.
[2] D. J. Bernstein. djbdns. http://cr.yp.to/djbdns.html. [26] S. Yadav, A. K. K. Reddy, A.L. Narasimha Reddy, and S. Ranjan.
[3] BIND 9 Configuration Reference. Detecting Algorithmically Generated Malicious Domain Names. In
http://ftp.isc.org/isc/bind9/cur/9.10/doc/arm/ ACM Internet Measurement Conference (IMC), 2010.
Bv9ARM.ch06.html#id2586632.

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

Intro To System Administration & Maintenance
100% (3)
Intro To System Administration & Maintenance
30 pages
Unit V CN Revised
No ratings yet
Unit V CN Revised
169 pages
DNS Presentation
100% (1)
DNS Presentation
29 pages
19 April 2024
No ratings yet
19 April 2024
71 pages
DNS Interview Questions and Answers
100% (3)
DNS Interview Questions and Answers
4 pages
15-440 Distributed Systems: Lecture 19 - Naming and Hashing
No ratings yet
15-440 Distributed Systems: Lecture 19 - Naming and Hashing
46 pages
Attacking Distributed Systems The DNS Case Study: Dan Kaminsky, CISSP Senior Security Consultant
No ratings yet
Attacking Distributed Systems The DNS Case Study: Dan Kaminsky, CISSP Senior Security Consultant
36 pages
N10 009 Exam Dumps
No ratings yet
N10 009 Exam Dumps
20 pages
Ethical Hacking Module 2 (Updated)
No ratings yet
Ethical Hacking Module 2 (Updated)
36 pages
Tshooting DNS and Exchange 2000
No ratings yet
Tshooting DNS and Exchange 2000
78 pages
Assignment1-Website Design & Development-Ngân Hà - Btec - Co1k11
No ratings yet
Assignment1-Website Design & Development-Ngân Hà - Btec - Co1k11
15 pages
139 Ripe 61 RDNS Kzorba Freedman
No ratings yet
139 Ripe 61 RDNS Kzorba Freedman
22 pages
My Mule Notes
No ratings yet
My Mule Notes
9 pages
Flow Caching Effectiveness in Packet Forwarding Applications
No ratings yet
Flow Caching Effectiveness in Packet Forwarding Applications
19 pages
Domain Name System (DNS)
No ratings yet
Domain Name System (DNS)
32 pages
DNS: A Statistical Analysis of Name Server Traffic at Local Network-To-Internet Connections
No ratings yet
DNS: A Statistical Analysis of Name Server Traffic at Local Network-To-Internet Connections
16 pages
DavidKruse Analyzing Metadata Caching Windows SMB2 Client
No ratings yet
DavidKruse Analyzing Metadata Caching Windows SMB2 Client
27 pages
Lecture 1.2.1 and 1.2.2
No ratings yet
Lecture 1.2.1 and 1.2.2
29 pages
In This Issue: June 2009 Volume 12, Number 2
No ratings yet
In This Issue: June 2009 Volume 12, Number 2
32 pages
Domain Name System
No ratings yet
Domain Name System
26 pages
DNS Seminar
No ratings yet
DNS Seminar
30 pages
Dns PDF
No ratings yet
Dns PDF
14 pages
The Internet Domain Name System
No ratings yet
The Internet Domain Name System
36 pages
09 Dns
No ratings yet
09 Dns
25 pages
8-DNS Cache Poisoning-23-Feb-2021material I 23-Feb-2021 DNS Cache Poisoning
No ratings yet
8-DNS Cache Poisoning-23-Feb-2021material I 23-Feb-2021 DNS Cache Poisoning
23 pages
RC 21785
No ratings yet
RC 21785
14 pages
MCD Level 2
No ratings yet
MCD Level 2
6 pages
Day 28 30 Network Services (DHCP DNS NTP)
No ratings yet
Day 28 30 Network Services (DHCP DNS NTP)
20 pages
Identifying Load Balancers in Penetration Testing
No ratings yet
Identifying Load Balancers in Penetration Testing
13 pages
Boosting Cache Performance by Access Time Measurements: Gil Einziger, Omri Himelbrand, Erez Waisbard
No ratings yet
Boosting Cache Performance by Access Time Measurements: Gil Einziger, Omri Himelbrand, Erez Waisbard
29 pages
A Case Study Solution To DNS Cache Poisoning Attacks: Usha.g@ktr - Srmuniv.ac - in
No ratings yet
A Case Study Solution To DNS Cache Poisoning Attacks: Usha.g@ktr - Srmuniv.ac - in
7 pages
Case Study - DNS-Case-Study
No ratings yet
Case Study - DNS-Case-Study
7 pages
Resource Records Zones
No ratings yet
Resource Records Zones
8 pages
Functions of Application Layer
No ratings yet
Functions of Application Layer
30 pages
Wireshark IP v6.0 Solution Wireshark IP v6.0 Solution
100% (2)
Wireshark IP v6.0 Solution Wireshark IP v6.0 Solution
7 pages
An Optimization of CDN Using Efficient Load Distribution and RADS Caching Algorithm
No ratings yet
An Optimization of CDN Using Efficient Load Distribution and RADS Caching Algorithm
14 pages
DNS/DNSSEC Workshop
No ratings yet
DNS/DNSSEC Workshop
39 pages
Naming and The DNS: 39 Name Address
No ratings yet
Naming and The DNS: 39 Name Address
12 pages
DNS-Based Botnet Detection
No ratings yet
DNS-Based Botnet Detection
17 pages
A Crash Course in Caching - Part 1 - by Alex Xu
No ratings yet
A Crash Course in Caching - Part 1 - by Alex Xu
9 pages
Project On
No ratings yet
Project On
12 pages
DNS Phishing Attack: Mayank Agarwal, Jahanvi Kuppili, Rahul Pundir, Vedant Sharma, Vishal Yadav
No ratings yet
DNS Phishing Attack: Mayank Agarwal, Jahanvi Kuppili, Rahul Pundir, Vedant Sharma, Vishal Yadav
6 pages
Domain Name Services: DNS Queries - Recursive and Iterative
No ratings yet
Domain Name Services: DNS Queries - Recursive and Iterative
16 pages
Dns and Ip Security
No ratings yet
Dns and Ip Security
5 pages
Caching Challenges and Strategies
No ratings yet
Caching Challenges and Strategies
7 pages
Mathematical Modelling Approach in Web Cache Scheme: Hapiza@tmsk - Uitm.edu - My Norlaila@tmsk - Uitm.edu - My
No ratings yet
Mathematical Modelling Approach in Web Cache Scheme: Hapiza@tmsk - Uitm.edu - My Norlaila@tmsk - Uitm.edu - My
5 pages
DNS Cache Poisoning: Definition and Prevention: Tom Olzak March 2006
No ratings yet
DNS Cache Poisoning: Definition and Prevention: Tom Olzak March 2006
9 pages
Unbound DNS Tutorial
No ratings yet
Unbound DNS Tutorial
14 pages
DNS Prefetching and Its Privacy Implications:: When Good Things Go Bad
No ratings yet
DNS Prefetching and Its Privacy Implications:: When Good Things Go Bad
9 pages
A2 - Solution
No ratings yet
A2 - Solution
5 pages
Unbound DNS Server Tutorial @
No ratings yet
Unbound DNS Server Tutorial @
16 pages
DNS in Computer Forensics
No ratings yet
DNS in Computer Forensics
33 pages
Nae: .Chaitanya Karthik REG NO: 22BCE0894
No ratings yet
Nae: .Chaitanya Karthik REG NO: 22BCE0894
9 pages
DNS Cache Poisoning A Review On Its Technique and Countermeasures
No ratings yet
DNS Cache Poisoning A Review On Its Technique and Countermeasures
6 pages
Progress Assessment (Domain Name System (DNS) )
No ratings yet
Progress Assessment (Domain Name System (DNS) )
2 pages
Debark University: (Type The Author Name)
No ratings yet
Debark University: (Type The Author Name)
20 pages
Q1
No ratings yet
Q1
8 pages
ASCII Strings Into Network Addresses.: How DNS Works
No ratings yet
ASCII Strings Into Network Addresses.: How DNS Works
6 pages
Presentation of Domain Name System
No ratings yet
Presentation of Domain Name System
33 pages
CSS - Info Sheet 3.2-5 - Respond To Unplanned Events or Conditions - DNS
No ratings yet
CSS - Info Sheet 3.2-5 - Respond To Unplanned Events or Conditions - DNS
9 pages
CS8581 Networks Lab Manual (R2017)
No ratings yet
CS8581 Networks Lab Manual (R2017)
58 pages
DNS Best Practices, Network Protections, and Attack Identification - Cisco Systems
No ratings yet
DNS Best Practices, Network Protections, and Attack Identification - Cisco Systems
8 pages
Built-In Windows Commands To Determine If A System Has Been Hacked
No ratings yet
Built-In Windows Commands To Determine If A System Has Been Hacked
6 pages
DNS Structure and Management DNS (Domain Name System)
50% (2)
DNS Structure and Management DNS (Domain Name System)
11 pages
3 - Protocols For IoT
No ratings yet
3 - Protocols For IoT
67 pages
Unit 2
No ratings yet
Unit 2
71 pages
Cpaneltest 1
100% (1)
Cpaneltest 1
38 pages
Hbase
100% (1)
Hbase
30 pages
What Is DNS and How To Work-Microsoft Technet
No ratings yet
What Is DNS and How To Work-Microsoft Technet
91 pages
Ddos Response Playbook: Why You Should Read This Guide?
No ratings yet
Ddos Response Playbook: Why You Should Read This Guide?
21 pages
OpenBSD DNS Server
100% (1)
OpenBSD DNS Server
52 pages
CN P1295
No ratings yet
CN P1295
7 pages
Exchange 2007 Autodiscovery White Paper
No ratings yet
Exchange 2007 Autodiscovery White Paper
38 pages
Cisco IOS XE Hardening Guide - Last Updated 28 Feb 2023
No ratings yet
Cisco IOS XE Hardening Guide - Last Updated 28 Feb 2023
70 pages
Lecture - 08 - Chapter 2 - Chapter 3 - 12 Sep 2024
No ratings yet
Lecture - 08 - Chapter 2 - Chapter 3 - 12 Sep 2024
25 pages
Darshan Sem8 180704 ACN 2014
No ratings yet
Darshan Sem8 180704 ACN 2014
62 pages
Packet Formats To Remember
No ratings yet
Packet Formats To Remember
18 pages
Distributed Operating Systems: Amdahl's Law
No ratings yet
Distributed Operating Systems: Amdahl's Law
16 pages
Server Gateway: Edit File Untuk Interface (Masukkan Ip Dengan Perintah Berikut)
No ratings yet
Server Gateway: Edit File Untuk Interface (Masukkan Ip Dengan Perintah Berikut)
11 pages
١٧
No ratings yet
١٧
14 pages
Lvi Ii 04
No ratings yet
Lvi Ii 04
10 pages
Cassandra - Data Model For Twitter - Part 3 - Treselle Systems
No ratings yet
Cassandra - Data Model For Twitter - Part 3 - Treselle Systems
6 pages
Lab4 Report
No ratings yet
Lab4 Report
8 pages
Assignment - 2 2015
No ratings yet
Assignment - 2 2015
7 pages
Evaluation of Some SMTP Testing, Email Verification, Header Analysis, SSL Checkers, Email Delivery, Email Forwarding and WordPress Email Tools
From Everand
Evaluation of Some SMTP Testing, Email Verification, Header Analysis, SSL Checkers, Email Delivery, Email Forwarding and WordPress Email Tools
Hidaia Mahmood Alassouli
No ratings yet
DNS Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
DNS Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DNS in Action
From Everand
DNS in Action
Alena KabelovÃ¡
No ratings yet
DNSSEC Mastery, 2nd edition: IT Mastery, #18
From Everand
DNSSEC Mastery, 2nd edition: IT Mastery, #18
Michael W. Lucas
No ratings yet
Evaluation of Some SMTP Testing, SSL Checkers, Email Delivery, Email Forwarding and WP Email Tools: Evaluation of Some SMTP Testing, SSL Checkers, Email Delivery, Email Forwarding and WordPress Email Tools
From Everand
Evaluation of Some SMTP Testing, SSL Checkers, Email Delivery, Email Forwarding and WP Email Tools: Evaluation of Some SMTP Testing, SSL Checkers, Email Delivery, Email Forwarding and WordPress Email Tools
Dr. Hedaya Alasooly
No ratings yet

Effectiveness of DNS

Uploaded by

Effectiveness of DNS

Uploaded by

Exploring Domain Name-Based Features on the

Effectiveness of DNS Caching

Shuai Hao*† Haining Wang*

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

Figure 2: Sample of domains with the domain name-based features.

Table 1: Summary of datasets

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

dataset - 2 dataset - 2 dataset - 2 dataset - 2

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

Cache hit rate

Cache hit rate

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

ACM SIGCOMM Computer Communication Review Volume 47 Issue 1, January 2017

You might also like

Shuai Hao† Haining Wang