On The Effectiveness of Using State-Of-The-Art Machine - 2012
On The Effectiveness of Using State-Of-The-Art Machine - 2012
ABSTRACT For example, Martin and Shamir gave a classical example of such
Cryptographic distinguishing attacks, in which the attacker is able amplification [1]. More recently, Albrecht, Paterson, and Watson
to extract enough “information” from an encrypted message to gave another example in which they succeeded in attacking one of
distinguish it from a piece of random data, allow for powerful the most widely used Internet security softwares, the OpenSSH,
cryptanalysis both in theory and in practice. In this paper, we by turning distinguishing attacks into plaintext-recovery attacks
report our experience of applying state-of-the-art machine [2]. Therefore, distinguishing attacks have been playing an
learning techniques to launch cryptographic distinguishing attacks important role in modeling cryptographic ciphers, and many
on several public datasets. We try several kinds of existing and cryptographers believe that it is computationally infeasible to
new features on these datasets and find that the ciphers’ “modes launch distinguishing attacks against reasonably secure ciphers
of operation” dominate the performance of classification tasks. such as DES and AES.
When CBC mode is used with a random initial vector for each In this paper, we focus on an important, albeit slightly easier task
plaintext, the performance is extremely bad, while the in cryptanalysis: Identification of encryption algorithm. It is easier
performance for certain datasets is relatively good when ECB in the sense that we don’t need to get too involved in what
mode is used. We conclude that, in contrary to the findings of random data is from a technical or philosophical viewpoint.
several existing works, the state-of-the-art machine learning Furthermore, such a task can be important in scenarios like digital
techniques cannot extract useful information from ciphertexts forensics because only the evidence from computer media is
produced by modern ciphers operating in a reasonably secure available. In these cases, we don’t even know which cipher was
mode such as CBC, let alone distinguish them from random data. used to encrypt the messages, whereas in textbook cryptanalysis
scenarios, the encryption algorithm is always given. In order to
Categories and Subject Descriptors recover useful information without using any meta-data, the
D.4.6 [Security and Protection]; I.2.1 [Applications and Expert technique of identification of encryption methods is needed.
Systems]; I.5.4 [Applications]; K.4.1 [Public Policy Issues]: Overall, this problem has not been investigated much in the
Abuse and crime involving computers literature. Furthermore, the few papers that have paid some
attention to it almost all use a set of similar features and claim
some success for ciphers operating in simple modes. In this paper,
Keywords we compare the performance of existing features in different
Computer Forensics, Cryptographic Distinguishing Attacks, scenarios and show that the classification accuracy can
Identification of Encryption Algorithm, Machine Learning significantly differ when different modes of cipher operation are
used. Without loss of generality, we only consider binary-class
1. INTRODUCTION cases, as multi-class tasks can be easily done by extending the
In cryptography, if an attacker can extract enough information approaches used in binary cases.
from a ciphertext and distinguish it from random data, then we say
We design different scenarios by introducing different modes of
that he or she succeeds in launching a distinguishing attack. Such
operations in encryption process. The mode of operation is a
an attack might seem innocuous at a first glance, but it can
procedure that repeatedly uses a block cipher with a fixed key to
actually lead to several powerful cryptanalytic attacks.
encrypt a message whose length is larger than one block. The
simplest one is electronic codebook (ECB) mode. In ECB mode, a
message is divided into several blocks, and each block is
encrypted independently. The advantage is speed because
Permission to make digital or hard copies of all or part of this work for encryption of different blocks can happen in parallel. However,
personal or classroom use is granted without fee provided that copies are such a mode doesn’t provide semantic security, as the same
not made or distributed for profit or commercial advantage and that copies plaintext block always encrypts to the same ciphertext block. The
bear this notice and the full citation on the first page. To copy otherwise,
cipher-block chaining (CBC) mode is the most commonly used
or republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee.
one. In CBC mode, the message is also divided into blocks, but
AISec’12, October 19, 2012, Raleigh, North Carolina, USA. before each block is encrypted, the plaintext is XORed with the
Copyright 2012 ACM 978-1-4503-1664-4/12/10...$15.00. ciphertext of previous block. For the first block, an initialization
105
vector (IV) is used to be XORed with the plaintext. Thus each JPEG format. We note that the category information is not used in
ciphertext depends on all blocks processed up to the current block. the experiments below. For audio files, MajorMinor dataset is
used [17], which contains 2174 audio files in WAVE format.
2. RELATED WORK The experiments are divided into two parts. In the first part, we
Genetic algorithm based methods are widely used in recovering build one instance with one ciphertext. That is, we extract features
secret keys in encryption algorithms, such as for substitution of one instance from only one ciphertext sequence. To eliminate
cipher [1], transposition cipher [4], knapsack cipher [5], and the effect of class imbalance, we only use 1000 ciphertexts for
Feistel cipher [6], by localized searching in the key space. Neural each class. The rule is very simple: for Reuters21578 and
networks are also used to break cryptosystems [7][8]. As will be MajorMiner, we choose the largest 1000 documents. For Caltech
detailed below, there are already some existing works on cipher 101 dataset, we choose the largest 1000 images in “motorcycle”
classification based on statistics techniques.. and “airplane” categories. In the second part, each instance is built
to contain multiple ciphertexts, as we want to see if machine-
There are some works done by Pooja on the classification of
learning algorithms can perform better by using more types of
classical ciphers [9]. It includes substitution cipher, permutation
information, e.g., positions in ciphertext sequences. Each
cipher, polyalphabetic cipher, and a combination of permutation
ciphertext is generated by randomly picking a plaintext from
and substitution cipher. Several cost functions are proposed to
dataset (with replacement); a random IV also needs to be picked if
distinguish classical ciphers by sorted or unsorted frequency of
CBC mode is used. The block ciphers used below are Data
letters. An expected frequency of letters is also required, which is
Encryption Standard (DES) and Advanced Encryption Standard
drawn from common English texts.
(AES), where 128-bit version of AES is used. Besides, the result
Some early work of classifying modern ciphers has done by generated from the stream cipher RC4 is also included. In all
Chandra [10] by combining several decision logics to classify experiments, a fixed random key is used for each cipher. In each
modern ciphers. Dileep [11] proposed to use support vector experiment, the datasets are divided into 5 parts, and we
machine (SVM) and bag-of-words model for identification of repeatedly use four of them as the training data while the
block ciphers, which builds common or class-specific dictionary remaining one as the testing data. We use cross-validation to find
of (1) fixed length words and (2) variable-length words. Saxena the best linear solver and parameters for each part, and the final
proposed to use linear programming on the segments of results are the average of the 5 testing data parts.
ciphertexts to generate many test vectors [12] and use SVM to
The main classifiers used are linear solvers in LIBLINEAR [18],
find good test vectors. Sharif used a number of classifiers on 8-bit
including L2-regularized L2-loss support vector classification
histogram features for identification of encryption methods and
(dual), L2-regularized L1-loss support vector classification (dual),
reported that random forests outperform all other classifiers 錯誤!
L1-regularized L2-loss support vector classification, and L2-
找不到參照來源。. Manjula proposed to use several features regularized logistic regression (dual). The linear classifiers are
such as entropy, correlation coefficient of uppercase letters, and very fast and suitable for bag-of-words model. For some
size of files to identify encryption algorithms by decision tree [14]. experiments, SVM with Gaussian kernel is also used to deal with
As we will demonstrate in the rest of this paper, almost all these small number of features via LIBSVM [19].
related works won’t work against a reasonably secure cipher We use OpenSSL1 as our encryption tool, which is open-source
operating in CBC mode. We will also give the reasons why they and designed originally for the SSL/TLS protocol implementation.
seemingly work in their reports and suggest what we should do in The random IVs are generated by Mersenne twister, a
the future when doing research in this direction. sophisticated pseudo random number generator [20].
106
is not useful in our datasets because each word appears at nearly produce ciphertexts, then the resulting accuracy becomes close to
the same frequency. 50%, i.e., no better than coin flipping. This is because CBC mode
can eliminate repeated patterns in ciphertexts. Besides, in the
three bottommost rows, we try all 3 datasets with the same cipher
Table 1. The list of features used in experiments but different modes of operation as labeled. Two of them can be
Feature Dimension Notation classified with 100% accuracy, while image data has only 67.05%
accuracy. There are two possible reasons. (1) A JPEG image
Entropy (1 symbol = 16 1 ENT1 consists of multiple segments, each of which begins with a
bits) marker2. Hence, the positions of one marker may vary in different
Entropy (1 symbol = 12 1 ENT2 files. (2) JPEG is a compressed format, which has higher entropy
bits) than uncompressed formats like text files. Nevertheless, the
overall results of classification based on modes of operation are
Number of 16-bit symbols 1 NSYM1
still quite acceptable.
Number of 12-bit symbols 1 NSYM2
We also try the varying length words feature (in Table 4),
16-bit histogram 65536 HIST originally introduced by Dileep. The dictionary is directly built
XORed with previous 16 65536 XOR1 from the instances we used. In summary, 949540 words are found
bits and build 16-bit from Reuters21578, while 3449174 words are found from
histogram Caltech101, but this feature still does not help anymore in CBC
mode. As AES has passed some standard NIST randomness tests
XORed with previous 64 65536 XOR2 [21], we further propose several randomness-related features not
bits and build 16-bit included in the NIST tests. The classification results are in Table
histogram 5, which shows that the accuracy is still around 50%. Therefore,
XORed with previous 128 65536 XOR3 the existing features do not seem to be effective in this scenario.
bits and build 16-bit The results of the case that an instance contains multiple
histogram ciphertexts are listed in Table 6. The term “bagsize” refers to the
number of ciphertexts included in one instance. From the table,
Varying length words Varies with data VLW we found the accuracy tends to be around 50% as the bag size
Distribution of intervals Varies with data INT increases.
between 0x00
Ratio of zero in i-th byte, 128 ZRO_RATI Table 2. Classification results of entropy-related features
i=1…128 O
Modes of
Entropy of the i-th byte, 128 ENT_BYTE Datasets Ciphers Features Accuracy
operation
i=1…128 ENT1+ 74.10%
Reuters2 AES vs. ENT2+
ECB 80.20%
The tenth feature is inspired by the varying length words 1578 DES NSYM1+
NSYM2 (RBF)
representation. The idea is to use only one delimiter, so we can
record the length of interval between two delimiters. ENT1+ 49.3%
Reuters2 AES vs. ENT2+
CBC 48.00%
3.3 Experiment Result 1578 DES NSYM1+
(RBF)
Table 2 shows the results of entropy-related features proposed by NSYM2
Manjula, in which results labeled with RBF are obtained using ENT1+ 51.45%
SVM with Gaussian kernel. Only Reuter21578 datasets can be Caltech1 AES vs. ENT2+
ECB 53.94%
partially classified with just 4 features in ECB mode. We believe 01 DES NSYM1+
NSYM2 (RBF)
the main reason is that the block sizes of AES and DES are not
equal, and naturally the ciphertexts produced by AES tend to have ENT1+ 50.05%
higher entropy because it uses larger blocks. Caltech1 AES vs. ENT2+
CBC 48.49%
01 DES NSYM1+
Besides, the content or size of plaintexts may implicitly affect the (RBF)
NSYM2
entropy. For example, some of documents in Reuters21578 have
ENT1+ 50%
similar titles (No. 15871 and No. 15875), and some of the images
MajorM AES vs. ENT2+
in Caltech101 also have the same headers because their resolution ECB 49.80%
iner DES NSYM1+
is the same. For WAVE files, the results are not as strong. Our (RBF)
NSYM2
reasoning goes as follows. Assume two plaintext messages have
ENT1+ 50%
one same block in the beginning, but other bits are totally
MajorM AES vs. ENT2+
different and random. Then the entropy should increase and CBC 49.65%
iner DES NSYM1+
approach maximum as the message size increases, resulting in (RBF)
NSYM2
poorer performance in classifying larger WAVE files.
Table 3 shows the results of histogram-related features. The
cipher used can be identified in all 3 datasets in ECB mode. It is
2
consistent with the results obtained in Dileep’s and Sharif’s works. http://class.ee.iastate.edu/ee528/Reading%20material/JPEG_File
However, if CBC mode is used, and if different IVs are used to _Format.pdf
107
Table 3. Classification results of histogram-related features
Modes of Table 5. Classification results of histogram-based features
Datasets Ciphers Accuracy
operation constructed from XORed segments and intervals between the
AES vs. delimiter ‘0x00’
Reuters21578 ECB 100%
DES Modes of
AES vs. Datasets Ciphers Features Accuracy
Reuters21578 CBC 51.05% operation
DES Reuters2157 AES vs.
AES vs. XOR1 CBC 49.10%
Caltech101 ECB 100% 8 DES
DES AES vs.
AES vs. Caltech101 XOR1 CBC 49.15%
Caltech101 CBC 49.95% DES
DES AES vs.
AES vs. MajorMiner XOR1 CBC 50%
MajorMiner ECB 100% DES
DES Reuters2157 AES vs. XOR2+
AES vs. CBC 51.05%
MajorMiner CBC 50% 8 DES XOR3
DES AES vs. XOR2+
Reuters21578 AES CBC vs. ECB 100% Caltech101 CBC 49.45%
DES XOR3
Caltech101 AES CBC vs. ECB 67.05% AES vs. XOR2+
MajorMiner AES CBC vs. ECB 100% MajorMiner CBC 50%
DES XOR3
Reuters2157 AES vs. INT+X
CBC 52.55%
Table 4. Classification results of varying length words features. 8 DES OR1
AES vs. INT+X
Modes of Caltech101 CBC 48.90%
Datasets Ciphers Features Accuracy DES OR1
operation
Reuters AES vs.
VLW CBC 49.05% Table 6. Classification results using multiple ciphertexts
21578 DES
Caltech AES vs. encrypted in CBC mode
VLW CBC 49.55%
101 DES Datasets Ciphers Features Bagsize Accuracy
Reuters AES vs. ZRO_RATIO
100 48.10%
Even for RC4, which has been shown to have biased outputs in 21578 DES + ENT_BYTE
the second byte 錯 誤 ! 找 不 到 參 照 來 源 。 , we still cannot Reuters AES vs. ZRO_RATIO
200 50%
distinguish it from AES, as is evident from the fact that accuracy 21578 DES + ENT_BYTE
is still around 50%. It shows that more training data or a larger Caltech AES vs. ZRO_RATIO
100 49.35%
bag size might be required. 101 DES + ENT_BYTE
Caltech AES vs. ZRO_RATIO
200 50.25%
4. DISCUSSION AND CONCLUSION 101 DES + ENT_BYTE
Our experiments show that the difficulty of this task may varies MajorM AES vs. ZRO_RATIO
100 49.55%
with type of plaintexts, size of documents, and the modes of iner DES + ENT_BYTE
operation used to encrypt. Several existing features are used to MajorM AES vs. ZRO_RATIO
200 50%
predict ciphers when different modes of operation, ciphers, or iner DES + ENT_BYTE
types of plaintexts are given. We found that the existing features Reuters AES vs. ZRO_RATIO
100 49.90%
are still not capable of distinguishing encryption algorithms in the 21578 RC4 + ENT_BYTE
scenario in which CBC mode is used with different IVs assigned Reuters AES vs. ZRO_RATIO
200 50%
to each ciphertext. In fact, random IV is also an important factor 21578 RC4 + ENT_BYTE
in this problem. For example, if only one fixed IV is assigned for Caltech AES vs. ZRO_RATIO
100 49.30%
every ciphertext produced by a fixed secret key, then those 101 RC4 + ENT_BYTE
plaintexts with the same header must be encrypted in the same Caltech AES vs. ZRO_RATIO
200 50.05%
manner, and the contents of first block will be the same as well. 101 RC4 + ENT_BYTE
Therefore, the classification task would be a little bit easier. Since MajorM AES vs. ZRO_RATIO
100 50.40%
the IVs are seldom the same in real world applications, this task is iner RC4 + ENT_BYTE
still very hard and challenging today. MajorM AES vs. ZRO_RATIO
200 50.10%
Overall, we find that state-of-the-art machine learning techniques iner RC4 + ENT_BYTE
are not yet effective for identification of encryption algorithm
used given only a reasonably large number of sample ciphertexts.
Despite that there have been successful reports in the literature, 5. ACKNOWLEDGMENTS
our experiments show that these works are flawed in the sense that This work was supported in part by National Science Council,
they didn’t consider CBC mode of operation with random IV, National Taiwan University and Intel Corporation under Grants
which is the recommended configuration capable of providing the NSC 100-2911-I-002-001, and 101R7501
basic level of security. Perhaps more advanced machine learning
techniques could be applied in this problem, but we suggest that
researchers must use ciphers in CBC or similar mode with a
random IV in the future.
108
6. REFERENCES [13] Suhaila O. Sharif, L.I. Kuncheva, S.P. Mansoor (2010).
[1] Itsik Mantin and Adi Shamir (2001). A Practical Attack on Classifying encryption algorithms using pattern recognition
Broadcast RC4. FSE, pp152 – 164. techniques. Information Theory and Information Security
(ICITIS), 2010 IEEE International Conference on ,
[2] Martin R. Albrecht, Kenneth G. Paterson, and Gaven J. pp.1168-1172.
Watson (2009). Plaintext Recovery Attacks against SSH. doi: 10.1109/ICITIS.2010.5689769
IEEE Symposium on Security and Privacy, pp. 16–26.
[14] R. Manjula and R. Anitha (2011). Identification of
[3] R. Spillman, M. Janssen, B. Nelson, and M. Kepner (1993). Encryption Algorithm Using Decision Tree. Advanced
Use of a genetic algorithm in the cryptanalysis of simple Computing Communications in Computer and Information
substitution ciphers. Cryptologia, vol. 17, no. 1, pp. 31–44. Science 2011 Volume 133, Part 3, 237-246.
[4] R. A. J. Matthews (1993). The use of genetic algorithms in [15] Lewis, D. D (1996). Reuters-21578 Text Categorization Test
the cryptanalysis. Cryptologia, vol. 17, no. 4, pp. 187–201. Collection Distribution. In AT&T Labs – Research.
[5] R. Spillman (1993). Cryptanalysis of knapsack ciphers using [16] Li Fei-Fei, Rob Fergus, and Pietro Perona (2007). Learning
genetic algorithms. Cryptologia, vol. 17, no. 4, pp. 367–377. generative visual models from few training examples: An
[6] A. M. B. Albassal and A-M. A. Wahdan (2004). Genetic incremental Bayesian approach tested on 101 object
algorithm cryptanalysis of a Feistel type block cipher. In categories. Computer Vision and Image Understanding. 106,
proceedings of IEEE International Conference on Electrical, 1 (April 2007), 59-70. DOI=10.1016/j.cviu.2005.09.012
Electronic and Computer Engineering (ICEEC’04), pp. 217– http://dx.doi.org/10.1016/j.cviu.2005.09.012
221. [17] M. I. Mandel and D. P. W. Ellis (2008). A web-based game
[7] Z. Ramzan (1998). On using neural networks to break for collecting music metadata. Journal of New Music
cryptosystems. Technical report, Laboratory of Computer Research, vol. 37, no. 2, pp. 151–165.
Science, Massachusetts Institute of Technology. Cambridge, [18] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J.
MA 02139. Lin (2008). LIBLINEAR: A library for large linear
[8] A. M. B. Albassal and A-M. A. Wahdan (2004). Neural classification. Journal of Machine Learning Research
network based cryptanalysis of a Feistel type block cipher. In 9(2008), 1871-1874.
proceedings of IEEE International Conference on Electrical, [19] C.-C. Chang and C.-J. Lin (2011). LIBSVM: a library for
Electronic and Computer Engineering, ICEEC’04, pp. 231– support vector machines. ACM Transactions on Intelligent
237. Systems and Technology. 2, 3, Article 27 (May 2011), 27
[9] Pooja Maheswari (2001). Classification of ciphers. M. Tech pages. DOI=10.1145/1961189.1961199
Thesis. Department of Computer Science and Engineering, http://doi.acm.org/10.1145/1961189.1961199
Indian Institute of Technology. Kanpur. [20] M. Matsumoto, T. Nishimura (1998). Mersenne twister: a
[10] Girish Chandra. The classification of modern ciphers (2001). 623-dimensionally equidistributed uniform pseudo-random
M. Tech Thesis. Department of Computer Science and number generator. ACM Transactions on Modeling and
Engineering, Indian Institute of Technology, Kanpur. Computer Simulation 8 (1): 3–30.
[11] A. Dileep and C. Chandra Sekhar (2006). Identification of [21] Soto J. (1999). Randomness testing of the AES candidate
Block Ciphers using Support Vector Machines. International algorithms, NIST IR 6390.
Joint Conference on Neural Networks Vancouver, Canada,
pp. 2696-2701.
[12] G. Saxena (2008). Classification of Ciphers using Machine
Learning. Master’s thesis, Department of Computer Science
and Engineering, Indian Institute of Technology. Kanpur.
109