0% found this document useful (0 votes)
62 views5 pages

Steganography in Text by Using MS Word D Symbols: Khaled Elleithy, Miad Faezipour

Uploaded by

humaishfaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views5 pages

Steganography in Text by Using MS Word D Symbols: Khaled Elleithy, Miad Faezipour

Uploaded by

humaishfaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of 2014 Zone 1 Conference of the American Society for Engineering Education

E (ASEE Zone 1)

Steganography in Text by Using MS Word


d Symbols
Ammaar Odeh, Khaled Elleithy, Miad Faezipour
Computer Science & Engineering
University of Bridgeport
Bridgeport, CT06604, USA
[email protected], [email protected], [email protected]
u

Abstract – The massive amount of datta transfer over B. Motivation


internet raises different challenges such aas channel types, Nowadays Steganography uses digital
d media to cover the
transmission time and data security. In this paper, we secret message. Stego carrier filess are classified into four
present a novel secure algorithm to hidee the data inside categories as shown in Figure 1.
document files, where four symbols are ussed to embed the
data inside the carrier file. The main proceess depends on a
key to produce a symbol table and matchh the data to be
hidden with the representative symbols. This method can be
extended to any language and does not change the file
format. In addition, the capacity ratio oof the presented Carrier file
algorithm is high compared to other algorithhms.

Keywords: Carrier file, Zero width chractter, Information


Hiding,Stego Key. Image Audio Text Video
I. INTRODUCTION

A. Background

D ifferent strategies are used to prootect transmitted


data from eavesdroppers. Traditionally,
cryptography is used, which is defined as ddata protection by
Figure 1.Steganography Carrier files

converting a readable message into cipher fform, preventing Image represents a popular carrrier for secret messages,
any middle users to read the originaal message [1]. especially in RGB format. The image can be changed
Cryptography may face brute force attackks to analyze the through the least significant bit in each pixel to substitute
encrypted message and conclude the secrett information [2]. secret data on it [7]. Other algoriithms use audio files as
Alternatively, other approaches hide the secret message carriers by using frequency domain n control limits for upper
inside a public carrier file while manipulatinng it to insert the and lower frequencies. Video represents merging image and
secret message [3]. In the regard, steganogrraphy attempts to audio properties to hide data [8].
avoid any suspicions by avoiding user fille analysis. Thus Text represents the hardest carrier file to hide data where
everyone can read the carrier file but only authorized users it contains small redundant data compared
c to image and
can extract the hidden data. audio files [9, 10]. On the other o hand, some text
steganography algorithms depend on language properties
Information hiding mainly consists of two branches, which could restrict the algorithm m applications to those
Digital Watermarking and Steganography. S Steganography is specific languages. Text steganograaphy can be divided into
an art of sending invisible messagges. The word 3 categories:-
Steganography is derived from Greek wordss; “Stego” means
“cover” and “Grapha” means “writing” [4]]. Most historical 1. Format Based: - by changing format of the carrier file,
stories about steganography are recorded bback to 440 BC. we can pass our secret message. Format
F strategies depend
One story says that Greek shave the priisoners head and on language properties. Thus, so ome algorithms can be
wrote secret messages on his scalp. Wheen his hair grew applied to specific languages and caannot be applied to other
back, the king would send him to the otheer side where no languages. Some methods are geenerically enough to be
one could read that message [5]. Otherr famous stories applied to any text regardless of the
t carrier file language
indicate that words were used to write secrret messages and [11].
were covered by wax. The cover tablet wass then sent to the 2. Random and Statistical Gen neration Methods: in this
receiver who would remove the wax and read the hidden strategy, a cover text is generaated depending on the
message [6]. statistical properties of the languag
ge. Probabilistic context-

978-1-4799-5233-5/14/$31.00 ©2014 IEEE


free grammar (PCFG) is the most common strategy used to data can be hidden inside the text file. Another algorithm
produce the cover file. Other strategies employ word was introduced in [19] by merging between three languages
statistics such as letter frequency and word length [10, 12]. Chinese, Arabic, and English. In this approach, the authors
3. Linguistic Methods: these methods can be divided create two tables; the first one is used for storing Arabic
into two groups. The first group is the syntactic methods Diacritics and the other table is used for storing English
that depend on some punctuation signs to hide the data. The letters. By translating Chinese text into English sentences,
second category creates a synonym dictionary and replaces each English letter would correspond to two Arabic
the interactive word by some carrier file word to pass the Diacritics. Then, the Arabic text is created which contained
hidden bits [12, 13]. selected Diacritics.
C. Main Contribution and Paper Organization
III. PROPOSED ALGORITHM
A novel text steganography algorithm is presented in
this paper. The main idea is to use word symbols that enable The algorithm presented in this paper hides data inside a
us to hide 4 bits and avoid intruder suspicions. word file without inferring any changes in the file properties
The rest of this paper is organized as follows. In Section like file size, content and format. The proposed algorithm
II we discuss previous text Steganography techniques. employs some invisible symbols to hide four bits between
Symbols insertion algorithm is discussed in Section III. letters, which improves the hidden capacity ratio compared
Section IV discusses and analyzes the presented algorithm. to other algorithms. Moreover, no changes in the word
Finally, conclusions are offered in Section V. format or letter shape would be made. Furthermore,
suggested algorithm avoids suspicions and any
stegoanalyzer noticeability, which will in turn, improve the
II. PRIOR WORK
algorithm robustness. Inserting one of the table variation
symbols after each letter enables us to hide four bits.
Word Synonym [14-16] is classified as one of the Mainly, we use Right remark (200E), Left remark (200F),
semantic steganography methods. This method focuses on Zero width joiner (200D), and Zero width non-joiner (200C)
replacing some of the words by their synonyms. In this by embedding any of these symbols to Steganography
technique, the hidden data will be transmitted without being
carrier file data.
suspicions to the attackers. However, in this method the data
size is considered small compared to the other methods but TABLE I. SAMPLE OF HIDDEN BITS BY USING WORD SYMBOLS
it could change the sentence meaning.
Right Left ZWJ ZWNJ Hidden
Another method uses punctuations like (.) and (;) to
represent hidden text. For example, “NY, CT, and NJ" is Remark Remark code
similar to "NY, CT and NJ" where the extra comma is used X X X X 0000
to represent 1 or to represent 0. The amount of hidden data X X X 0001
in this method is very small in comparison to the amount of
the cover media. Inconsistence use of punctuation might be X X 0101
noticeable from a Stegoanalysis perspective [16].
X X 0011
Line shifting involves vertically shifting a line a little to
hide information to create a unique shape of the text. X 0111
Unfortunately, line shifting can be detected by character X X X X 1111
recognition programs. Moreover, retyping the document
will remove all the hidden data [14]. X X X 1101
Two other Text Steganography algorithms were
introduced in [17], where the space character was added In Table I we present some of the hidden codes when
after words and two bits were encoded. Depending on the inserting the word symbols. For instance, if we insert all
number of word letters, and the number of space characters four symbols then the passing bits code is 0000. In this
after that word, one of the values in the set {00, 01, 10, 11} technique, different variations can be used to represent
would be passed. The second scenario suggests a new hidden bits for a total of 16 different codes.
spacing method, where single spaces were used to pass 0,
and double spaces were used to pass 1. The previous two Figure 2 represents the data hiding scenario/steps when
methods have a problem since a word processor can using three inputs; the carrier file, hidden data, and Stego
highlight the additional spaces. key. The main purpose of Stego key is to change the
a [18]new method was introduced to hide data inside symbols bit representation. In other words, 0 represents bit
Telugu text by horizontally shifting inherent vowel signs. absence while the other state represents a 1. In the next step,
The main advantage of this method is that huge amount of a symbols table is created depending on the Stego key; we
insert four bits from the hidden data after each letter in IV. ANALYTICAL DISCUSSION
carrier file.
Table II shows the capacity for the new algorithm and
two other algorithms which are applied to different visited
The capacity of the carrier file is computed as follows:
web sites. Text steganography was used in those pages to
Capacity of carrier file = Number of letters×4 (1)
evaluate the hidden capacity. Figure 3 shows a comparison
histogram for the three algorithms.
So the hidden capacity of our algorithm is:
Capacity Ratio= (Number of letters ×4) / carrier file size
The algorithm has many advantages over other
(2)
algorithms. For example, this algorithm can be applied to
The receiver can extract the hidden data by reading the
any language regardless of if it is Unicode or ASCII codes,
carrier file and using the Stego key to build the symbols
where other algorithms such as [11, 18, 19] can be applied
table. Reading the symbols after each letter and matching
to only some Unicode languages. Moreover, there is no need
them with the symbols table would enable the receiver to
for special software or equipment to hide the data and
extract the hidden data.
extract it. The algorithm does not change the file format
since the used symbols do not affect the format of the
letters. Consequently, this algorithm improves transparency
feature which is one of key Steganography objectives.

V. CONCLUSION
Different algorithms have been presented in literature to
hide data inside text files. Some of these methods were
designed to be applied to specific languages, while others
are generic and can be applied to any language. In this
paper, we introduced a novel algorithm that can be used to
hide data inside document files of any language by using
word symbols. Our technique employed Remarks (Right
Remark, Left Remark, ZWJ, and ZWNJ) symbols which can
be used in any language and at any position in the words.
These scenarios enable the user to pass 4 bits between any
two letters. In addition, the algorithm has been enhanced by
using a Stego key to create symbols table representation

Figure 2. Data Hiding Algorithm

4500
No. of hidden bits (Capcity)

4000
3500
3000
2500
2000 Word Shift
1500 Line Shift
1000
500 Suggested Algorithm
0
CNN BBC NYPOST TheGuardian
Website

Figure 3. Comparison between three algorithms


TABLE II. REPRESENT THE SIMULATION RESULT OF FILE SIZE AND NUMBER OF BITS CAN BE INSERT IN TO CARRIES WEB PAGES

Web site Size Number Number Number Our Line shift Word shift
(K.B) of lines of words of letters Algorithm algorithm algorithm
1 www.cnn.com 19.8 74 763 4592 928 4 39
2 www.bbc.com 19.3 67 749 4065 842 3 39
3 www.nypost.com 19.8 48 634 3532 714 2 32
4 www.guardian.co.uk 21 64 935 5625 1071 3 45
5 www.ctpost.com 20.5 51 640 3652 713 2 31
[12] V. N. Rao and D. D. Shulman, Classical Telugu
REFERENCES poetry: an anthology: University of California
Press, 2002.
[1] M. Shirali-Shahreza, "Pseudo-space Persian/Arabic [13] K. Bennett, "Linguistic steganography: Survey,
text steganography," in Computers and analysis, and robustness concerns for hiding
Communications, 2008. ISCC 2008. IEEE information in text," CERIAS Technical Report 3,
Symposium on, 2008, pp. 864-868. Purdue University, pp. 1-30, 2004.
[2] W. A. Arbaugh, N. Shankar, Y. J. Wan, and K. [14] M. H. Shirali-Shahreza and M. Shirali-Shahreza,
Zhang, "Your 80211 wireless network has no "A new approach to Persian/Arabic text
clothes," Wireless Communications, IEEE, vol. 9, steganography," in Computer and Information
pp. 44-51, 2002. Science, 2006 and 2006 1st IEEE/ACIS
[3] M. Shirali-Shahreza and S. Shirali-Shahreza, International Workshop on Component-Based
"Steganography in TeX documents," in Intelligent Software Engineering, Software Architecture and
System and Knowledge Engineering, 2008. ISKE Reuse. ICIS-COMSAR 2006. 5th IEEE/ACIS
2008. 3rd International Conference on, 2008, pp. International Conference on, 2006, pp. 310-315.
1363-1366. [15] M. Nosrati, R. Karimi, and M. Hariri, "An
[4] R. Krenn, "Steganography and steganalysis," introduction to steganography methods," World
Retrieved September, vol. 8, p. 2007, 2004. Applied Programming, vol. 1, pp. 191-195, 2011.
[5] J. Silman, "Steganography and steganalysis: an [16] M. H. Shirali-Shahreza and M. Shirali-Shahreza,
overview," SANS Institute, vol. 3, pp. 61-76, 2001. "Text steganography in chat," in Internet, 2007.
[6] B. Dunbar, "A detailed look at Steganographic ICI 2007. 3rd IEEE/IFIP International Conference
Techniques and their use in an Open-Systems in Central Asia on, 2007, pp. 1-5.
Environment," SANS Institute, 2002. [17] W. Bender, D. Gruhl, N. Morimoto, and A. Lu,
[7] N. F. Johnson and S. Jajodia, "Exploring "Techniques for data hiding," IBM systems journal,
steganography: Seeing the unseen," IEEE vol. 35, pp. 313-336, 1996.
computer, vol. 31, pp. 26-34, 1998. [18] S. ALAMETI, A. POTHALAIAH, and A. BABU,
[8] F. Djebbar, B. Ayad, H. Hamam, and K. Abed- "A New Approach to Telugu Text Steganography
Meraim, "A view on latest audio steganography by Shifting Inherent Vowel," International Journal
techniques," in Innovations in Information of Engineering Science and Technology, vol. 2, pp.
Technology (IIT), 2011 International Conference 7203-7214, 2010.
on, 2011, pp. 409-414. [19] A. C. Shakir, G. Xuemai, and J. Min, "Chinese
[9] V. Potdar and E. Chang, "Visibly Invisible: Language Steganography using the Arabic
Ciphertext as a Steganographic Carrier," in Diacritics as a Covered Media," International
Proceedings of the 4th International Network Journal of Computer Applications IJCA, vol. 11,
Conference (INC2004), 2004, pp. 385-391. pp. 24-28, 2010.
[10] S. Bhattacharyya, I. Banerjee, and G. Sanyal, "A
novel approach of secure text based steganography
model using word mapping method (WMM)," Ammar Odeh is a PhD. Student in University of Bridgeport. He earned the
Journal on “International Journal of Computer M.S. degree in Computer Science College of King Abdullah II School for
Information Technology (KASIT) at the University of Jordan in Dec. 2005
and Information Engineering, vol. 4, p. 2, 2010. and the B.Sc. in Computer Science from the Hashemite University. He has
[11] R. Prasad and K. Alla, "A new approach to Telugu worked as a Lab Supervisor in Philadelphia University (Jordan) and
text steganography," in Wireless Technology and Lecturer in Philadelphia University for the ICDL courses and as technical
Applications (ISWTA), 2011 IEEE Symposium on, support for online examinations for two years. He served as a Lecturer at
the IT, (ACS,CIS ,CS) Department of Philadelphia University in Jordan,
2011, pp. 60-65.
and also worked at the Ministry of Higher Education (Oman, Sur College online in real-time via the internet and was successfully running for four
of Applied Science) for two years. Ammar joined the University of years. Dr. Elleithy is the editor or co-editor of 10 books published by
Bridgeport as a PhD student of Computer Science and Engineering in Springer for advances on Innovations and Advanced Techniques in
August 2011. His area of concentration is reverse software engineering, Systems, Computing Sciences and Software.
computer security, and wireless networks. Specifically, he is working on
the enhancement of computer security for data transmission over wireless Dr. Miad Faezipour is an Assistant Professor in the Computer Science
networks. He is also actively involved in academic community, outreach and Engineering program at the University of Bridgeport and the director
activities and student recruiting and advising. of the D-BEST Lab since July 2011. Prior to joining UB, she has been a
Post-Doctoral Research Associate at the University of Texas at Dallas
Dr. Khaled Elleithy is the Associate Dean for Graduate Studies in the collaborating with the Center for Integrated Circuits and Systems and the
School of Engineering at the University of Bridgeport. He has research Quality of Life Technology laboratories. She received the B.Sc. in
interests are in the areas of network security, mobile communications, and Electrical Engineering from the University of Tehran, Tehran, Iran and the
formal approaches for design and verification. He has published more than M.Sc. and Ph.D. in Electrical Engineering from the University of Texas at
two hundred fifty research papers in international journals and conferences Dallas. Her research interests lie in the broad area of biomedical signal
in his areas of expertise. Dr. Elleithy is the co-chair of the International processing and behavior analysis techniques, high-speed packet processing
Joint Conferences on Computer, Information, and Systems Sciences, and architectures, and digital/embedded systems. Dr. Faezipour is a member of
Engineering (CISSE). CISSE is the first Engineering/Computing and IEEE and IEEE women in engineering.
Systems Research e-Conference in the world to be completely conducted

You might also like