0% found this document useful (0 votes)
92 views

Unit5 - Data Compression and Cryptography

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

Unit5 - Data Compression and Cryptography

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Data compression &

Cryptography
Why Compress

•Conserve storage space


•Reduce time for transmission
– Faster to encode, send, then decode than to send the
original
•Progressive transmission
– Some compression techniques allow us to send the most
important bits first so we can get a low resolution version of
some data before getting the high fidelity version
•Reduce computation
– Use less data to achieve an approximate answer
Data Compression
➢Data Compression is a reduction in the number of bits
needed to represent data. Compressing data can save
storage capacity, speed file transfer, and decrease costs for
storage hardware and network bandwidth.
➢Data Compression refers to the reducing the number of
bits that need to be transmitted over communication
channel.
➢ Data Compression reduces the number of bits sent
➢Data Compression becomes particularly important when
we send data with high size such as audio & video
➢Even with very fast transmission speed of data we need to
send data in short time. We need to Compress data for this
purpose.
➢ Virtually all form of data contain redundancy i.e. it is the
amount of wasted "space" used to transmit certain data.
➢ By making use of more efficient data representation methods, redundancy
can be reduced.
➢ Even 7 bit ASCII code has some redundancy in it.
The goal of data compression is to represent an information
source (e.g. a data file, a speech signal, an image, or a video
signal) as accurately as possible using the fewest number of
bits.
Data Compression Methods
Lossy Compression
•Lossy Compression refers to data compression techniques
in which some amount of data is lost.
•Lossy compression technologies attempt to eliminate
redundant or unnecessary information.
•If the decompressed information need not be an exact replica
of the original information but something very close, we can
use a lossy data compression method.
•Lossy compression reduces a file by permanently
eliminating certain information, especially redundant
information. When the file is uncompressed, only a part of the
original information is still there (although the user may not
notice it). Lossy compression is generally used for video and
sound, where a certain amount of information loss will not be
detected by most users.
• The JPEG image file, commonly used for photographs and
other complex still images on the Web, is an image that
has lossy compression. Using JPEG compression, the
creator can decide how much loss to introduce and make
a trade-off between file size and image quality. Most
video compression technologies, such as MPEG, use a
lossy technique.

The best example is a videoconference where there is an


acceptable amount of frame loss in order to deliver the
image in real time. People may appear jerky in their
movements, but you still have a grasp for what is
happening on the other end of the conference.
In the case of graphics files, some resolution may be lost in
order to create a smaller file. The loss may be in the form of
color depth or graphic detail. For example, high-resolution
details can be lost if a picture is going to be displayed on a
low-resolution device. Loss is also acceptable in voice and
audio compression, depending on the desired quality.
These methods are called lossy compression methods
because we will lose some of the original data in the
process.
Several methods have been developed using lossy
compression techniques. Joint photographic experts group
(JPEG) is used to compress pictures and graphics. Motion
picture experts group (MPEG) is used to compress video.
High Compression (Low Quality)

Low Compression (High Quality)


Lossless Compression
➢For most types of data, lossless compression techniques
can reduce the space needed by only about 50%.
➢For greater compression, one must use a lossy compression
technique. Note, however, that only certain types of data -
graphics, audio, and video -- can tolerate lossy compression.
➢You must use a lossless compression technique when
compressing data and programs.
➢The PKZIP compression technology is an example of lossless
compression.
➢With lossless compression, every single bit of data that was
originally in the file remains after the file is uncompressed.
All of the information is completely restored
➢ This is generally the technique of choice for text or
spreadsheet files, where losing words or financial data
could pose a problem.
➢ The Graphics Interchange File (GIF) is an image format
used on the Web that provides lossless compression.
➢ In Lossless Data Compression the compressing &
decompressing algorithm are usually the inverse of each
other. In other words after decompressing we will get
exact data as they were before compressing. Nothing is
lost. The following are some techniques used in lossless
data compression
1. Null compression
2. Run Length Encoding
1. Null Compression
Replaces a series of blank spaces with a
compression code, followed by a value that
represents the number of spaces.
Example : hello friend how is your life?
2. Run Length Encoding(RLE)
When data contain strings of repeated symbols (such as bits or
characters), the strings can be replaced by a special marker,
followed by the repeated symbol, followed by the number of
occurrences. example, in given Figure For the symbol marker.
The symbol being repeated ( the run symbol) follows the marker.
After the run symbol, the number of occurrences (length) is
shown by a two-digit number.
Example : This run-length encoding method can be used in
audio (silence is a run of 0s) and video (run of a picture element
having the same brightness and color).
Run-Length
Encoding (RLE)
Cryptography
Basics
Cryptography is the science of secret, or hidden writing
It has two main Components:
1. Encryption
– Practice of hiding messages so that they can not be
read by anyone other than the intended recipient
2. Authentication & Integrity
– Ensuring that users of data/resources are the
persons they claim to be and that a message has not
been surreptitiously altered
Encryption
Cipher

Plain Text Encryption Cipher Text Decryption Plain Text


Algorithm Algorithm

Key A Key B

• Encryption algorithms are standardized & published


• The key which is an input to the algorithm is secret
– Key is a string of numbers or characters
– If same key is used for encryption & decryption the algorithm is
called symmetric
– If different keys are used for encryption & decryption the algorithm
is called asymmetric
Encryption
Symmetric Algorithms
• Algorithms in which the key for encryption and decryption
are the same are Symmetric
– Example: Caesar Cipher
• Types:
1. Block Ciphers
– Encrypt data one block at a time (typically 64 bits, or
128 bits)
– Used for a single message
2. Stream Ciphers
– Encrypt data one bit or one byte at a time
– Used if data is a constant stream of information
Symmetric Encryption
Key Strength
– Strength of algorithm is determined by the size of the
key
– The longer the key the more difficult it is to crack
– Key length is expressed in bits
– Typical key sizes vary between 48 bits and 448 bits
– Set of possible keys for a cipher is called key space
– Each additional bit added to the key length doubles the
security
Symmetric Encryption
Key Strength
– To crack the key the hacker has to use brute-force
(i.e. try all the possible keys till a key that works is found)
– Super Computer can crack a 56-bit key in 24 hours
– It will take 272 times longer to crack a 128-bit key
(Longer than the age of the universe)
Substitution Ciphers
Caesar Cipher
• Caesar Cipher is a method in which each letter in the
alphabet is rotated by three letters as shown

AB C D E F G H I J K LM N O PQ R S T U V W XY Z

D E F G H I J K LM N O PQ R S T U VW XYZAB C
Substitution Ciphers
Caesar Cipher
Encryption
Plain Text Cipher Text
Cipher:
Message: Caesar Cipher Message:
Attack at Dawn Algorithm Dwwdfn Dw Gdyq

Key (3)
Decryption
Cipher Text Plain Text
Cipher:
Message: Caesar Cipher Message:
Dwwdfn Dw Gdyq Algorithm Attack at Dawn

Key (3)

How many different keys are possible?


Substitution Cipher
Monoalphabetic Cipher
• Any letter can be substituted for any other letter
– Each letter has to have a unique substitute

ABCDEFGH I JKLMNOPQRSTUVWXYZ

MNBVCXZASDFGHJ KLPO IUYTREWQ

• There are 26! pairing of letters (~1026)


• Brute Force approach would be too time consuming
– Statistical Analysis would make it feasible to crack the key

Message: Encrypted
Cipher: Message:
Bob, I love you. Monoalphabetic Nkn, s gktc wky.
Alice Cipher mgsbc

Key
Substitution Cipher
Polyalphabetic Caesar Cipher
• Developed by Blaise de Vigenere
– Also called Vigenere cipher
• Uses a sequence of monoalpabetic ciphers in tandem
– e.g. C1, C2, C2, C1, C2

Plain Text A B C D E F G H I J K LM N O PQ R STUVWXYZ

C1(k=6) F G H I J K LM N O PQ R S T U VW XYZAB C D E
C2(k=20) T U VW XYZAB C D E F G H I J K LM N O PQ R S
• Example
Message: Encrypted
Cipher: Message:
Bob, I love you. Monoalphabetic Gnu, n etox dhz.
Alice Cipher tenvj

Key
Substitution Cipher
Using a key to shift alphabet
• Obtain a key to for the algorithm and then shift the alphabets
– For instance if the key is word we will shift all the letters by four and remove
the letters w, o, r, & d from the encryption
• We have to ensure that the mapping is one-to-one
– no single letter in plain text can map to two different letters in cipher text
– no single letter in cipher text can map to two different letters in plain text

Plain Text A B C D E F G H I J K LM N O PQ R STUVWXYZ

C1(k=6) W O R DAB C E F G H I J K LM N PQ S T U V XYZ

Message:
Encrypted
Cipher: Message:
Bob, I love you.
??
Alice

WORD
Transposition Cipher
Columnar Transposition
• This involves rearrangement of characters on the plain text into columns
• The following example shows how letters are transformed
– If the letters are not exact multiples of the transposition size there may be a
few short letters in the last column which can be padded with an infrequent
letter such as x or z

Plain Text Cipher Text


T H I S I T S S O H
S A M E S O A N I W
S A G E T H A A S O
O S H O W L R S T O
H O W A C I M G H W
O L U M N U T P I R
A R T R A S E E O A
N S P O S M R O O K
I T I O N I S T W C
W O R K S N A S N S
Ciphers
Shannon’s Characteristics of “Good” Ciphers
• The amount of secrecy needed should determine
the amount of labor appropriate for the encryption
and decryption.
• The set of keys and the enciphering algorithm
should be free from complexity.
• The implementation of the process should be as
simple as possible.
• Errors in ciphering should not propagate and cause
corruption of further information in the message.
• The size of the enciphered text should be no larger
than the text of the original message.
Encryption Systems
Properties of Trustworthy Systems
• It is based on sound mathematics.
– Good cryptographic algorithms are are derived from
solid principles.
• It has been analyzed by competent experts and
found to be sound.
– Since it is hard for the writer to envisage all possible
attacks on the algorithm
• It has stood the “test of time.”
– Over time people continue to review both mathematical
foundations of an algorithm and the way it builds upon
those foundations.
– The flaws in most algorithms are discovered soon after
their release.
Cryptanalysis
Techniques
• Cryptanalysis is the process of breaking an encryption code
– Tedious and difficult process
• Several techniques can be used to deduce the algorithm
– Attempt to recognize patterns in encrypted messages, to be able to
break subsequent ones by applying a straightforward decryption
algorithm
– Attempt to infer some meaning without even breaking the
encryption, such as noticing an unusual frequency of
communication or determining something by whether the
communication was short or long
– Attempt to deduce the key, in order to break subsequent messages
easily
– Attempt to find weaknesses in the implementation or environment
of use of encryption
– Attempt to find general weaknesses in an encryption algorithm,
without necessarily having intercepted any messages
Data Encryption Standard (DES) Basics
• Goal of DES is to completely scramble the data and
key so that every bit of cipher text depends on every
bit of data and ever bit of key
• DES is a block Cipher Algorithm
– Encodes plaintext in 64 bit chunks
– One parity bit for each of the 8 bytes thus it reduces to
56 bits
• It is the most used algorithm
– Standard approved by US National Bureau of Standards
for Commercial and nonclassified US government use in
1993
Data Encryption Standard (DES) Basics
64-bit input 56-bit key

48-bit k1
L1 R1 • DES run in reverse to
F(L1, R1, K1)
decrypt
• Cracking DES
48-bit k2
L2 R2 – 1997: 140 days
F(L2, R2, K2)
– 1999: 14 hours
• TripleDES uses DES 3
48-bit k3
L3 R3 times in tandem
– Output from 1 DES is
input to next DES
F(L16, R16, K16)

48-bit k16
L17 R17
Encryption Algorithm
Summary
Algorithm Type Key Size Features

DES Block 56 bits Most Common, Not


Cipher strong enough
TripleDES Block 168 bits Modification of DES,
Cipher (112 effective) Adequate Security
Blowfish Block Variable Excellent Security
Cipher (Up to 448 bits)
AES Block Variable Replacement for DES,
Cipher (128, 192, or Excellent Security
256 bits)
RC4 Stream Variable Fast Stream Cipher,
Cipher (40 or 128 bits) Used in most SSL
implementations
Symmetric Encryption
Limitations
• Any exposure to the secret key compromises secrecy
of ciphertext
• A key needs to be delivered to the recipient of the
coded message for it to be deciphered
– Potential for eavesdropping attack during transmission of
key
Asymmetric Encryption
Basics
• Uses a pair of keys for encryption
– Public key for encryption
– Private key for decryption
• Messages encoded using public key can only be decoded by
the private key
– Secret transmission of key for decryption is not required
– Every entity can generate a key pair and release its public key

Plain Text Cipher Text Plain Text


Cipher Cipher

Public Key Private Key


Asymmetric Encryption
Types
• Two most popular algorithms are RSA & El Gamal
– RSA
• Developed by Ron Rivest, Adi Shamir, Len Adelman
• Both public and private key are interchangable
• Variable Key Size (512, 1024, or 2048 buts)
• Most popular public key algorithm
– El Gamal
• Developed by Taher ElGamal
• Variable key size (512 or 1024 bits)
• Less common than RSA, used in protocols like PGP
Asymmetric Encryption
RSA
• Choose two large prime numbers p & q
• Compute n=pq and z=(p-1)(q-1)
• Choose number e, less than n, which has no common factor (other
than 1) with z
• Find number d, such that ed – 1 is exactly divisible by z
• Keys are generated using n, d, e
– Public key is (n,e)
– Private key is (n, d)
• Encryption: c = me mod n
– m is plain text
– c is cipher text
• Decryption: m = cd mod n
• Public key is shared and the private key is hidden
Asymmetric Encryption
RSA
• P=5 & q=7
• n=5*7=35 and z=(4)*(6) = 24
• e=5
• d = 29 , (29x5 –1) is exactly divisible by 24
• Keys generated are
– Public key: (35,5)
– Private key is (35, 29)
• Encrypt the word love using (c = me mod n)
– Assume that the alphabets are between 1 & 26

Plain Text Numeric Representation me Cipher Text (c = me mod n)


l 12 248832 17
o 15 759375 15
v 22 5153632 22
e 5 3125 10
Asymmetric Encryption
RSA
• Decrypt the word love using (m = cd mod n)
– n = 35, c=29

Cipher cd (m = me mod n) Plain


Text Text
17 481968572106750915091411825223072000 17 l
15 12783403948858939111232757568359400 15 o
22 852643319086537701956194499721110000000 22 v
10 100000000000000000000000000000 10 e
Asymmetric Encryption
Weaknesses
• Efficiency is lower than Symmetric Algorithms
– A 1024-bit asymmetric key is equivalent to 128-bit
symmetric key
• Potential for man-in-the middle attack
• It is problematic to get the key pair generated for the
encryption
Asymmetric Encryption
Man-in-the-middle Attack
• Hacker could generate a key pair, give the public key away and
tell everybody, that it belongs to somebody else. Now,
everyone believing it will use this key for encryption, resulting
in the hacker being able to read the messages. If he encrypts
the messages again with the public key of the real recipient,
he will not be recognized easily.
Trudeau’s Trudeau’s
Bob
Message Encrypted
+ public key Cipher Message
David’s
Public Key

David’s
Bob’s Bob’s Public Key
Message Trudeau
Cipher Encrypted David
+ Public key (Middle-man)
Message

Bob’s Attacker Trudeau’s


Public Key Public Key

Trudeau’s David’s
Trudeau’s Trudeau’s
New Message Message
Encrypted Cipher + public key Encrypted Cipher + public key
Message Message
Asymmetric Encryption
Session-Key Encryption
• Used to improve efficiency
– Symmetric key is used for encrypting data
– Asymmetric key is used for encrypting the symmetric key

Plain Text Cipher Cipher Text


(DES)

Send to Recipient

Encrypted
Cipher Key
(RSA)
Session Key

Recipient’s Public Key


Asymmetric Encryption
Encryption Protocols
• Pretty Good Privacy (PGP)
– Used to encrypt e-mail using session key encryption
– Combines RSA, TripleDES, and other algorithms
• Secure/Multipurpose Internet Mail Extension (S/MIME)
– Newer algorithm for securing e-mail
– Backed by Microsoft, RSA, AOL
• Secure Socket Layer(SSL) and Transport Layer Socket(TLS)
– Used for securing TCP/IP Traffic
– Mainly designed for web use
– Can be used for any kind of internet traffic
Asymmetric Encryption
Key Agreement
• Key agreement is a method to create secret key by exchanging only public
keys.
• Example
– Bob sends Alice his public key
– Alice sends Bob her public key
– Bob uses Alice’s public key and his private key to generate a session key
– Alice uses Bob’s public key and her private key to generate a session key
– Using a key agreement algorithm both will generate same key
– Bob and Alice do not need to transfer any key

Alice’s
Private Key

Bob’s Cipher
Public Key
(DES) Alice and Bob
Bob’s Session Key
Generate Same
Private Key Session Key!
Alice’s Cipher
Public Key
(DES)
Asymmetric Encryption
Key Diffie-Hellman Mathematical Analysis
Bob & Alice
agree on non-secret
Bob prime p and value a Alice
Generate Secret Generate Secret
Random Number x Random Number y

Bob & Alice


Compute Public Key exchange Compute Public Key
ax mod p public keys ay mod p

Compute Session Key Compute Session Key


(ay)x mod p (ax)y mod p

Identical Secret Key


Asymmetric Encryption
Key Agreement con’t.
• Diffie-Hellman is the first key agreement algorithm
– Invented by Whitfield Diffie & Martin Hellman
– Provided ability for messages to be exchanged securely
without having to have shared some secret information
previously
– Inception of public key cryptography which allowed keys
to be exchanged in the open
• No exchange of secret keys
– Man-in-the middle attack avoided
Authentication
Basics
• Authentication is the process of validating the
identity of a user or the integrity of a piece of data.
• There are three technologies that provide
authentication
– Message Digests / Message Authentication Codes
– Digital Signatures
– Public Key Infrastructure
• There are two types of user authentication:
– Identity presented by a remote or application participating
in a session
– Sender’s identity is presented along with a message.
Authentication
Message Digests
• A message digest is a fingerprint for a document
• Purpose of the message digest is to provide proof that
data has not altered
• Process of generating a message digest from data is called
hashing
• Hash functions are one way functions with following
properties
– Infeasible to reverse the function
– Infeasible to construct two messages which hash to same digest
• Commonly used hash algorithms are
– MD5 – 128 bit hashing algorithm by Ron Rivest of RSA
– SHA & SHA-1 – 162 bit hashing algorithm developed by NIST

Message Message Digest


Digest
Algorithm
Message Authentication Codes
Basics
• A message digest created with a key
• Creates security by requiring a secret key to be
possesses by both parties in order to retrieve the
message

Message
Message Digest Digest
Algorithm

Secret Key
Password Authentication
Basics
• Password is secret character string only known to user and
server
• Message Digests commonly used for password authentication
• Stored hash of the password is a lesser risk
– Hacker can not reverse the hash except by brute force attack
• Problems with password based authentication
– Attacker learns password by social engineering
– Attacker cracks password by brute-force and/or guesswork
– Eavesdrops password if it is communicated unprotected over the
network
– Replays an encrypted password back to the authentication server
Authentication Protocols
Basics
• Set of rules that governs the communication of data related to
authentication between the server and the user
• Techniques used to build a protocol are
– Transformed password
• Password transformed using one way function before transmission
• Prevents eavesdropping but not replay
– Challenge-response
• Server sends a random value (challenge) to the client along with the authentication
request. This must be included in the response
• Protects against replay
– Time Stamp
• The authentication from the client to server must have time-stamp embedded
• Server checks if the time is reasonable
• Protects against replay
• Depends on synchronization of clocks on computers
– One-time password
• New password obtained by passing user-password through one-way function n times
which keeps incrementing
• Protects against replay as well as eavesdropping
Authentication Protocols
Kerberos
• Kerberos is an authentication service that uses symmetric key
encryption and a key distribution center.
• Kerberos Authentication server contains symmetric keys of all
users and also contains information on which user has access
privilege to which services on the network
Authentication
Personal Tokens
• Personal Tokens are hardware devices that generate unique
strings that are usually used in conjunction with passwords for
authentication
• Different types of tokens exist
– Storage Token: A secret value that is stored on a token and is available
after the token has been unlocked using a PIN
– Synchronous one-time password generator: Generate a new password
periodically (e.g. each minute) based on time and a secret code stored
in the token
– Challenge-response: Token computes a number based on a challenge
value sent by the server
– Digital Signature Token: Contains the digital signature private key and
computes a computes a digital signature on a supplied data value
• A variety of different physical forms of tokens exist
– e.g. hand-held devices, Smart Cards, PCMCIA cards, USB tokens
Authentication
Biometrics
• Uses certain biological characteristics for
authentication
– Biometric reader measures physiological indicia and
compares them to specified values
– It is not capable of securing information over
the network
• Different techniques exist
– Fingerprint Recognition
– Voice Recognition
– Handwriting Recognition
– Face Recognition
– Retinal Scan
– Hand Geometry Recognition
Authentication
Iris Recognition
The scanning process takes advantage of the natural
patterns in people's irises, digitizing them for
identification purposes

Facts
• Probability of two irises producing exactly the same
code: 1 in 10 to the 78th power
• Independent variables (degrees of freedom)
extracted: 266
• IrisCode record size: 512 bytes
• Operating systems compatibility: DOS and
Windows (NT/95)
• Average identification speed (database of 100,000
IrisCode records): one to two seconds
Authentication
Digital Signatures
• A digital signature is a data item which accompanies or is
logically associated with a digitally encoded message.
• It has two goals
– A guarantee of the source of the data
– Proof that the data has not been tampered with

Sender’s Sender’s
Private Key Public Key

Message Digest Digest Message


Sent to Algorithm Algorithm Digest
Receiver

Same?

Digital
Message Signature Signature Signature Message
Digest Algorithm Sent to Algorithm Digest
Receiver

Sender Receiver
Authentication
Digital Cerftificates
• A digital certificate is a signed statement by a trusted party that
another party’s public key belongs to them.
– This allows one certificate authority to be authorized by a different authority
(root CA)
• Top level certificate must be self signed
• Any one can start a certificate authority
– Name recognition is key to some one recognizing a certificate authority
– Verisign is industry standard certificate authority

Identity
Information

Sender’s Signature Certificate


Algorithm
Public Key

Certificate
Authority’s
Private Key
Authentication
Cerftificates Chaining
• Chaining is the practice of signing a certificate with another private key
that has a certificate for its public key
– Similar to the passport having the seal of the government
• It is essentially a person’s public key & some identifying information signed
by an authority’s private key verifying the person’s identity
• The authorities public key can be used to decipher the certificate
• The trusted party is called the certificate authority

Certificate Signature New Certificate


Algorithm

Certificate
Authority’s
Private Key
Cryptanalysis
Basics
• Practice of analyzing and breaking cryptography
• Resistance to crypt analysis is directly proportional to the key
size
– With each extra byte strength of key doubles
• Cracking Pseudo Random Number Generators
– A lot of the encryption algorithms use PRNGs to generate
keys which can also be cracked leading to cracking of
algorithms
• Variety of methods for safe guarding keys (Key Management)
– Encryption & computer access protection
– Smart Cards

You might also like