0% found this document useful (0 votes)
111 views8 pages

Digital Photography and Video Editing

Uploaded by

Edwin Mugo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views8 pages

Digital Photography and Video Editing

Uploaded by

Edwin Mugo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Thika Road, Ruaraka

P.O. BOX 56808 Nairobi. 00200


Pilot Line: +254 20 8070408/9
Mobile: +254 734 88022, 710 888022

DIGITAL PHOTOGRAPHY Email: [email protected]


Website: www.kca.ac.ke

AND VIDEO EDITING


ASSIGNMENT

NAME: MORRIS GITONGA


REG NO:19/02548
COURSE: APPLIED COMPUTING
UNIT: DIGITAL PHOTOGRAPHY AND VIDEO EDITING
LECTURER:MR MADARA

 Describe how the following compressions work:


1)JPEG
The JPEG is a digital image file extension given its acronym from Joint
Photographic Experts Group. It is a popularized file type due to its high standard of
quality and easily downloadable size.
The JPEG compression scheme is divided into the following stages:
a. Transform the image into an optimal color space.
b. Down sample chrominance components by averaging groups of pixels
together.
c. Apply a Discrete Cosine Transform (DCT) to blocks of pixels, thus
removing redundant image data.
d. Quantize each block of DCT coefficients using weighting functions
optimized for the human eye.

A) TRANSFORM THE IMAGE


The JPEG algorithm is capable of encoding images that use any type of color
space. JPEG itself encodes each component in a color model separately, and it is
completely independent of any color-space model, such as RGB, HSI, or CMY.
The best compression ratios result if a luminance/chrominance color space, such as
YUV or YCbCr, is used.
Most of the visual information to which human eyes are most sensitive is found in
the high-frequency, gray-scale, luminance component (Y) of the YCbCr color
space. The other two chrominance components (Cb and Cr) contain high-frequency
color information to which the human eye is less sensitive. Most of this
information can therefore be discarded.
In comparison, the RGB, HSI, and CMY color models spread their useful visual
image information evenly across each of their three-color components, making the
selective discarding of information very difficult. All three-color components
would need to be encoded at the highest quality, resulting in a poorer compression
ratio.

B) DOWNSAMPLE THE CHROMINANCE COMPONENTS


Each chrominance pixel covers the same area as a 2x2 block of luminance pixels.
We store a total of six-pixel values for each 2x2 block (four luminance values, one
each for the two chrominance channels), rather than the twelve values needed if
each component is represented at full resolution. Remarkably, this 50 percent
reduction in data volume has almost no effect on the perceived quality of most
images. Equivalent savings are not possible with conventional color models such
as RGB, because in RGB each color channel carries some luminance information
and so any loss of resolution is quite visible.
When the uncompressed data is supplied in a conventional format (equal resolution
for all channels), a JPEG compressor must reduce the resolution of the
chrominance channels by down sampling, or averaging together groups of pixels.
The JPEG standard allows several different choices for the sampling ratios, or
relative sizes, of the down sampled channels. The luminance channel is always left
at full resolution (1:1 sampling). Typically, both chrominance channels are down
sampled 2:1 horizontally and either 1:1 or 2:1 vertically, meaning that a
chrominance pixel covers the same area as either a 2x1 or a 2x2 block of
luminance pixels. JPEG refers to these down sampling processes as 2h1v and 2h2v
sampling, respectively.
C)APPLY A DISCRETE COSINE TRANSFORM
The image data is divided up into 8x8 blocks of pixels. (From this point on, each
color component is processed independently, so a "pixel" means a single value,
even in a color image.) A DCT is applied to each 8x8 block. DCT converts the
spatial image representation into a frequency map: the low-order or "DC" term
represents the average value in the block, while successive higher-order ("AC")
terms represent the strength of more and more rapid changes across the width or
height of the block. The highest AC term represents the strength of a cosine wave
alternating from maximum to minimum at adjacent pixels.
The DCT calculation is fairly complex; in fact, this is the costliest step in JPEG
compression. The point of doing it is that we have now separated out the high- and
low-frequency information present in the image. We can discard high-frequency
data easily without losing low-frequency information. The DCT step itself is
lossless except for roundoff errors.
D)QUANTIZE EACH BLOCK
To discard an appropriate amount of information, the compressor divides each
DCT output value by a "quantization coefficient" and rounds the result to an
integer. The larger the quantization coefficient, the more data is lost, because the
actual DCT value is represented less and less accurately. Each of the 64 positions
of the DCT output block has its own quantization coefficient, with the higher-order
terms being quantized more heavily than the low-order terms (that is, the higher-
order terms have larger quantization coefficients). Furthermore, separate
quantization tables are employed for luminance and chrominance data, with the
chrominance data being quantized more heavily than the luminance data. This
allows JPEG to exploit further the eye's differing sensitivity to luminance and
chrominance.
The compressor starts from a built-in table that is appropriate for a medium-quality
setting and increases or decreases the value of each table entry in inverse
proportion to the requested quality. The complete quantization tables actually used
are recorded in the compressed file so that the decompressor will know how to
(approximately) reconstruct the DCT coefficients.
E) ENCODE THE RESULTING COEFFICIENTS
The resulting coefficients contain a significant amount of redundant data. Huffman
compression will listlessly remove the redundancies, resulting in smaller JPEG
data. An optional extension to the JPEG specification allows arithmetic encoding
to be used instead of Huffman for an even greater compression. At this point, the
JPEG data stream is ready to be transmitted across a communications channel or
encapsulated inside an image file format.
2.MPEG
The name MPEG is an acronym for Moving Pictures Experts Group. MPEG is a
method for video compression, which involves the compression of digital images
and sound, as well as synchronization of the two. MPEG describes a whole family
of international standards for the compression of audio-visual digital data. The
most known are MPEG-1, MPEG-2 and MPEG-4, which are also formally known
as ISO/IEC-11172, ISO/IEC-13818 and ISO/IEC14496.The following sections
detail some of the more practical aspects of MPEG video compression:
a)Reduction of the resolution
The human eye has a lower sensibility to color information than to dark-bright
contrasts. A conversion from RGB-color-space into YUV color components help
to use this effect for compression. The chrominance components U and V can be
reduced (subsampling) to half of the pixels in horizontal direction (4:2:2), or a half
of the pixels in both the horizontal and vertical (4:2:0).
b)Motion Estimation
An MPEG video can be understood as a sequence of frames. Because two
successive frames of a video sequence often have small differences (except in
scene changes), the MPEG-standard offers a way of reducing this temporal
redundancy. It uses three types of frames: I-frames (intra), P-frames (predicted)
and B-frames (bidirectional). The I-frames are “key-frames”, which have no
reference to other frames and their compression is not that high. The P-frames can
be predicted from an earlier I-frame or P-frame. P-frames cannot be reconstructed
without their referencing frame, but they need less space than the I-frames, because
only the differences are stored. The B-frames are a two directional version of the
P-frame, referring to both directions (one forward frame and one backward frame).
B-frames cannot be referenced by other P- or B-frames, because they are
interpolated from forward and backward frames. P-frames and B-frames are called
inter coded frames, whereas I-frames are known as intra coded frames.
The usage of the particular frame type defines the quality and the compression
ratio of the compressed video. I-frames increase the quality (and size), whereas the
usage of B-frames compresses better but also produces poorer quality. The
distance between two I-frames can be seen as a measure for the quality of an
MPEG-video. In practice following sequence showed to give good results for
quality and compression level: IBBPBBPBBPBBIBBP. The references between
the different types of frames are realized by a process called motion estimation or
motion compensation. The correlation between two frames in terms of motion is
represented by a motion vector. The resulting frame correlation, and therefore the
pixel arithmetic difference, strongly depends on how good the motion estimation
algorithm is implemented. Good estimation results in higher compression ratios
and better quality of the coded video sequence. However, motion estimation is a
computationally intensive operation, which is often not well suited for real time
applications.
Steps involved in motion estimation include:
i)Frame Segmentation - The Actual frame is divided into nonoverlapping blocks
(macro blocks) usually 8x8 or 16x16 pixels.
ii)Search Threshold- In order to minimise the number of expensive motion
estimation calculations, they are only calculated if the difference between two
blocks at the same position is higher than a threshold, otherwise the whole block is
transmitted.
iii)Block Matching- In general block matching tries, to “stitch together” an actual
predicted frame by using snippets (blocks) from previous frames.
iv)Prediction Error coding- The MPEG stream contains a matrix for compensating
this error. After prediction the, the predicted and the original frame are compared,
and their differences are coded. Obviously less data is needed to store only the
differences.
v)Vector Coding- After determining the motion vectors and evaluating the
correction, these can be compressed. Large parts of MPEG videos consist of B-
and P-frames as seen before, and most of them have mainly stored motion vectors.
Therefore, an efficient compression of motion vector data, which has usually high
correlation, is desired.
vi)Block Coding
a)Discrete Cosine Transform
DCT allows, similar to the Fast Fourier Transform (FFT), a representation of
image data in terms of frequency components. So, the frame-blocks (8x8 or 16x16
pixels) can be represented as frequency components.
The DCT is unfortunately computational very expensive and its complexity
increases disproportionately (O(N2)). That is the reason why images compressed
using DCT are divided into blocks. Another disadvantage of DCT is its inability to
decompose a broad signal into high and low frequencies at the same time.
Therefore, the use of small blocks allows a description of high frequencies with
less cosine terms.
b)Quantization
During quantization, which is the primary source of data loss, the DCT terms are
divided by a quantization matrix, which takes into account human visual
perception. The human eyes are more reactive to low frequencies than to high
ones. Higher frequencies end up with a zero entry after quantization and the
domain was reduced significantly.
If the compression is too high, which means there are more zeros after
quantization, artefacts are visible. This happens because the blocks are compressed
individually with no correlation to each other. When dealing with video, this effect
is even more visible, as the blocks are changing (over time) individually in the
worst case.
A)Entropy Coding
The entropy coding takes two steps: Run Length Encoding (RLE) and Huffman
coding. These are well known lossless compression methods, which can compress
data, depending on its redundancy, by an additional factor of 3 to 4.

3.ARITHMETIC CODING
Arithmetic coding is a data compression technique that encodes data (the data
string) by creating a code string which represents a fractional value on the number
line between 0 and 1. The coding algorithm is symbol wise recursive; i.e., it
operates upon and encodes (decodes) one data symbol per iteration or recursion.
On each recursion, the algorithm successively partitions an interval of the number
line between 0 and I, and retains one of the partitions as the new interval. Thus, the
algorithm successively deals with smaller intervals, and the code string, viewed as
a magnitude, lies in each of the nested intervals. The data string is recovered by
using magnitude comparisons on the code string to recreate how the encoder must
have successively partitioned and retained each nested subinterval. Arithmetic
coding differs considerably from the more familiar compression coding techniques,
such as prefix (Huffman) codes. Also, it should not be confused with error control
coding, whose object is to detect and correct errors in computer operations.
The notion of compression systems captures the idea that data may be transformed
into something which is encoded, then transmitted to a destination, then
transformed back into the original data. Any data compression approach, whether
employing arithmetic coding, Huffman codes, or any other coding technique, has a
model which makes some assumptions about the data and the events encoded. The
code itself can be independent of the model. Some systems which compress
waveforms (e.g., digitized speech) may predict the next value and encode the error.
In this model the error and not the actual data is encoded. Typically, at the encoder
side of a compression system, the data to be compressed feed a model unit. The
model determines:
1) the event@) to be encoded.
2) the estimate of the relative frequency (probability) of the events.
The encoder accepts the event and some indication of its relative frequency and
generates the code string. A simple model is the memoryless model, where the data
symbols themselves are encoded according to a single code. Another model is the
first-order Markov model, which uses the previous symbol as the context for the
current symbol. Consider, for example, compressing English sentences. If the data
symbol (in this case, a letter) “q” is the previous letter, we would expect the next
letter to be “u.” The first-order Markov model is a dependent model; we have a
different expectation for each symbol (or in the example, each letter), depending
on the context. The context is, in a sense, a state governed by the past sequence of
symbols. The purpose of a context is to provide a probability distribution, or
statistics, for encoding (decoding) the next symbol. Corresponding to the symbols
are statistics. To simplify the discussion, consider a single-context model, i.e., the
memoryless model. Data compression results from encoding the more frequent
symbols with short code-string length increases, and encoding the less-frequent
events with long code length increases. Let e, denote the occurrences of the ith
symbol in a data string. For the memoryless model and a given code, let 4 denote
the length (in bits) of the code-string increase associated.
The components of a compression system include
I)The model structure for contexts and events
In practice, the model is a finite-state machine which operates successively on each
data symbol and determines the current event to be encoded and its context (i.e.,
which relative frequency distribution applies to the current event). Often, each
event is the data symbol itself, but the structure can define other events from which
the data string could be reconstructed.
I. The statistics unit for estimation of the event statistics
The estimation method computes the relative frequency distribution used for each
context. The computation may be performed beforehand, or may be performed
during the encoding process, typically by a counting technique. For Huffman
codes, the event statistics are predetermined by the length of the event’s codeword
II. The encoder.
The encoder accepts the events
to be encoded and generates the code string.
Merits of arithmetic coding include:
1) Arithmetic coding is very helpful and is useful for small alphabet letters with
highly skewed probabilities.
2) The compression ratio of arithmetic coding is efficient in comparison of
Huffman method.
3) Arithmetic coding is the most efficient method to code symbols according to
the probability of their occurrence. The average code length corresponds
exactly to the possible minimum given by information theory.
4) Arithmetic coding offers a clearly better compression rate compared to
Huffman code tree.

Arithmetic encoding has several drawbacks:


 More susceptible to corruption.
 Larger context length implies more conditional probability to update
which may mean that the estimated distribution will take more time to
approach the true distribution.

You might also like