Video Transmission Jerkiness Measure: Deena Abdelsamad
Video Transmission Jerkiness Measure: Deena Abdelsamad
Jerkiness Measure
Deena Abdelsamad
1
Abstract
2
Acknowledgments
Completion of work gives a nice feeling of happiness and success. The accom-
plishment of this thesis gives me the condence to move ahead and teaches me
a new way of knowledge and learning. I would like to convey my gratefulness,
praise and thanks to ALLAH (swt) the most gracious and the most merciful,
who blessed me with his grace and helped me to write this thesis successfully.
Thanks to Anders Hultgren, program manager and Raja M. Khurram Shahzad
my supervisor who provided me with valuable comments and suggestions for
improvement. Thanks are due to all the BTH sta members who are always
friendly and supportive. Last but not least, my sincere gratitude goes to my
parents and my husband for their continuous mental, emotional and nancial
support. I feel lucky and proud to be part of my family.
3
Contents
1 Introduction 8
1.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4
List of Figures
5
List of Tables
6
List of abbreviations
1. HD - High-denition
2. SD - Standard-denition
16. 3D - Three-dimensional
19. SECAM - Sequentiel couleur a memoire ( French for Sequential Color with
Memory )
7
Chapter 1
Introduction
1X
J (v) = 4ti (4ti ) (mi+1 (v)) (1.0.1)
T
8
1.1 Scope
1.2 Outline
Chapter 2 focuses on the basic concepts and background for image and video
history. Chapter 3 describes the design and implementation of the program,
it presents a description of MATLAB commands used and tested. Finally in
Chapter 4, results are presented and their explanation is given. Chapter 4 also
concludes the work and provides directions for future work.
9
Chapter 2
2.1 Background
Vidre is a Latin verb with the meaning See, while the word
Video typically means I See. But practically it refers to store the
moving photos with dierent digital formats like DVD, moving pic-
ture experts group (MPEG), audio video interleave (AVI) or analog
formats like video home system (VHS) and transmits them with dif-
ferent techniques like phase alternating line (PAL) or national tele-
vision system committee (NTSC) [1]. The quality of a transmitted
video depends on dierent parameters such as how the moving pic-
tures were captured and how they were stored. A modern format of
television video became the standard format which can oer higher
10
quality than the former ones which is digital television (DTV) [6].
The frame rate is the number of pictures per unit of time in the video; it diers
according to the type of the camera used to capture pictures. The normal rate
of old cameras can vary from six to eight frames per second, while it reaches up
to 120 frames per second for modern cameras. It is discovered that the frame
rate aects the transferring process of the cinematic motion picture to a video
lm, as the movie recorded at slow frame rate like 24 photo-grams per second
can make the transmission more complicated. Besides, to get the illusion of a
video movie, without giving the feeling to user that he/she is watching moving
photos, a frame rate of at least 15 frames per is suggested [7].
2.1.3 Interlacing
Video has two scan formats, progressive or interlaced, such as sequential color
with memory (SECAM), NTSC and PAL 576i50. Interlaced video is a method
to double the received frame rate displayed with the signal used with analog
television without consuming more bandwidth [12]. Interlacing is indicated in
the video data as i where 576 is the resolution of the vertical line and 50 is the
elds of half-frames per second. The main usage of interlacing is to get the best
video quality if the bandwidth is limited. In each interlaced frame, horizontal
scan lines are numbered respectively and divided into two elds, i.e., upper eld
and lower eld. The upper eld also called odd eld contains lines with odd
numbers, the lower eld that is also called even eld contains the lines with
even numbers. This interlaced stream like DVD or analog can be converted by
a method called deinterlacing to use it with the progressive devices like liquid
crystal display (LCD) and plasma screens. However, this deinterlacing method
is unable to give the same video quality processed by progressive scan. The
progressive system has a dierent technique, it updates all scan lines with every
refresh period, which enhances the resolution and decreases errors like moving
or ashing of the constant pictures [12].
RGB model is an additive color model where R stands for red, G for green and
B for blue, light is added in dierent ways to reproduce a massive range of the
colors. The main purpose of the RGB model is to display, represent and sense
image in the electronic systems, for example computer screens and televisions
[10]. RGB has dierent input methods such as video, television cameras and
image scanners. RGB has dierent output devices as well such as mobile phones,
computer screens, projectors and televisions with dierent systems like cathode
ray tube (CRT), LCD and plasma [6].
11
2.1.5 YUV
In the last few years, researches have been done on a non-reference video quality
measure. Some of them presented a new metric to evaluate and detect the eect
of image dropping on user quality perception [8]. That measure was based on
a psycho-visual quality function and temporal summation function (temporal
pooling) modeling the assessment mechanism of the human assessors [8]. This
assessment model integrates the abrupt temporal variation that appears at the
end of uidity impairments as a second factor for quality estimation [8]. While
other similar studies were done about the same type of measure, however, the
measure is based on freezes, jerky motions and rate variations of an image [9].
The measure should show a signicant correlation with the observers ratings in
an attempt to reproduce some basic perceptual human visual process involved
within the task of video quality assessment [9].
12
Chapter 3
3.1 Requirements
3.2 Methodology
1X
J (v) = 4ti (4ti ) (mi+1 (v)) (3.2.1)
T
Displayed images with certain rate and time stamps form a video sequence
can be denoted by v = (fi , ti ) , i = 1...n [3]. Where fi is frame i of the video,
ti is starting displaying time up to ti+1 and n is the total number of frames.
In case that the number of frames is already known, it will be mentioned as
time stamps 4ti = ti+1 ti . It is assumed that calculating motion by the
theory of calculating the statistic velocity distribution of all objects moves in
the video sequence is very complicated, so a simpler measure is used for motion
calculation in a video sequence v = (fi , ti ) by the following formula [3]. Where
fi (x) is the pixel value of the Y-component of the frame i located at the x
location. s
X 2
mi+1 (v) = (fi+1 (x) fi (x)) (3.2.2)
13
The jerkiness calculation in Borer's model [3] depends on the number of
frames, frame display time 4ti and frame motion intensity. The dependency on
4ti and the motion intensity are expressed by two S-shaped (sigmoid functions)
and . These functions have three parameters represented in position of
x, position of y and slope of the inection point. For calculating the motion
dependent part , these three parameters are kept xed. For the of display
time dependent part they are re-parameterized by single parameter where
qpx /py
a = py /px , b = qpx /py , c = 4q/d and d = 2 (1 py ).
axb
if x px
s (x) = d (3.2.3)
1+exp(c(xpx)) +1d else
As the algorithms are tested on QCIF resolution videos, so for the param-
eters are (px , py , q) = (5, 0.5, 0.25) and for the parameters are (px , py , q) =
(0.12, 0.05, 1.5). The S-shaped function starts at the origin and increases poly-
nomial until it reaches the inection point then it saturates exponential towards
one. The viewing angles and distances are proved to aect the results; tests are
performed with the typical viewing angle and distance [5]. The viewing distance
is equal to three times the height of the picture (3 H) and viewers are seated
directly in line with the center of the video display. The resolution is set as
QCIF and motion intensity is measured on the sub-sampled frames in case of
larger resolutions.
3.3 Implementation
The program for calculating video jerkiness consists of two parts. The main
program is ReadVid.m that contains all the functions required to read the
video, slice it into frames, compute motion intensity and other required values
in order to compute the jerkiness. The second part is Sshape.m, it calculates
the s-shape value required to compute, .
In gure 3.3.1, clear all command is used to clear the MATLAB screen, and
then close all to close any open windows or gures opened in MATLAB. The
code also performs the following:
Read video specied
slice video into frames
convert RGB matrices of each frame to YCbCr matrices
14
clear all
close all
inle = 'clip6.avi';
readerobj = mmreader(inle)
vidFrames = read(readerobj);
numFrames = size(vidFrames,4);
rgb=zeros(1,1,1);
for k = 1 : numFrames
rgb = vidFrames(:,:,:,k);
% transfearing the rgb data to YCbCr
mov(k).cdata=rgb2ycbcr(rgb);
end
info = mmleinfo(inle);
hight=info.Video.Height;
width=info.Video.Width;
qciag=0;
ciag=0;
hdag=0;
sdag=0;
c=0;
%QCIF 176 144
%CIF/SIF(625) 352 288
% HD width is >= 1080
if ((width==176)&&(hight==144))
qciag=1;
c=1;
end
if ((width==352)&&(hight==288))
ciag=1;
c=1.18;
end
In gure 3.3.2, mmleinfo MATLAB function is used to get the frame width
and height. Each resolution (QCIF, CIF, HD, SD) will have certain constants in
the future calculations. Four ags (qciag, ciag, hdag, sdag) are initialized
to have the value of zero then according to its values of width and height
the appropriate ag are set. For example if width =176 and height = 144 so
the video is of QCIF resolution. Therefore, the qciag is set to 1 while other
ags remain zero.
15
if ((hight==480)||(hight==576))
sdag=1;
c=1.54;
end
if (width>=1080)
hdag=1;
c=2.54;
end
if ((qciag==0)&&(ciag==0)&&(sdag==0)&&(hdag==0))
disp('the vedio resolution is not recognized');
disp('excution terminated');
break;
end
Figure 3.3.3, continues to set the appropriate frame resolution ag. The last
part of the code is to break the execution of the program if the frame resolution
is not recognized.
In gure 3.3.4, as the code is written to measure the jerkiness of QCIF videos,
so the parameters of , of QCIF videos are used. Note that c=1 for QCIF
resolution.
16
% calculating the motion intensity
% for avi Dt is uniform
dt=info.Duration/numFrames;
for k=1:numFrames-1 %
extracting the R matrix of frames F(i) , F(i+1)
Fiy= double(mov(k).cdata(:,:,1));
Fip1y=double(mov(k+1).cdata(:,:,1));
Ficb= double(mov(k).cdata(:,:,2));
Fip1cb=double(mov(k+1).cdata(:,:,2));
Ficr= double(mov(k).cdata(:,:,3));
Fip1cr=double(mov(k+1).cdata(:,:,3));
irow=size(Fiy,1);
jcol=size(Fiy,2);
dumy=0;
dumcb=0;
dumcr=0;
for i=1:irow
for j=1:jcol
dumy=dumy+(Fip1y(i,j)-Fiy(i,j))^2;
dumcb=dumcb+(Fip1cb(i,j)-Ficb(i,j))^2;
dumcr=dumcr+(Fip1cr(i,j)-Ficr(i,j))^2;
end
end
dumy=double(dumy);
dumcb=double(dumcb);
dumcr=double(dumcr);
my(k)=sqrt(dumy)/width;
mcb(k)=sqrt(dumcb)/width;
mcr(k)=sqrt(dumcr)/width;
% mav=sqrt((dumr+dumg+dumb)/3);
end
gure(1)
plot(my,'r');
hold on
%plot(mcb,'g');
%hold on
%plot(mcr,'b');
%hold on
%plot(mav,'k');
title('motion intesity');
ylabel('Motion intensity');
xlabel('Frame no')
17
In gure 3.3.5, the code computes the motion intensity and plots it.
Figure 3.3.6, shows the part that calculates and using Sshape function
which is presented in the next section. The code uses the already derived and
to compute the jerkiness while MATLAB Figures (2,3) plot and respectively
for each frame in the video.
The Sshape function in gure 3.3.7, computes and of a frame in the video
sequence. It uses the and parameters as an input plus 4t if is required
or motion intensity if is required. The returned value ( or ) is the out
variable in gure 3.3.7.
18
function out = Sshape(invec,x)
px=invec(1);
py=invec (2);
q=invec(3);
b=q*px/py; d=2*(1-py);
c=4*q/d; a= (py/px)^b;
if x<=px out=a*x^b; else out=d/(1+exp(-c*(x-px) ))+1-d;
end
end
In gure 3.3.8, values are set and the freezing code is executed in the video, then
the video is converted from YUV to MOV [11]. This part of the code should
perform the following:
Delete any former memory in working space
Close all the opened windows
Read le clip9
start from 2/3
end at 3/3
Sample rate = 420
Convert YUV to MOV
Number of frames = size of matrix
clear all
close all
inle = 'clip9';
clip=[2/3,3/3];
samplerate=420;
mov = yuv2mov([inle,'.yuv'],176,144,num2str(samplerate));
numFrames = size(mov,2);
hight=144;
width=176;
In gure 3.3.9, some freeze (repeated frames) is added to the video to compare
the dierent results when the code runs on same video with and without freeze.
This part of the code should perform the following:
Start freeze from istart
avoid decimals
Start from the rst part or number 1
19
start from (2/3)
stop at (3/3)
Start the freezing from start to stop
K = number of the current frame used
delete the values of frames from start to stop
repeat frame of start
End
istart=round(1+clip(1)*numFrames);
iend=round(clip(2)*numFrames);
for k=istart:iend
mov(k).cdata=mov(istart).cdata;
end
In gure 3.3.10, this function is used to convert MOV object to YUV le [11].
The code takes the old name of the freeze free le and add to it the word -freez,
then it takes the same sample rate in a string using the MATLAB tool number
to string.
mov2yuv([inle,'-freez','.yuv'],mov,num2str(samplerate));
3.4 Testing
To test the code, freezing algorithm is used to freeze certain parts in the video. It
is expected that the video with some freezing frames gives lower jerkiness value
than the video without freezing. A simple test is made which freezes the entire
video; it contains one frame but repeated 300 times so that all the frames have
the same numerical data. The frame repetition leads to zero motion intensity
for all frames and zero jerkiness as well. The implemented program gives the
zero jerkiness value as expected.
20
Chapter 4
Future work
4.1 Results
Twelve short videos are used to test the program that computes the jerkiness.
The rst six videos are of avi format and of QCIF resolution, while the remain-
der movies are of YUV format. It should be noted that in avi video format,
dt is constant, i.e, the time interval for each frame display in the video frame
sequence is not changing and causes a constant value of (t) throughout video
display time. That is shown in (c) of all gures. Figures (4.1.1-4.1.12), show
plots of the motion intensity, (m), and (t). Table (4.1) summarizes test cases
details in addition to the resultant jerkiness values for each test case. First six
(1- 6) videos are tested normally without any errors and three (7- 9) videos are
tested before and after adding freeze to them. Figure 4.1.1, shows the results of
clip1.avi. It shows that this clip has high motion intensity for frames 150 and
above, this leads to a (m) close to 1 for those frames as shown in (b). This
clip is of avi format so 4t is uniform, therefor (t) is constant. Figure 4.1.2,
shows the results of clip2.avi, it shows also that the number of frames is larger
(~1200). This clip is of avi format, 4t is uniform, therefor (t) is constant but
higher than (t) of clip1.avi because of higher number of frames. The jerki-
ness value of clip2.avi is higher than clip1.avi because it has larger number
of frames, a signicant portion of frames has high value of motion intensity and
(m) close to 1. Figure 4.1.3, shows results of clip3.avi, the number of frames
is comparable to clip1.avi but the time step is larger so (t) is larger than
of clip1.avi. The jerkiness of clip3 is 50% larger than of clip1 due to higher
value of (t) mainly. Figure 4.1.4, shows the results of clip4.avi, the rst 60
frames shows almost zero motion intensity and zero (m). Most of the frames
after that shows a high value of motion intensity and a unit (m). The number
of frames is larger than of clip1, (t) is constant and of comparable value to
clip1. The number of frames of almost unit (m) in clip4 is larger than of clip1
21
and all these factors make the jerkiness of clip4 higher than it was in clip1.
Figure 4.1.(5-6), show results of clip5.avi and clip6.avi, there is a high value
of motion intensity for the frames 180~340. For QCIF video, a high motion
intensity means value above 5 and (m) is computed using the following second
equation of the s-shaped function as described in chapter 3. (m) is almost unit
for these frames because motion intensity is higher than 10.
d
(m) = +1d
1 + exp(c (m pm))
Figure 4.1.7, shows the results of clip7.yuv, it also shows that most of the
frames have small value of motion intensity therefore a small value of (t). This
video consists of 300 frames which is larger than the number of frames of clip1.
The resulting jerkiness value of clip7 is smaller than of clip1 due to smaller
motion intensities of the frames of clip7 compared to clip1 although clip7 has
more frames than of clip1. Figure 4.1.8, shows results of clip8.yuv, this video
consists of 150 frames which is half the number of frames of clip7.yuv. Same
value of 4t is observed for all yuv clips used in this study. Figures (4.1.8 b,
4.1.7 b), show that the number of frames with value higher than 5 in clip8 is
larger than clip7 and this is why the jerkiness of clip8.yuv is larger than the
jerkiness of clip7.yuv. Figure 4.1.9, shows the results of clip9.yuv, this video
has 300 frames as clip8.yuv and has the same frame 4t. Comparing Figures
(4.1.8 b, 4.1.9 b), we can see that clip9 has higher average (m) compared to
clip8 and the result of that is a higher jerkiness value of clip9 compared to
clip8. Figures 4.1.(10-11-12), show results of clips7/8/9 with freezing of 1/3 of
its frames. Freezing frames are made by a freeze program that allows dening
the freezing frames interval and replace the numerical data of all the frames in
the freeze interval by the data of the rst frame in that interval. The freezing
procedure causes zero motion intensities of the frames in the freezing interval
and this causes zero value of (m). The jerkiness value of the videos with
intentional frame interval freeze is lower than the jerkiness value of the videos
without freezing as shown in Table (4.1) when comparing case (7 and 10), (8
and 11), and (9 and 12).
22
motion intesity
18
16
14
12
Motion intensity
10
0
0 20 40 60 80 100 120 140 160 180
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100 120 140 160 180
Frame no
(b)(t)
7 Tau values
x 10
4.6
4.4
4.2
Tau
3.8
3.6
3.4
23
3.2
0 20 40 60 80 100 120 140 160 180
Frame no
(c) (t)
70
60
Motion intensity 50
40
30
20
10
0
0 200 400 600 800 1000 1200
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 200 400 600 800 1000 1200
Frame no
(b)(t)
7 Tau values
x 10
4.8
4.6
4.4
4.2
Tau
3.8
3.6
3.4 24
(c) (t)
25
20
Motion intensity
15
10
0
0 50 100 150 200
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200
Frame no
(b)(t)
7 Tau values
x 10
5.2
4.8
4.6
Tau
4.4
4.2
3.8
25
3.6
0 50 100 150 200
Frame no
(c) (t)
30
25
Motion intensity
20
15
10
0
0 100 200 300 400 500
Frame no
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 100 200 300 400 500
Frame no
(b)(t)
7 Tau values
x 10
4.8
4.6
4.4
4.2
Tau
3.8
3.6
3.4
(c) (t)
26
motion intesity
50
45
40
35
Motion intensity
30
25
20
15
10
0
0 50 100 150 200 250 300 350 400
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250 300 350 400
Frame no
(b)(t)
7 Tau values
x 10
5.2
4.8
4.6
Tau
4.4
4.2
3.8
27
3.6
0 50 100 150 200 250 300 350 400
Frame no
(c) (t)
60
50
Motion intensity
40
30
20
10
0
0 100 200 300 400 500 600 700 800 900
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 100 200 300 400 500 600 700 800 900
Frame no
(b)(t)
7 Tau values
x 10
4.8
4.6
4.4
4.2
Tau
3.8
3.6
3.4 28
(c) (t)
4
Motion intensity
0
0 50 100 150 200 250 300
Frame no
mu values
0.7
0.6
0.5
0.4
mu
0.3
0.2
0.1
0
0 50 100 150 200 250 300
Frame no
(b)(t)
7 Tau values
x 10
10.5
10
9.5
9
Tau
8.5
7.5
29
(c) (t)
Motion intensity
6
2
0 50 100 150
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150
Frame no
(b)(t)
7 Tau values
x 10
10.5
10
9.5
9
Tau
8.5
7.5
30
0 50 100 150
Frame no
(c) (t)
40
35
30
Motion intensity
25
20
15
10
0
0 50 100 150 200 250 300
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250 300
Frame no
(b)(t)
7 Tau values
x 10
10.5
10
9.5
9
Tau
8.5
7.5
31
(c) (t)
18
16
14
Motion intensity
12
10
0
0 50 100 150 200 250 300
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250 300
Frame no
(b)(t)
7 Tau values
x 10
10.5
10
9.5
9
Tau
8.5
7.5
32
(c) (t)
45
40
35
Motion intensity
30
25
20
15
10
0
0 50 100 150
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150
Frame no
(b)(t)
7 Tau values
x 10
10.5
10
9.5
9
Tau
8.5
7.5
33
0 50 100 150
Frame no
(c) (t)
40
35
30
Motion intensity
25
20
15
10
0
0 50 100 150 200 250 300
Frame no
mu values
1
0.9
0.8
0.7
0.6
mu
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250 300
Frame no
(b)(t)
7 Tau values
x 10
10.5
10
9.5
9
Tau
8.5
7.5
34
(c) (t)
35
4.2 Conclusion
1X
J (v) = 4ti (4ti ) (mi+1 (v)) (4.2.1)
T
It is proved also that all the YUV videos used have the same 4t while each
AVI video have its own constant 4t through all the video duration. 4t is shown
to be directly proportional to as calculated in Sshape function as well. For
QCIF video, a high motion intensity means value above 5 and (m) is computed
using the following second equation of the s-shaped function as described in
previous chapters. The (m) is almost unit (1) when motion intensity is higher
than 10.
d
(m) = +1d
1 + exp(c (m pm))
Jaf ter
Qtransmission = Jbef ore 100 %
Example:
Assume the jerkiness before transmission is substituted by jerkiness value
of a freeze free video (jerkiness value of test case 7 freeze free), while jerkiness
after transmission is substituted by jerkiness value of the same video after adding
freeze (jerkiness value of test case 7 with freeze).
36
1.8923e008
Qtransmission = 3.1592e008 100 %
Qtransmission =59.9 %
37
Bibliography
[5] VQEG HDTV Group. VQEG HDTV Final Report version2.0 video quality
models. volume 2.5, 2008. http://www.its.bldrdoc.gov/vqeg/projects.aspx.
[7] Anil K. Jain. Fundamentals of digital image processing. Prentice Hall, 1989.
[10] Charles A. Poynton. Digital Video and HDTV: Algorithms and Interfaces.
Morgan Kaufmann, 2003.
[12] S. Winkler. Digital Video Quality: Vision Models and Metrics. John Wiley
and Sons, 2005.
38