Analysis of A Data Set
Analysis of A Data Set
IMPORTANT: The teachers of this course apply a ‘zero tolerance’ policy regarding academic
dishonesty. Students that sign up this document agree to deliver an original work. The breach
of this commitment will result in academic punishment.
Observations:
Solve the exercises in the Assignment1.pdf file. Note: It is advisable to consult the
manual for basic operation of MATLAB / Octave available on the website of the course.
_______________________________________________________________
1. a) Calculate the frequency table of variable Area. The table must include the absolute,
relative, cumulative absolute and cumulative relative frequencies.
abs_acum =
178
329 We can calculate cumulative
478 frequencies by means of command
cumsum.
>> rel_acum = cumsum(table(:,3))
rel_acum =
3rd column: relative
37.2385
68.8285
100.0000
___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
Complete frequency table
table = Area:
The school is in an
1.0000 178.0000 37.2385 178.0000 37.2385 1 = urban,
2.0000 151.0000 31.5900 329.0000 68.8285 2 = sub-urban, or
3.0000 149.0000 31.1715 478.0000 100.0000 3 = rural area.
2. a) What is the proportion of boys and girls? Represent graphically that proportion with a bar
and a pie chart.
Relative
>> table = tabulate (Gender)
Gender Absolute frequency
(1 = Boys frequency (percent)
table =
2 = Girls) (count) -
PROPORTION
1.0000 227.0000 47.4895
1.0000 227.0000 47.4895
2.0000 251.0000 52.5105
2.0000 251.0000 52.5105
>> bar(table(:,3))
>> pie(table(:,3))
___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
Girls
Boys
b) What is the proportion of boys and girls whose schools are established in urban areas?
%First, we create a matrix with 2 columns, the first one represents the Gender and the
second, the Area. In other words, we join the Gender and Area vectors in a single matrix in
order to compare the values
%Then, we create a vector (with 478 elements(total)) called “Boys_urban” that will have 1s
when the following conditions are met: first column is a 1 (Boy) and second column is also
a 1 (Urban Area).
Boys_urban =
72
%Now, we sum all the 1s in our Boys_urban vector (Therefore, counting how many boys
live in urban areas)
Boys_urban_percent =
31.7181
106
Girls_urban_percent =
42.2311
Therefore, the proportion of boys whose schools are established in urban areas is
31.7181% and the proportion of girls is 42.2311%
4.- Analyse the variables Gender and Goals in a double entry table. Calculate the absolute
frequency table with its marginal distributions and the relative frequency table with its
marginal distributions. (1.5 points)
a)
In this double entry table, the rows represent the gender and the columns represents the
goals (for example, the gender 1 with the goal 1 happens to be 117 times).
b)
We know the absolute frequency of a type of element is the number of times that element is
repeated, and the relative frequency is that number divided into the total number of
elements. Therefore, we can observe when tabulating both variables the absolute frequency
of each value is on the column “count” (for example, regarding the Goals the value 1 is
repeated 247 times).
On the other hand, the relative frequency is located on the “Percent” column, where we can
see the proportion of that variable with respect to the others (in this case the relative
frequency would be Percent/100)
___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
2. Linear Transformations
1. Change of units. Consider the matrix internet in the file internet.mat, and consider the
variable MB (“downloaded Mb”). Define a new variable, KB, as the nº of downloaded
Kb, recall “1Mb = 1024Kb”. The new variable is the result of a linear transformation of
the form y = a + bx. From this transformation, check with MATLAB/Octave the next
theoretical relations: (2 points)
a) y = a + bx.
___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
c) s 2 y = b 2 s 2 x , where s 2 is the sample quasi-variance.
When we open the file internet.mat we find a 95x4 matrix, where the first column
corresponds to MB and the second to the connection time in hours. So first, we create the variables
x for the downloaded MB and v for the connection time in hours:
>> x = internet(:,1)
LINEAR TRANSF. REQUIRED:
>> v = internet(:,2) y = a + bx & u = c + dv
Then, we make the required conversions and create 2 new variables, y and u:
b = 1024 d = 3600
___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019