0% found this document useful (0 votes)
70 views7 pages

Analysis of A Data Set

This document provides instructions for students to complete an assignment in descriptive statistics using MATLAB. It includes instructions to analyze a dataset by calculating frequency tables and proportions, as well as performing linear transformations on variables and calculating correlations. Students are warned that any academic dishonesty will result in punishment.

Uploaded by

Jorge Amat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views7 pages

Analysis of A Data Set

This document provides instructions for students to complete an assignment in descriptive statistics using MATLAB. It includes instructions to analyze a dataset by calculating frequency tables and proportions, as well as performing linear transformations on variables and calculating correlations. Students are warned that any academic dishonesty will result in punishment.

Uploaded by

Jorge Amat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

Telecommunications Engineering

Universidad Carlos III de Madrid


Statistics

Assignment 1: Introduction to MATLAB and Descriptive Statistics

Group Students Signatures

Jorge Amat Zapatero


69 Natalia Paz García
Eva María Sánchez de Rojas Luján

IMPORTANT: The teachers of this course apply a ‘zero tolerance’ policy regarding academic
dishonesty. Students that sign up this document agree to deliver an original work. The breach
of this commitment will result in academic punishment.

Observations:

Solve the exercises in the Assignment1.pdf file. Note: It is advisable to consult the
manual for basic operation of MATLAB / Octave available on the website of the course.
_______________________________________________________________

1. Analysis of a data set

1. a) Calculate the frequency table of variable Area. The table must include the absolute,
relative, cumulative absolute and cumulative relative frequencies.

>> table = tabulate(Area)


The command tabulate calculates
table = the absolute frequencies (Count,
2nd column) and the relative
1.0000 178.0000 37.2385 frequencies in % (Percent, 3rd
2.0000 151.0000 31.5900 column).
3.0000 149.0000 31.1715

>> abs_acum = cumsum(table(:,2)) 2nd column: absolute

abs_acum =

178
329 We can calculate cumulative
478 frequencies by means of command
cumsum.
>> rel_acum = cumsum(table(:,3))

rel_acum =
3rd column: relative
37.2385
68.8285
100.0000

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
Complete frequency table

>> table = [ table abs_acum rel_acum ]

table = Area:
The school is in an
1.0000 178.0000 37.2385 178.0000 37.2385 1 = urban,
2.0000 151.0000 31.5900 329.0000 68.8285 2 = sub-urban, or
3.0000 149.0000 31.1715 478.0000 100.0000 3 = rural area.

Absolute frequency Relative frequency Absolute cumulative Relative cumulative


Value
(Count) (Percent) frequency frequency
1.0000 178.0000 37.2385 178.0000 37.2385
2.0000 151.0000 31.5900 329.0000 68.8285
3.0000 149.0000 31.1715 478.0000 100.0000

b) In which of the three types of areas most students are concentrated?

In urban areas (Area = 1)

2. a) What is the proportion of boys and girls? Represent graphically that proportion with a bar
and a pie chart.
Relative
>> table = tabulate (Gender)
Gender Absolute frequency
(1 = Boys frequency (percent)
table =
2 = Girls) (count) -
PROPORTION
1.0000 227.0000 47.4895
1.0000 227.0000 47.4895
2.0000 251.0000 52.5105
2.0000 251.0000 52.5105

>> bar(table(:,3))
>> pie(table(:,3))

BAR CHART PIE CHART

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
Girls
Boys

b) What is the proportion of boys and girls whose schools are established in urban areas?

>> Gender_Area = [Gender Area]

%First, we create a matrix with 2 columns, the first one represents the Gender and the
second, the Area. In other words, we join the Gender and Area vectors in a single matrix in
order to compare the values

>> Boys_urban = (Gender_Area(:,1) == 1 & Gender_Area(:,2)==1)

%Then, we create a vector (with 478 elements(total)) called “Boys_urban” that will have 1s
when the following conditions are met: first column is a 1 (Boy) and second column is also
a 1 (Urban Area).

>> Boys_urban = sum(Boys_urban,1)

Boys_urban =

72

%Now, we sum all the 1s in our Boys_urban vector (Therefore, counting how many boys
live in urban areas)

>> Boys_urban_percent= (Boys_urban*100)/227

Boys_urban_percent =

31.7181

%Finally, as we are asked for a percentage (proportion):


(nºboys who live in urban areas*100)/total nº of boys
Obtaining the percentage of boys who live in urban areas

We repeat the same process for the girls:

>> Girls_urban = (Gender_Area(:,1) == 2 & Gender_Area(:,2)==1)

>> Girls_urban = sum(Girls_urban,1)


___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
Girls_urban =

106

>> Girls_urban_percent= (Girls_urban*100)/251

Girls_urban_percent =

42.2311

Therefore, the proportion of boys whose schools are established in urban areas is
31.7181% and the proportion of girls is 42.2311%

4.- Analyse the variables Gender and Goals in a double entry table. Calculate the absolute
frequency table with its marginal distributions and the relative frequency table with its
marginal distributions. (1.5 points)

a)
In this double entry table, the rows represent the gender and the columns represents the
goals (for example, the gender 1 with the goal 1 happens to be 117 times).

b)
We know the absolute frequency of a type of element is the number of times that element is
repeated, and the relative frequency is that number divided into the total number of
elements. Therefore, we can observe when tabulating both variables the absolute frequency
of each value is on the column “count” (for example, regarding the Goals the value 1 is
repeated 247 times).

On the other hand, the relative frequency is located on the “Percent” column, where we can
see the proportion of that variable with respect to the others (in this case the relative
frequency would be Percent/100)

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
2. Linear Transformations
1. Change of units. Consider the matrix internet in the file internet.mat, and consider the
variable MB (“downloaded Mb”). Define a new variable, KB, as the nº of downloaded
Kb, recall “1Mb = 1024Kb”. The new variable is the result of a linear transformation of
the form y = a + bx. From this transformation, check with MATLAB/Octave the next
theoretical relations: (2 points)

a) y = a + bx.

First, we calculate the total MB downloaded

Then we transform the first column(which are the MB into KB)

b) ymed = a + bxmed, where med is the median.

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
c) s 2 y = b 2 s 2 x , where s 2 is the sample quasi-variance.

d) sy = |b|sx, where s is the sample quasi-standard deviation.

3. Correlation between linearly transformed variables

When we open the file internet.mat we find a 95x4 matrix, where the first column
corresponds to MB and the second to the connection time in hours. So first, we create the variables
x for the downloaded MB and v for the connection time in hours:

>> x = internet(:,1)
LINEAR TRANSF. REQUIRED:
>> v = internet(:,2) y = a + bx & u = c + dv

Then, we make the required conversions and create 2 new variables, y and u:

“1Mb = 1024Kb” “1h = 3600s”

y = nº od downloaded KB u = connection time in seconds

b = 1024 d = 3600

>> y = x*1024 >> u = v*3600

Finally, we calculate the correlation coefficients:

ρy,u = >> corrcoef(y,u)


Therefore, we have check that the following expression is, in
ans = fact, true:
___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
1.0000 0.7686
0.7686 1.0000

ρx,v = >> corrcoef(x,v)


which indicates that the correlation coefficient between two
ans = variables does not change if a change of units is applied.
1.0000 0.7686
0.7686 1.0000

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019

You might also like