Task 1 Statistics
Task 1 Statistics
For
Stadistics
Submitted by
FERNANDEZ
in
UNIVERSIDAD EUROPEA
INTRODUCTION
In order to start getting used to MATLAB in the statistics world we were given the task of finding
various things that we had already seen in class. Throughout this paper, we will be seeing the
results of our work, such as histograms or box plot diagrams. As stated before, all of this task was
done using MATLAB.
1.Determine the kind of variables: discrete or continuos, quantitative or qualitative.
2.Select the important variables as engine size, consumption, weight, type of car and country of
origin. And determine the mean, median, mode, standard deviation, and range for the important
variables. Show the results in a table.
3.Plot the histogram or bar diagram for the 5 variables, and indicate where the mean, median and
mode are in the histogram.
As we know an histogram is the most commonly used graph to show frequency distributions. We
have to first clarify that the data are not integers.
For the engine size histogram, we can observe that, if all the values had the same frequency, the
median would be in between 4 and 5, however, since we have more quantity in 2-3 and 3-4 (the
extremes values have very little frequency) the result would be moved to the left, therefore it would
be lower than 4-5.
In terms of the mean, if all the values had the same frequency, it would be around 4-5, but since
2-3 and 3-4 have more frequency, the data is going to be inclined to those values.
For the highway consumption histogram, we get to the same conclusion. As seen on the graphs,
the most common highway consumption is between 25-30, however we do have a high frequency
for 17.5-20 (aprox), which will move both the median and the mean to the left, (lower values).
Histogram: Consumption
For the weight histogram, we, again, get to the same conclusion. As seen on the graphs, the most
common weight is between 3000-3500, although we do have a high frequency for 3500-4000,
which will move both the median and the mean to the right, (higher values).
Histogram: Weight
Histogram: Type
For a categorical values, only the mode is calculated. As seen on the graphs, it shows directly
which is the value of the mode as it takes the value of variable with highest frequency. Sedan is
for the mode for the Type and US is the mode for Country.
Histogram: Countries
4.Plot a boxplot for the fuel consumption variable splitted by type of cars.
In this boxplot we are shown the fuel consumption in terms of the type of cars. Looking at the
graph we can see that Minivan and Wagon have thinner boxes, which means they have less
dispersion, than Sedan or Wagon, who has the greatest disparity out of the 4. As you can also
observe, the middle line is the median which is not centered in any of the cases, which means the
distance in between quartiles (Q1 and Q3) isn’t symmetric .
6.Plot a scatter diagram for the engine size versus the weight and versus the consumption. You can
do it separately in 2 plots or in the same plot.
Figure 6: Engine Size vs. Weight Figure 7: Engine Size vs. Consumption
A scatter diagram is a diagram who shows the relationship between an independent variable (X
AXIS) and a dependent variable (Y AXIS). As you can see on the graph, as we increase the engine
size the weight increases, on the contrary as you increase the size the consumption decreases.
7.Compute the correlation coefficient for the variables EngSize and Weight and EngSize and
consumption.
With this we learned how to compute the mean, median, mode, standard deviation, and range,
alongside with histograms, boxplot, and scatter diagrams in the program MATLAB.