0% found this document useful (0 votes)
14 views45 pages

Stats Lecture 1

Uploaded by

Rafan Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views45 pages

Stats Lecture 1

Uploaded by

Rafan Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

BE303 Quantitative Methods and

Finance
Sam Astill
Room EBS 3.8
Email: [email protected]

Academic Support Hours: Thursday 3-4 and Friday 9-11

http://moodle.essex.ac.uk
1
Outline of the Statistics Section:
• Lecture 5: Data and Descriptive Statistics
Barrow Chapter 1
• Lecture 6: Introduction to Probability
Barrow Chapter 2
• Lecture 7: Probability Distribution
Barrow Chapter 3
• Lectures 8-9: Regression and Hypothesis Tests
Barrow Chapters 5 and 7
Textboo
k
Introduction to Quantitative Methods and
Finance
• Dedicated text for the whole year
• Pearson 2022
• Accessible from Pearson MyLab
• Has essential chapters from 3 different
texts:
• Maths (Jacques)
• Stats (Barrow)
• Finance (Levy)
Assessmen
t
End of term test performed on Pearson
Mylab

• No end-of-year exam for this module


• Assessed through end of term test on
Tuesday 13th December, 10am-12pm.
• Test will be 2 hours long
• Will cover material from Maths (Udichi) and
Probability and Statistics (Me)
Lecture 6: Data and Descriptive Statistics

Lecture outline:
• Graphical summary of data
Bar chart
Pie chart
Scatter plot
• Numerical measures of
data
Measures of location
Measures of dispersion
Measure of symmetry

Reading: Barrow – Ch. 1


Learning Outcomes

By the end of this lecture you should be able to:


• Recognise different types of data
• Use Excel to generate graphs that summarise data
• Use numerical measures to summarise data series
• Recognise the relative strength and limitations of the
numerical methods
What is Statistics?

• It is the science of collecting, analysing and


interpreting data for decision making
• It involves data gathering, data description, inference
and interpretation, and decision making
• Data are often collected by interviews, surveys, or
observations
Presenting Data

• We are often concerned with the overall picture of the


data
• Hence we make use of different graphs and summary
descriptive statistics to obtain an overall picture
Data: In-class Test Results for One Module in 2010

• Test results for 187 students in December 2010


• First group them according to the mark range they
belong: fail, pass, 2:2, 2:1, first
• Count the number of marks in each group; they
should sum up to 187
• This produces the frequency table
• We can then plot it using a bar chart or a pie
chart
The frequency table

Mark Range Frequency / Number of Students


fail ( <40) 26
pass (40-49) 38
2:2 (50-59) 61
2:1 (60-69) 40
first (>=70) 22
Total: 187
The bar chart and the pie chart

• In Excel:
• Highlight the data you want to plot
• Go to “Insert”, then select the type of chart you
wish to include.
• If you select your chart after generating you then
access the “Chart Tools” tab.
• Can then add various things to add more
information and make the plot look nicer
Example of a bar chart

Test results, 187 students, Dec 2010

70
61
Number of studetns

60
50
38 40
40
30 26
22
20
10
0
fail pass 2:2 2:1 first
Mark
Example of a pie chart

Test results, 187 students, Dec 2010

12% 14%
fail
21% 20%
pass
2:2
33% 2:1
first

Both charts contain exactly the SAME INFORMATION. They


are just presenting it differently
The histogram

• The histogram is a special bar chart


• It is equivalent to a bar chart when all class widths
(the mark range in the example) are the same
Example of a histogram

Test results, 187 students, Dec 2010


70
61
60
Numberof students

50
38 40
40
30
18
20 12
5 8
10 4
1
0
0-­‐9 10-­‐19 20-­‐29 30-­‐39 40-­‐49 50-­‐59 60-­‐69 70-­‐79 80-­‐89
Mark
The Scatter Plot

• In both of the previous charts we were only


examining what it known as one VARIABLE
• i.e. information on only one characteristic is
recorded in the data
• In this case it is the test score
• What if we are interested in plotting two variables
against one another?
• I will show how this is done as we now discuss
different kinds of data you may come across in
practice
Types of Data
• Times series: a number of observations of the same
variable measured over time, e.g. stock prices for
HSBC over 2010-2020, the inflation rate in the UK
over 1970-2020
• Cross sectional: data recorded at a particular point in
time for one or more variables, e.g. units produced of
a good and cost to produce those goods for a group of
manufacturers
• Panel data: the combination of the two.

HOW MIGHT THIS DATA LOOK?


Types of Data (Cross Sectional)

Scatter Plot of This Data


Units 30

of production, Costs, 25

Cost of Production £'000


x y
20
1 5
2 10.5 15

3 15.5 10

4 25 5

5 16
0
6 20.6 0 1 2 3 4 5 6 7

Units of Production '000


Types of Data (Time Series)

700

650
Time Period Price
600

550
Jan 2010 388
500
Feb 2010 412 Price (GBP)
450
Mar 2010 383
400
Apr 2010 386 350

May 2010 363 300

June 2010 359 250

200
1/1/2010 1/1/2011 1/1/2012 1/1/2013 1/1/2014 1/1/2015 1/1/2016 1/1/2017 1/1/2018 1/1/2019 1/1/2020
Numerical measures to describe data
• A measure of location
1. average and weighted average
2. median
3. mode
• A measure of dispersion
4. range
5. variance
6. standard deviation
• A measure of skewness
7. skewness
1. The mean (average)

• The arithmetic mean, also called average, is the


most commonly adopted measure of location

• It is like you are "flattening out" the numbers


1. The mean (average)

• Add up all the observations and divide it by the


number of observations
• Suppose we have a sample of N observations of a
random variable x : x1, x2 ,...xN
• The mean is
N

x i

 i1

N
1. The mean (average)

• In Excel, the function is AVERAGE


• In the example of the test results, the mean is 53
• We also refer to it as the expected value in the sense
that if we randomly select a mark, we would ‘expect’
that it is 53 on average:

E(x)   
53
1. The weighted average
• Sometimes different observations have different
weights, we use the weighted average to reflect this
   wi xi
i
• And the weights should add up to 1

w i 1
i
• For example, your degree result is
calculated as
=0*Year 1 + 40% * Year 2 + 60% * Year
3
We have the following observations of the end-of-day
stock price for company Alpha over one week:
24, 37, 18, 24, 25.
What is the average of the stock prices?

A. 25
B. 25.6
C. 25.8
D. 26
2. The median

• The Median is the "middle number" (in a sorted list


of numbers); the middle of the distribution

How to Find the Median Value:


• First rank the sample in order
• If there are odd number of observations, the median is
simply the one in the middle
• If there are even number of observations, the median
is the average of the middle two observations
2. The median

• We have a sample: 45, 12, 33, 80, 77


• Arrange in order: 12, 33, 45, 77, 80
• The median is 45
• Another sample: 12, 33, 45, 63, 77, 80
• The median is (45+63)/2=54
• The median is not affected by any increase of the
maximum value or decrease of the minimum value
• Hence robust to extreme values
• The Excel function is “MEDIAN”
We have the following observations of the end-of-
day stock price for company Alpha over one
week:
24, 37, 18, 24, 25.
What is the median of the stock prices?

A. 21
B. 22
C. 23
D. 24
Why is the median useful?

• Most people have heard of the mean even if they’ve


never done any statistics before
• Less so for the median, not as well known
• Used quite often though, best example is when
reporting average salary for workers in an
economy
• Anytime you hear a news story about average pay
they are talking about the median
• Why do they do this?
Mean vs Median

• Suppose I have a simple survey of 5 people, their


salaries are £28k, £26k, £32k, £30k and £38k
• The mean of these numbers is £30.8k
• The median is £30k
• Pretty close together
• This is because all of the numbers are not too far
from one another
• No Extremely large or small values

• What if I’d included a hedge fund manager in my


survey? Let’s say they earn £2m per year
Mean vs Median (With One Extremely Large Value)

• My surveyed salaries are now £28k, £26k, £32k, £30k


and £2,000k
• The mean of these numbers is £423k
• The median is STILL £30k
• The mean is pushed upwards and is now not
representative of anybody in the survey!
• Median is totally identical

• Shows why this measure is used when reporting


“average” pay
2. The median

Just as median divides data into 2 halves, we have


• Quartiles which divide all data into 4 equal parts
• Quintiles which divide all data into 5 equal parts
• Deciles which divide all data into 10 equal parts
• Percentiles which divide all data into 100 equal parts
3. The mode

• The observation that occurs most often


• The ages of a sample of 10 students are:
18, 19, 21, 22, 22, 22, 26, 32, and 36;
• The mode is 22
• Sometimes the data may be bimodal when it has two
modes as in:
18, 19, 21, 22, 22, 22, 26, 32, 32, 32
• Here the two modes are 22 and 32
• The Excel function is “MODE”
When would we use each measure?
• A measure of location
1. average and weighted average
2. median
3. mode
• A measure of dispersion
4. range
5. variance
6. standard deviation
• A measure of skewness
7. skewness
4. The range
• The difference between the largest and smallest
observations. Example 1:
• In {4, 6, 9, 3, 7} the lowest value is 3, and the highest
is 9.

• So the range is 9-3 = 6.


What is the range for the test results, if
the maximum score is 85, and the
minimum score is 5?
A. 85
B. 80
C. 40
D. 5
4. The range

• The inter-quartile range is the difference


between the 1st and 3rd quartiles so it ignores
extreme values
4. The range

• In our sample:
• The 1st quartile Q1 is 187/4=46.75 the average
of the 46th and 47th observation, 44;
• the 3rd quartile is 187*0.75=140.25, the
average of the 140th and the 141st observation,
64
• Half of the students achieve marks between 44
and 64
5. The variance

• The average of the squared differences from the


Mean:

 (x
 2  )
i  2

N
• The larger the variance the greater the dispersion
• In finance, investors use it to estimate the risk of
holding a financial asset
• The Excel function is “VAR ”
6. The standard deviation

• The square root of variance

  (x  )
i
2

N
• In this way, the measurement unit can be properly
defined
• The Excel function is “STDEV”
The variance for a sample

• We make a distinction between a sample and a


population
• For the sample, the variance is

s 2   (x  x) 2
n
• Using this equation1 is especially important when the
sample size (n) is small
7. The skewness

Data can be "skewed", meaning it tends to have a long tail


on one side or the other:

Negative Skew No Skew Positive Skew


7. The skewness

• 1) If a distribution is skewed to the right, it is


positively skewed, skewness>0;
• 2) if a distribution is skewed to the left, it is
negatively skewed, skewness<0;
• 3) else it is symmetric, skewness=0.
1
 (x i 
n 3
x)
1  2 3/ 2
[ n  (x i x) ]

• In Excel, the function is “SKEW”


Asymmetric distributions

• Financial time series often exhibit negative skewness


• Returns are often characterised by a long left tail,
which indicates higher probability of hitting large
negative returns
• You may have heard of this in the context of so
called “Black Swan” events
• Very rare to see large positive returns on any
given day...
• Investors are mindful of this and would like to hold
assets which do not exhibit negative skewness
Summary
• Graphical summary of data
bar chart
pie chart
• Numerical measures of
data
Measures of location
Measures of dispersion
Measure of symmetry

• Next week, we are going to talk about probability


Barrow Chapter 2

You might also like