0% found this document useful (0 votes)
16 views26 pages

Stats Lecture1

The lecture introduces statistical tools for designing experiments, summarizing data, and quantifying uncertainty, with a focus on principles rather than mathematical details. It covers basic to advanced topics in statistics, including summary statistics, hypothesis testing, and data visualization through case studies, specifically analyzing the effects of different diets on rat weights. Key concepts such as standard error, normal distribution, and p-values are discussed, setting the stage for future lectures on t-tests and confidence intervals.

Uploaded by

Ali Asif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views26 pages

Stats Lecture1

The lecture introduces statistical tools for designing experiments, summarizing data, and quantifying uncertainty, with a focus on principles rather than mathematical details. It covers basic to advanced topics in statistics, including summary statistics, hypothesis testing, and data visualization through case studies, specifically analyzing the effects of different diets on rat weights. Key concepts such as standard error, normal distribution, and p-values are discussed, setting the stage for future lectures on t-tests and confidence intervals.

Uploaded by

Ali Asif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Statistics & Data Analysis

Lecture 1: Introduction
Dr Stephen Sawiak

Physiology, Development and Neuroscience


Overview

Statistical tools help:

1. Design experiments to address scientific questions and test hypotheses


2. Summarise data to share with others
3. Quantify uncertainty and confidence in the results

These lectures aim to give an overview of principle and tools for successful experiments without focusing on
mathematical details

Important to understand the basis of how they work, the assumptions made so they are used appropriately
These six lectures

Topics for beginners to advanced questions: covering basics for beginners to advanced modelling

A catalogue of techniques, important to understand the assumptions/ideas behind

1: basic summary statistics, plots, outliers, Normal distribution, hypothesis testing


2: t-tests, confidence intervals
3: correlation and regression (significance, comparing correlations, controlling for nuisance variables)
4: analysis of variance (many groups, problems with multiple comparisons)
5: power calculations, sample size, linear models
6: statistics in the wild – walkthrough of real papers, extra topics (clustering, non-parametric statistics)
Software packages

Simple tasks readily accomplished in Excel

Dedicated statistics packages exist for some methods (e.g. R, Matlab) many Web-based tutorials etc.

Most important thing is to know what methods are available and when each is appropriate:
let software do the hard work

These lectures are not about software but I will point out commands / methods or go through some
standard output where it is helpful
This Lecture

Rat diet case study


Exploring data: the mean, median and mode
Introduce measures of spread, standard error, normal distribution
Plots: bar graphs, histograms, scatter plots, error bars
Testing for normally distributed data with QQ-plots
Hypothesis testing and p-values
Case study: rat diet

Rats fed diet A vs. diet B

12 rats fed under each regime


for 3 months and weighed at the end

Does the regime affect their weight?


Describing data

Diet regime A

weightsA=[376.9,411.1,416.6,367.3,393.2,190.5,446.0,401.0,433.5,366.6,180.4,399.8];

std(weightsA) = A (365.3g) standard


mean(weightsA) = 365.3g
87.4g < B (386.7g)
mean
deviation

Diet regime B

‘Rats fed diet B are heavier’


weightsB=[354.1,406.8,372.2,392.0,379.1,396.1,378.0,399.8,404.8,424.2,386.1,347.2];
mean(weightsB) = 386.7g
std(weightsB) = 22.1g
Describing data: graphs
Measuring variability

Range: max-min indicates the spread of values: sensitive to outliers

Variance: ‘average of squared differences from the sample mean’

Standard deviation is the square root of the


sample variance

(roughly 2/3 of data within one standard


deviation of the mean for normally distributed
data)
Standard error

Imagine a huge supply of rats you can sample as much as you want, as many times as you want

𝑠
SEM =
√𝑛

Histogram: mean 200, st dev 20


hist(randn([1 1e6])*20+200,128);
Error bars

“Data indicate group means, errorbars indicate 1 SEM.”


Rat diet study

A mean ; standard error


B mean ; standard error

Scatterplot
Median, percentiles and mode

Are there values of “average” that are less sensitive to outliers?

Median: sort the data, the median is the value in the middle (here just 9 points)
0 1 2 3 4 5 6 7 8 9

180 191 367 367 377


377 393 400 401 411

0 11 22 33 44 55 67 78 89 100
Mean: 343g – biased by the two unexpected light rats
Mode: most common value (367g)
Median = 50th percentile: can calculate the value at any position, e.g. 0%, 100% are the lowest and highest
values, here 33rd percentile approximately 367g.
Percentiles can be more robust to outliers

0% 10% 20% 30% 40% 60% 80% 90% 100%


50% 70%

mean
Boxplots

max

75th percentile

median

25th percentile

min

+ outlier

boxplot([weightsA weightsB])
Normal distribution

Normal distribution is defined by two parameters


mean
Mean
Standard deviation

standard deviation
13.7%
68.3%
Cumulative probability distribution

90%

+1.3
50% 0
31% -0.5
10% -1.3
Is my data normally distributed?

Compare percentiles of the data against percentiles of


the Normal distribution

e.g. 10% of the data should be 1.3 standard


deviations less than the mean

50% below the mean

Rather than do a detailed comparison, plot


percentiles of the data against expected percentiles
of the Normal distribution

qqplot(weightsA)
How bad does the plot have to be to panic?

Few hundred samples drawn from a


normal distribution and assessed by
QQ-plot

Considerable variation, especially with


small samples
Deviations from normality
Hypothesis testing

Rats fed diet A vs. diet B

Does the regime affect their weight?

Assume there is no effect of diet


- how likely is it that we would
see such a difference that large?
Null hypothesis

If all the rats were the same, what is the probability of the mean of rat “A” weights
being greater than rat “B” weights by at least as much as observed?
Simulated null experiments

Assume we draw 10 samples from the closest normal distribution to these data and call them “rats A”
Draw 12 samples and call them “rats B”. Calculate the difference in their means. Repeat 1 million times.

p = 0.17
p-values

The p-value gives the probability of a result being at least as extreme if the null hypothesis were true

One-tailed tests only consider one direction (an increase or decrease but not both), two-tailed tests consider
that the effect could have been in either direction

Widely used (with some controversy): best used as an indicator of whether your results are “worth another
look”

It is not quite what you would like to know:


it does not directly tell you how likely the null hypothesis is to be true
it does not give you the probability that your results are due to chance
p-values

For the rat data seen in this lecture, p = 0.17. These data do not offer good evidence to ‘reject the null
hypothesis’ that the diet regimes are the same.

Conventionally, an arbitrary cut-off of at most p < 0.05 is often used to decide whether the null hypothesis is
reasonably consistent with the data.

Lower p-values suggest less confidence in the null hypothesis as an explanation for the data.
Summary

Rat diet case study to explore data: the mean, median and mode
Introduce the standard error, normal distribution
Plots: bar graphs, histograms, scatter plots, error bars
Testing for normally distributed data with QQ-plots
Hypothesis testing and p-values

Next time: all about t-tests and confidence intervals, comparing means

You might also like