0% found this document useful (0 votes)

16 views26 pages

Stats Lecture1

The lecture introduces statistical tools for designing experiments, summarizing data, and quantifying uncertainty, with a focus on principles rather than mathematical details. It covers basic to advanced topics in statistics, including summary statistics, hypothesis testing, and data visualization through case studies, specifically analyzing the effects of different diets on rat weights. Key concepts such as standard error, normal distribution, and p-values are discussed, setting the stage for future lectures on t-tests and confidence intervals.

Uploaded by

Ali Asif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views26 pages

Stats Lecture1

Uploaded by

Ali Asif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Statistics & Data Analysis

Lecture 1: Introduction
Dr Stephen Sawiak

Physiology, Development and Neuroscience

Overview

Statistical tools help:

1. Design experiments to address scientific questions and test hypotheses

2. Summarise data to share with others
3. Quantify uncertainty and confidence in the results

These lectures aim to give an overview of principle and tools for successful experiments without focusing on
mathematical details

Important to understand the basis of how they work, the assumptions made so they are used appropriately
These six lectures

Topics for beginners to advanced questions: covering basics for beginners to advanced modelling

A catalogue of techniques, important to understand the assumptions/ideas behind

1: basic summary statistics, plots, outliers, Normal distribution, hypothesis testing

2: t-tests, confidence intervals
3: correlation and regression (significance, comparing correlations, controlling for nuisance variables)
4: analysis of variance (many groups, problems with multiple comparisons)
5: power calculations, sample size, linear models
6: statistics in the wild – walkthrough of real papers, extra topics (clustering, non-parametric statistics)
Software packages

Simple tasks readily accomplished in Excel

Dedicated statistics packages exist for some methods (e.g. R, Matlab) many Web-based tutorials etc.

Most important thing is to know what methods are available and when each is appropriate:
let software do the hard work

These lectures are not about software but I will point out commands / methods or go through some
standard output where it is helpful
This Lecture

Rat diet case study

Exploring data: the mean, median and mode
Introduce measures of spread, standard error, normal distribution
Plots: bar graphs, histograms, scatter plots, error bars
Testing for normally distributed data with QQ-plots
Hypothesis testing and p-values
Case study: rat diet

Rats fed diet A vs. diet B

12 rats fed under each regime

for 3 months and weighed at the end

Does the regime affect their weight?

Describing data

Diet regime A

weightsA=[376.9,411.1,416.6,367.3,393.2,190.5,446.0,401.0,433.5,366.6,180.4,399.8];

std(weightsA) = A (365.3g) standard

mean(weightsA) = 365.3g
87.4g < B (386.7g)
mean
deviation

Diet regime B

‘Rats fed diet B are heavier’

weightsB=[354.1,406.8,372.2,392.0,379.1,396.1,378.0,399.8,404.8,424.2,386.1,347.2];
mean(weightsB) = 386.7g
std(weightsB) = 22.1g
Describing data: graphs
Measuring variability

Range: max-min indicates the spread of values: sensitive to outliers

Variance: ‘average of squared differences from the sample mean’

Standard deviation is the square root of the

sample variance

(roughly 2/3 of data within one standard

deviation of the mean for normally distributed
data)
Standard error

Imagine a huge supply of rats you can sample as much as you want, as many times as you want

𝑠
SEM =
√𝑛

Histogram: mean 200, st dev 20

hist(randn([1 1e6])*20+200,128);
Error bars

“Data indicate group means, errorbars indicate 1 SEM.”

Rat diet study

A mean ; standard error

B mean ; standard error

Scatterplot
Median, percentiles and mode

Are there values of “average” that are less sensitive to outliers?

Median: sort the data, the median is the value in the middle (here just 9 points)
0 1 2 3 4 5 6 7 8 9

180 191 367 367 377

377 393 400 401 411

0 11 22 33 44 55 67 78 89 100
Mean: 343g – biased by the two unexpected light rats
Mode: most common value (367g)
Median = 50th percentile: can calculate the value at any position, e.g. 0%, 100% are the lowest and highest
values, here 33rd percentile approximately 367g.
Percentiles can be more robust to outliers

0% 10% 20% 30% 40% 60% 80% 90% 100%

50% 70%

mean
Boxplots

max

75th percentile

median

25th percentile

min

+ outlier

boxplot([weightsA weightsB])
Normal distribution

Normal distribution is defined by two parameters

mean
Mean
Standard deviation

standard deviation
13.7%
68.3%
Cumulative probability distribution

90%

+1.3
50% 0
31% -0.5
10% -1.3
Is my data normally distributed?

Compare percentiles of the data against percentiles of

the Normal distribution

e.g. 10% of the data should be 1.3 standard

deviations less than the mean

50% below the mean

Rather than do a detailed comparison, plot

percentiles of the data against expected percentiles
of the Normal distribution

qqplot(weightsA)
How bad does the plot have to be to panic?

Few hundred samples drawn from a

normal distribution and assessed by
QQ-plot

Considerable variation, especially with

small samples
Deviations from normality
Hypothesis testing

Rats fed diet A vs. diet B

Does the regime affect their weight?

Assume there is no effect of diet

- how likely is it that we would
see such a difference that large?
Null hypothesis

If all the rats were the same, what is the probability of the mean of rat “A” weights
being greater than rat “B” weights by at least as much as observed?
Simulated null experiments

Assume we draw 10 samples from the closest normal distribution to these data and call them “rats A”
Draw 12 samples and call them “rats B”. Calculate the difference in their means. Repeat 1 million times.

p = 0.17
p-values

The p-value gives the probability of a result being at least as extreme if the null hypothesis were true

One-tailed tests only consider one direction (an increase or decrease but not both), two-tailed tests consider
that the effect could have been in either direction

Widely used (with some controversy): best used as an indicator of whether your results are “worth another
look”

It is not quite what you would like to know:

it does not directly tell you how likely the null hypothesis is to be true
it does not give you the probability that your results are due to chance
p-values

For the rat data seen in this lecture, p = 0.17. These data do not offer good evidence to ‘reject the null
hypothesis’ that the diet regimes are the same.

Conventionally, an arbitrary cut-off of at most p < 0.05 is often used to decide whether the null hypothesis is
reasonably consistent with the data.

Lower p-values suggest less confidence in the null hypothesis as an explanation for the data.
Summary

Rat diet case study to explore data: the mean, median and mode
Introduce the standard error, normal distribution
Plots: bar graphs, histograms, scatter plots, error bars
Testing for normally distributed data with QQ-plots
Hypothesis testing and p-values

Next time: all about t-tests and confidence intervals, comparing means

Omv (Tunesien) Production GMBH
100% (1)
Omv (Tunesien) Production GMBH
133 pages
Chapter 2 - Highway Materials
No ratings yet
Chapter 2 - Highway Materials
50 pages
Omojokkpkıjhıhıjğkkmjgfybı Inmokğnılm Mşö-1
No ratings yet
Omojokkpkıjhıhıjğkkmjgfybı Inmokğnılm Mşö-1
40 pages
Form Aoc-4 XBRL Help
No ratings yet
Form Aoc-4 XBRL Help
23 pages
Mathematics Mean and Mode
No ratings yet
Mathematics Mean and Mode
37 pages
Basic Statistics With R - Reaching Decisions With Data
No ratings yet
Basic Statistics With R - Reaching Decisions With Data
262 pages
Statistical Concepts
No ratings yet
Statistical Concepts
51 pages
MTH1310 - Statistics
No ratings yet
MTH1310 - Statistics
34 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Notes 03
No ratings yet
Notes 03
21 pages
The SHS For SHS Framework
No ratings yet
The SHS For SHS Framework
3 pages
ACCA Qualification Global Brochure
No ratings yet
ACCA Qualification Global Brochure
22 pages
NCM 120 - Maternal Concept
No ratings yet
NCM 120 - Maternal Concept
19 pages
9.structural Behaviour and Design Criteria of Concrete Box-Girder Bridges - JRC
No ratings yet
9.structural Behaviour and Design Criteria of Concrete Box-Girder Bridges - JRC
16 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
68 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Assignment
83% (6)
Assignment
16 pages
Grab The Full PDF Version of Test Bank For Human Physiology: An Integrated Approach, 8th Edition, Dee Unglaub Silverthorn, With A Fast Download.
100% (6)
Grab The Full PDF Version of Test Bank For Human Physiology: An Integrated Approach, 8th Edition, Dee Unglaub Silverthorn, With A Fast Download.
72 pages
Space Programming
100% (2)
Space Programming
3 pages
Slides Chp03 Stats 20221
No ratings yet
Slides Chp03 Stats 20221
41 pages
STATISTICS (Averages and Variation)
No ratings yet
STATISTICS (Averages and Variation)
8 pages
Assignment (Key) 1
100% (1)
Assignment (Key) 1
16 pages
Unit I 20cec21 Geometric
No ratings yet
Unit I 20cec21 Geometric
44 pages
Resampling Methods A Practical Guide To Data Analysis Digital EPUB Download
100% (16)
Resampling Methods A Practical Guide To Data Analysis Digital EPUB Download
16 pages
Week 4 Bioscience
No ratings yet
Week 4 Bioscience
37 pages
Arjun S Assignment 1 Basic Stat1
88% (8)
Arjun S Assignment 1 Basic Stat1
21 pages
Introduction To Statistics: Measures of Central Tendency
No ratings yet
Introduction To Statistics: Measures of Central Tendency
35 pages
Statistics
No ratings yet
Statistics
46 pages
Basic Statisticks 1 - Assignment - Vivek T
100% (7)
Basic Statisticks 1 - Assignment - Vivek T
18 pages
Bio Statistics
No ratings yet
Bio Statistics
435 pages
New - FE - I - Exam Form - Submitted List
No ratings yet
New - FE - I - Exam Form - Submitted List
42 pages
R Lab: Assumptions of Normality: Part 1. Assessing Parametric Assumptions
No ratings yet
R Lab: Assumptions of Normality: Part 1. Assessing Parametric Assumptions
18 pages
Salinas CA Fy 2025 26 Adopted Budget in Brief
No ratings yet
Salinas CA Fy 2025 26 Adopted Budget in Brief
13 pages
Hydraulic Diagram MM0434313 - 1
100% (1)
Hydraulic Diagram MM0434313 - 1
4 pages
Full Slides Beginselen2019
No ratings yet
Full Slides Beginselen2019
364 pages
Chapter 7-8-9
No ratings yet
Chapter 7-8-9
26 pages
CHC Rotortales 2004 Annual Edition
No ratings yet
CHC Rotortales 2004 Annual Edition
16 pages
Basic Statistics For Health Sciences
91% (11)
Basic Statistics For Health Sciences
361 pages
Introduction To CFD SPRING 2016
No ratings yet
Introduction To CFD SPRING 2016
36 pages
Assignment 1
100% (1)
Assignment 1
16 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
Modernism and Post Modernism in Literature
No ratings yet
Modernism and Post Modernism in Literature
16 pages
Processfolio
No ratings yet
Processfolio
3 pages
Daily Work Instructions Plan (IKH) 03-04 12 24
No ratings yet
Daily Work Instructions Plan (IKH) 03-04 12 24
3 pages
COMPTRONIX
No ratings yet
COMPTRONIX
18 pages
CENG3300 Lecture 2-2
No ratings yet
CENG3300 Lecture 2-2
23 pages
OBIEE Regression Testing
No ratings yet
OBIEE Regression Testing
9 pages
Studentinfo Homework
No ratings yet
Studentinfo Homework
11 pages
Starting Points in Data Analysis: January 21, 2020
No ratings yet
Starting Points in Data Analysis: January 21, 2020
32 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
2-Basic Statistics For Pharmacology Practicals
No ratings yet
2-Basic Statistics For Pharmacology Practicals
38 pages
2.4 General Epidemiological Measures
No ratings yet
2.4 General Epidemiological Measures
32 pages
Poptropica English L1 - Scope and Sequence
No ratings yet
Poptropica English L1 - Scope and Sequence
2 pages
RABBIT: A Robot-Assisted Bed Bathing System With Multimodal Perception and Integrated Compliance
No ratings yet
RABBIT: A Robot-Assisted Bed Bathing System With Multimodal Perception and Integrated Compliance
10 pages
Artikel 7 Pages From Prosiding Vol 2 No 1 Jan 2020
No ratings yet
Artikel 7 Pages From Prosiding Vol 2 No 1 Jan 2020
7 pages
Homework 5 Problem 1
No ratings yet
Homework 5 Problem 1
4 pages
Assignment ON MGT-516: (Research and Methodology)
100% (1)
Assignment ON MGT-516: (Research and Methodology)
5 pages
Descriptive and Inferential Statistics. Confidence Interval
No ratings yet
Descriptive and Inferential Statistics. Confidence Interval
42 pages
Lecture Notes Ma12003 PDF
100% (1)
Lecture Notes Ma12003 PDF
105 pages
EPGP ProbSet 1
No ratings yet
EPGP ProbSet 1
4 pages
1.1 - Statistical Analysis PDF
No ratings yet
1.1 - Statistical Analysis PDF
10 pages
Accomplishment Report On Booklet
No ratings yet
Accomplishment Report On Booklet
5 pages
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
No ratings yet
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
55 pages
AARM CAIA Benchmarks-1
No ratings yet
AARM CAIA Benchmarks-1
12 pages
11.inferential Statistics March 24
No ratings yet
11.inferential Statistics March 24
74 pages
Activity
No ratings yet
Activity
11 pages
Graph and Stats Intro 25
No ratings yet
Graph and Stats Intro 25
26 pages
SABA Sports Book
No ratings yet
SABA Sports Book
11 pages
Week1 Introduction
No ratings yet
Week1 Introduction
36 pages
Business Analytics
No ratings yet
Business Analytics
47 pages
Assignment
No ratings yet
Assignment
11 pages
Assignment
No ratings yet
Assignment
19 pages
Lecture 1-2-118
No ratings yet
Lecture 1-2-118
117 pages
Statests
No ratings yet
Statests
20 pages
Bio Statistics
No ratings yet
Bio Statistics
97 pages
Data Management Tutorials
No ratings yet
Data Management Tutorials
56 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
33 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
Estadistica Medica Con R
No ratings yet
Estadistica Medica Con R
73 pages
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
No ratings yet
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
62 pages
Describing Data: Measure Sample Population Mean 'X Stand. Dev. Variance Size
No ratings yet
Describing Data: Measure Sample Population Mean 'X Stand. Dev. Variance Size
10 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
NOTES
No ratings yet
NOTES
10 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
Introduction
No ratings yet
Introduction
52 pages
Conflict of Interest Disclosures
No ratings yet
Conflict of Interest Disclosures
24 pages
Basic Statistics: Populations and Samples
No ratings yet
Basic Statistics: Populations and Samples
10 pages
PM&R Volume 4 Issue 12 2012 - Sainani, Kristin L. - Dealing With Non-Normal Data
No ratings yet
PM&R Volume 4 Issue 12 2012 - Sainani, Kristin L. - Dealing With Non-Normal Data
5 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet