0% found this document useful (0 votes)

39 views31 pages

101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann

This document discusses using MATLAB for fitting functions to data and computational statistics. It begins with an example of fitting linear, polynomial, and nonlinear functions to reaction time data. It then covers downloading example regression data and using the polyfit function to fit lines and polynomials to the data. The document also discusses why understanding statistics is important, challenges with conventional statistical analyses, and how MATLAB can make statistics easier through simulation and machine learning methods like regularization and cross-validation.

Uploaded by

Rebesques

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views31 pages

101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann

Uploaded by

Rebesques

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

101827-FS2018-0: 

Programming with MATLAB: 

Advanced course
Felix Wichmann 
Neural Information Processing Group and 
Bernstein Center for Computational Neuroscience,  
Eberhard Karls Universität Tübingen & 
Max Planck Institute for Intelligent Systems, Tübingen
3: fitting, regression & 
computational statistics
I am doing psychology and not mathematics or physics—so why should I care
about MATLAB and mathematics?
First, this course will be super-light on mathematical details.
Of course, quantitative analysis of data is everywhere in most areas of
psychological research, either openly or disguised in “cookbook-recipe-
style” statistics.
In particular, some of you may be planning to do EEG or functional imaging,
which generates lots of very high-dimensional data, for which statistical
analysis can be challenging.
Software packages (e.g. SPM or Brain Voyager) can do a lot of work for
you, but are not always applicable, or sometimes “overkill” for a simple
analysis. In addition, it is important to understand what you are doing, and
packages often hide many of the important details and assumptions from
you.

3
Using MATLAB to fit functions to your data: Linear functions

8
Reaction time

0
0 5 10 15 20
Task difficulty

4
Using MATLAB to fit functions to your data: polynomials

8
Reaction time

0
0 5 10 15 20
Task difficulty

5
Using MATLAB to fit functions to your data: general nonlinear functions

4
Lab measurement

−2

−4

−6
0 5 10 15 20
Day after first treatment

6
Plan for today, part 1
First, you download the file RegressionData.mat from ILIAS, and read about
the very convenient MATLAB function polyfit (help polyfit).
Second, try and fit a straight line to the data y and x1 in the above data
file. Plot data and your fitted line—does it look “good”?
Third, try and fit a parabola (polynomial of degree 2) to the data y and x2.
Plot data and your fitted parabola—does it look “good”?
Fourth, try and fit a polynomial to the data y and x3. Plot data and your
fitted polynomial—does it look “good”? Which order of a polynomial do
you need to fit to get a “good” fit?
Finally, study the FunctionFittingDemoScript.m and make sure you
understand exactly what is going on; ideally generate yourself new
datasets and try and fit them.

7
Statistics
How many of you have taken a statistics class before?
How many of you like statistics?
Science is (often) about formulating and testing hypotheses.
Testing hypotheses requires statistics.
In addition, some knowledge of statistics is useful for interpreting
information in your everyday life.

8
Conventional statistical analyses can be really annoying
a) Identify the problem you want to solve.
I want to find out if attention decreases detection thresholds. I need to
test if detection thresholds were significantly lower in the attended than in
the unattended condition.
b) Decide which test you need to use. (You might have to ask a colleague/
flatmate that does statistics/friend/supervisor/use google/read a book.)
To test this, you will need a t-test/Chi-Square-test-for-variance/Q-test/F-
test/Kruskal-Wallis-test/Two-proportion-Z-test/Jarque-Bera-test/
Kolmogorov-Smirnoff-test/Ansari-Bradley-test/Student-t-test/Bla-Bla-
Bla-test...
c) Find out which function in which software implements the test, and how
to use it.
You need the function ttest2 from the MATLAB statistics toolbox, and you
will need to pass your data x and y, the significance level (as probability,
not percent) and the tailedness (left/right/both).
9
Conventional statistical analyses can be really annoying (cont’d)
d) Your annoying colleague/supervisor/referee/examiner asks “Are the
assumptions of your test satisfied?
Is the data Gaussian with equal variances? Are your residuals uncorrelated
and homoscedastic? Do you have enough data for a test that is only valid
asymptotically?
Testing whether the assumptions are truly satisfied might require another
test … thus you have to go back to b.

10
A brief glance at the history of statistical testing
Student’s t-test was developed in 1908 by William S. Gosset 
under the pseudonym “Student”. He was a chemist working at 
the Guinness Brewery in Dublin.
Statistical testing dates back to John Arbuthnot, 1694, and the 
first (known) lecture series on it was given by Karl Pearson 
in 1893.
Picture source: Wikipedia

There were no computers in 1908. Statistical tests had to be such that they
could be done by pen, paper and lists of numbers.
To make this possible, one need to write down mathematical formulas which
describe the distribution of the test-statistic. Thus, tests were tailor-made to
specific simplifying assumptions (data is Gaussian), and often used
approximations which are valid for large N (asymptotics).

11
A brief glance at the history of statistical testing (cont’d)
Thus, for classical tests you have to …
1. know the name of the test
2. check its distributional assumptions
3. (sometimes) live with the fact that they are only approximations.

12
Good news: 
With computers (and MATLAB), statistics can be much easier
Key question in statistics: 
Could this effect just come about by chance?
If the null-hypothesis was true—i.e. there is no difference 
between condition a and b—would we still observe similar—or more
extreme—effects just by chance?
You play dice with a friend, and he has suspiciously many ‘sixes’. Out of 60
attempts, he had 15 sixes. How can you test whether his die is fair?
If you have a fair die (and lots of time), play this game a 1000 times. If you
observe 15 (or more) sixes only rarely (e.g. less than 5% of the time), then
his die is ‘significantly unfair’.

13
Good news: 
With computers (and MATLAB), statistics can be much easier (cont’d)
In MATLAB, we can simulated data for which the null-hypothesis is true—
MATLAB is our perfectly fair die — and it is very fast! See the demo
IsThisDieFair.m

Classical approach: Binomial test, p-value is about 6.5%.

Note: I am not saying that classical statistical methods are bad. If you
know the name of the test that is appropriate for your problem, and you
know how to use it, go ahead.
In fact, MATLAB has functions for many classical statistical tests in the
statistics toolbox. Type in available hypothesis tests into the
MATLAB documentation search.

14
Machine Learning
Comparatively new sub-branch of computational statistics jointly
developed in computer science and statistics.
Machine learning is empirical inference performed by computers based on
past observations and learning algorithms: Machine learning algorithms
are mainly concerned with discovering hidden structure in data in order to
predict novel data—exploratory methods.
Machine learning—and in particular kernel methods as well as
convolutional deep neural networks—have proven successful whenever
there is an abundance of empirical data but a lack of explicit knowledge
how the data were generated.

15
Regularisation & Cross-validation
Find a compromise between complexity and classification performance (or
goodness-of-fit in classical statistics).
Penalise complex functions via a regularisation term or regulariser
Cross-validate the results (leave-one-out or 10-fold typically used)

16
Polynomial Curve Fitting 1
t

0 x 1

N=10 datapoints (training set): x = (x1, … xN) and t = (t1, … tN) 

Prediction game: t* for a new x*

M
X
Choose a polynomial: y(x, ) = 0 + 1x + 2x
2
+ ... + Mx
M
= jx
j

j=0

Figure (1.2) taken from Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer Verlag.
Error Function
t tn

y(xn , w)

xn x

N
X
2
E( ) = {y(xi , i) ti }
i=1

p
ERMS = E( )/N
1 M =0
t

0 x 1

Figure (1.4a) taken from Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer Verlag.
1 M =1
t

0 x 1

Figure (1.4b) taken from Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer Verlag.
1 M =3
t

0 x 1

Figure (1.4c) taken from Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer Verlag.
1 M =9
t

0 x 1

Note: Excellent fit to the data—error free … but a poor representation of the
green curve: over-fitting

Figure (1.4d) taken from Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer Verlag.
1
Training
Test

ERMS 0.5

0
0 3 M 6 9

p
ERMS = E( )/N

Figure (1.5) taken from Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer Verlag.
More data typically helps …

1 M =9 1 N = 15 1 N = 100
t t t

0 0 0

1 1 1

0 x 1 0 x 1 0 x 1

Figure (1.4d and 1.6) taken from Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer Verlag.
… or appropriate regularisation …

1
λ=0 M =9 1
λ reasonable
ln λ = −18 1
λ too large
ln λ = 0
t t t

0 0 0

1 1 1

0 x 1 0 x 1 0 x 1

N
X
2 2
E( ) = {y(xi , i) ti } + ⇥|| ||
i=1

2 T 2 2 2
with || || = = 0 + 1 + ... + M

Figure (1.4d and 1.7) taken from Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Springer Verlag.
Two basic types of statistical analyses 
cover many scenarios you will come across in your research
1) What are the error bars on this quantity? [Is there a significant
difference between A and B?—You can answer this once you have the
error bars for A, B]: Bootstrap Test
2) I have two models (one of which might have more parameters than the
other), which of them is a better model of the data? Cross-Validation
Today, we will discuss example situations for each of the two questions. We
will give “general recipes” for how to address each of the two questions in
MATLAB. 
 
Caveat: Of course, this is a gross simplification. There are cases which do
not quite fit into either of the two boxes.
There are also situations which do fit into one of the boxes, but do require
a more complicated analysis than the ones described here.

26
Scenario 1: What are the error bars on this quantity?
Examples:
I have measured percentage correct of 40%—how accurate is this
measurement?
My average measurement is 30 seconds—what is a 90% confidence
region for this measurement?
The median of my data is 15.3—is this significantly bigger than 10 or not?
Is height correlated with IQ?
Strategy: Boostrap test. Take random subsets of the data (with
replacement), calculate quantity of interest on each subset, get histogram
across different subsets and derive error bars/confidence region from the
percentiles of the histogram. See bootstrapDemoScript.m

27
Scenario 2: I have two models (one of which might have more parameters than the
other), which of them is a better model of the data?
Examples:
Can the dependence of reaction time on the stimulus intensity be
described using a linear function, or do I need a quadratic function?
I have a new saliency model. Is it better at predicting eye movements than
previously developed models?
How well can I decode from my fMRI/EEG what stimulus the subject saw?
Catch: When comparing models of different complexity—i.e. with different
numbers of parameters—the model with more parameters will have an
unfair advantage (better goodness-of-fit).
For example, a quadratic function will always fit the data better than a
line.

28
Scenario 2: I have two models (one of which might have more parameters than the
other), which of them is a better model of the data? (cont’d)
We are interested in generalisation ability—science is a prediction game—
not just fitting the data (‘over-fitting’).
Strategy: Cross-validation. Fit parameters on one subset of the data
(‘training set’), evaluate goodness of fit on other subset (‘test set’).
Repeat.
K-fold cross-validation. Split data into K non-overlapping subsets. Take K-th
subset as test-set, and the other K-1 as training set. Then, take the K-1-th
set as test set, etc.
Leave-One-Out cross-validation. If you have N data-points, take all but one
data-points for training and one data-point as test-set, repeat this
procedure N times.
Cross-validation is extremely important for high-dimensional data or
models!

29
Plan for today, part 2
Computational statistics in MATLAB: Bootstrap and cross-validation.
First, you download the file DemoData.mat from ILIAS, and a support
function called Percentile.m, as well as the following three m-files:
IsThisDieFair.m, bootstrapDemoScript.m,
crossvalidationDemoScript.m

Second, you play with the demo scripts and try and ensure you understand
both the 
i.) logic behind the bootstrap and cross-validation 
as well as 
ii.) the MATLAB code implementing them.

30
Homework
First, read and work through all the files and demos I have supplied you
with and make sure you understand what is happening.
Second, modify the crossvalidationDemoScript.m script and
implement 2-fold as well as 10-fold cross-validation
Third, modify the script again such that the user can set a CONSTANT at the
top of the file, choosing to do either leave-one-out, 2-fold or 10-fold cross-
validation

Internship PPT Final of Collage
No ratings yet
Internship PPT Final of Collage
19 pages
Ai Unit-2
No ratings yet
Ai Unit-2
45 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
Practical Bayesian Inference
100% (2)
Practical Bayesian Inference
322 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
112 pages
(Nicholas J. Higham) Accuracy and Stability of Num
100% (1)
(Nicholas J. Higham) Accuracy and Stability of Num
710 pages
(Ebook PDF) Statistics: Unlocking The Power of Data, 2nd Edition Download
100% (2)
(Ebook PDF) Statistics: Unlocking The Power of Data, 2nd Edition Download
51 pages
Lastest Advancements in Process Control in Refinery
100% (1)
Lastest Advancements in Process Control in Refinery
15 pages
Profit Testing
No ratings yet
Profit Testing
26 pages
Statistical Modeling and Computation Scribd PDF Download
100% (17)
Statistical Modeling and Computation Scribd PDF Download
14 pages
DLMDSAS01 - Advanced Statistics.
100% (1)
DLMDSAS01 - Advanced Statistics.
248 pages
CSEBook
0% (1)
CSEBook
342 pages
Think Stats: Probability and Statistics For Programmers
100% (1)
Think Stats: Probability and Statistics For Programmers
142 pages
Statistical Modeling and Computation Full Access Download
No ratings yet
Statistical Modeling and Computation Full Access Download
16 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Unit - 5 - Dictionary Technique
No ratings yet
Unit - 5 - Dictionary Technique
19 pages
Intro To Traditional and Bayesian M Using R-Guilford 2017
No ratings yet
Intro To Traditional and Bayesian M Using R-Guilford 2017
330 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
Think Stats: Probability and Statistics For Programmers
No ratings yet
Think Stats: Probability and Statistics For Programmers
140 pages
Play With Data Science
No ratings yet
Play With Data Science
306 pages
Ict515 Lec1
No ratings yet
Ict515 Lec1
70 pages
Integral Transforms
No ratings yet
Integral Transforms
104 pages
Solution
No ratings yet
Solution
148 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
DS ML Probability Statistics Interview
No ratings yet
DS ML Probability Statistics Interview
6 pages
Engineering Computations: Solution of Non-Linear Equations
100% (1)
Engineering Computations: Solution of Non-Linear Equations
45 pages
Parametric and Non Parametric Test
No ratings yet
Parametric and Non Parametric Test
76 pages
CSEBook PDF
No ratings yet
CSEBook PDF
342 pages
Advanced Statistical Methods Using R Notes
No ratings yet
Advanced Statistical Methods Using R Notes
55 pages
CSEBook PDF
No ratings yet
CSEBook PDF
342 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
Stata
No ratings yet
Stata
28 pages
Advanced Statistics For The Behavioral Sciences A Computational Approach With R Ebook Full Text
100% (15)
Advanced Statistics For The Behavioral Sciences A Computational Approach With R Ebook Full Text
16 pages
Advanced Patented Methodologies in Ground Vibration Testing For Aerospace Applications
No ratings yet
Advanced Patented Methodologies in Ground Vibration Testing For Aerospace Applications
19 pages
Statistical Analysis in Excel by Golden MCpherson
No ratings yet
Statistical Analysis in Excel by Golden MCpherson
315 pages
Econometrics 2019 PDF
No ratings yet
Econometrics 2019 PDF
143 pages
VFC 4
No ratings yet
VFC 4
3 pages
DSP Lab Manual 2010
0% (1)
DSP Lab Manual 2010
53 pages
Advanced Data Analysis in Neuroscience Integrating Statistical and Computational Models Full Digital Edition
100% (10)
Advanced Data Analysis in Neuroscience Integrating Statistical and Computational Models Full Digital Edition
14 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Pad Unit 2 Ibm
No ratings yet
Pad Unit 2 Ibm
61 pages
CS194 Lec 06 EDA
No ratings yet
CS194 Lec 06 EDA
40 pages
Chapter 12
No ratings yet
Chapter 12
48 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Lec448B 20160406
No ratings yet
Lec448B 20160406
30 pages
Statistic & Machine Learning: Team 2
No ratings yet
Statistic & Machine Learning: Team 2
42 pages
CISE301: Numerical Methods Solution of Nonlinear Equations: Topic 2
No ratings yet
CISE301: Numerical Methods Solution of Nonlinear Equations: Topic 2
90 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
Deep Learning Lecture 6
No ratings yet
Deep Learning Lecture 6
8 pages
CH 2 - Nonlinear Equations
No ratings yet
CH 2 - Nonlinear Equations
22 pages
Research Proposal Quantum Entaglement
No ratings yet
Research Proposal Quantum Entaglement
29 pages
Chapter 1 Tupad 2
No ratings yet
Chapter 1 Tupad 2
17 pages
Design and Analysis of DNA Microarray Investigations Premium Download
No ratings yet
Design and Analysis of DNA Microarray Investigations Premium Download
17 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
7 - Programming With MATLAB
No ratings yet
7 - Programming With MATLAB
22 pages
R Commands
No ratings yet
R Commands
5 pages
Khalid Raihan Talha 1731682642, Koushik Banerjee 1812171642 (CSE299.9 Report)
No ratings yet
Khalid Raihan Talha 1731682642, Koushik Banerjee 1812171642 (CSE299.9 Report)
13 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
Ai Module 6
No ratings yet
Ai Module 6
8 pages
Van Liebergen - Machine Learning in Compliance Risk Management PDF
No ratings yet
Van Liebergen - Machine Learning in Compliance Risk Management PDF
8 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Knight's Tour
No ratings yet
Knight's Tour
8 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
INtroductionGeostatistics R
No ratings yet
INtroductionGeostatistics R
30 pages
Pda PDF
No ratings yet
Pda PDF
37 pages
Islp 3
No ratings yet
Islp 3
5 pages
Maths and Stat Research Projects Supervision 2024 25
No ratings yet
Maths and Stat Research Projects Supervision 2024 25
4 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
Appendix E
No ratings yet
Appendix E
9 pages
Counter: Siemens
No ratings yet
Counter: Siemens
6 pages
VNA Calibration Methods
No ratings yet
VNA Calibration Methods
7 pages
Statistical Learning
No ratings yet
Statistical Learning
2 pages
Statistical Modeling and Computation
No ratings yet
Statistical Modeling and Computation
6 pages
Be Summer 2023
No ratings yet
Be Summer 2023
2 pages
Logistic+Regression+Practice+Exercise+ +solutions - Ipynb Colaboratory
No ratings yet
Logistic+Regression+Practice+Exercise+ +solutions - Ipynb Colaboratory
5 pages
Computational Stadistic With Matlab
No ratings yet
Computational Stadistic With Matlab
11 pages
Histogram Charts in Matlab: Data Analysis Statistics
No ratings yet
Histogram Charts in Matlab: Data Analysis Statistics
13 pages
Computing Thousands of Test Statistics Simultaneously in R
No ratings yet
Computing Thousands of Test Statistics Simultaneously in R
6 pages
Module 5: Design of Sampled Data Control Systems
No ratings yet
Module 5: Design of Sampled Data Control Systems
5 pages
R-Cheatsheet: Help Numerical Summaries Linear Regression
No ratings yet
R-Cheatsheet: Help Numerical Summaries Linear Regression
2 pages
Stat841 Outline
No ratings yet
Stat841 Outline
3 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
No ratings yet
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
3 pages
An Introduction To Modern Bayesian Econometrics: Tony Lancaster May 26, 2003
No ratings yet
An Introduction To Modern Bayesian Econometrics: Tony Lancaster May 26, 2003
10 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann

Uploaded by

101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann

Uploaded by

101827-FS2018-0:

Programming with MATLAB:

Classical approach: Binomial test, p-value is about 6.5%.

N=10 datapoints (training set): x = (x1, … xN) and t = (t1, … tN)

You might also like

101827-FS2018-0: 

Programming with MATLAB: 

N=10 datapoints (training set): x = (x1, … xN) and t = (t1, … tN)