Lecture 7 - Part 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 71

Data Analytics

for Civil & Environmental Engineers

Lecture 7
Introduction to MATLAB for Data Analysis
(Start of Part 2)

1
Part 2

Your research may require the followings

 Define a scientific problem;


 Collect scientific data;
 Apply in-depth data analysis methods;
 Synthesize the analysis results;
 Provide error or uncertainty estimates;
 Present the research results in written and oral forms.

Note: Main lecture contents of Part 2 - from Prof. Wang, Y. 2


3
4
5
Data analysis: Opinions vs. Facts
 “Everyone is entitled to his own opinion but
not his own facts” -Daniel Patrick Moynihan

 Global warming is related to the rise of Dow Jones


Industrial Averages.

 Genetically modified potatoes have 10 times more


Hg than the “normal” potatoes, and therefore unsafe
for human consumption.

 In the past 60 years, the price of potatoes never fell


in all 50 states at the same time. Buying potatoes
across the country is a safe investment.
6
Scientific method

 Formulate hypotheses;

 Observe events and gather data about natural


phenomena;

 Test hypotheses;

Observations usually play a pivotal role in formulating


and testing hypotheses.

7
Observation platforms

 Fixed in situ instruments

 e.g., ground sites (wind speed, pollution


concentrations), stationary buoy

 Mobile in situ instruments


 e.g., cars, ships, aircrafts, balloons, drifting buoy

 Remote sensing measurements

 fixed; mobile; satellites

8
Sampling

 Space

 location; coverage area

 Time

 frequency; period

 Uncertainties

 Precision; accuracy

 Concurrency

 multi-platform/multi-variable
9
Data types

 0-D measurements (1 site)


 Time series measurements
 1-D measurements
 vertical profiles
 multiple species of pollution at 1 site
 2-D measurements
 surface observation network
 3-D measurements
 2-D network with profiles
 satellites
10
Data analysis methods w/ Matlab (this course)

 Univariate methods
 Empirical distributions
 Theoretical distributions (Guassian)
 Statistical tests (e.g., t, F, and χ2)
 Bivariate methods
 Linear correlation
 Regression/prediction, and analysis of residuals
 Bootstrap and Jackknife estimates of regression uncertainties
 Reduced major axis/curvilinear regression
 Principal component regression (EOF and PC analysis)
 Nonlinear regression and weighted regression
11
Data analysis methods w/ Matlab (this course)

 Time-series analysis
 Auto covariance
 Auto spectral/cross spectral analysis
 Uneven-spaced data processing
 Nonlinear time-series analysis
 Signal processing
 “Irregular” temporal variation – signals
 Multivariate statistics
 Principal component analysis

12
Data analysis

 Confirmatory data analysis


 Confirm or falsify existing hypotheses
 Exploratory data analysis
 Suggest hypotheses about the causes of observed
phenomena
 Assess assumptions on which statistical inference will be
based
 Support the selection of appropriate statistical tools and
techniques
 Provide a basis for further data collection through surveys
or experiments
 AI/machine learning
13
Introduction to MATLAB

 Syntax
 Flow control (loops, if)
 Function
 Plots
 Plot, plot3
 Subplot
 Surf, surfc, contour
 Data handling

14
MATLAB interface
 Command window
display results; run scripts; run statements
command prompt >>
 Directory window (set path; Matlab drive)
Open files
 Workspace
 List of variables
 Double click a matrix to open “array editor”
 Editor
 Edit Matlab script
 Run script
 Help (doc command)
 Path (Set path)
15
MATLAB online version

matlab.mathworks.com; drive.matlab.com

16
Programming languages

17
Syntax: assignment

 Scalar
x = 10
x = 9^0.5
x = sqrt(9)

 Special characters

18
Syntax: Vector assignment

 1-D vector; use [ ] for assignment


 Row vector
 Column vector

 Sequence

 Evenly spaced sequence

 Assigning elements

 Operators
u+v, u-v, u.*v, u*v', u.^0.5
19
Syntax: Vector assignment

 Row vector A=[1 2 3]


 Column vector B=A'; B=[1; 2; 3]
 Sequence A=1:10; A=1:0.5:10
 Evenly spaced sequence A= linspace(1,10,9)
 Assigning elements A(2)=10; A(2:2:6)=9:11
 Remove an element A(5)=[]
 u+v, u v, u*v': matrix operations
 . Operator: applies to the elements
u.*v, u.^0.5
20
Syntax: Matrix assignment

 2-D array; table; use [ ] for assignment

 Assign by row: A=[ 1 2 3; 3 4 5; 6 7 8]

 Assign by column

B = [ [1 2 3]' [2 4 7]' [3 5 8]']

 Matrix operator: A*B

 Element operator: A.*B

21
Matrix initialization

 Initialize a m x n matrix of zeros

x=zeros(m,n);

 Initialize a m x n matrix of ones

x=ones(m,n);

 Initialize a m x n matrix with 1's on the diagonal


and 0's elsewhere

x=eye(m,n);

22
Strings: an array of characters

A = [73 32 97 109 32 108 101 97 114 110 105 110 103]


B = [32 77 65 84 76 65 66]
str1=char(A)
str2=char(B)
str=strcat(str1,str2);
disp(str)

23
Useful commands
 who
lists in alphabetical order all variables in the currently active
workspace
 whos
“who” list with along with their sizes and types
 what
list all MATLAB-related files in current directory
 clear
removes all variables from the workspace
 clc
clears all input and output from the Command Window
 clf
clear the figure window
 close all
close all figures
 diary filename
Save the commands you issued and the results you obtain can be
saved to a disk file specified by filename (end by type ‘diary off’) 24
For loops

for i = array (i.e., first : increment : last)


commands
end

Sum up an array
A=1:100;
sm=0;
for i=1:100
sm=sm+A(i);
end
sm
25
While loops

While condition
commands
end

Sum up an array
A=1:100;
sm=0;
i=1;
while i <= max(size (A))
%size (A) returns [1, 100]
%for a vector, one can use length(A)
sm=sm+A(i);
i=i+1; %very important
end
sm

To interrupt, Ctrl-C
26
Script

>> edit first_script_average


Save, run

A=1:100;
sm=0;
for i=1:100
sm=sm+A(i);
end
average=sm/length(A)

27
Golden Section (golden ratio)

 Since antiquity, many philosophers, artists, and


mathematicians have been intrigued by the golden
section, which Renaissance writers called the divine
proportion.
 A golden section is created by the point C on line
segment AB if AC:AB = CB:AC.

28
29
Golden Section (golden ratio)

Write a matlab script to compute the divine


proportion without using sqrt function

30
x(1)=1+1/2
for i=2:20
x(i)=1+1/x(i-1)
end
gsection=(sqrt(5)+1)/2
plot(x(1:10)-gsection)
xlabel('Iteration')
ylabel('Error')

31
32
How do we convert this script by using “while” statement?
What is the advantage?

x(1)=1+1/2
for i=2:20
x(i)=1+1/x(i-1)
end
gsection=(sqrt(5)+1)/2
plot(x(1:10)-gsection)
xlabel('Iteration')
ylabel('Error')

33
While loop
x(1)=1+1/2;
gsection=(sqrt(5)+1)/2;
i=1;
while abs(gsection-x(i))>= 1e-9
x(i+1)=1+1/x(i);
i=i+1;
end

plot(x(1:10)-gsection)
xlabel('Iteration')
ylabel('Error')

What do we do if we don’t know the answer?


34
Iterative calculation
x(1)=1+1/2;
gsection=(sqrt(5)+1)/2;
x(2)=1+1/x(1);
i=2;
%while abs(gsection-x(i))>= 1e-9
while abs(x(i)-x(i-1))>= 1e-9
x(i+1)=1+1/x(i);
i=i+1;
End

plot(x(1:10)-gsection)
xlabel('Iteration')
ylabel('Error')

35
IF selection

36
Score count

score=fix(100*rand(50,1));
pass=0; fail=0;
for i =1:50
if score(i) >=60
pass=pass+1;
else
fail=fail+1;
end
end
How can we simplify the calculation
pass, fail
without using loops?

37
Vector operation

Logic comparison:
true returns 1; false returns 0

score=fix(100*rand(50,1));
pass = (score >=60);
sum(pass)

%find what the scores are.


index=find(score >=60);
score(index)

38
Relation operators

Less than <


Less than or equal to <=
Greater >
Greater or equal to >=
Equal to ==
Not equal to ~=

39
Logic operators

And &

Or |
Not ~

Next question:
How do we find the scores between 60 and 80?

40
Find the scores between 60 and 80

score=fix(100*rand(50,1));
index=find (score >=60 & score <=80);
score(index)
length(index) %length(x)=max(size(x))
disp(strcat('student # =',...
num2str(length(index))))

How do we write “find” statement with “~”


index=find (~(score < 60 | score >80))

41
Estimating π

 What is the probability of hitting the blue target of


a circle with a radius of 1 within a 2x2 square?

42
Estimating π

n=10^6;
x= 2*rand (n, 1)-1;
y= 2*rand (n, 1)-1;
r= sqrt(x.^2+y.^2);
xpi=sum(r<=1)/n*4

43
Function

 Format
function [out1, out2, ...] = funname(input1, input2, ...)
statements
end
 1-Variable
function b=sort(a)
 2-Variable
function [b,j]=sort(a)
 Must use in a script file using the function name
for the file name
44
Function: Descending sort

 Sort a vector, a , in ascending mode


[b,j ]=sort(a)
b: sorted vector
j: indices of vector a that gives b(:) == a(j)

a=round(100*rand(1,5));
[b, j]=sort(a);
b, a(j)

 Write a function that sort vector a in descending


order using function, sort (do not use sort (a,
‘descend’))
45
Function: Descending sort

46
Inline function
advantage: no need to write a script file

 Useful for math expression

f = inline('sqrt(x.^2+y.^2)','x','y')

function a=mysqrt(x,y)
a=sqrt(x.^2+y.^2)
end

47
Rounding to integers

 ceil
 Round towards plus infinity
 floor
 Round towards minus infinity
 fix
 Round towards zero
 round
 Round towards nearest integer

48
Plot

plot(X,Y,S): plots vector


Y versus vector X

S is a character string
made from one element
from any or all the
following 3 columns
[‘color symbol line_style’]

49
Plot

x = [0:0.1:10];
y = sin(x).*x./(1+cos(x)) ;
plot(x,y,'r-')
plot(x,y, 'b-',x,y,'go')
plot(x,y, 'b-',x,y,'go',x,exp(x+1),'m--')
semilogy(x,abs(y), 'b-',x,abs(y),'go',...
x,exp(x+1),'m--')

50
Plot

 Subplot: divides the current figure into


rectangular panes that are numbered row wise.

subplot(m, n, p)
m: rows
n: columns
p: position 1:m*n;
count by rows

51
Subplot

x = [0:0.1:10];
y = sin(x).*x./(1+cos(x));
subplot(2,2,1), plot(x,y,'r-')
subplot(2,2,2), plot(x,y, 'b-',x,y,'go')
subplot(2,2,3), plot(x,y, 'b-',x,y,'go',x,exp(x+1),'m--')
subplot(2,2,4), semilogy(x,abs(y), 'b-',x,abs(y),'go',...
x,exp(x+1),'m--')

52
Subplot

53
Plot

axis([xmin xmax ymin ymax])


or use xlim([xmin,xmax]); ylim([ymin, ymax])
title('string')
xlabel('string')
ylabel('string')
text(x,y,'string')

54
Plot: Two circles

55
Plot

seq=0:pi/100:2*pi;
r1=4, r2=2;
x=r1*cos(seq);
y=r1*sin(seq);
x2=4-r2*cos(seq); Hold on: retains the
current plot and certain
y2=4-r2*sin(seq);
axes properties so that
fill(x,y,'b') subsequent graphing
hold on commands add to the
existing graph.
plot(x2,y2,'r')

56
sprintf: write formatted data to string
s=sprintf(format, variables)
'%g': generic number format
‘FontSize’ to change font

axis([-r1,4+r2,-r1,4+r2],'square')
xlabel('x'),ylabel('y')
title(sprintf('Graphs of x^2+y^2=%g^2 and (4-x)^2+(4-
y)^2=%g^2',r1,r2), 'FontSize',14)
%title('Graphs of x^2+y^2=4^2 and (4-x)^2+(4-y)^2=2^2')
text(-3,5, 'Two Circles')
hold off

57
3D-plot: plot3

Function
x=tcos(t); y=tsin(t); z=t
t in the range of -10 π to 10 π

t = -10*pi:pi/100:10*pi;
x = t.*cos(t);
y = t.*sin(t);
plot3(x,y,t);
title('Curve u(t) = [t*cos(t), t*sin(t), t ]')
grid
58
Loma Prieta Earthquake

 MATLAB default data earthquake.mat contains


200 Hz acceleration data from the October 17,
1989 Loma Prieta earthquake in the Santa
Cruz Mountains. (What is 200 Hz?)

 Get the file


load quake e n v
e: E-W; n: N-S; v: vertical

 Data were scaled with g


e=g*e
59
Loma Prieta Earthquake

 Data processing

load quake e n v
g = 0.0980; %km/s^2
e = g*e; n = g*n; v = g*v;

 The data were sampled at 200 Hz. How do we


set the time sequence for e, n, v?

60
Loma Prieta Earthquake
%time
delt= 1/200;
t = delt*(1:length(e));

%plot
xrange= [0,50];
yrange= [-250 250];
limits = [xrange yrange];
subplot(3,1,1), plot(t,e,'b'), axis(limits), title('East-West acceleration')
subplot(3,1,2), plot(t,n,'g'), axis(limits), title('North-South acceleration')
subplot(3,1,3), plot(t,v,'r'), axis(limits), title('Vertical acceleration')

How do we plot just between 8 and 25?


61
Loma Prieta Earthquake

%can also use xlim(ylim, zlim) function


%syntax: xlim([xmin xmax])
subplot(3,1,1), xlim([8 25])
subplot(3,1,2), xlim([8 25])
subplot(3,1,3), xlim([8 25])
%subplot(3,1,3), plot(t,v,'r'), xlim([8 25]), title('Vertical acceleration')

% Alternative
%trange=[8, 25];
%subplot(3,1,1), plot(t,e,'b'), xlim(trange), title('East-West acceleration’)
%subplot(3,1,2), plot(t,n,'g'), xlim(trange), title('North-South acceleration’)
%subplot(3,1,3), plot(t,v,'r'), xlim(trange), title('Vertical acceleration’)

62
surf: 3-D shaded surface plot

 surf(X,Y,Z)
X, Y, Z are 2-D arrays

 shading flat (constant in each mesh grid)


 shading interp (smooth-looking)

63
View angle

 view(az,el) sets the viewing angle for a three-


dimensional plot. The azimuth, az, is the
horizontal rotation about the z-axis as measured
in degrees from the negative y-axis.

64
Meshgrid

 Function
z=y2-x2

 Create X, Y arrays (meshgrid)


[X,Y] = meshgrid(x,y) transforms the domain
specified by vectors x and y into arrays X and Y

65
Surfc/surf
x = -1:0.05:1;
y = x;
[xi,yi] = meshgrid(x,y);
zi = yi.^2 -xi.^2; %notice . operator
surfc(xi,yi,zi) %with a contour map beneath
%surf(xi,yi,zi)
shading interp
view(120,30)

grid off
view (30, 30)
Or click “rotation” tool in figure menu
66
2-D surface

x = -1:0.05:1;
y = x;
[xi,yi] = meshgrid(x,y);
zi = yi.^2 -xi.^2;
surf(xi, yi, zi) %surf(x,y, zi) also works
shading interp
view(0,90)
colorbar

shading flat

67
Contour

 Contour (x, y, z) [c,h] = contourf(xi, yi, zi,':');


%dotted contour line
 Contourf (x, y, z) to fill
shading flat
color
clabel(c,h)
 Label
title('The level curves of z =
[c,h]=contour(x, y, z) y^2 -x^2.')
xlabel('x')
clabel(c,h)
ylabel('y')
 Add color bar
colorbar colorbar

68
Data handling

u=1:5; v=2:4;
save test.mat u v
clear
load test.mat

save test.txt u v -ascii


clear
load test.txt (Doesn‟t work. Why?)

69
Data handling

SampleID Percent C Percent S


101 0.3657 0.0636
102 0.2208 0.1135
103 0.5353 0.5191
104 0.5009 0.5216
105 0.5415 -999
106 0.501 -999

70
Data handling

a=load('geochem.txt');

%Remove rows with negative numbers

[r, c]=find(a<0)

a(r,:)=[]

71

You might also like