100% found this document useful (1 vote)

137 views

Data Analysis

Uploaded by

Efe Felix

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

137 views

Data Analysis

Uploaded by

Efe Felix

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 126

MATLAB®

Data Analysis

R2018b
How to Contact MathWorks

Latest news: www.mathworks.com

Sales and services: www.mathworks.com/sales_and_services

User community: www.mathworks.com/matlabcentral

Technical support: www.mathworks.com/support/contact_us

Phone: 508-647-7000

The MathWorks, Inc.

3 Apple Hill Drive
Natick, MA 01760-2098
MATLAB® Data Analysis
© COPYRIGHT 2005–2018 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used
or copied only under the terms of the license agreement. No part of this manual may be photocopied or
reproduced in any form without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by,
for, or through the federal government of the United States. By accepting delivery of the Program or
Documentation, the government hereby agrees that this software or documentation qualifies as commercial
computer software or commercial computer software documentation as such terms are used or defined in
FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this
Agreement and only those rights specified in this Agreement, shall pertain to and govern the use,
modification, reproduction, release, performance, display, and disclosure of the Program and
Documentation by the federal government (or other entity acquiring for or through the federal government)
and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the
government's needs or is inconsistent in any respect with federal procurement law, the government agrees
to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand
names may be trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see
www.mathworks.com/patents for more information.
Revision History
September 2005 Online only New for MATLAB Version 7.1 (Release 14SP3)
March 2006 Online only Revised for MATLAB Version 7.2 (Release
2006a)
September 2006 Online only Revised for MATLAB Version 7.3 (Release
2006b)
March 2007 Online only Revised for MATLAB Version 7.4 (Release
2007a)
September 2007 Online only Revised for MATLAB Version 7.5 (Release
2007b)
March 2008 Online only Revised for MATLAB Version 7.6 (Release
2008a)
October 2008 Online only Revised for MATLAB Version 7.7 (Release
2008b)
March 2009 Online only Revised for MATLAB 7.8 (Release 2009a)
September 2009 Online only Revised for MATLAB 7.9 (Release 2009b)
March 2010 Online only Revised for MATLAB 7.10 (Release 2010a)
September 2010 Online only Revised for MATLAB Version 7.11 (R2010b)
April 2011 Online only Revised for MATLAB Version 7.12 (R2011a)
September 2011 Online only Revised for MATLAB Version 7.13 (R2011b)
March 2012 Online only Revised for MATLAB Version 7.14 (R2012a)
September 2012 Online only Revised for MATLAB Version 8.0 (R2012b)
March 2013 Online only Revised for MATLAB Version 8.1 (R2013a)
September 2013 Online only Revised for MATLAB Version 8.2 (R2013b)
March 2014 Online only Revised for MATLAB Version 8.3 (R2014a)
October 2014 Online only Revised for MATLAB Version 8.4 (R2014b)
March 2015 Online only Revised for MATLAB Version 8.5 (R2015a)
September 2015 Online only Revised for MATLAB Version 8.6 (R2015b)
March 2016 Online only Revised for MATLAB Version 9.0 (R2016a)
September 2016 Online only Revised for MATLAB Version 9.1 (R2016b)
March 2017 Online only Revised for MATLAB Version 9.2 (R2017a)
September 2017 Online only Revised for MATLAB Version 9.3 (R2017b)
March 2018 Online only Revised for MATLAB Version 9.4 (R2018a)
September 2018 Online only Revised for MATLAB Version 9.5 (R2018b)
Contents

Data Processing
1
Importing and Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Importing Data into the Workspace . . . . . . . . . . . . . . . . . . . . . 1-2
Exporting Data from the Workspace . . . . . . . . . . . . . . . . . . . . 1-2

Plotting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
Load and Plot Data from Text File . . . . . . . . . . . . . . . . . . . . . . 1-3

Missing Data in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6

Data Smoothing and Outlier Detection . . . . . . . . . . . . . . . . . . 1-11

Inconsistent Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-24

Filter Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26

Filter Difference Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26
Moving-Average Filter of Traffic Data . . . . . . . . . . . . . . . . . . 1-26
Modify Amplitude of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-27

Smooth Data with Convolution . . . . . . . . . . . . . . . . . . . . . . . . . 1-31

Detrending Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-35

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-35
Remove Linear Trends from Data . . . . . . . . . . . . . . . . . . . . . 1-35

Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-39

Functions for Calculating Descriptive Statistics . . . . . . . . . . 1-39
Example: Using MATLAB Data Statistics . . . . . . . . . . . . . . . . 1-41

v
Regression Analysis
2
Linear Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Correlation Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4

Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
Residuals and Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . 2-11
Fitting Data with Curve Fitting Toolbox Functions . . . . . . . . 2-15

Interactive Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16

The Basic Fitting UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
Preparing for Basic Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
Opening the Basic Fitting UI . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
Example: Using Basic Fitting UI . . . . . . . . . . . . . . . . . . . . . . 2-18

Programmatic Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-35

MATLAB Functions for Polynomial Models . . . . . . . . . . . . . . 2-35
Linear Model with Nonpolynomial Terms . . . . . . . . . . . . . . . 2-41
Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-42
Programmatic Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44

Time Series Analysis

3
What Are Time Series? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2

Time Series Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3

Types of Time Series and Their Uses . . . . . . . . . . . . . . . . . . . . 3-3
Time Series Data Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Example: Time Series Objects and Methods . . . . . . . . . . . . . . 3-5
Time Series Constructor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17
Time Series Collection Constructor . . . . . . . . . . . . . . . . . . . . 3-17

vi Contents
1

Data Processing

• “Importing and Exporting Data” on page 1-2

• “Plotting Data” on page 1-3
• “Missing Data in MATLAB” on page 1-6
• “Data Smoothing and Outlier Detection” on page 1-11
• “Inconsistent Data” on page 1-24
• “Filter Data” on page 1-26
• “Smooth Data with Convolution” on page 1-31
• “Detrending Data” on page 1-35
• “Descriptive Statistics” on page 1-39
1 Data Processing

Importing and Exporting Data

In this section...
“Importing Data into the Workspace” on page 1-2
“Exporting Data from the Workspace” on page 1-2

Importing Data into the Workspace

The first step in analyzing data is to import it into the MATLAB workspace. See “Methods
for Importing Data” for information about importing data from specific file formats.

Exporting Data from the Workspace

When you analyze your data, you might create new variables or modify imported
variables. You can export variables from the MATLAB workspace to various file formats,
both character-based and binary. You can, for example, create HDF and Microsoft® Excel®
files containing your data. For details, see the documentation on “Supported File Formats
for Import and Export”.

1-2
Plotting Data

Plotting Data
In this section...
“Introduction” on page 1-3
“Load and Plot Data from Text File” on page 1-3

Introduction
After you import data into the MATLAB workspace, it is a good idea to plot the data so
that you can explore its features. An exploratory plot of your data enables you to identify
discontinuities and potential outliers, as well as the regions of interest.

The MATLAB figure window displays plots. See “Types of MATLAB Plots” for a full
description of the figure window. It also discusses the various interactive tools available
for editing and customizing MATLAB graphics.

Load and Plot Data from Text File

This example uses sample data in count.dat, a space-delimited text file. The file consists
of three sets of hourly traffic counts, recorded at three different town intersections over a
24-hour period. Each data column in the file represents data for one intersection.

Load the count.dat Data

Import data into the workspace using the load function.

load count.dat

Loading this data creates a 24-by-3 matrix called count in the MATLAB workspace.

Get the size of the data matrix.

[n,p] = size(count)

n = 24

p = 3

n represents the number of rows, and p represents the number of columns.

1-3
1 Data Processing

Plot the count.dat Data

Create a time vector, t, containing integers from 1 to n.

t = 1:n;

Plot the data as a function of time, and annotate the plot.

plot(t,count),
legend('Location 1','Location 2','Location 3','Location','NorthWest')
xlabel('Time'), ylabel('Vehicle Count')
title('Traffic Counts at Three Intersections')

1-4
See Also

More About
• “Types of MATLAB Plots”

1-5
1 Data Processing

Missing Data in MATLAB

Working with missing data is a common task in data preprocessing. Although sometimes
missing values signify a meaningful event in the data, they often represent unreliable or
unusable data points. In either case, MATLAB® has many options for handling missing
data.

Create and Organize Missing Data

The form that missing values take in MATLAB depends on the data type. For example,
numeric data types such as double use NaN (not a number) to represent missing values.

x = [NaN 1 2 3 4];

You can also use the missing value to represent missing numeric data or data of other
types, such as datetime, string, and categorical. MATLAB automatically converts
the missing value to the data's native type.

xDouble = [missing 1 2 3 4]

xDouble = 1×5

NaN 1 2 3 4

xDatetime = [missing datetime(2014,1:4,1)]

xDatetime = 1x5 datetime array

Columns 1 through 3

NaT 01-Jan-2014 00:00:00 01-Feb-2014 00:00:00

Columns 4 through 5

01-Mar-2014 00:00:00 01-Apr-2014 00:00:00

xString = [missing "a" "b" "c" "d"]

xString = 1x5 string array

<missing> "a" "b" "c" "d"

xCategorical = [missing categorical({'cat1' 'cat2' 'cat3' 'cat4'})]

1-6
Missing Data in MATLAB

xCategorical = 1x5 categorical array

<undefined> cat1 cat2 cat3 cat4

A data set might contain values that you want to treat as missing data, but are not
standard MATLAB missing values in MATLAB such as NaN. You can use the
standardizeMissing function to convert those values to the standard missing value for
that data type. For example, treat 4 as a missing double value in addition to NaN.

xStandard = standardizeMissing(xDouble,[4 NaN])

xStandard = 1×5

NaN 1 2 3 NaN

Suppose you want to keep missing values as part of your data set but segregate them
from the rest of the data. Several MATLAB functions enable you to control the placement
of missing values before further processing. For example, use the 'MissingPlacement'
option with the sort function to move NaNs to the end of the data.

xSort = sort(xStandard,'MissingPlacement','last')

xSort = 1×5

1 2 3 NaN NaN

Find, Replace, and Ignore Missing Data

Even if you do not explicitly create missing values in MATLAB, they can appear when
importing existing data or computing with the data. If you are not aware of missing values
in your data, subsequent computation or analysis can be misleading.

For example, if you unknowingly plot a vector containing a NaN value, the NaN does not
appear because the plot function ignores it and plots the remaining points normally.

nanData = [1:9 NaN];

plot(1:10,nanData)

1-7
1 Data Processing

However, if you compute the average of the data, the result is NaN. In this case, it is more
helpful to know in advance that the data contains a NaN, and then choose to ignore or
remove it before computing the average.
meanData = mean(nanData)

meanData = NaN

One way to find NaNs in data is by using the isnan function, which returns a logical array
indicating the location of any NaN value.
TF = isnan(nanData)

TF = 1x10 logical array

1-8
Missing Data in MATLAB

0 0 0 0 0 0 0 0 0 1

Similarly, the ismissing function returns the location of missing values in data for
multiple data types.

TFdouble = ismissing(xDouble)

TFdouble = 1x5 logical array

1 0 0 0 0

TFdatetime = ismissing(xDatetime)

TFdatetime = 1x5 logical array

1 0 0 0 0

Suppose you are working with a table or timetable made up of variables with multiple
data types. You can find all of the missing values with one call to ismissing, regardless
of their type.

xTable = table(xDouble',xDatetime',xString',xCategorical')

xTable=5×4 table
Var1 Var2 Var3 Var4
____ ____________________ _________ ___________

NaN NaT <missing> <undefined>

1 01-Jan-2014 00:00:00 "a" cat1
2 01-Feb-2014 00:00:00 "b" cat2
3 01-Mar-2014 00:00:00 "c" cat3
4 01-Apr-2014 00:00:00 "d" cat4

TF = ismissing(xTable)

TF = 5x4 logical array

1 1 1 1
0 0 0 0
0 0 0 0
0 0 0 0

1-9
1 Data Processing

0 0 0 0

Missing values can represent unusable data for processing or analysis. Use fillmissing
to replace missing values with another value, or use rmmissing to remove missing values
altogether.

xFill = fillmissing(xStandard,'constant',0)

xFill = 1×5

0 1 2 3 0

xRemove = rmmissing(xStandard)

xRemove = 1×3

1 2 3

Many MATLAB functions enable you to ignore missing values, without having to explicitly
locate, fill, or remove them first. For example, if you compute the sum of a vector
containing NaN values, the result is NaN. However, you can directly ignore NaNs in the
sum by using the 'omitnan' option with the sum function.

sumNan = sum(xDouble)

sumNan = NaN

sumOmitnan = sum(xDouble,'omitnan')

sumOmitnan = 10

See Also
ismissing | fillmissing | standardizeMissing | missing

Related Examples
• “Clean Messy and Missing Data in Tables”

1-10
Data Smoothing and Outlier Detection

Data Smoothing and Outlier Detection

Data smoothing refers to techniques for eliminating unwanted noise or behaviors in data,
while outlier detection identifies data points that are significantly different from the rest
of the data.

Moving Window Methods

Moving window methods are ways to process data in smaller batches at a time, typically
in order to statistically represent a neighborhood of points in the data. The moving
average is a common data smoothing technique that slides a window along the data,
computing the mean of the points inside of each window. This can help to eliminate
insignificant variations from one data point to the next.

For example, consider wind speed measurements taken every minute for about 3 hours.
Use the movmean function with a window size of 5 minutes to smooth out high-speed wind
gusts.

load windData.mat
mins = 1:length(speed);
window = 5;
meanspeed = movmean(speed,window);
plot(mins,speed,mins,meanspeed)
axis tight
legend('Measured Wind Speed','Average Wind Speed over 5 min Window','location','best')
xlabel('Time')
ylabel('Speed')

1-11
1 Data Processing

Similarly, you can compute the median wind speed over a sliding window using the
movmedian function.

medianspeed = movmedian(speed,window);
plot(mins,speed,mins,medianspeed)
axis tight
legend('Measured Wind Speed','Median Wind Speed over 5 min Window','location','best')
xlabel('Time')
ylabel('Speed')

1-12
Data Smoothing and Outlier Detection

Not all data is suitable for smoothing with a moving window method. For example, create
a sinusoidal signal with injected random noise.

t = 1:0.2:15;
A = sin(2*pi*t) + cos(2*pi*0.5*t);
Anoise = A + 0.5*rand(1,length(t));
plot(t,A,t,Anoise)
axis tight
legend('Original Data','Noisy Data','location','best')

1-13
1 Data Processing

Use a moving mean with a window size of 3 to smooth the noisy data.

window = 3;
Amean = movmean(Anoise,window);
plot(t,A,t,Amean)
axis tight
legend('Original Data','Moving Mean - Window Size 3')

1-14
Data Smoothing and Outlier Detection

The moving mean achieves the general shape of the data, but doesn't capture the valleys
(local minima) very accurately. Since the valley points are surrounded by two larger
neighbors in each window, the mean is not a very good approximation to those points. If
you make the window size larger, the mean eliminates the shorter peaks altogether. For
this type of data, you might consider alternative smoothing techniques.

Amean = movmean(Anoise,5);
plot(t,A,t,Amean)
axis tight
legend('Original Data','Moving Mean - Window Size 5','location','best')

1-15
1 Data Processing

Common Smoothing Methods

The smoothdata function provides several smoothing options such as the Savitzky-Golay
method, which is a popular smoothing technique used in signal processing. By default,
smoothdata chooses a best-guess window size for the method depending on the data.

Use the Savitzky-Golay method to smooth the noisy signal Anoise, and output the
window size that it uses. This method provides a better valley approximation compared to
movmean.
[Asgolay,window] = smoothdata(Anoise,'sgolay');
plot(t,A,t,Asgolay)
axis tight
legend('Original Data','Savitzky-Golay','location','best')

1-16
Data Smoothing and Outlier Detection

window

window = 3

The robust Lowess method is another smoothing method that is particularly helpful when
outliers are present in the data in addition to noise. Inject an outlier into the noisy data,
and use robust Lowess to smooth the data, which eliminates the outlier.

Anoise(36) = 20;
Arlowess = smoothdata(Anoise,'rlowess',5);
plot(t,Anoise,t,Arlowess)
axis tight
legend('Noisy Data','Robust Lowess')

1-17
1 Data Processing

Detecting Outliers

Outliers in data can significantly skew data processing results and other computed
quantities. For example, if you try to smooth data containing outliers with a moving
median, you can get misleading peaks or valleys.

Amedian = smoothdata(Anoise,'movmedian');
plot(t,Anoise,t,Amedian)
axis tight
legend('Noisy Data','Moving Median')

1-18
Data Smoothing and Outlier Detection

The isoutlier function returns a logical 1 when an outlier is detected. Verify the index
and value of the outlier in Anoise.
TF = isoutlier(Anoise);
ind = find(TF)

ind = 36

Aoutlier = Anoise(ind)

Aoutlier = 20

You can use the filloutliers function to replace outliers in your data by specifying a
fill method. For example, fill the outlier in Anoise with the value of its neighbor
immediately to the right.

1-19
1 Data Processing

Afill = filloutliers(Anoise,'next');
plot(t,Anoise,t,Afill)
axis tight
legend('Noisy Data with Outlier','Noisy Data with Filled Outlier')

Nonuniform Data

Not all data consists of equally spaced points, which can affect methods for data
processing. Create a datetime vector that contains irregular sampling times for the data
in Airreg. The time vector represents samples taken every minute for the first 30
minutes, then hourly over two days.

t0 = datetime(2014,1,1,1,1,1);
timeminutes = sort(t0 + minutes(1:30));

1-20
Data Smoothing and Outlier Detection

timehours = t0 + hours(1:48);
time = [timeminutes timehours];
Airreg = rand(1,length(time));
plot(time,Airreg)
axis tight

By default, smoothdata smooths with respect to equally spaced integers, in this case,
1,2,...,78. Since integer time stamps do not coordinate with the sampling of the points
in Airreg, the first half hour of data still appears noisy after smoothing.

Adefault = smoothdata(Airreg,'movmean',3);
plot(time,Airreg,time,Adefault)
axis tight
legend('Original Data','Smoothed Data with Default Sample Points')

1-21
1 Data Processing

Many data processing functions in MATLAB®, including smoothdata, movmean, and

filloutliers, allow you to provide sample points, ensuring that data is processed
relative to its sampling units and frequencies. To remove the high-frequency variation in
the first half hour of data in Airreg, use the 'SamplePoints' option with the time
stamps in time.

Asamplepoints = smoothdata(Airreg,'movmean',hours(3),'SamplePoints',time);
plot(time,Airreg,time,Asamplepoints)
axis tight
legend('Original Data','Smoothed Data with Sample Points')

1-22
See Also

See Also
smoothdata | isoutlier | filloutliers | movmean | movmedian

Related Examples
• “Filter Data” on page 1-26

1-23
1 Data Processing

Inconsistent Data
When you examine a data plot, you might find that some points appear to differ
dramatically from the rest of the data. In some cases, it is reasonable to consider such
points outliers, or data values that appear to be inconsistent with the rest of the data.

The following example illustrates how to remove outliers from three data sets in the 24-
by-3 matrix count. In this case, an outlier is defined as a value that is more than three
standard deviations away from the mean.

Caution Be cautious about changing data unless you are confident that you understand
the source of the problem you want to correct. Removing an outlier has a greater effect
on the standard deviation than on the mean of the data. Deleting one such point leads to a
smaller new standard deviation, which might result in making some remaining points
appear to be outliers!

% Import the sample data

load count.dat;
% Calculate the mean and the standard deviation
% of each data column in the matrix
mu = mean(count)
sigma = std(count)

The Command Window displays

mu =
32.0000 46.5417 65.5833

sigma =
25.3703 41.4057 68.0281

When an outlier is considered to be more than three standard deviations away from the
mean, use the following syntax to determine the number of outliers in each column of the
count matrix:

[n,p] = size(count);
% Create a matrix of mean values by
% replicating the mu vector for n rows
MeanMat = repmat(mu,n,1);
% Create a matrix of standard deviation values by
% replicating the sigma vector for n rows

1-24
Inconsistent Data

SigmaMat = repmat(sigma,n,1);
% Create a matrix of zeros and ones, where ones indicate
% the location of outliers
outliers = abs(count - MeanMat) > 3*SigmaMat;
% Calculate the number of outliers in each column
nout = sum(outliers)

The procedure returns the following number of outliers in each column:

nout =
1 0 0

There is one outlier in the first data column of count and none in the other two columns.

To remove an entire row of data containing the outlier, type

count(any(outliers,2),:) = [];

Here, any(outliers,2) returns a 1 when any of the elements in the outliers vector
are nonzero. The argument 2 specifies that any works down the second dimension of the
count matrix—its columns.

1-25
1 Data Processing

Filter Data
Filter Difference Equation
Filters are data processing techniques that can smooth out high-frequency fluctuations in
data or remove periodic trends of a specific frequency from data. In MATLAB, the filter
function filters a vector of data x according to the following difference equation, which
describes a tapped delay-line filter.

a(1) y(n) = b(1) x(n) + b(2) x( n - 1) + … + b( Nb ) x(n - N b + 1)

- a(2) y( n - 1) - … - a( N a ) y(n - N a + 1)

In this equation, a and b are vectors of coefficients of the filter, Na is the feedback filter
order, and Nb is the feedforward filter order. n is the index of the current element of x.
The output y(n) is a linear combination of the current and previous elements of x and y.

The filter function uses specified coefficient vectors a and b to filter the input data x.
For more information on difference equations describing filters, see [1].

Moving-Average Filter of Traffic Data

The filter function is one way to implement a moving-average filter, which is a common
data smoothing technique.

The following difference equation describes a filter that averages time-dependent data
with respect to the current hour and the three previous hours of data.

Import data that describes traffic flow over time, and assign the first column of vehicle
counts to the vector x.
load count.dat
x = count(:,1);

Create the filter coefficient vectors.

a = 1;
b = [1/4 1/4 1/4 1/4];

1-26
Filter Data

Compute the 4-hour moving average of the data, and plot both the original data and the
filtered data.
y = filter(b,a,x);

t = 1:length(x);
plot(t,x,'--',t,y,'-')
legend('Original Data','Filtered Data')

Modify Amplitude of Data

This example shows how to modify the amplitude of a vector of data by applying a
transfer function.

1-27
1 Data Processing

In digital signal processing, filters are often represented by a transfer function. The Z-
transform of the difference equation

is the following transfer function.

Use the transfer function

to modify the amplitude of the data in count.dat.

Load the data and assign the first column to the vector x.

load count.dat
x = count(:,1);

Create the filter coefficient vectors according to the transfer function .

a = [1 0.2];
b = [2 3];

Compute the filtered data, and plot both the original data and the filtered data. This filter
primarily modifies the amplitude of the original data.

y = filter(b,a,x);

t = 1:length(x);
plot(t,x,'--',t,y,'-')
legend('Original Data','Filtered Data')

1-28
See Also

References
[1] Oppenheim, Alan V., Ronald W. Schafer, and John R. Buck. Discrete-Time Signal
Processing. Upper Saddle River, NJ: Prentice-Hall, 1999.

See Also
conv | filter | filter2 | movmean | smoothdata

1-29
1 Data Processing

Related Examples
• “Smooth Data with Convolution” on page 1-31

1-30
Smooth Data with Convolution

Smooth Data with Convolution

You can use convolution to smooth 2-D data that contains high-frequency components.

Create 2-D data using the peaks function, and plot the data at various contour levels.
Z = peaks(100);
levels = -7:1:10;
contour(Z,levels)

Inject random noise into the data and plot the noisy contours.
Znoise = Z + rand(100) - 0.5;
contour(Znoise,levels)

1-31
1 Data Processing

The conv2 function in MATLAB® convolves 2-D data with a specified kernel whose
elements define how to remove or enhance features of the original data. Kernels do not
have to be the same size as the input data. Small-sized kernels can be sufficient to smooth
data containing only a few frequency components. Larger sized kernels can provide more
precision for tuning frequency response, resulting in smoother output.

Define a 3-by-3 kernel K and use conv2 to smooth the noisy data in Znoise. Plot the
smoothed contours. The 'same' option in conv2 makes the output the same size as the
input.

K = 0.125*ones(3);
Zsmooth1 = conv2(Znoise,K,'same');
contour(Zsmooth1, levels)

1-32
Smooth Data with Convolution

Smooth the noisy data with a 5-by-5 kernel, and plot the new contours.

K = 0.045*ones(5);
Zsmooth2 = conv2(Znoise,K,'same');
contour(Zsmooth2,levels)

1-33
1 Data Processing

See Also
conv | conv2 | filter | smoothdata

Related Examples
• “Filter Data” on page 1-26

1-34
Detrending Data

Detrending Data
In this section...
“Introduction” on page 1-35
“Remove Linear Trends from Data” on page 1-35

Introduction
The MATLAB function detrend subtracts the mean or a best-fit line (in the least-squares
sense) from your data. If your data contains several data columns, detrend treats each
data column separately.

Removing a trend from the data enables you to focus your analysis on the fluctuations in
the data about the trend. A linear trend typically indicates a systematic increase or
decrease in the data. A systematic shift can result from sensor drift, for example. While
trends can be meaningful, some types of analyses yield better insight once you remove
trends.

Whether it makes sense to remove trend effects in the data often depends on the
objectives of your analysis.

Remove Linear Trends from Data

This example shows how to remove a linear trend from daily closing stock prices to
emphasize the price fluctuations about the overall increase. If the data does have a trend,
detrending it forces its mean to zero and reduces overall variation. The example simulates
stock price fluctuations using a distribution taken from the gallery function.

Create a simulated data set and compute its mean. sdata represents the daily price
changes of a stock.
t = 0:300;
dailyFluct = gallery('normaldata',size(t),2);
sdata = cumsum(dailyFluct) + 20 + t/100;

Find the average of the data.

mean(sdata)

ans = 39.4851

1-35
1 Data Processing

Plot and label the data. Notice the systematic increase in the stock prices that the data
displays.

figure
plot(t,sdata);
legend('Original Data','Location','northwest');
xlabel('Time (days)');
ylabel('Stock Price (dollars)');

Apply detrend, which performs a linear fit to sdata and then removes the trend from it.
Subtracting the output from the input yields the computed trend line.

detrend_sdata = detrend(sdata);
trend = sdata - detrend_sdata;

1-36
Detrending Data

Find the average of the detrended data.

mean(detrend_sdata)

ans = 1.1425e-14

As expected, the detrended data has a mean very close to 0.

Display the results by adding the trend line, the detrended data, and its mean to the
graph.

hold on
plot(t,trend,':r')
plot(t,detrend_sdata,'m')
plot(t,zeros(size(t)),':k')
legend('Original Data','Trend','Detrended Data',...
'Mean of Detrended Data','Location','northwest')
xlabel('Time (days)');
ylabel('Stock Price (dollars)');

1-37
1 Data Processing

See Also
cumsum | detrend | gallery | plot

1-38
Descriptive Statistics

Descriptive Statistics
In this section...
“Functions for Calculating Descriptive Statistics” on page 1-39
“Example: Using MATLAB Data Statistics” on page 1-41

If you need more advanced statistics features, you might want to use the Statistics and
Machine Learning Toolbox™ software.

Functions for Calculating Descriptive Statistics

Use the following MATLAB functions to calculate the descriptive statistics for your data.

Note For matrix data, descriptive statistics for each column are calculated independently.

Statistics Function Summary

Function Description
max Maximum value
mean Average or mean value
median Median value
min Smallest value
mode Most frequent value
std Standard deviation
var Variance, which measures the spread or dispersion of the values

The following examples apply MATLAB functions to calculate descriptive statistics:

• “Example 1 — Calculating Maximum, Mean, and Standard Deviation” on page 1-40

• “Example 2 — Subtracting the Mean” on page 1-41

1-39
1 Data Processing

Example 1 — Calculating Maximum, Mean, and Standard Deviation

This example shows how to use MATLAB functions to calculate the maximum, mean, and
standard deviation values for a 24-by-3 matrix called count. MATLAB computes these
statistics independently for each column in the matrix.

% Load the sample data

load count.dat
% Find the maximum value in each column
mx = max(count)
% Calculate the mean of each column
mu = mean(count)
% Calculate the standard deviation of each column
sigma = std(count)

The results are

mx =
114 145 257

mu =
32.0000 46.5417 65.5833

sigma =
25.3703 41.4057 68.0281

To get the row numbers where the maximum data values occur in each data column,
specify a second output parameter indx to return the row index. For example:

[mx,indx] = max(count)

These results are

mx =
114 145 257

indx =
20 20 20

Here, the variable mx is a row vector that contains the maximum value in each of the
three data columns. The variable indx contains the row indices in each column that
correspond to the maximum values.

1-40
Descriptive Statistics

To find the minimum value in the entire count matrix, 24-by-3 matrix into a 72-by-1
column vector by using the syntax count(:). Then, to find the minimum value in the
single column, use the following syntax:

min(count(:))

ans =
7

Example 2 — Subtracting the Mean

Subtract the mean from each column of the matrix by using the following syntax:

% Get the size of the count matrix

[n,p] = size(count)
% Compute the mean of each column
mu = mean(count)
% Create a matrix of mean values by
% replicating the mu vector for n rows
MeanMat = repmat(mu,n,1)
% Subtract the column mean from each element
% in that column
x = count - MeanMat

Note Subtracting the mean from the data is also called detrending. For more information
about removing the mean or the best-fit line from the data, see “Detrending Data” on
page 1-35.

Example: Using MATLAB Data Statistics

The Data Statistics dialog box helps you calculate and plot descriptive statistics with the
data. This example shows how to use MATLAB Data Statistics to calculate and plot
statistics for a 24-by-3 matrix, called count. The data represents how many vehicles
passed by traffic counting stations on three streets.

This section contains the following topics:

• “Calculating and Plotting Descriptive Statistics” on page 1-42

• “Formatting Data Statistics on Plots” on page 1-44
• “Saving Statistics to the MATLAB Workspace” on page 1-46

1-41
1 Data Processing

• “Generating Code Files” on page 1-47

Note MATLAB Data Statistics is available for 2-D plots only.

Calculating and Plotting Descriptive Statistics

1 Load and plot the data:

load count.dat
[n,p] = size(count);

% Define the x-values

t = 1:n;

% Plot the data and annotate the graph

plot(t,count)
legend('Station 1','Station 2','Station 3','Location','northwest')
xlabel('Time')
ylabel('Vehicle Count')

1-42
Descriptive Statistics

Note The legend contains the name of each data set, as specified by the legend
function: Station 1, Station 2, and Station 3. A data set refers to each column
of data in the array you plotted. If you do not name the data sets, default names are
assigned: data1, data2, and so on.
2 In the Figure window, select Tools > Data Statistics.

The Data Statistics dialog box opens and displays descriptive statistics for the X- and
Y-data of the Station 1 data set.

Note The Data Statistics dialog box displays a range, which is the difference
between the minimum and maximum values in the selected data set. The dialog box
does not display the range on the plot.
3 Select a different data set in the Statistics for list: Station 2.

This displays the statistics for the X and Y data of the Station 2 data set.
4 Select the check box for each statistic you want to display on the plot, and then click
Save to workspace.

For example, to plot the mean of Station 2, select the mean check box in the Y
column.

This plots a horizontal line to represent the mean of Station 2 and updates the
legend to include this statistic.

1-43
1 Data Processing

Formatting Data Statistics on Plots

The Data Statistics dialog box uses colors and line styles to distinguish statistics from the
data on the plot. This portion of the example shows how to customize the display of
descriptive statistics on a plot, such as the color, line width, line style, or marker.

Note Do not edit display properties of statistics until you finish plotting all the statistics
with the data. If you add or remove statistics after editing plot properties, the changes to
plot properties are lost.

To modify the display of data statistics on a plot:

1-44
Descriptive Statistics

1
In the MATLAB Figure window, click the (Edit Plot) button in the toolbar.

This step enables plot editing.

2 Double-click the statistic on the plot for which you want to edit display properties.
For example, double-click the horizontal line representing the mean of Station 2.

This step opens the Property Editor below the MATLAB Figure window, where you
can modify the appearance of the line used to represent this statistic.

3 In the Property Editor, specify the Line and Marker styles, sizes, and colors.

1-45
1 Data Processing

Tip Alternatively, right-click the statistic on the plot, and select an option from the
shortcut menu.

Saving Statistics to the MATLAB Workspace

Perform these steps to save the statistics to the MATLAB workspace.

Note When your plot contains multiple data sets, save statistics for each data set
individually. To display statistics for a different data set, select it from the Statistics for
list in the Data Statistics dialog box.

1 In the Data Statistics dialog box, click the Save to workspace button.
2 In the Save Statistics to Workspace dialog box, select options to save statistics for
either X data, Y data, or both. Then, enter the corresponding variable names.

In this example, save only the Y data. Enter the variable name as Loc2countstats.

3 Click OK.

This step saves the descriptive statistics to a structure. The new variable is added to
the MATLAB workspace.

To view the new structure variable, type the variable name at the MATLAB prompt:

Loc2countstats

Loc2countstats =

min: 9
max: 145
mean: 46.5417
median: 36
mode: 9

1-46
Descriptive Statistics

std: 41.4057
range: 136

Generating Code Files

This portion of the example shows how to generate a file containing MATLAB code that
reproduces the format of the plot and the plotted statistics with new data. Generating a
code file is not available in MATLAB Online™.

1 In the Figure window, select File > Generate Code.

This step creates a function code file and displays it in the MATLAB Editor.
2 Change the name of the function on the first line of the file from createfigure to
something more specific, like countplot. Save the file to your current folder with
the file name countplot.m.
3 Generate some new, random count data:

randcount = 300*rand(24,3);
4 Reproduce the plot with the new data and the recomputed statistics:

countplot(t,randcount)

1-47
1 Data Processing

1-48
2

Regression Analysis

• “Linear Correlation” on page 2-2

• “Linear Regression” on page 2-6
• “Interactive Fitting” on page 2-16
• “Programmatic Fitting” on page 2-35
2 Regression Analysis

Linear Correlation

In this section...
“Introduction” on page 2-2
“Covariance” on page 2-3
“Correlation Coefficients” on page 2-4

Introduction
Correlation quantifies the strength of a linear relationship between two variables. When
there is no correlation between two variables, then there is no tendency for the values of
the variables to increase or decrease in tandem. Two variables that are uncorrelated are
not necessarily independent, however, because they might have a nonlinear relationship.

You can use linear correlation to investigate whether a linear relationship exists between
variables without having to assume or fit a specific model to your data. Two variables that
have a small or no linear correlation might have a strong nonlinear relationship. However,
calculating linear correlation before fitting a model is a useful way to identify variables
that have a simple relationship. Another way to explore how variables are related is to
make scatter plots of your data.

Covariance quantifies the strength of a linear relationship between two variables in units
relative to their variances. Correlations are standardized covariances, giving a
dimensionless quantity that measures the degree of a linear relationship, separate from
the scale of either variable.

The following three MATLAB functions compute sample correlation coefficients and
covariance. These sample coefficients are estimates of the true covariance and correlation
coefficients of the population from which the data sample is drawn.

Function Description
corrcoef Correlation coefficient matrix
cov Covariance matrix
xcorr (a Signal Cross-correlation sequence of a random process (includes
Processing autocorrelation)
Toolbox™ function)

2-2
Linear Correlation

Covariance
Use the MATLAB cov function to calculate the sample covariance matrix for a data matrix
(where each column represents a separate quantity).

The sample covariance matrix has the following properties:

• cov(X) is symmetric.
• diag(cov(X)) is a vector of variances for each data column. The variances represent
a measure of the spread or dispersion of data in the corresponding column. (The var
function calculates variance.)
• sqrt(diag(cov(X))) is a vector of standard deviations. (The std function
calculates standard deviation.)
• The off-diagonal elements of the covariance matrix represent the covariances between
the individual data columns.

Here, X can be a vector or a matrix. For an m-by-n matrix, the covariance matrix is n-by-n.

For an example of calculating the covariance, load the sample data in count.dat that
contains a 24-by-3 matrix:
load count.dat

Calculate the covariance matrix for this data:

cov(count)

MATLAB responds with the following result:

ans =
1.0e+003 *
0.6437 0.9802 1.6567
0.9802 1.7144 2.6908
1.6567 2.6908 4.6278

The covariance matrix for this data has the following form:

È s211 s212 s213 ˘

Í 2 ˙
Í s 21 s2 22 s2 23 ˙
Í s2 31 s2 32 s2 33 ˙˚
Î
s2 ij = s2 ji

2-3
2 Regression Analysis

Here, s2ij is the sample covariance between column i and column j of the data. Because
the count matrix contains three columns, the covariance matrix is 3-by-3.

Note In the special case when a vector is the argument of cov, the function returns the
variance.

Correlation Coefficients
The MATLAB function corrcoef produces a matrix of sample correlation coefficients for
a data matrix (where each column represents a separate quantity). The correlation
coefficients range from -1 to 1, where

• Values close to 1 indicate that there is a positive linear relationship between the data
columns.
• Values close to -1 indicate that one column of data has a negative linear relationship to
another column of data (anticorrelation).
• Values close to or equal to 0 suggest there is no linear relationship between the data
columns.

For an m-by-n matrix, the correlation-coefficient matrix is n-by-n. The arrangement of the
elements in the correlation coefficient matrix corresponds to the location of the elements
in the covariance matrix, as described in “Covariance” on page 2-3.

For an example of calculating correlation coefficients, load the sample data in count.dat
that contains a 24-by-3 matrix:

load count.dat

Type the following syntax to calculate the correlation coefficients:

corrcoef(count)

This results in the following 3-by-3 matrix of correlation coefficients:

ans =
1.0000 0.9331 0.9599
0.9331 1.0000 0.9553
0.9599 0.9553 1.0000

2-4
Linear Correlation

Because all correlation coefficients are close to 1, there is a strong positive correlation
between each pair of data columns in the count matrix.

2-5
2 Regression Analysis

Linear Regression
In this section...
“Introduction” on page 2-6
“Simple Linear Regression” on page 2-7
“Residuals and Goodness of Fit” on page 2-11
“Fitting Data with Curve Fitting Toolbox Functions” on page 2-15

Introduction
A data model explicitly describes a relationship between predictor and response
variables. Linear regression fits a data model that is linear in the model coefficients. The
most common type of linear regression is a least-squares fit, which can fit both lines and
polynomials, among other linear models.

Before you model the relationship between pairs of quantities, it is a good idea to perform
correlation analysis to establish if a linear relationship exists between these quantities. Be
aware that variables can have nonlinear relationships, which correlation analysis cannot
detect. For more information, see “Linear Correlation” on page 2-2.

The MATLAB Basic Fitting UI helps you to fit your data, so you can calculate model
coefficients and plot the model on top of the data. For an example, see “Example: Using
Basic Fitting UI” on page 2-18. You also can use the MATLAB polyfit and polyval
functions to fit your data to a model that is linear in the coefficients. For an example, see
“Programmatic Fitting” on page 2-44.

If you need to fit data with a nonlinear model, transform the variables to make the
relationship linear. Alternatively, try to fit a nonlinear function directly using either the
Statistics and Machine Learning Toolbox nlinfit function, the Optimization Toolbox™
lsqcurvefit function, or by applying functions in the Curve Fitting Toolbox™.

This topic explains how to:

• Perform simple linear regression using the \ operator.

• Use correlation analysis to determine whether two quantities are related to justify
fitting the data.
• Fit a linear model to the data.

2-6
Linear Regression

• Evaluate the goodness of fit by plotting residuals and looking for patterns.
• Calculate measures of goodness of fit R2 and adjusted R2

Simple Linear Regression

This example shows how to perform simple linear regression using the accidents
dataset. The example also shows you how to calculate the coefficient of determination
to evaluate the regressions. The accidents dataset contains data for fatal traffic
accidents in U.S. states.

Linear regression models the relation between a dependent, or response, variable and
one or more independent, or predictor, variables . Simple linear regression
considers only one independent variable using the relation

where is the y-intercept, is the slope (or regression coefficient), and is the error
term.

Start with a set of observed values of and given by , , ..., .

Using the simple linear regression relation, these values form a system of linear
equations. Represent these equations in matrix form as

Let

2-7
2 Regression Analysis

The relation is now .

In MATLAB, you can find using the mldivide operator as B = X\Y.

From the dataset accidents, load accident data in y and state population data in x. Find
the linear regression relation between the accidents in a state and the population
of a state using the \ operator. The \ operator performs a least-squares regression.

load accidents
x = hwydata(:,14); %Population of states
y = hwydata(:,4); %Accidents per state
format long
b1 = x\y

b1 =
1.372716735564871e-04

b1 is the slope or regression coefficient. The linear relation is .

Calculate the accidents per state yCalc from x using the relation. Visualize the
regression by plotting the actual values y and the calculated values yCalc.

yCalc1 = b1*x;
scatter(x,y)
hold on
plot(x,yCalc1)
xlabel('Population of state')
ylabel('Fatal traffic accidents per state')
title('Linear Regression Relation Between Accidents & Population')
grid on

2-8
Linear Regression

Improve the fit by including a y-intercept in your model as . Calculate

by padding x with a column of ones and using the \ operator.

X = [ones(length(x),1) x];
b = X\y

b = 2×1
102 ×

1.427120171726537
0.000001256394274

2-9
2 Regression Analysis

This result represents the relation .

Visualize the relation by plotting it on the same figure.

yCalc2 = X*b;
plot(x,yCalc2,'--')
legend('Data','Slope','Slope & Intercept','Location','best');

From the figure, the two fits look similar. One method to find the better fit is to calculate
the coefficient of determination, . is one measure of how well a model can predict
the data, and falls between and . The higher the value of , the better the model is
at predicting the data.

2-10
Linear Regression

Where represents the calculated values of and is the mean of , is defined as

Find the better fit of the two fits by comparing values of . As the values show, the
second fit that includes a y-intercept is better.
Rsq1 = 1 - sum((y - yCalc1).^2)/sum((y - mean(y)).^2)

Rsq1 =
0.822235650485566

Rsq2 = 1 - sum((y - yCalc2).^2)/sum((y - mean(y)).^2)

Rsq2 =
0.838210531103428

Residuals and Goodness of Fit

Residuals are the difference between the observed values of the response (dependent)
variable and the values that a model predicts. When you fit a model that is appropriate for
your data, the residuals approximate independent random errors. That is, the distribution
of residuals ought not to exhibit a discernible pattern.

Producing a fit using a linear model requires minimizing the sum of the squares of the
residuals. This minimization yields what is called a least-squares fit. You can gain insight
into the “goodness” of a fit by visually examining a plot of the residuals. If the residual
plot has a pattern (that is, residual data points do not appear to have a random scatter),
the randomness indicates that the model does not properly fit the data.

Evaluate each fit you make in the context of your data. For example, if your goal of fitting
the data is to extract coefficients that have physical meaning, then it is important that
your model reflect the physics of the data. Understanding what your data represents, how
it was measured, and how it is modeled is important when evaluating the goodness of fit.

2-11
2 Regression Analysis

One measure of goodness of fit is the coefficient of determination, or R2 (pronounced r-

square). This statistic indicates how closely values you obtain from fitting a model match
the dependent variable the model is intended to predict. Statisticians often define R2
using the residual variance from a fitted model:

R2 = 1 – SSresid / SStotal

SSresid is the sum of the squared residuals from the regression. SStotal is the sum of the
squared differences from the mean of the dependent variable (total sum of squares). Both
are positive scalars.

To learn how to compute R2 when you use the Basic Fitting tool, see “Derive R2, the
Coefficient of Determination” on page 2-24. To learn more about calculating the R2
statistic and its multivariate generalization, continue reading here.

Example: Computing R2 from Polynomial Fits

You can derive R2 from the coefficients of a polynomial regression to determine how much
variance in y a linear model explains, as the following example describes:

1 Create two variables, x and y, from the first two columns of the count variable in the
data file count.dat:

load count.dat
x = count(:,1);
y = count(:,2);
2 Use polyfit to compute a linear regression that predicts y from x:

p = polyfit(x,y,1)

p =
1.5229 -2.1911

p(1) is the slope and p(2) is the intercept of the linear predictor. You can also
obtain regression coefficients using the Basic Fitting UI on page 2-16.
3 Call polyval to use p to predict y, calling the result yfit:

yfit = polyval(p,x);

Using polyval saves you from typing the fit equation yourself, which in this case
looks like:

yfit = p(1) * x + p(2);

2-12
Linear Regression

4 Compute the residual values as a vector of signed numbers:

yresid = y - yfit;
5 Square the residuals and total them to obtain the residual sum of squares:

SSresid = sum(yresid.^2);
6 Compute the total sum of squares of y by multiplying the variance of y by the number
of observations minus 1:

SStotal = (length(y)-1) * var(y);

7 Compute R2 using the formula given in the introduction of this topic:

rsq = 1 - SSresid/SStotal

rsq =
0.8707

This demonstrates that the linear equation 1.5229 * x -2.1911 predicts 87% of
the variance in the variable y.

Computing Adjusted R2 for Polynomial Regressions

You can usually reduce the residuals in a model by fitting a higher degree polynomial.
When you add more terms, you increase the coefficient of determination, R2. You get a
closer fit to the data, but at the expense of a more complex model, for which R2 cannot
account. However, a refinement of this statistic, adjusted R2, does include a penalty for
the number of terms in a model. Adjusted R2, therefore, is more appropriate for
comparing how different models fit to the same data. The adjusted R2 is defined as:

R2adjusted = 1 - (SSresid / SStotal)*((n-1)/(n-d-1))

where n is the number of observations in your data, and d is the degree of the polynomial.
(A linear fit has a degree of 1, a quadratic fit 2, a cubic fit 3, and so on.)

The following example repeats the steps of the previous example, “Example: Computing
R2 from Polynomial Fits” on page 2-12, but performs a cubic (degree 3) fit instead of a
linear (degree 1) fit. From the cubic fit, you compute both simple and adjusted R2 values
to evaluate whether the extra terms improve predictive power:

1 Create two variables, x and y, from the first two columns of the count variable in the
data file count.dat:

2-13
2 Regression Analysis

load count.dat
x = count(:,1);
y = count(:,2);
2 Call polyfit to generate a cubic fit to predict y from x:

p = polyfit(x,y,3)

p =
-0.0003 0.0390 0.2233 6.2779

p(4) is the intercept of the cubic predictor. You can also obtain regression
coefficients using the Basic Fitting UI on page 2-16.
3 Call polyval to use the coefficients in p to predict y, naming the result yfit:

yfit = polyval(p,x);

polyval evaluates the explicit equation you could manually enter as:

yfit = p(1) * x.^3 + p(2) * x.^2 + p(3) * x + p(4);

4 Compute the residual values as a vector of signed numbers:

yresid = y - yfit;
5 Square the residuals and total them to obtain the residual sum of squares:

SSresid = sum(yresid.^2);
6 Compute the total sum of squares of y by multiplying the variance of y by the number
of observations minus 1:

SStotal = (length(y)-1) * var(y);

7 Compute simple R2 for the cubic fit using the formula given in the introduction of this
topic:

rsq = 1 - SSresid/SStotal

rsq =
0.9083
8 Finally, compute adjusted R2 to account for degrees of freedom:
rsq_adj = 1 - SSresid/SStotal * (length(y)-1)/(length(y)-length(p))

rsq_adj =
0.8945

2-14
Linear Regression

The adjusted R2, 0.8945, is smaller than simple R2, .9083. It provides a more reliable
estimate of the power of your polynomial model to predict.

In many polynomial regression models, adding terms to the equation increases both R2
and adjusted R2. In the preceding example, using a cubic fit increased both statistics
compared to a linear fit. (You can compute adjusted R2 for the linear fit for yourself to
demonstrate that it has a lower value.) However, it is not always true that a linear fit is
worse than a higher-order fit: a more complicated fit can have a lower adjusted R2 than a
simpler fit, indicating that the increased complexity is not justified. Also, while R2 always
varies between 0 and 1 for the polynomial regression models that the Basic Fitting tool
generates, adjusted R2 for some models can be negative, indicating that a model that has
too many terms.

Correlation does not imply causality. Always interpret coefficients of correlation and
determination cautiously. The coefficients only quantify how much variance in a
dependent variable a fitted model removes. Such measures do not describe how
appropriate your model—or the independent variables you select—are for explaining the
behavior of the variable the model predicts.

Fitting Data with Curve Fitting Toolbox Functions

The Curve Fitting Toolbox software extends core MATLAB functionality by enabling the
following data-fitting capabilities:

• Linear and nonlinear parametric fitting, including standard linear least squares,
nonlinear least squares, weighted least squares, constrained least squares, and robust
fitting procedures
• Nonparametric fitting
• Statistics for determining the goodness of fit
• Extrapolation, differentiation, and integration
• Dialog box that facilitates data sectioning and smoothing
• Saving fit results in various formats, including MATLAB code files, MAT-files, and
workspace variables

For more information, see the Curve Fitting Toolbox documentation.

2-15
2 Regression Analysis

Interactive Fitting
In this section...
“The Basic Fitting UI” on page 2-16
“Preparing for Basic Fitting” on page 2-16
“Opening the Basic Fitting UI” on page 2-17
“Example: Using Basic Fitting UI” on page 2-18

The Basic Fitting UI

The MATLAB Basic Fitting UI allows you to interactively:

• Model data using a spline interpolant, a shape-preserving interpolant, or a polynomial

up to the tenth degree
• Plot one or more fits together with data
• Plot the residuals of the fits
• Compute model coefficients
• Compute the norm of the residuals (a statistic you can use to analyze how well a model
fits your data)
• Use the model to interpolate or extrapolate outside of the data
• Save coefficients and computed values to the MATLAB workspace for use outside of
the dialog box
• Generate MATLAB code to recompute fits and reproduce plots with new data

Note The Basic Fitting UI is only available for 2-D plots. For more advanced fitting and
regression analysis, see the Curve Fitting Toolbox documentation and the Statistics and
Machine Learning Toolbox documentation.

Preparing for Basic Fitting

The Basic Fitting UI sorts your data in ascending order before fitting. If your data set is
large and the values are not sorted in ascending order, it will take longer for the Basic
Fitting UI to preprocess your data before fitting.

2-16
Interactive Fitting

You can speed up the Basic Fitting UI by first sorting your data. To create sorted vectors
x_sorted and y_sorted from data vectors x and y, use the MATLAB sort function:

[x_sorted, i] = sort(x);
y_sorted = y(i);

Opening the Basic Fitting UI

To use the Basic Fitting UI, you must first plot your data in a figure window, using any
MATLAB plotting command that produces (only) x and y data.

To open the Basic Fitting UI, select Tools > Basic Fitting from the menus at the top of
the figure window.

When you fully expand it by twice clicking the arrow button in the lower right corner,
the window displays three panels. Use these panels to:

• Select a model and plotting options

• Examine and export model coefficients and norms of residuals
• Examine and export interpolated and extrapolated values.

2-17
2 Regression Analysis

To expand or collapse panels one-by-one, click the arrow button in the lower right corner
of the interface.

Example: Using Basic Fitting UI

This example shows how to use the Basic Fitting UI to fit, visualize, analyze, save, and
generate code for polynomial regressions.

• “Load and Plot Census Data” on page 2-19

• “Predict the Census Data with a Cubic Polynomial Fit” on page 2-19
• “View and Save the Cubic Fit Parameters” on page 2-23
• “Derive R2, the Coefficient of Determination” on page 2-24

2-18
Interactive Fitting

• “Interpolate and Extrapolate Population Values” on page 2-28

• “Generate a Code File to Reproduce the Result” on page 2-31
• “Learn How the Basic Fitting Tool Computes Fits” on page 2-32

Load and Plot Census Data

The file, census.mat, contains U.S. population data for the years 1790 through 1990 at
10 year intervals.

To load and plot the data, type the following commands at the MATLAB prompt:

load census
plot(cdate,pop,'ro')

The load command adds the following variables to the MATLAB workspace:

• cdate — A column vector containing the years from 1790 to 1990 in increments of 10.
It is the predictor variable.
• pop — A column vector with U.S. population for each year in cdate. It is the response
variable.

The data vectors are sorted in ascending order, by year. The plot shows the population as
a function of year.

Now you are ready to fit an equation the data to model population growth over time.

Predict the Census Data with a Cubic Polynomial Fit

1 Open the Basic Fitting dialog box by selecting Tools > Basic Fitting in the Figure
window.
2 In the Plot fits area of the Basic Fitting dialog box, select the cubic check box to fit a
cubic polynomial to the data.

MATLAB uses your selection to fit the data, and adds the cubic regression line to the
graph as follows.

2-19
2 Regression Analysis

In computing the fit, MATLAB encounters problems and issues the following warning:

Polynomial is badly conditioned.

Add points with distinct X values,
select a polynomial with a lower degree,
or select "Center and scale X data."

This warning indicates that the computed coefficients for the model are sensitive to
random errors in the response (the measured population). It also suggests some
things you can do to get a better fit.

2-20
Interactive Fitting

3 Continue to use a cubic fit. As you cannot add new observations to the census data,
improve the fit by transforming the values you have to z-scores before recomputing a
fit. Select the Center and scale X data check box in the dialog box to make the
Basic Fitting tool perform the transformation.

To learn how centering and scaling data works, see “Learn How the Basic Fitting Tool
Computes Fits” on page 2-32.
4 Now view the equations and display residuals. In addition to selecting the Center
and scale X data and cubic check boxes, select the following options:

• Show equations
• Plot residuals
• Show norm of residuals

Selecting Plot residuals creates a subplot of them as a bar graph. The following figure
displays the results of the Basic Fitting UI options you selected.

2-21
2 Regression Analysis

The cubic fit is a poor predictor before the year 1790, where it indicates a decreasing
population. The model seems to approximate the data reasonably well after 1790.
However, a pattern in the residuals shows that the model does not meet the assumption of
normal error, which is a basis for the least-squares fitting. The data 1 line identified in
the legend are the observed x (cdate) and y (pop) data values. The cubic regression line
presents the fit after centering and scaling data values. Notice that the figure shows the
original data units, even though the tool computes the fit using transformed z-scores.

For comparison, try fitting another polynomial equation to the census data by selecting it
in the Plot fits area.

2-22
Interactive Fitting

View and Save the Cubic Fit Parameters

In the Basic Fitting dialog box, click the arrow button to display the estimated
coefficients and the norm of the residuals in the Numerical results panel.

To view a specific fit, select it from the Fit list. This displays the coefficients in the Basic
Fitting dialog box, but does not plot the fit in the figure window.

Note If you also want to display a fit on the plot, you must select the corresponding Plot
fits check box.

Save the fit data to the MATLAB workspace by clicking the Save to workspace button on
the Numerical results panel. The Save Fit to Workspace dialog box opens.

2-23
2 Regression Analysis

With all check boxes selected, click OK to save the fit parameters as a MATLAB structure:
fit
fit =
type: 'polynomial degree 3'
coeff: [0.9210 25.1834 73.8598 61.7444]

Now, you can use the fit results in MATLAB programming, outside of the Basic Fitting UI.

Derive R2, the Coefficient of Determination

You can get an indication of how well a polynomial regression predicts your observed data
by computing the coefficient of determination, or R-square (written as R2). The R2
statistic, which ranges from 0 to 1, measures how useful the independent variable is in
predicting values of the dependent variable:

• An R2 value near 0 indicates that the fit is not much better than the model y =
constant.
• An R2 value near 1 indicates that the independent variable explains most of the
variability in the dependent variable.

To compute R2, first compute a fit, and then obtain residuals from it. A residual is the
signed difference between an observed dependent value and the value your fit predicts
for it.

residuals = yobserved - yfitted

The Basic Fitting tool can generate residuals for any fit it calculates. To view a graph of
residuals, select the Plot residuals check box. You can view residuals as a bar, line or
scatter plot.

After you have residual values, you can save them to the workspace, where you can
compute R2. Complete the preceding part of this example to fit a cubic polynomial to the
census data, and then perform these steps:
Compute Residual Data and R2 for a Cubic Fit

1 Click the arrow button at the lower right to open the Numerical results tab if it
is not already visible.
2 From the Fit drop-down menu, select cubic if it does not already show.
3 Save the fit coefficients, norm of residuals, and residuals by clicking Save to
Workspace.

2-24
Interactive Fitting

The Save Fit to Workspace dialog box opens with three check boxes and three text
fields.
4 Select all three check boxes to save the fit coefficients, norm of residuals, and
residual values.
5 Identify the saved variables as belonging to a cubic fit. Change the variable names by
adding a 3 to each default name (for example, fit3, normresid3, and resids3).
The dialog box should look like this figure.

6 Click OK. Basic Fitting saves residuals as a column vector of numbers, fit coefficients
as a struct, and the norm of residuals as a scalar.

Notice that the value that Basic Fitting computes for norm of residuals is 12.2380.
This number is the square root of the sum of squared residuals of the cubic fit.
7 Optionally, you can verify the norm-of-residuals value that the Basic Fitting tool
provided. Compute the norm-of-residuals yourself from the resids3 array that you
just saved:
mynormresid3 = sum(resids3.^2)^(1/2)

mynormresid3 =
12.2380
8 Compute the total sum of squares of the dependent variable, pop to compute R2.
Total sum of squares is the sum of the squared differences of each value from the
mean of the variable. For example, use this code:
SSpop = (length(pop)-1) * var(pop)

SSpop =
1.2356e+005

2-25
2 Regression Analysis

var(pop) computes the variance of the population vector. You multiply it by the
number of observations after subtracting 1 to account for degrees of freedom. Both
the total sum of squares and the norm of residuals are positive scalars.
9 Now, compute R2, using the square of normresid3 and SSpop:

rsqcubic = 1 - normresid3^2 / SSpop

rsqcubic =
0.9988
10 Finally, compute R2 for a linear fit and compare it with the cubic R2 value that you
just derived. The Basic Fitting UI also provides you with the linear fit results. To
obtain the linear results, repeat steps 2-6, modifying your actions as follows:

• To calculate least-squares linear regression coefficients and statistics, in the Fit

drop-down on the Numerical results pane, select linear instead of cubic.
• In the Save to Workspace dialog, append 1 to each variable name to identify it as
deriving from a linear fit, and click OK. The variables fit1, normresid1, and
resids1 now exist in the workspace.
• Use the variable normresid1 (98.778) to compute R2 for the linear fit, as you
did in step 9 for the cubic fit:

rsqlinear = 1 - normresid1^2 / SSpop

rsqlinear =
0.9210

This result indicates that a linear least-squares fit of the population data explains
92.1% of its variance. As the cubic fit of this data explains 99.9% of that variance, the
latter seems to be a better predictor. However, because a cubic fit predicts using
three variables (x, x2, and x3), a basic R2 value does not fully reflect how robust the fit
is. A more appropriate measure for evaluating the goodness of multivariate fits is
adjusted R2. For information about computing and using adjusted R2, see “Residuals
and Goodness of Fit” on page 2-11.

Caution R2 measures how well your polynomial equation predicts the dependent
variable, not how appropriate the polynomial model is for your data. When you analyze
inherently unpredictable data, a small value of R2 indicates that the independent variable
does not predict the dependent variable precisely. However, it does not necessarily mean
that there is something wrong with the fit.

2-26
Interactive Fitting

Compute Residual Data and R2 for a Linear Fit

In this next example, use the Basic Fitting UI to perform a linear fit, save the results to
the workspace, and compute R2 for the linear fit. You can then compare linear R2 with the
cubic R2 value that you derive in the example “Compute Residual Data and R2 for a Cubic
Fit” on page 2-24.
1 Click the arrow button at the lower right to open the Numerical results tab if it
is not already visible.
2 Select the linear check box in the Plot fits area.
3 From the Fit drop-down menu, select linear if it does not already show. The
Coefficients and norm of residuals area displays statistics for the linear fit.
4 Save the fit coefficients, norm of residuals, and residuals by clicking Save to
Workspace.

The Save Fit to Workspace dialog box opens with three check boxes and three text
fields.
5 Select all three check boxes to save the fit coefficients, norm of residuals, and
residual values.
6 Identify the saved variables as belonging to a linear fit. Change the variable names by
adding a 1 to each default name (for example, fit1, normresid1, and resids1).
7 Click OK. Basic Fitting saves residuals as a column vector of numbers, fit coefficients
as a struct, and the norm of residuals as a scalar.

Notice that the value that Basic Fitting computes for norm of residuals is 98.778.
This number is the square root of the sum of squared residuals of the linear fit.
8 Optionally, you can verify the norm-of-residuals value that the Basic Fitting tool
provided. Compute the norm-of-residuals yourself from the resids1 array that you
just saved:
mynormresid1 = sum(resids1.^2)^(1/2)

mynormresid1 =
98.7783
9 Compute the total sum of squares of the dependent variable, pop to compute R2.
Total sum of squares is the sum of the squared differences of each value from the
mean of the variable. For example, use this code:
SSpop = (length(pop)-1) * var(pop)

2-27
2 Regression Analysis

SSpop =
1.2356e+005

var(pop) computes the variance of the population vector. You multiply it by the
number of observations after subtracting 1 to account for degrees of freedom. Both
the total sum of squares and the norm of residuals are positive scalars.
10 Now, compute R2, using the square of normresid1 and SSpop:

rsqlinear = 1 - normresid1^2 / SSpop

rsqcubic =
0.9210

This result indicates that a linear least-squares fit of the population data explains
92.1% of its variance. As the cubic fit of this data explains 99.9% of that variance, the
latter seems to be a better predictor. However, a cubic fit has four coefficients (x, x2,
x3, and a constant), while a linear fit has two coefficients (x and a constant). A simple
R2 statistic does not account for the different degrees of freedom. A more appropriate
measure for evaluating polynomial fits is adjusted R2. For information about
computing and using adjusted R2, see “Residuals and Goodness of Fit” on page 2-11.

Interpolate and Extrapolate Population Values

Suppose you want to use the cubic model to interpolate the U.S. population in 1965 (a
date not provided in the original data).

1
In the Basic Fitting dialog box, click the button to specify a vector of x values at
which to evaluate the current fit.
2 In the Enter value(s)... field, type the following value:

1965

2-28
Interactive Fitting

Note Use unscaled and uncentered x values. You do not need to center and scale
first, even though you selected to scale x values to obtain the coefficients in “Predict
the Census Data with a Cubic Polynomial Fit” on page 2-19. The Basic Fitting tool
makes the necessary adjustments behind the scenes.
3 Click Evaluate.

The x values and the corresponding values for f(x) computed from the fit and
displayed in a table, as shown below:

4 Select the Plot evaluated results check box to display the interpolated value as a
diamond marker:

2-29
2 Regression Analysis

5 Save the interpolated population in 1965 to the MATLAB workspace by clicking Save
to workspace.

This opens the following dialog box, where you specify the variable names:

2-30
Interactive Fitting

6 Click OK, but keep the Figure window open if you intend to follow the steps in the
next section, “Generate a Code File to Reproduce the Result” on page 2-31.

Generate a Code File to Reproduce the Result

After completing a Basic Fitting session, you can generate MATLAB code that recomputes
fits and reproduces plots with new data.

1 In the Figure window, select File > Generate Code.

This creates a function and displays it in the MATLAB Editor. The code shows you
how to programmatically reproduce what you did interactively with the Basic Fitting
dialog box.
2 Change the name of the function on the first line from createfigure to something
more specific, like censusplot. Save the code file to your current folder with the file
name censusplot.m The function begins with:

function censusplot(X1, Y1, valuesToEvaluate1)

3 Generate some new, randomly perturbed census data:

randpop = pop + 10*randn(size(pop));

4 Reproduce the plot with the new data and recompute the fit:

censusplot(cdate,randpop,1965)

You need three input arguments: x,y values (data 1) plotted in the original graph,
plus an x-value for a marker.

The following figure displays the plot that the generated code produces. The new plot
matches the appearance of the figure from which you generated code except for the y
data values, the equation for the cubic fit, and the residual values in the bar graph, as
expected.

2-31
2 Regression Analysis

Learn How the Basic Fitting Tool Computes Fits

The Basic Fitting tool calls the polyfit function to compute polynomial fits. It calls the
polyval function to evaluate the fits. polyfit analyzes its inputs to determine if the
data is well conditioned for the requested degree of fit.

When it finds badly conditioned data, polyfit computes a regression as well as it can,
but it also returns a warning that the fit could be improved. The Basic Fitting example
section “Predict the Census Data with a Cubic Polynomial Fit” on page 2-19 displays this
warning.

One way to improve model reliability is to add data points. However, adding observations
to a data set is not always feasible. An alternative strategy is to transform the predictor

2-32
Interactive Fitting

variable to normalize its center and scale. (In the example, the predictor is the vector of
census dates.)

The polyfit function normalizes by computing z-scores:

x- m
z=
s

where x is the predictor data, μ is the mean of x, and σ is the standard deviation of x. The
z-scores give the data a mean of 0 and a standard deviation of 1. In the Basic Fitting UI,
you transform the predictor data to z-scores by selecting the Center and scale x data
check box.

After centering and scaling, model coefficients are computed for the y data as a function
of z. These are different (and more robust) than the coefficients computed for y as a
function of x. The form of the model and the norm of the residuals do not change. The
Basic Fitting UI automatically rescales the z-scores so that the fit plots on the same scale
as the original x data.

To understand the way in which the centered and scaled data is used as an intermediary
to create the final plot, run the following code in the Command Window:

close
load census
x = cdate;
y = pop;
z = (x-mean(x))/std(x); % Compute z-scores of x data

plot(x,y,'ro') % Plot data as red markers

hold on % Prepare axes to accept new graph on top

zfit = linspace(z(1),z(end),100);
pz = polyfit(z,y,3); % Compute conditioned fit
yfit = polyval(pz,zfit);

xfit = linspace(x(1),x(end),100);
plot(xfit,yfit,'b-') % Plot conditioned fit vs. x data

The centered and scaled cubic polynomial plots as a blue line, as shown here:

2-33
2 Regression Analysis

In the code, computation of z illustrates how to normalize data. The polyfit function
performs the transformation itself if you provide three return arguments when calling it:

[p,S,mu] = polyfit(x,y,n)

The returned regression parameters, p, now are based on normalized x. The returned
vector, mu, contains the mean and standard deviation of x. For more information, see the
polyfit reference page.

2-34
Programmatic Fitting

Programmatic Fitting
In this section...
“MATLAB Functions for Polynomial Models” on page 2-35
“Linear Model with Nonpolynomial Terms” on page 2-41
“Multiple Regression” on page 2-42
“Programmatic Fitting” on page 2-44

MATLAB Functions for Polynomial Models

Two MATLAB functions can model your data with a polynomial.

Polynomial Fit Functions

Function Description
polyfit polyfit(x,y,n) finds the coefficients of a polynomial p(x) of
degree n that fits the y data by minimizing the sum of the
squares of the deviations of the data from the model (least-
squares fit).
polyval polyval(p,x) returns the value of a polynomial of degree n
that was determined by polyfit, evaluated at x.

This example shows how to model data with a polynomial.

Measure a quantity y at several values of time t.

t = [0 0.3 0.8 1.1 1.6 2.3];

y = [0.6 0.67 1.01 1.35 1.47 1.25];
plot(t,y,'o')
title('Plot of y Versus t')

2-35
2 Regression Analysis

You can try modeling this data using a second-degree polynomial function,

The unknown coefficients, , , and , are computed by minimizing the sum of the
squares of the deviations of the data from the model (least-squares fit).

Use polyfit to find the polynomial coefficients.

p = polyfit(t,y,2)

p = 1×3

2-36
Programmatic Fitting

-0.2942 1.0231 0.4981

MATLAB calculates the polynomial coefficients in descending powers.

The second-degree polynomial model of the data is given by the equation

Evaluate the polynomial at uniformly spaced times, t2. Then, plot the original data and
the model on the same plot.

t2 = 0:0.1:2.8;
y2 = polyval(p,t2);
figure
plot(t,y,'o',t2,y2)
title('Plot of Data (Points) and Model (Line)')

2-37
2 Regression Analysis

Evaluate model at the data time vector

y2 = polyval(p,t);

Calculate the residuals.

res = y - y2;

Plot the residuals.

figure, plot(t,res,'+')
title('Plot of the Residuals')

2-38
Programmatic Fitting

Notice that the second-degree fit roughly follows the basic shape of the data, but does not
capture the smooth curve on which the data seems to lie. There appears to be a pattern in
the residuals, which indicates that a different model might be necessary. A fifth-degree
polynomial (shown next) does a better job of following the fluctuations in the data.

Repeat the exercise, this time using a fifth-degree polynomial from polyfit.

p5 = polyfit(t,y,5)

p5 = 1×6

0.7303 -3.5892 5.4281 -2.5175 0.5910 0.6000

2-39
2 Regression Analysis

Evaluate the polynomial at t2 and plot the fit on top of the data in a new figure window.

y3 = polyval(p5,t2);
figure
plot(t,y,'o',t2,y3)
title('Fifth-Degree Polynomial Fit')

Note If you are trying to model a physical situation, it is always important to consider
whether a model of a specific order is meaningful in your situation.

2-40
Programmatic Fitting

Linear Model with Nonpolynomial Terms

This example shows how to fit data with a linear model containing nonpolynomial terms.

When a polynomial function does not produce a satisfactory model of your data, you can
try using a linear model with nonpolynomial terms. For example, consider the following
function that is linear in the parameters , , and , but nonlinear in the data:

You can compute the unknown coefficients , , and by constructing and solving a
set of simultaneous equations and solving for the parameters. The following syntax
accomplishes this by forming a design matrix, where each column represents a variable
used to predict the response (a term in the model) and each row corresponds to one
observation of those variables.

Enter t and y as column vectors.

t = [0 0.3 0.8 1.1 1.6 2.3]';
y = [0.6 0.67 1.01 1.35 1.47 1.25]';

Form the design matrix.

X = [ones(size(t)) exp(-t) t.*exp(-t)];

Calculate model coefficients.

a = X\y

a = 3×1

1.3983
-0.8860
0.3085

Therefore, the model of the data is given by

Now evaluate the model at regularly spaced points and plot the model with the original
data.

2-41
2 Regression Analysis

T = (0:0.1:2.5)';
Y = [ones(size(T)) exp(-T) T.*exp(-T)]*a;
plot(T,Y,'-',t,y,'o'), grid on
title('Plot of Model and Original Data')

Multiple Regression
This example shows how to use multiple regression to model data that is a function of
more than one predictor variable.

2-42
Programmatic Fitting

When y is a function of more than one predictor variable, the matrix equations that
express the relationships among the variables must be expanded to accommodate the
additional data. This is called multiple regression.

Measure a quantity for several values of and . Store these values in vectors x1,
x2, and y, respectively.
x1 = [.2 .5 .6 .8 1.0 1.1]';
x2 = [.1 .3 .4 .9 1.1 1.4]';
y = [.17 .26 .28 .23 .27 .24]';

A model of this data is of the form

Multiple regression solves for unknown coefficients , , and by minimizing the sum
of the squares of the deviations of the data from the model (least-squares fit).

Construct and solve the set of simultaneous equations by forming a design matrix, X.
X = [ones(size(x1)) x1 x2];

Solve for the parameters by using the backslash operator.

a = X\y

a = 3×1

0.1018
0.4844
-0.2847

The least-squares fit model of the data is

To validate the model, find the maximum of the absolute value of the deviation of the data
from the model.
Y = X*a;
MaxErr = max(abs(Y - y))

2-43
2 Regression Analysis

MaxErr = 0.0038

This value is much smaller than any of the data values, indicating that this model
accurately follows the data.

Programmatic Fitting
This example shows how to use MATLAB functions to:

• “Calculate Correlation Coefficients” on page 2-45

• “Fit a Polynomial to the Data” on page 2-46
• “Plot and Calculate Confidence Bounds” on page 2-48

Load sample census data from census.mat, which contains U.S. population data from
the years 1790 to 1990.

load census

This adds the following two variables to the MATLAB workspace.

• cdate is a column vector containing the years 1790 to 1990 in increments of 10.
• pop is a column vector with the U.S. population numbers corresponding to each year
in cdate.

Plot the data.

plot(cdate,pop,'ro')
title('U.S. Population from 1790 to 1990')

2-44
Programmatic Fitting

The plot shows a strong pattern, which indicates a high correlation between the variables.

Calculate Correlation Coefficients

In this portion of the example, you determine the statistical correlation between the
variables cdate and pop to justify modeling the data. For more information about
correlation coefficients, see “Linear Correlation” on page 2-2.

Calculate the correlation-coefficient matrix.

corrcoef(cdate,pop)

ans = 2×2

2-45
2 Regression Analysis

1.0000 0.9597
0.9597 1.0000

The diagonal matrix elements represent the perfect correlation of each variable with itself
and are equal to 1. The off-diagonal elements are very close to 1, indicating that there is a
strong statistical correlation between the variables cdate and pop.

Fit a Polynomial to the Data

This portion of the example applies the polyfit and polyval MATLAB functions to
model the data.

Calculate fit parameters.

[p,ErrorEst] = polyfit(cdate,pop,2);

Evaluate the fit.

pop_fit = polyval(p,cdate,ErrorEst);

Plot the data and the fit.

plot(cdate,pop_fit,'-',cdate,pop,'+');
title('U.S. Population from 1790 to 1990')
legend('Polynomial Model','Data','Location','NorthWest');
xlabel('Census Year');
ylabel('Population (millions)');

2-46
Programmatic Fitting

The plot shows that the quadratic-polynomial fit provides a good approximation to the
data.

Calculate the residuals for this fit.

res = pop - pop_fit;

figure, plot(cdate,res,'+')
title('Residuals for the Quadratic Polynomial Model')

2-47
2 Regression Analysis

Notice that the plot of the residuals exhibits a pattern, which indicates that a second-
degree polynomial might not be appropriate for modeling this data.

Plot and Calculate Confidence Bounds

Confidence bounds are confidence intervals for a predicted response. The width of the
interval indicates the degree of certainty of the fit.

This portion of the example applies polyfit and polyval to the census sample data to
produce confidence bounds for a second-order polynomial model.

The following code uses an interval of ±2D , which corresponds to a 95% confidence
interval for large samples.

2-48
Programmatic Fitting

Evaluate the fit and the prediction error estimate (delta).

[pop_fit,delta] = polyval(p,cdate,ErrorEst);

Plot the data, the fit, and the confidence bounds.

plot(cdate,pop,'+',...
cdate,pop_fit,'g-',...
cdate,pop_fit+2*delta,'r:',...
cdate,pop_fit-2*delta,'r:');
xlabel('Census Year');
ylabel('Population (millions)');
title('Quadratic Polynomial Fit with Confidence Bounds')
grid on

2-49
2 Regression Analysis

The 95% interval indicates that you have a 95% chance that a new observation will fall
within the bounds.

2-50
3

Time Series Analysis

• “What Are Time Series?” on page 3-2

• “Time Series Objects” on page 3-3
3 Time Series Analysis

What Are Time Series?

Time series are data vectors sampled over time, in order, often at regular intervals. They
are distinguished from randomly sampled data, which form the basis of many other data
analyses. Time series represent the time-evolution of a dynamic population or process.
The linear ordering of time series gives them a distinctive place in data analysis, with a
specialized set of techniques.

Time series analysis is concerned with:

• Identifying patterns
• Modeling patterns
• Forecasting values

Several dedicated MATLAB functions perform time series analysis. This section
introduces objects and interactive tools for time series analysis.

3-2
Time Series Objects

Time Series Objects

In this section...
“Types of Time Series and Their Uses” on page 3-3
“Time Series Data Sample” on page 3-3
“Example: Time Series Objects and Methods” on page 3-5
“Time Series Constructor” on page 3-17
“Time Series Collection Constructor” on page 3-17

Types of Time Series and Their Uses

MATLAB time series objects are of two types:

• timeseries — Stores data and time values, as well as the metadata information that
includes units, events, data quality, and interpolation method
• tscollection — Stores a collection of timeseries objects that share a common
time vector, convenient for performing operations on synchronized time series with
different units

This section discusses the following topics:

• Using time series constructors to instantiate time series classes

• Modifying object properties using set methods or dot notation
• Calling time series functions and methods

To get a quick overview of programming with timeseries and tscollection objects,

follow the steps in “Example: Time Series Objects and Methods” on page 3-5.

Time Series Data Sample

To properly understand the description of timeseries object properties and methods in
this documentation, it is important to clarify some terms related to storing data in a
timeseries object—the difference between a data value and a data sample.

A data value is a single, scalar value recorded at a specific time. A data sample consists of
one or more values associated with a specific time in the timeseries object. The
number of data samples in a time series is the same as the length of the time vector.

3-3
3 Time Series Analysis

For example, consider data that consists of three sensor signals: two signals represent the
position of an object in meters, and the third represents its velocity in meters/second.

To enter the data matrix, type the following at the MATLAB prompt:

x = [-0.2 -0.3 13;

-0.1 -0.4 15;
NaN 2.8 17;
0.5 0.3 NaN;
-0.3 -0.1 15]

The NaN value represents a missing data value. MATLAB displays the following 5-by-3
matrix:

x=
-0.2000 -0.3000 13.0000
-0.1000 -0.4000 15.0000
NaN 2.8000 17.0000
0.5000 0.3000 NaN
-0.3000 -0.1000 15.0000

The first two columns of x contain quantities with the same units and you can create a
multivariate timeseries object to store these two time series. For more information
about creating timeseries objects, see “Time Series Constructor” on page 3-17. The
following command creates a timeseries object ts_pos to store the position values:

ts_pos = timeseries(x(:,1:2), 1:5, 'name', 'Position')

MATLAB responds by displaying the following properties of ts_pos:

timeseries

Common Properties:
Name: 'Position'
Time: [5x1 double]
TimeInfo: [1x1 tsdata.timemetadata]
Data: [5x2 double]
DataInfo: [1x1 tsdata.datametadata]

More properties, Methods

The Length of the time vector, which is 5 in this example, equals the number of data
samples in the timeseries object. Find the size of the data sample in ts_pos by typing
the following at the MATLAB prompt:

3-4
Time Series Objects

getdatasamplesize(ts_pos)

ans =

1 2

Similarly, you can create a second timeseries object to store the velocity data:

ts_vel = timeseries(x(:,3), 1:5, 'name', 'Velocity');

Find the size of each data sample in ts_vel by typing the following:

getdatasamplesize(ts_vel)

ans =

1 1

Notice that ts_vel has one data value in each data sample and ts_pos has two data
values in each data sample.

Note In general, when the time series data is an M-by-N-by-P-by-... multidimensional

array with M samples, the size of each data sample is N-by-P-by-... .

If you want to perform operations on the ts_pos and ts_vel timeseries objects while
keeping them synchronized, group them in a time series collection. For more information,
see “Time Series Collection Constructor Syntax” on page 3-18.

Example: Time Series Objects and Methods

• “Creating Time Series Objects” on page 3-6
• “Viewing Time Series Objects” on page 3-7
• “Modifying Time Series Units and Interpolation Method” on page 3-9
• “Defining Events” on page 3-9
• “Creating Time Series Collection Objects” on page 3-10
• “Resampling a Time Series Collection Object” on page 3-11
• “Adding a Data Sample to a Time Series Collection Object” on page 3-12
• “Removing and Interpolating Missing Data” on page 3-13

3-5
3 Time Series Analysis

• “Removing a Time Series from a Time Series Collection” on page 3-15

• “Displaying Time Vector Values as Date Strings” on page 3-15
• “Plotting Time Series Collection Members” on page 3-15

Creating Time Series Objects

This portion of the example illustrates how to create several timeseries objects from an
array. For more information about the timeseries object, see “Time Series Constructor”
on page 3-17.

Import the sample data from count.dat to the MATLAB workspace.

load count.dat

This adds the 24-by-3 matrix, count, to the workspace. Each column of count represents
hourly vehicle counts at each of three town intersections.

View the count matrix.

count

Create three timeseries objects to store the data collected at each intersection.
count1 = timeseries(count(:,1), 1:24,'name', 'intersection1');
count2 = timeseries(count(:,2), 1:24,'name', 'intersection2');
count3 = timeseries(count(:,3), 1:24,'name', 'intersection3');

Note In the above construction, timeseries objects have both a variable name (e.g.,
count1) and an internal object name (e.g., intersection1). The variable name is used
with MATLAB functions. The object name is a property of the object, accessed with object
methods. For more information on timeseries object properties and methods, see “Time
Series Properties” on page 3-17 and “Time Series Methods” on page 3-17.

By default, a time series has a time vector having units of seconds and a start time of 0
sec. The example constructs the count1, count2, and count3 time series objects with
start times of 1 sec, end times of 24 sec, and 1-sec increments. You will change the time
units to hours in “Modifying Time Series Units and Interpolation Method” on page 3-9.

Note If you want to create a timeseries object that groups the three data columns in
count, use the following syntax:

3-6
Time Series Objects

count_ts = timeseries(count, 1:24,'name','traffic_counts')

This is useful when all time series have the same units and you want to keep them
synchronized during calculations.

Viewing Time Series Objects

After creating a timeseries object, as described in “Creating Time Series Objects” on

page 3-6, you can view it in the Variables editor.

To view a timeseries object like count1 in the Variables editor, use either of the
following methods:

• Type open('count1') at the command prompt.

• On the Home tab, in the Variable section, click Open Variable and select count1.
This method is not available in MATLAB Online.

3-7
3 Time Series Analysis

3-8
Time Series Objects

Modifying Time Series Units and Interpolation Method

After creating a timeseries object, as described in “Creating Time Series Objects” on

page 3-6, you can modify its units and interpolation method using dot notation.

View the current properties of count1.

get(count1)

MATLAB displays the current property values of the count1 timeseries object.

View the current DataInfo properties using dot notation.

count1.DataInfo

Change the data units for count1 to 'cars'.

count1.DataInfo.Units = 'cars';

Set the interpolation method for count1 to zero-order hold.

count1.DataInfo.Interpolation = tsdata.interpolation('zoh');

Verify that the DataInfo properties have been modified.

count1.DataInfo

Modify the time units to be 'hours' for the three time series.
count1.TimeInfo.Units = 'hours';
count2.TimeInfo.Units = 'hours';
count3.TimeInfo.Units = 'hours';

Defining Events

This portion of the example illustrates how to define events for a timeseries object by
using the tsdata.event auxiliary object. Events mark the data at specific times. When
you plot the data, event markers are displayed on the plot. Events also provide a
convenient way to synchronize multiple time series.

Add two events to the data that mark the times of the AM commute and PM commute.

Construct and add the first event to all time series. The first event occurs at 8 AM.
e1 = tsdata.event('AMCommute',8);
e1.Units = 'hours'; % Specify the units for time

3-9
3 Time Series Analysis

count1 = addevent(count1,e1); % Add the event to count1

count2 = addevent(count2,e1); % Add the event to count2
count3 = addevent(count3,e1); % Add the event to count3

Construct and add the second event to all time series. The second event occurs at 6 PM.

e2 = tsdata.event('PMCommute',18);
e2.Units = 'hours'; % Specify the units for time
count1 = addevent(count1,e2); % Add the event to count1
count2 = addevent(count2,e2); % Add the event to count2
count3 = addevent(count3,e2); % Add the event to count3

Plot the time series, count1.

figure
plot(count1)

When you plot any of the time series, the plot method defined for time series objects
displays events as markers. By default markers are red filled circles.

The plot reflects that count1 uses zero-order-hold interpolation.

Plot count2.

plot(count2)

If you plot time series count2, it replaces the count1 display. You see its events and that
it uses linear interpolation.

Overlay time series plots by setting hold on.

hold on
plot(count3)

Creating Time Series Collection Objects

This portion of the example illustrates how to create a tscollection object. Each
individual time series in a collection is called a member. For more information about the
tscollection object, see “Time Series Collection Constructor” on page 3-17.

Note Typically, you use the tscollection object to group synchronized time series that
have different units. In this simple example, all time series have the same units and the
tscollection object does not provide an advantage over grouping the three time series

3-10
Time Series Objects

in a single timeseries object. For an example of how to group several time series in one
timeseries object, see “Creating Time Series Objects” on page 3-6.

Create a tscollection object named count_coll and use the constructor syntax to
immediately add two of the three time series currently in the MATLAB workspace (you
will add the third time series later).

tsc = tscollection({count1 count2},'name', 'count_coll')

Note The time vectors of the timeseries objects you are adding to the tscollection
must match.

Notice that the Name property of the timeseries objects is used to name the collection
members as intersection1 and intersection2.

Add the third timeseries object in the workspace to the tscollection.

tsc = addts(tsc, count3)

All three members in the collection are listed.

Resampling a Time Series Collection Object

This portion of the example illustrates how to resample each member in a tscollection
using a new time vector. The resampling operation is used to either select existing data at
specific time values, or to interpolate data at finer intervals. If the new time vector
contains time values that did not exist in the previous time vector, the new data values are
calculated using the default interpolation method you associated with the time series.

Resample the time series to include data values every 2 hours instead of every hour and
save it as a new tscollection object.

tsc1 = resample(tsc,1:2:24)

In some cases you might need a finer sampling of information than you currently have and
it is reasonable to obtain it by interpolating data values.

Interpolate values at each half-hour mark.

tsc1 = resample(tsc,1:0.5:24)

3-11
3 Time Series Analysis

To add values at each half-hour mark, the default interpolation method of a time series is
used. For example, the new data points in intersection1 are calculated by using the
zero-order hold interpolation method, which holds the value of the previous sample
constant. You set the interpolation method for intersection1 as described in
“Modifying Time Series Units and Interpolation Method” on page 3-9.

The new data points in intersection2 and intersection3 are calculated using linear
interpolation, which is the default method.

Plot the members of tsc1 with markers to see the results of interpolating.

hold off % Allow axes to clear before plotting

plot(tsc1.intersection1,'-xb','Displayname','Intersection 1')

You can see that data points have been interpolated at half-hour intervals, and that
Intersection 1 uses zero-order-hold interpolation, while the other two members use linear
interpolation.

Maintain the graph in the figure while you add the other two members to the plot.
Because the plot method suppresses the axis labels while hold is on, also add a legend
to describe the three series.

hold on
plot(tsc1.intersection2,'-.xm','Displayname','Intersection 2')
plot(tsc1.intersection3,':xr','Displayname','Intersection 3')
legend('show','Location','NorthWest')

Adding a Data Sample to a Time Series Collection Object

This portion of the example illustrates how to add a data sample to a tscollection.

Add a data sample to the intersection1 collection member at 3.25 hours (i.e., 15
minutes after the hour).

tsc1 = addsampletocollection(tsc1,'time',3.25,...
'intersection1',5);

There are three members in the tsc1 collection, and adding a data sample to one
member adds a data sample to the other two members at 3.25 hours. However, because
you did not specify the data values for intersection2 and intersection3 in the new
sample, the missing values are represented by NaNs for these members. To learn how to
remove or interpolate missing data values, see “Removing Missing Data” on page 3-13
and “Interpolating Missing Data” on page 3-14.

3-12
Time Series Objects

tsc1 Data from 2.0 to 3.5 Hours

Hours Intersection 1 Intersection 2 Intersection 3

2.0 7 13 11
2.5 7 15 15.5
3.0 14 17 20
3.25 5 NaN NaN
3.5 14 15 14.5

To view all intersection1 data (including the new sample at 3.25 hours), type

tsc1.intersection1

Similarly, to view all intersection2 data (including the new sample at 3.25 hours
containing a NaN value), type

tsc1.intersection2

Removing and Interpolating Missing Data

Time series objects use NaNs to represent missing data. This portion of the example
illustrates how to either remove missing data or interpolate values for it by using the
interpolation method you specified for that time series. In “Adding a Data Sample to a
Time Series Collection Object” on page 3-12, you added a new data sample to the tsc1
collection at 3.25 hours.

As the tsc1 collection has three members, adding a data sample to one member added a
data sample to the other two members at 3.25 hours. However, because you did not
specify the data values for the intersection2 and intersection3 members at 3.25
hours, they currently contain missing values, represented by NaNs.
Removing Missing Data

Find and remove the data samples containing NaN values in the tsc1 collection.

tsc1 = delsamplefromcollection(tsc1,'index',...
find(isnan(tsc1.intersection2.Data)));

This command searches one tscollection member at a time—in this case,

intersection2. When a missing value is located in intersection2, the data at that
time is removed from all members of the tscollection.

3-13
3 Time Series Analysis

Note Use dot-notation syntax to access the Data property of the intersection2
member in the tsc1 collection:

tsc1.intersection2.Data

For a complete list of timeseries properties, see “Time Series Properties” on page 3-
17.

Interpolating Missing Data

For the sake of this example, reintroduce NaN values in intersection2 and
intersection3.

tsc1 = addsampletocollection(tsc1,'time',3.25,...
'intersection1',5);

Interpolate the missing values in tsc1 using the current time vector (tsc1.Time).

tsc1 = resample(tsc1,tsc1.Time);

This replaces the NaN values in intersection2 and intersection3 by using linear
interpolation—the default interpolation method for these time series.

Note Dot notation tsc1.Time is used to access the Time property of the tsc1
collection. For a complete list of tscollection properties, see “Time Series Collection
Properties” on page 3-19.

To view intersection2 data after interpolation, for example, type

tsc1.intersection2

3-14
Time Series Objects

New tsc1 Data from 2.0 to 3.5 Hours

Hours Intersection 1 Intersection 2 Intersection 3
2.0 7 13 11
2.5 7 15 15.5
3.0 14 17 20
3.25 5 16 17.3
3.5 14 15 14.5

Removing a Time Series from a Time Series Collection

Remove the intersection3 time series from the tscollection object tsc1.
tsc1 = removets(tsc1,'intersection3')

Two time series as members in the collection are now listed.

Displaying Time Vector Values as Date Strings

This portion of the example illustrates how to control the format in which numerical time
vector display, using MATLAB date strings. For a complete list of the MATLAB date-string
formats supported for timeseries and tscollection objects, see the definition of time
vector definition in the timeseries reference page.

To use date strings, you must set the StartDate field of the TimeInfo property. All
values in the time vector are converted to date strings using StartDate as a reference
date.

Suppose the reference date occurs on December 25, 2009.

tsc1.TimeInfo.Units = 'hours';
tsc1.TimeInfo.StartDate = '25-DEC-2009 00:00:00';

Similarly to what you did with the count1, count2, and count3 time series objects, set
the data units to of the tsc1 members to the string 'car count'.
tsc1.intersection1.DataInfo.Units = 'car count';
tsc1.intersection2.DataInfo.Units = 'car count';

Plotting Time Series Collection Members

To plot data in a time series collection, you plot its members one at a time.

3-15
3 Time Series Analysis

First graph tsc1 member intersection1.

hold off
plot(tsc1.intersection1);

When you plot a member of a time series collection, its time units display on the x-axis
and its data units display on the y-axis. The plot title is displayed as 'Time Series
Plot:<member name>'.

If you use the same figure to plot a different member of the collection, no annotations
display. The time series plot method does not attempt to update labels and titles when
hold is on because the descriptors for the series can be different.

Plot intersection1 and intersection2 in the same figure. Prevent overwriting the
plot, but remove axis labels and title. Add a legend and set the DisplayName property of
the line series to label each member.

plot(tsc1.intersection1,'-xb','Displayname','Intersection 1')
hold on
plot(tsc1.intersection2,'-.xm','Displayname','Intersection 2')
legend('show','Location','NorthWest')

The plot now includes the two time series in the collection: intersection1 and
intesection2. Plotting the second graph erased the labels on the first graph.

Finally, change the date strings on the x-axis to hours and plot the two time series
collection members again with a legend.

Specify time units to be 'hours' for the collection.

tsc1.TimeInfo.Units = 'hours';

Specify the format for displaying time.

tsc1.TimeInfo.Format = 'HH:MM';

Recreate the last plot with new time units.

hold off
plot(tsc1.intersection1,'-xb','Displayname','Intersection 1')

% Prevent overwriting plot, but remove axis labels and title.

hold on
plot(tsc1.intersection2,'-.xm','Displayname','Intersection 2')

3-16
Time Series Objects

legend('show','Location','NorthWest')

% Restore the labels with the |xlabel| and |ylabel| commands and overlay a
% data grid.
xlabel('Time (hours)')
ylabel('car count')
grid on

For more information on plotting options for time series, see timeseries.

Time Series Constructor

Before implementing the various MATLAB functions and methods specifically designed to
handle time series data, you must create a timeseries object to store the data. See
timeseries for the timeseries object constructor syntax.

For an example of using the constructor, see “Creating Time Series Objects” on page 3-6.

Time Series Properties

See timeseries for a description of all the timeseries object properties. You can
specify the Data, IsTimeFirst, Name, Quality, and Time properties as input
arguments in the constructor. To assign other properties, use the set function or dot
notation.

Note To get property information from the command line, type help timeseries/
tsprops at the MATLAB prompt.

For an example of editing timeseries object properties, see “Modifying Time Series
Units and Interpolation Method” on page 3-9.

Time Series Methods

For a description of all the time series methods, see timeseries.

Time Series Collection Constructor

• “Introduction” on page 3-18
• “Time Series Collection Constructor Syntax” on page 3-18

3-17
3 Time Series Analysis

• “Time Series Collection Properties” on page 3-19

• “Time Series Collection Methods” on page 3-20

Introduction

The MATLAB object, called tscollection, is a MATLAB variable that groups several
time series with a common time vector. The timeseries objects that you include in the
tscollection object are called members of this collection, and possess several methods
for convenient analysis and manipulation of timeseries.

Time Series Collection Constructor Syntax

Before you implement the MATLAB methods specifically designed to operate on a

collection of timeseries objects, you must create a tscollection object to store the
data.

The following table summarizes the syntax for using the tscollection constructor. For
an example of using this constructor, see “Creating Time Series Collection Objects” on
page 3-10.

3-18
Time Series Objects

Time Series Collection Syntax Descriptions

Syntax Description
tsc = tscollection(ts) Creates a tscollection object tsc that
includes one or more timeseries objects.

The ts argument can be one of the following:

• Single timeseries object in the MATLAB

workspace
• Cell array of timeseries objects in the
MATLAB workspace

The timeseries objects share the same time

vector in the tscollection.
tsc = tscollection(Time) Creates an empty tscollection object with the
time vector Time.

When time values are date strings, you must

specify Time as a cell array of date strings.
tsc = tscollection(Time, Optionally enter the following parameter-value
TimeSeries, 'Parameter', pairs after the Time and TimeSeries arguments:
Value, ...)
• Name (see “Time Series Collection Properties”
on page 3-19)

Time Series Collection Properties

This table lists the properties of the tscollection object. You can specify the Name,
Time, and TimeInfo properties as input arguments in the tscollection constructor.

3-19
3 Time Series Analysis

Time Series Collection Property Descriptions

Property Description
Name tscollection object name entered as a string. This name can
differ from the name of the tscollection variable in the
MATLAB workspace.
Time A vector of time values.

When TimeInfo.StartDate is empty, the numerical Time

values are measured relative to 0 in specified units. When
TimeInfo.StartDate is defined, the time values represent date
strings measured relative to StartDate in specified units.

The length of Time must match either the first or the last
dimension of the Data property of each tscollection member.
TimeInfo Uses the following fields to store contextual information about
Time:

• Units — Time units with the following values: 'weeks',

'days', 'hours', 'minutes', 'seconds',
'milliseconds', 'microseconds', and 'nanoseconds'
• Start — Start time
• End — End time (read-only)
• Increment — Interval between two subsequent time values.
The increment is NaN when times are not uniformly sampled.
• Length — Length of the time vector (read-only)
• Format — String defining the date string display format. See
the MATLAB datestr function reference page for more
information.
• StartDate — Date string defining the reference date. See
the MATLAB setabstime function reference page for more
information.
• UserData — Stores any additional user-defined information

Time Series Collection Methods

• “General Time Series Collection Methods” on page 3-21

3-20
Time Series Objects

• “Data and Time Manipulation Methods” on page 3-21

General Time Series Collection Methods

Use the following methods to query and set object properties, and plot the data.

Methods for Querying Properties

Method Description
get Query tscollection object property values.
isempty Evaluate to true for an empty tscollection
object.
length Return the length of the time vector.
plot Plot the time series in a collection.
set Set tscollection property values.
size Return the size of a tscollection object.

Data and Time Manipulation Methods

Use the following methods to add or delete data samples, and manipulate the
tscollection object.

3-21
3 Time Series Analysis

Methods for Manipulating Data and Time

Method Description
addts Add a timeseries object to a tscollection
object.
addsampletocollection Add data samples to a tscollection object.
delsamplefromcollection Delete one or more data samples from a
tscollection object.
getabstime Extract a date-string time vector from a
tscollection object into a cell array.
getsampleusingtime Extract data samples from an existing
tscollectionobject into a new tscollection
object.
gettimeseriesnames Return a cell array of time series names in a
tscollection object.
horzcat Horizontal concatenation of tscollection objects.
Combines several timeseries objects with the
same time vector into one time series collection.
removets Remove one or more timeseries objects from a
tscollection object.
resample Select or interpolate data in a tscollection object
using a new time vector.
setabstime Set the time values in the time vector of a
tscollection object as date strings.
settimeseriesnames Change the name of the selected timeseries
object in a tscollection object.
vertcat Vertical concatenation of tscollection objects.
Joins several tscollection objects along the time
dimension.

3-22

Philippine Public Fiscal Administration Leonor Magtolis Briones PDF
67% (24)
Philippine Public Fiscal Administration Leonor Magtolis Briones PDF
1 page
Instant Download Regression Analysis An Intuitive Guide For Using and Interpreting Linear Models 1st Edition Jim Frost PDF All Chapter
0% (1)
Instant Download Regression Analysis An Intuitive Guide For Using and Interpreting Linear Models 1st Edition Jim Frost PDF All Chapter
62 pages
PP 180828 Shrink Wrapping Manual
100% (2)
PP 180828 Shrink Wrapping Manual
56 pages
Simpack General
No ratings yet
Simpack General
2 pages
Predictive Analytics in Insurance
No ratings yet
Predictive Analytics in Insurance
12 pages
Measuring Value in The Public Sector
No ratings yet
Measuring Value in The Public Sector
6 pages
Drill
100% (1)
Drill
2 pages
Manual of Infection Prevention and Control (PDFDrive)
100% (2)
Manual of Infection Prevention and Control (PDFDrive)
399 pages
Buy Ebook Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin Cheap Price
100% (3)
Buy Ebook Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin Cheap Price
84 pages
A Learning Guide To R PDF
0% (1)
A Learning Guide To R PDF
255 pages
Nonparametric Statistics Theory and Methods
No ratings yet
Nonparametric Statistics Theory and Methods
275 pages
10.1007@978 3 030 49165 9
No ratings yet
10.1007@978 3 030 49165 9
145 pages
Principles of Parallel Sci Comp
No ratings yet
Principles of Parallel Sci Comp
302 pages
Bio Statistics
No ratings yet
Bio Statistics
174 pages
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
No ratings yet
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
55 pages
Sfun PDF
No ratings yet
Sfun PDF
466 pages
Assignment Cover Page: BUSM4535
No ratings yet
Assignment Cover Page: BUSM4535
13 pages
Practical Algorithms - Yool, George
No ratings yet
Practical Algorithms - Yool, George
222 pages
Manual Stata 13
100% (1)
Manual Stata 13
371 pages
Applications of DOE in Engineering and Science 2019 Revised-LYE Sept 17 PDF
No ratings yet
Applications of DOE in Engineering and Science 2019 Revised-LYE Sept 17 PDF
214 pages
Book Matlab Document Stats
No ratings yet
Book Matlab Document Stats
2,338 pages
Applied Econometrics Using Matlab
100% (1)
Applied Econometrics Using Matlab
348 pages
(Image Processing Series) Luciano Da Fona Costa, Roberto Marcond Cesar Jr. - Shape Classification and Analysis - Theory and Practice-CRC Press (2009) PDF
No ratings yet
(Image Processing Series) Luciano Da Fona Costa, Roberto Marcond Cesar Jr. - Shape Classification and Analysis - Theory and Practice-CRC Press (2009) PDF
674 pages
LGSVL Simulator Paper 2020
No ratings yet
LGSVL Simulator Paper 2020
6 pages
Essentials of Statistics
No ratings yet
Essentials of Statistics
272 pages
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
No ratings yet
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
1 page
Book - Roger D Peng-Exploratory Data Analysis With R-Leanpub (2015) PDF
No ratings yet
Book - Roger D Peng-Exploratory Data Analysis With R-Leanpub (2015) PDF
125 pages
Dlmdmdql01 Course Book
No ratings yet
Dlmdmdql01 Course Book
104 pages
The Philosophy of Quantitative Methods Understanding Statistics
No ratings yet
The Philosophy of Quantitative Methods Understanding Statistics
169 pages
How To Calculate Precision, Recall, and F-Measure For Imbalanced Classification
No ratings yet
How To Calculate Precision, Recall, and F-Measure For Imbalanced Classification
19 pages
Manual Simevent
100% (1)
Manual Simevent
117 pages
Get MATLAB Programming For Engineers 6th Edition Stephen J. Chapman PDF Ebook With Full Chapters Now
100% (10)
Get MATLAB Programming For Engineers 6th Edition Stephen J. Chapman PDF Ebook With Full Chapters Now
79 pages
STP531 Course Syllabus Fall2013
No ratings yet
STP531 Course Syllabus Fall2013
2 pages
Complete Download Image Processing The Fundamentals Second Edition Maria Petrou PDF All Chapters
100% (1)
Complete Download Image Processing The Fundamentals Second Edition Maria Petrou PDF All Chapters
67 pages
Akritas Probability & Statistics With R For Engineers and Scientists
No ratings yet
Akritas Probability & Statistics With R For Engineers and Scientists
256 pages
Study Guide For STA3701
No ratings yet
Study Guide For STA3701
325 pages
M-Phimac Brochure 2011
No ratings yet
M-Phimac Brochure 2011
12 pages
Data Analytics, Data Visualization and Big Data
No ratings yet
Data Analytics, Data Visualization and Big Data
25 pages
Career Guide 2024 Final
No ratings yet
Career Guide 2024 Final
1,124 pages
[Ebooks PDF] download Research Methods, Statistics, and Applications Kathrynn A. Adams full chapters
100% (1)
[Ebooks PDF] download Research Methods, Statistics, and Applications Kathrynn A. Adams full chapters
65 pages
Statistics and Machine Learning Toolbox™ Release Notes
No ratings yet
Statistics and Machine Learning Toolbox™ Release Notes
150 pages
Simulation
No ratings yet
Simulation
38 pages
Applied Statistics For Bioinformatics Using R
No ratings yet
Applied Statistics For Bioinformatics Using R
279 pages
IACT 422 - 03 - Term Project - SUPPLY CHAIN SIMULATION FOR 4th PARTY LOGISTICS
100% (1)
IACT 422 - 03 - Term Project - SUPPLY CHAIN SIMULATION FOR 4th PARTY LOGISTICS
37 pages
Performance Modeling of Automated Manufacturing Systems: N. Viswanadham
No ratings yet
Performance Modeling of Automated Manufacturing Systems: N. Viswanadham
9 pages
R Studio How To
No ratings yet
R Studio How To
12 pages
Ec3022 ch1-3
No ratings yet
Ec3022 ch1-3
54 pages
CSE-Machine Learning & Big Data - WSS Source Book
No ratings yet
CSE-Machine Learning & Big Data - WSS Source Book
181 pages
Railway Restructuring
No ratings yet
Railway Restructuring
134 pages
Matlab S Function Ref
100% (1)
Matlab S Function Ref
470 pages
Computational Intelligence in Expensive Optimization Problems (2010) (Attica)
50% (2)
Computational Intelligence in Expensive Optimization Problems (2010) (Attica)
707 pages
Data Analysis
No ratings yet
Data Analysis
110 pages
Data
No ratings yet
Data
126 pages
Data Analysis
No ratings yet
Data Analysis
106 pages
Data Analysis
No ratings yet
Data Analysis
116 pages
Matlab Mathworks Data Analysis
No ratings yet
Matlab Mathworks Data Analysis
167 pages
Matlab Prog
No ratings yet
Matlab Prog
1,504 pages
Mathworks-Matlab Programming Fundamentals r2017b-2017
100% (1)
Mathworks-Matlab Programming Fundamentals r2017b-2017
1,360 pages
Matlab Prog
No ratings yet
Matlab Prog
1,720 pages
Matlab Prog PDF
No ratings yet
Matlab Prog PDF
1,418 pages
Data - Analysis Using Matlab
No ratings yet
Data - Analysis Using Matlab
156 pages
Matlab Prog (0001-0200)
No ratings yet
Matlab Prog (0001-0200)
200 pages
Matlab Prog (0001-0050)
No ratings yet
Matlab Prog (0001-0050)
50 pages
Basic Research and Technologies for Two-Stage-to-Orbit Vehicles: Final Report of the Collaborative Research Centres 253, 255 and 259
From Everand
Basic Research and Technologies for Two-Stage-to-Orbit Vehicles: Final Report of the Collaborative Research Centres 253, 255 and 259
Dieter Jacob
No ratings yet
Programming the Photon: Getting Started with the Internet of Things
From Everand
Programming the Photon: Getting Started with the Internet of Things
Christopher Rush
5/5 (1)
Title: Tilt Cylinder Model Number: 863 Serial Number: 514440001 & Above, 514540001 & Above, 514640001 & Above
No ratings yet
Title: Tilt Cylinder Model Number: 863 Serial Number: 514440001 & Above, 514540001 & Above, 514640001 & Above
3 pages
30 Day Challenge Meal-Plan - Week 1
No ratings yet
30 Day Challenge Meal-Plan - Week 1
39 pages
Manuskrip Evi
No ratings yet
Manuskrip Evi
28 pages
Srinagar Leh To Delhi Package
No ratings yet
Srinagar Leh To Delhi Package
7 pages
Banking Chapter 5
No ratings yet
Banking Chapter 5
11 pages
Lists_Sets
No ratings yet
Lists_Sets
2 pages
RIDGID A-Frame Fault Locator
No ratings yet
RIDGID A-Frame Fault Locator
2 pages
Clinical Teaching On Cardiac Rehabilitation
No ratings yet
Clinical Teaching On Cardiac Rehabilitation
14 pages
Waiver 2023
No ratings yet
Waiver 2023
1 page
AJAY Chhattisgarh - Company Final
No ratings yet
AJAY Chhattisgarh - Company Final
333 pages
Intracom Telecom Company - Profile
No ratings yet
Intracom Telecom Company - Profile
8 pages
VOS3000 Details Pricing
No ratings yet
VOS3000 Details Pricing
13 pages
Introduction To Systems Analysis and Design
No ratings yet
Introduction To Systems Analysis and Design
47 pages
Department of Management Sciences: Financial Project Adamjee Insurance Company
No ratings yet
Department of Management Sciences: Financial Project Adamjee Insurance Company
85 pages
Instant download (Ebook) Structural Concrete: Strut-and-Tie Models for Unified Design by Chen, Wai-Fah; El-Metwally, Salah El-Din E ISBN 9781498783842, 1498783848 pdf all chapter
100% (10)
Instant download (Ebook) Structural Concrete: Strut-and-Tie Models for Unified Design by Chen, Wai-Fah; El-Metwally, Salah El-Din E ISBN 9781498783842, 1498783848 pdf all chapter
65 pages
WWW Who Int/workforcealliance/knowledge/toolkit/33 PDF
No ratings yet
WWW Who Int/workforcealliance/knowledge/toolkit/33 PDF
1 page
Straight Egyptian Selection PDF
No ratings yet
Straight Egyptian Selection PDF
23 pages
2014 Mitsubishi Mirage Ac
No ratings yet
2014 Mitsubishi Mirage Ac
13 pages
Poultry Industry in Moldova
No ratings yet
Poultry Industry in Moldova
5 pages
Going Aloft / Working Over Side Permit: General
No ratings yet
Going Aloft / Working Over Side Permit: General
2 pages
2023 June Jebsen QC Convention Center
No ratings yet
2023 June Jebsen QC Convention Center
5 pages
Master Thesis Wageningen
100% (3)
Master Thesis Wageningen
4 pages
Blockchain in The Power Sector
100% (1)
Blockchain in The Power Sector
9 pages
Week 3 Lab
No ratings yet
Week 3 Lab
2 pages
XTR101
No ratings yet
XTR101
26 pages