0% found this document useful (0 votes)
65 views103 pages

Unit-1-Introduction To Statistical Analysis

Uploaded by

31240640
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views103 pages

Unit-1-Introduction To Statistical Analysis

Uploaded by

31240640
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Unit-1-Introduction to Statistical Analysis

• Introduction
• Meaning of Statistics
• The Scientific Method : Basic Steps of the Research Process
• Experimental Data and Survey Data
• Populations and Samples
• Census and Sampling Methods
• Parameter and Statistic
• Independent and Dependent Variables
• Examining Relationships
• Introduction to SPSS Statistics.
Introduction - What is Statistics ?
• Word statistics has two meanings.

• In the more common usage, statistics refers to numerical facts.

• Income of family, age of student, percentage passes , starting salary etc…

• In these examples, the word statistics refers to numbers.


Introduction - What is Statistics ?
The second meaning of statistics refers to the field or discipline of study.

In this sense of the word, statistics is defined as follows

Decisions made by using statistical methods are called educated guesses.

Theoretical or mathematical statistics deals with the development, derivation, and


proof of statistical theorems, formulas, rules, and laws.

Applied statistics involves the applications of those theorems, formulas, rules, and laws
to solve real-world problems.

Broadly speaking, applied statistics can be divided into two areas: descriptive statistics
and inferential statistics
Introduction - What is Statistics ?
Definition: Collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions.

Statistical analysis: It’s the science of collecting, exploring and presenting large
amounts of data to discover underlying patterns and trends. Statistics are applied
every day – in research, industry and government – to become more scientific about
decisions that need to be made.
First, statisticians are guides for learning from data and navigating common problems
that can lead you to incorrect conclusions. Second, given the growing importance of
decisions and opinions based on data, it’s crucial that you can critically assess the
quality of analyses that others present to you.

● Statistics Uses Numerical Evidence to Draw Valid Conclusions

● Statisticians Know How to Avoid Common Pitfalls

● Statistics is used to Make an Impact in Your Field


The important areas of application of statistics are:

• The State
• Economics
• Business Management and Industry
• Social and Natural Sciences
• Biology and Medicine
• Research
Types of Statistics
Types of Statistics - Descriptive Statistics
• Suppose we have information on test scores of students enrolled in a statistics class.
• The whole set of numbers that represents the scores of students is called a data set
• Name of each student is called an element, and the score of each student is called
an observation.
• Rather that source data set it is easier to draw conclusions from summary tables and
diagrams
•So, we reduce data to a manageable size by constructing tables, drawing graphs, or
calculating summary measures such as averages.
•The portion of statistics that helps us do this type of statistical analysis is called
descriptive statistics.
Types of Statistics - Descriptive Statistics

The accompanying chart


shows the top five
companies in the United
States with the most
patents.

This chart describes the


data on patents as
collected from these five
companies and, hence, is
an example of descriptive
statistics.
Types of Statistics - Inferential Statistics
• In statistics, collection of all elements of interest is called a population.

•The selection of a few elements from this population is called a sample.

• A major portion of statistics deals with making decisions, inferences, predictions, and
forecasts about populations based on results obtained from samples

• We may want to find the starting salary of a typical college graduate. To do so, we may
select 2000 recent college graduates, find their starting salaries, and make a decision
based on this information.

• Area of statistics that deals with such decision-making procedures is referred to as


inferential statistics.
Types of Statistics - Inferential Statistics

Cart shows degree to which TV commercials


motivate people to shop at specific retailers.

18% - they are influenced by commercials,

30% - they are not influenced shop regularly

44% - they are not influenced at all.

Chart indicates 1% margin of error

Margin of error means percentages given in


chart can change in the plus or minus direction
by 1% when applied to the population.
Exercise
Population Vs Sample
Suppose a statistician is interested in knowing
1. The percentage of all voters in a city who will vote for a particular candidate in
an election
2. The 2009 gross sales of all companies in New York City
3. The prices of all houses in California

• In statistics, a population does not necessarily mean a collection of people.

• It can be collection of people / houses / books / television sets / or cars etc..

•Most of the time, decisions are made based on portions of populations


Population : The set of data (numerical or otherwise) corresponding to the entire collection of units
about which information is sought.
Population Vs Sample
• The collection of a few elements selected from a population is called a sample.

The collection of information from the elements of a population or a sample is called a


survey
Population Vs Sample

•A survey that includes every element of the target population is called a census

•A census is rarely taken because it is expensive and time-consuming

• We select a sample and collect the required information from the elements included in
that sample.

• Such a survey conducted on a sample is called a sample survey.


Population Vs Sample
• It is important that results obtained from a sample survey closely match results that we
would obtain by conducting a census

• One way to select a random sample is by lottery or draw

● Biased Sample- one which is not true representative of the population from which
it’s inference is drawn
Inferencing about Population using Sample
Sampling with and without replacement
• A sample may be selected with or without replacement.

• In sampling with replacement, each time we select an element from the population,
we put it back in the population before we select the next element.

• Thus, in sampling with replacement, the population contains the same


number of items each time a selection is made.

• As a result, we may select the same item more than once in such a sample.

• The experiment of rolling a die many times is another example of sampling with
replacement because every roll has the same six possible outcomes.

• Sampling without replacement occurs when the selected element is not replaced in
the population.

• Thus, we cannot select the same item more than once in this type of sampling.
Sample Survey
Exercise
Exercise

Q1

Q2

Q1 - P S P P S
Q2 - P S P S P
Basic Terms
Basic Terms
Exercise
Exercise
Types of Variables
Quantitative Variables

Discrete Variables

For example, number of cars sold on a day at car dealership is a discrete


variable because number of cars sold must be 0, 1, 2, 3, . . .& we can count it

Continuous Variables

The time taken to complete an examination is an example of a continuous variable


because it can assume any value, let us say, between 30 and 60 minutes
Types of Variables
Qualitative or Categorical Variables

For example, the status of an undergraduate college student is a qualitative variable


because a student can fall into any one of four categories: freshman, sophomore,
junior, or senior.
Exercise
Exercise
Cross Section Vs Time Series Data

Information on incomes of 100 families for 2009 is an example of cross-section data.


Sources of Data
•The availability of accurate and appropriate data is essential for deriving reliable results.

•Data may be obtained from internal sources, external sources, or surveys and experiments

• Sometimes needed data may not be available from either internal or external sources.

• In such cases, investigator may have to conduct a survey or experiment to obtain the
required data.

University
Internal Data/ Sources External Data/ Sources Gather Data
Exercise
Population Parameters and Sample Statistics
• A numerical measure such as the mean, median, mode, range, variance, or standard
deviation calculated for a population data set is called a population parameter, or simply a
parameter.
• A summary measure calculated for a sample data set is called a sample statistic, or simply
a statistic.

A parameter is a characteristic of a population.


A statistic is a characteristic of a sample.
Scientific Steps in Statistical Analysis Research Process

Establish The Determine


Identify The Data Collection
Research Question Measurement Data Category
Population Method
& Statistical Goal Standard

Ethical
Explore Data Check Errors Data Entry Sampling Method
consideration

Identify
Plot Data Test For Normality Statistical Inference
Distribution
Scientific Steps in Statistical Analysis Research Process
Step 1 Establish The Research Question & Statistical Goal

• Goal of statistical data analysis is to find answer to a research question we call hypothesis

• Our hypothesis which we want to find through study is called alternate hypothesis
• time of watching TV is related to marks obtained
• Contrast to this is null hypothesis which we want to disprove
• time of watching TV is not related to marks obtained
• Due to rejection of null hypothesis we assume that the alternate hypothesis is true

• Through rejecting the null hypothesis, we want to establish


• association (correlation) between two or more variables
• relationship (prediction) between two or more variables
• difference (comparison or variance) between two or more variables
Scientific Steps in Statistical Analysis Research Process
Step 2 Identify The Population

• Population is genus of entity which shares some essential common characteristics and features

• The numerical description or characteristics of the population is called parameter

• Sample is a subset of the population.

• Statistic is called the numerical characteristic or feature of the sample.

• We need to identify population we need to study to explain our statistical hypothesis

• We need to ask how many samples or what proportion of population do we need?


Scientific Steps in Statistical Analysis Research Process
Step 3 Determine Measurement Standard

• After determining population & sampling size we decide on measurement standards


for our data.

• Do we use qualitative or quantitative data category?

• Which test statistic will be suitable for our data analysis?

• Which graphs will be suitable for data representation.

• Which units will be suitable for our study and variables?


Scientific Steps in Statistical Analysis Research Process
Step 4 Data Collection Method

• We must decide also on which form of data collection method will be used?

• Experimental ----------------------- Refer LR5


• Survey
• Fieldwork
• Archival research
• Observation of practice
• Action or hybrid

• We need to identify primary and secondary resources of data

• Pros and cons of these methods should be pondered in relation to your research
question, budget and logistical abilities.
Scientific Steps in Statistical Analysis Research Process
Step 5 Data Category

• Data tag, theme or category are the variables for the test statistic.

• What data we need to collect?

• We need to collect sample statistic and put them in relevant and meaningful variables.

• For qualitative research we should find conceptual category of data.

• We should ask questions such as “how does a group of people feel?”, “how does a
company recruit people?”, “How does people feel with this practice?”. We collect data
which cannot be quantified. We find concepts relevant to our research question.
Scientific Steps in Statistical Analysis Research Process
Step 6 Sampling Method

• Should decide what kind of sampling to use depending on collected data category

• For parametric data we choose from major sampling methods


• random
• cluster
• systematic
• stratified

• For qualitative data, we need to choose from sampling methods


• convenience sampling
• snowball sampling
• judgement sampling
• quota sampling.

•We need to determine size of sample needed & how many samples should be
observed/collected.
Scientific Steps in Statistical Analysis Research Process
Step 7 Ethical consideration

Essential ethical consideration must be followed. Some important ones are highlighted:
• Invasion of privacy should be avoided.
• Causing participants to lose dignity must be avoided.
• Causing participants to think less of themselves must be avoided.
• Deception that causes resentment or hostility must be avoided.
• Unnecessary withholding of information must be avoided.
• Pain or discomfort must be avoided.
• Breaking local prohibitions (e.g. drinking alcohol, taking drugs, etc.) must be
avoided.
• Anything that may make participants feel uncomfortable must be avoided.
• Information about the research should be given as much as possible without
violating integrity and accuracy of the research.
Scientific Steps in Statistical Analysis Research Process
Step 8 Data Entry

• It is a good idea to define the variable names at this stage.

• Meaningful variable names in relation to the research question, aims and objectives of
the study should be defined and named.

• Consider also variables in relation to an efficient database design

• Enter the names in the chosen software Excel or SPSS.

• Once we have established an organization of data tables we can start collecting data
considering

• Remember that statistical data in the real world are two types: conceptual data (gender,
class etc.) and numerical data (age, height etc.).
Scientific Steps in Statistical Analysis Research Process
Step 9 Check Errors

• Once we have initially entered data we have to check data integrity.

• We will revise what we have entered.

• We will recheck variable names, actual values, units and deal with not available values.
Scientific Steps in Statistical Analysis Research Process
Step 10 Explore Data
• Statistics measure certain attributes about data which you can explore.

• These attributes can be divided in the following categories:


• Measure of Central tendency : Mean , Mode , Median
• Measure of Position : Percentiles , Quartiles , Z-Scores
• Measure of Shape : Distribution pattern Normal , Skewed or Uniform Distribution
• Examples of distributions: z, t, F, chi-square and many more!
• Measure of Spread : Range, IQR , Standard Deviation (SD) , Variance & Co-efficient
of Variance
•Scientific Steps in Statistical Analysis Research Process
Step 11 Plot Data

• Identify suitable graph relevant to your research hypothesis.

• The data can be visualized through graphs.

• You can understand patterns or trends and make decision relevant to your research.

• You can extract useful statistical data for further analysis.

• Demonstrating your hypothesis through data visualization is achieved in this stage.


Scientific Steps in Statistical Analysis Research Process
Step 12 Identify Distribution

• Captures shape of statistical data as it exists in the real world


• Distribution can be represented by both equation and graph
• Symmetry, skewness, kurtosis, continuity etc. of graph define type of distribution.
• Most of the times these characteristics are mean and standard deviation.
• Consider number of variables we are studying and any relationship they have.
• X and Y axis of distribution - random variable value and its probability or frequency
Scientific Steps in Statistical Analysis Research Process
Step 13 Test For Normality

You can test for normality either by graphs or by statistical formula.


Scientific Steps in Statistical Analysis Research Process
Step 14 Statistical Inference

• Statistical inference is process of extending a claim about sample characteristic to


population characteristic.

• Claim can be a sample statistic such as mean or a more general characteristic such as
a hypothesis about the population.

• As an example, let’s assume that we have found that a sample responds to a


treatment and extending this to the population stating the treatment works.

• Hence the process of generalizing a claim from sample to population is statistical


inference
Scientific Steps in Statistical Analysis Research Process
Sampling Methods
Probability Sample
● Simple random sample
● Systematic sample
● Stratified sample
● Cluster sample

Non-probability Sample
● Convinience Sampling
● Quota Sampling
Data Analysis : Data Sampling
•Sampling helps a lot in research. It is one of the most important factors which determines
the accuracy of your research/survey result.
•If anything goes wrong with sample then it will be directly reflected in the final result.

Let’s have a look on some basic terminology


Population
Sample
Sampling

Population is the collection of the elements which has some or the other characteristic in
common.
Number of elements in the population is the size of the population.
A population is the entire group that you want to draw conclusions about

Sample is the subset of the population.


A sample is the specific group that you will collect data from.
The process of selecting a sample is known as sampling.
Number of elements in the sample is the sample size.
Data Analysis : Data Sampling

There are lot of sampling techniques which are grouped into two categories as
•Probability Sampling
•Non- Probability Sampling
• Difference lies between two is whether sample selection is based on randomization or not.
• With randomization, every element gets equal chance to be picked up & be part of sample

Probability Sampling

This Sampling technique uses randomization to make sure that every element of the
population gets an equal chance to be part of the selected sample.

It’s alternatively known as random sampling.


Data Analysis : Data Sampling - Probability Sampling

Simple Random Sampling:

Every element has an equal chance of getting selected to be the part sample. It is used
when we don’t have any kind of prior information about the target population.

For example: Random selection of 20 students from class of 50 student. Each student has
equal chance of getting selected. Here probability of selection is 1/50
Data Analysis : Data Sampling - Probability Sampling

Stratified Sampling

•This technique divides elements of population into small subgroups (strata) based on
the similarity.

•And then the elements are randomly selected from each of these strata.

•We need to have prior information about the population to create subgroups.
Data Analysis : Data Sampling - Probability Sampling

Cluster Sampling

Our entire population is divided into clusters or sections and then the clusters are randomly
selected.

All the elements of the cluster are used for sampling.

Clusters are identified using details such as age, sex, location etc.
Data Analysis : Data Sampling - Probability Sampling
Data Analysis : Data Sampling - Probability Sampling
Data Analysis : Data Sampling - Probability Sampling

Systematic Clustering
Here the selection of elements is systematic and not random except the first element.
Elements of a sample are chosen at regular intervals of population. All the elements are put
together in a sequence first where each element has the equal chance of being selected.
Data Analysis : Data Sampling - Non - Probability Sampling

• It does not rely on randomization.


• Reliant on the researcher’s ability to select elements for a sample.
• Outcome of sampling might be biased and makes difficult for all the elements of
population to be part of the sample equally.
•This type of sampling is also known as non-random sampling.

Convenience Sampling
Purposive Sampling
Quota Sampling
Referral /Snowball Sampling
Data Analysis : Data Sampling - Non - Probability Sampling

Convenience Sampling

Here the samples are selected based on the availability.


This method is used when the availability of sample is rare and also costly.
So based on the convenience samples are selected.

For example: Researchers prefer this during the initial stages of survey research, as
it’s quick and easy to deliver results.
Data Analysis : Data Sampling - Non - Probability Sampling

Purposive Sampling
• This is based on the intention or the purpose of study.
• Only those elements will be selected from the population which suits the best for the
purpose of our study.

For Example: If we want to understand the thought process of the people who are
interested in pursuing master’s degree then the selection criteria would be “Are you
interested for Masters in..?”

All the people who respond with a “No” will be excluded from our sample.
Data Analysis : Data Sampling - Non - Probability Sampling

Quota Sampling
• This type of sampling depends on some pre-set standard.
• It selects the representative sample from the population.
• Proportion of characteristics/ trait in sample should be same as population.
• Elements are selected until exact proportions of certain types of data is obtained or
sufficient data in different categories is collected.

For example: If our population has 45% females and 55% males then our sample should
reflect the same percentage of males and females.
Data Analysis : Data Sampling - Non - Probability Sampling
Referral /Snowball Sampling

•This technique is used in the situations where the population is completely unknown
and rare.
•Therefore we will take the help from the first element which we select for the
population and ask him to recommend other elements who will fit the description of
the sample needed.
•So this referral technique goes on, increasing the size of population like a snowball.

•For example: It’s used in situations of highly sensitive topics like HIV Aids where people
will not openly discuss and participate in surveys to share information about HIV Aids.
•Helps in situations where we do not have the access to sufficient people with the
characteristics we are seeking. It starts with finding people to study.
Types of Data / Variables
• Data that is expressed in numbers and summarized using statistics to give
meaningful information is referred to as quantitative data.
• Examples of quantitative data we could collect are
• Heights
• Weights,

• Age

• If we obtain the mean of each set of measurements, we have meaningful


information about the average value for each of these characteristics.
• When we use data for description without measurement, we call
it qualitative data.

• Examples of qualitative data are student

• attitudes towards school

• attitudes towards exam

• friendliness of students to teachers.

• Such data cannot be easily summarized using statistics.


• When we obtain data directly from individuals, objects or processes, we refer to it
as primary data.

• Quantitative or qualitative data can be collected using this approach.

• Such data is usually collected solely for the research problem to you will study.
• When you collect data after another researcher or agency that initially
gathered it makes it available, you are gathering secondary data.

• Examples of secondary data are

• census data published

• stock prices data published

• salaries data published


Scales / Levels of Data
• Data type is an important concept in statistics to correctly apply statistical measurements
• We need to know data type to do proper exploratory data analysis for machine learning
Scales / Levels of Data
CATEGORICAL / QUALITATIVE DATA
•Categorical data represents characteristics.
•Therefore it can represent things like a person’s gender, language etc.
•Categorical data can also take on numerical values (Example: 1 for female and 0 for male).
•Note that those numbers don’t have mathematical meaning.

NOMINAL SCALE DATA


•Nominal values represent discrete units and are used to label variables that
have no quantitative value.
•Note that nominal data has no order.
•Therefore if you would change the order of its values, the meaning would not
change.
The left feature that describes a persons gender would be called „dichotomous“,
which is a type of nominal scales that contains only two categories.
Scales / Levels of Data
ORDINAL SCALE DATA
Ordinal values represent discrete and ordered units.
It is therefore nearly the same as nominal data, except that it’s ordering matters.

Note that difference between Elementary and High School is different than the difference
between High School and College.

The differences between the values is not really known.

Because of that, ordinal scales are usually used to measure non-numeric features like
happiness, customer satisfaction and so on
Scales / Levels of Data
NUMERICAL / QUANTITATIVE DATA

DISCRETE DATA

• Its values are distinct and separate.

• Data can only take on certain values.

• This type of data can’t be measured but it can be counted.

• It basically represents information that can be categorized into a classification.

•An example is the number of heads in 100 coin flips.

•You can check by asking the following two questions whether you are dealing with discrete
data or not: Can you count it and can it be divided up into smaller and smaller parts?
Scales / Levels of Data
NUMERICAL / QUANTITATIVE DATA

CONTINUOUS DATA • Problem with interval values data is that they don’t
• Continuous Data representshave
measurements and therefore their values can’t be counted
a „true zero“.
but they can be measured.
• An example would be the height of a person,
•That means which
in regards toyou
our can describe
example, thatbythere
usingisintervals
no
on the real number line. such thing as no temperature.

INTERVAL SCALE DATA •With interval data, we can add and subtract, but we
• Interval values represent ordered units that have the same difference.
cannot multiply, divide or calculate ratios.
• Therefore we speak of interval data when we have a variable that contains
numeric values that are ordered and where we know the exact differences
•Because there is no true zero, a lot of descriptive and
between the values.
inferential statistics can’t be applied.
•An example would be a feature that contains temperature of a given place like
Scales / Levels of Data
NUMERICAL / QUANTITATIVE DATA

CONTINUOUS DATA

RATIO SCALE DATA


•Ratio values are also ordered units that have the same difference.
•Ratio values are the same as interval values, with the difference that they do
have an absolute zero.
•Good examples are height, weight, length etc.
Analysis

In Data Science, you can use one hot encoding, to transform nominal data
into a numeric feature.
Analysis
Analysis
Types of Data / Variables
Types of Data / Variables Revision
In statistical research, a variable is defined as an attribute of an object of study. Choosing
which variables to measure is central to good experimental design.

Example
• If you want to test whether some plant species are more salt-tolerant than others, some
key variables you might measure
• amount of salt added to water
• species of plants
• growth
• wilting.

• You need to know which types of variables you are working with in order to choose
appropriate statistical tests and interpret the results of your study.
Types of Data / Variables Revision
You can usually identify the type of variable by asking two questions:
What type of data does the variable contain?
What part of the experiment does the variable represent?
Types of Data / Variables Revision
Types of Data / Variables Revision
Types of Data / Variables Revision
Types of Data / Variables Revision
Types of Data / Variables Revision
Types of Data / Variables Revision
Types of Data / Variables Revision
● Sample Surveys

● Interviewing Respondents

◦ In-person Interviewing
◦ Telephone Interviewing
◦ Online Interviewing
◦ Mailed Questionnaire
◦ Focus Groups
◦ Observational Data Collection
◦ Experiments
Examining Relationships Between Variables

• Lets examine how two or more variables may be related


• We start with relationship between 2 variables (bivariate )
• We will explore
• types of relationships
• statistical analysis of relationships
• how this can be used to make predictions
Types of relationships
• Scatter plot is a visual image of the ways in which variables may or may not be related
• Two variables can be associated in one of three ways: unrelated, linear, or nonlinear.
• Unrelated - have no systematic relationship
• Linear - relationship can be explained by a straight line on a scatter plot.
• Positive - 2 variables move, or change, in the same direction.
• Negative - 2 variables move in opposite directions
• Nonlinear - can be explained by a curved line on a scatter plot
• Curvilinear relationship is described by a polynomial equation
• Quadratic relationship – is has only one curve in it
• U-shaped curvilinear - 2 variables are related negatively until a certain point
and then are related positively.
• Inverted U-shaped curvilinear - 2 variables are related positively to a certain
point and then are related negatively.
Correlations
• Statistical relationship between variables is referred to as a correlation
• Correlation between 2 variables is sometimes called a simple correlation
• It measures degree of relationship between variables

• Researchers calculate a correlation coefficient and a coefficient of determination.


• Correlation coefficient - Is a numerical summary of the type and strength of a
relationship between variables.
• Correlation coefficient takes the form: rab = +/-x,
• x ranges from +1.00 (a perfect positive ) to -1.00 (a perfect negative)
• correlation coefficient of 0.00 means two variables are unrelated
Correlations
• Pearson product moment correlation calculates a correlation coefficient for two
variables that are measured on a ratio or interval scale.
• Correlation matrices: A correlation matrix lists all the relevant variables across the
top and down the left side of a matrix where the respective rows and columns meet
researchers indicate the bivariate correlation coefficient for those two variables and
whether it is significant by using stars

Causation and correlation


• Correlation is one of the criteria used to determine causation, but causation cannot
necessarily be inferred from a correlation coefficient
• Researchers can sometimes use sequencing of events in time to infer causation.
• Two variables may also be correlated, but their relationship is not necessarily meaningful.
Correlations
• Multiple correlation: A multiple correlation is computed when researchers want to
assess relationship between a variable they wish to explain (criterion variable- Y ) and
two or more other independent variables (X1, X2)working together;
Y = a + bX1 + cX2
• Multiple correlation coefficient is just like a correlation coefficient, except that it
tells researchers how two or more variables working together are related to the
criterion variable of interest.

• It takes the form Ra.b.c = +/-x, read Multiple correlation of variables b & c with a
Regression Analysis

• Used to predict / explain value of criterion / outcome / response / target variable


on the basis of value of another variables , called a predictor variables.
• Statistical procedure used to make such predictions are referred to as regression analysis.
• Linear regression (simple regression): used to predict or explain value of criterion / response
variable on basis of values of predictor variables and knowledge of relationship between the
two variables.
• Regression line (line of best fit) is denoted by a straight line through the data on a
scatter plot.
Regression Analysis
Regression Analysis
•Regression analysis is accomplished by constructing a regression equation (also called
a prediction equation or regression model which is an algebraic equation expressing
the relationship between variables.
•The typical regression equation for two variables is: y = a + bx, where y is the criterion or
response variable, a is the intercept, b is the slope, and x is the predictor variable.
• The intercept is how far up the y-axis the regression line crosses it.
• The slope denotes how many units the variable Y is increasing for every unit increase in X; it
depends on the correlation between the two variables.
Correlations
• Regression coefficient, which is part of a regression equation, is a statistical
measure of the relationship between the variables (a bivariate regression coefficient
references two variables).

• Significance tests are applied to a regression equation to determine whether the


predicted variance is significant.

• The extent to which any model or equation, such as a regression line, summarizes or
“fits” the data is referred to as the goodness of fit.

Y = a + b X

Regression coefficient
Advanced Relationship Analysis
• There are more complex multivariate analytic procedures that assess relationships variables
• Canonical correlation analysis (Rc) is a form of regression analysis used to examine
the relationship between multiple independent and dependent variables.
• Path analysis examines hypothesized relationships among multiple variables (usually
independent, mediating, and dependent) for the purpose of helping to establish causal
connections and inferences by showing the “paths” the causal influences take.
• Discriminant analysis is a form of regression analysis that classifies, or discriminates,
individuals on the basis of their scores on two or more ratio/interval independent
variables into the categories of a nominal dependent variable.
•Factor analysis examines whether a large number of variables can be reduced to a
smaller number of factors (a set of variables).
•Cluster analysis explains whether multiple variables or elements are similar enough to
be placed together into meaningful groups or clusters that have not been predetermined
by the researcher.
Introduction to SPSS

https://www.youtube.com/watch?v=TZPyOJ8tFcI
End

You might also like