ACCTY 312 - Lesson 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Lesson 1:

NATURE OF STATISTICS
Learning Objectives:
At the end of the lesson, the students are expected to:
a. Define Statistics;
b. Differentiate Descriptive from Inferential Statistics;
c. Define Parametric and Non-Parametric tests;
d. Describe statistical data;
e. Determine the desired sample size using the Slovin's Formula;
f. Cite examples of the levels of measurement; and
g. Identify the different data visualization techniques.

Statistics plays a vital role, as it is used


to test theories, support research findings, and
make informed decisions. The use of statistics
helps researchers to identify patterns and
relationships in data, draw conclusions, and
make predictions.

For example, statistical methods can be


used to determine the validity and reliability of
psychological tests and measures. They can
also be used to analyze the results of
experiments and determine the significance of
differences between groups. Additionally,
statistical methods can be used to make
predictions about future behavior based on past data, and to determine the effectiveness
of different treatments and interventions.

ACCTY 312 - Statistical Analysis with SoftwareApplications 1


STATISTICS

Statistics is a branch of mathematics that deals with the collection, organization,


analysis, interpretation, and presentation of numerical data. It's important to note that
statistical results should always be interpreted carefully, as they are based on
assumptions and can be affected by various sources of error. Misinterpretation of
statistical results can lead to incorrect conclusions and bad decision-making.

BRANCHES OF STATISTICS

a. Descriptive Statistics
Descriptive statistics summarizes the main features of a dataset, such as its central
tendency (mean, median, mode), dispersion (range, variance, standard deviation), and
shape (skewness, kurtosis). It helps to describe the data in a concise and meaningful
way, making it easier to understand and visualize. Descriptive statistics does not
involve making any assumptions about the underlying population from which the
data was collected.

b. Inferential Statistics
Inferential statistics uses sample data to make inferences about a larger population. It
involves testing hypotheses about the population parameters and making predictions
about future observations. Inferential statistics relies on probabilities and models, and
its results are subject to uncertainty. For example, in a hypothesis test, we might use
sample data to test whether the mean of a population is equal to a certain value.

i. Parametric Test – a test of significance that make assumptions about the


parameters of the population distribution from which the sample is drawn.
This is often the assumption that the population data are normally
distributed.
ai. Non-Parametric Test – a test of significance appropriate when a parametric
assumption has been greatly violated, or when the nature of the distribution
is not known. This test is based on the ranks or order of the data.

POPULATION AND SAMPLE

In statistics, you always deal with either from a population or from a sample.

a. Population is a collection of all possible members of a set of individuals, objects,


or measurement. It refers to groups or aggregates of people, objects, or things of
any form. It is the totality of individuals that possesses some observable
characteristics—called a variable.
 Finite population are data sets from which the numbers can be
determined or counted immediately through certain delimitation or
putting into a certain scope
and coverage of subjects of
study.
 Infinite population are data
sets from which the numbers
cannot be determined or

ACCTY 312 - Statistical Analysis with SoftwareApplications 2


counted immediately.
b. Sample is a part or portion of a population. It is a number of individuals selected
from a population for a study, preferably in such a way that they represent the
larger group from which they were selected.

DETERMINING THE SAMPLE SIZE

The determination of a sample size is very important in statistics for several


reasons:

1. Statistical power: The larger the sample size, the greater the statistical power of a
study. This means that with a larger sample size, it is more likely that the study
will detect a true effect if one exists. On the other hand, if the sample size is too
small, there is a greater chance that the study will fail to detect a true effect.
2. Precision of estimates: A larger sample size can provide more precise estimates of
population parameters. For example, a larger sample size can produce more
precise estimates of the mean and standard deviation of a population.
3. Generalizability: A sample that is representative of the population can provide
more generalizable results. A larger sample size increases the chances that the
sample will be representative of the population.
4. Reducing sampling error: Sampling error refers to the difference between the
sample statistics and the population parameters. The larger the sample size, the
smaller the sampling error, which can increase the accuracy of the results.
5. Cost considerations: Determining the appropriate sample size also takes into
account cost considerations. A larger sample size is often more expensive, so it is
important to balance the need for a large sample with the cost of obtaining that
sample.

Before calculating the sample size, it's important to consider several factors about
the target population and the desired level of accuracy:

a. Population size: This refers to the total number of individuals in the target
population. To determine the population size, it's important to clearly define who
falls within the group of interest. For example, if you're studying dog owners, you
would include everyone who has owned at least one dog.
b. Margin of error (confidence interval): Errors are inevitable in any study, so it's
important to determine the acceptable level of error. The margin of error
(confidence interval) is expressed as a mean value and represents the difference
allowed between the mean of your sample and the mean of the population. For
example, a confidence interval of ±5% means that you allow a difference of 5%
between the sample mean and the population mean.
c. Confidence level: This represents the level of confidence you have that the actual
mean falls within the margin of error. The most commonly used confidence levels
are 90%, 95%, and 99%. These represent the percentage of times that the actual
mean would fall within the margin of error if the study were repeated multiple
times.

ACCTY 312 - Statistical Analysis with SoftwareApplications 3


Once you have determined these factors, you can use various formulas to
calculate the sample size required for your study such as the Slovin’s formula below.

Slovin’s Formula
N
n=
1 + Ne2
where:
n = sample size
N = population size
e = margin of error

Example: A researcher plans to study the association between emotional quotient and
intelligence quotient among freshmen students of ISU-Echague. If the campus has a total
of 8,950 freshmen students, calculate the desired sample size of the study considering a
margin of error of 0.05.

8950 8950
n= 1 + (8950)(0.05)2
= 23.375 = 382.89 or 383 sample.

SOURCES OF DATA

There are two main sources of data whether primary or secondary.


a. Primary data are data that come from an original source, and are intended to
answer specific research questions, can be taken by interview, mail-in
questionnaire, survey, or experimentation.
b. Secondary data are data that are taken from previously recorded data, such as
information in research conducted, demographic data from the census bureau,
educational achievement data from the Department of Education, or clinical
data from electronic medical records.

CONSTANT AND VARIABLE

There are two major characteristics of objects, or events whether constant or


variable.
a. Constant. A constant is a characteristic of object, people, or event that does
not vary. For example, the temperature at which water boils is a constant.
b. Variable. A variable is a characteristic of objects, people, or events that can take
of different values. It can vary in quantity (e.g., weight of people), or in quality
(e.g., hair color of people).

TYPES OF DATA

There are basically two types of random variables yielding two types of data:
qualitative and quantitative.
a. Qualitative data. Qualitative variables are non-numerical and describe
characteristics or qualities of a person, place, or thing. It is also called

ACCTY 312 - Statistical Analysis with SoftwareApplications 4


categorical variables. Examples of qualitative variables include race, gender,
marital status, occupation, and political affiliation. These variables are often
categorical in nature and cannot be easily quantified.
b. Quantitative data. Quantitative variables are numerical and represent amounts
or measurements. It is also termed numerical variables. Examples of
quantitative variables include age, height, weight, IQ score, and number of
years of education. These variables can be manipulated and measured.

CLASSIFICATION OF VARIABLES

Variables can be classified into two according to purpose whether experimental


or mathematical.

Experimental Classification. A researcher may classify variables according to the function


they serve in the experiment.

1. Independent variables are variables controlled by the researcher, and expected to


have an effect on the behavior of the subjects. The independent variable is also
called explanatory variable.
2. Dependent variable is some measure of the behavior of subjects and expected to
be influenced by the independent variable. The dependent variable is also called
outcome variable.

Example: To predict the role of level of motivation on the academic performance of the
students, the dependent variable is the academic performance while the independent
variable is the level of motivation.

Mathematical Classification. Variables may also be classified in terms of the mathematical


values they take on within a given interval.

1. Continuous variable is a variable which can assume any of an infinite number of


values and can be associated with points on a continuous line interval.

Examples: height (measured in inches or centimeter), weight (measured in kilogram),


age (measured in years), IQ, income, reaction time (measured in seconds).

2. Discrete variable is a variable which consists of either a finite number of values or


countable number of values.

Examples: sex (male or female), course (BS-Biology, BS-Physics, BS-Chemistry), marital


status (single, married, divorced), educational level (high school, bachelor’s degree,
master’s degree), diagnosis (yes or no), age group (18-25, 26-35, 36-45).

ACCTY 312 - Statistical Analysis with SoftwareApplications 5


METHODS OF COLLECTING DATA

a. Direct or Interview Method. It is a face-to-face encounter between the interviewer


and the interviewee. The interview may vary according to the preference of either
or both parties. However, this method is time-consuming and expensive.
b. Indirect or Questionnaire Method. This method utilizes questionnaires to obtain
information. It can be done by mail, electronic form, or hand-carried to the
intended respondents.
c. Registration Method. This method of gathering information is governed by laws.
Example, birth certificates, death certificates, and licenses, among others.
d. Observation Method. This method is used to data that are pertaining to behaviors
of an individual or group of individuals at the time of occurrence of a given
situation.
e. Experiment Method. This is used to determine the cause and effect relationship of
certain phenomena under controlled conditions. This method usually employed
by scientific researchers.

METHODS OF PRESENTING DATA

a. Textual Method. This method presents the collected data in narrative and
paragraphs forms.
b. Tabular Method. This method presents the collected data in table which are
orderly arranged in rows and columns for an easier and more comprehensive
comparison of figures.
c. Graphical Method. This method presents the collected data in visual or pictorial
form to get a clear view of data.

LEVELS OF MEASUREMENT
1. Nominal variable. This variable is also called the categorical variable scale, is
defined as a scale used for labeling variables into distinct classifications and
doesn’t involve a quantitative value or order.

Examples: sex (male and female), citizenship (Filipino, American), religion (catholic,
INC)

2. Ordinal variable. Ordinal scale is defined as a variable measurement scale used to


simply depict the order of variables and not the difference between each of the
variables. These scales are generally used to depict non-mathematical ideas such
as frequency, satisfaction, happiness, a degree of pain, etc.

Examples: Student class designation (freshmen, sophomore, junior, senior),


Level of satisfaction (very satisfied, satisfied, neutral, unsatisfied)
Faculty rank (instructor, assistant professor, associate professor, professor)

3. Interval variable. Interval scale is defined as a numerical scale where the order of
the variables is known and the difference between successive numbers in the
interval measurement are consistently the same, however, there is no true
absence of the quantity being measured (i.e., zero has no meaning).

Examples: Temperature (in degree Celsius or Fahrenheit), IQ score

ACCTY 312 - Statistical Analysis with SoftwareApplications 6


4. Ratio variable. Ratio measurement, like the interval measurement, are also
expressed in numbers, and the differences between two any successive numbers
are consistent. It has, however, the additional characteristics of starting from a
true zero (i.e., zero indicates the absence of the quantity being measured).

Examples: height, weight, time, age, money

NOMINAL ORDINAL INTERVAL RATIO


Eye Color Birth Order Temperature Height
Named Named Named Named
Natural Order Natural Order Natural Order
Equal interval Equal interval
between variables between variables
Has a “true zero”
value, thus ratio
between values can
be calculated
QUALITATIVE (Categorical) QUANTITATIVE (Numerical/Scale)

SAMPLING TECHNIQUES

Sampling techniques refer to the methods used to select a subset of individuals


or elements from a larger population for study or analysis. The choice of a particular
sampling technique depends on the research question, the study design, and the
characteristics of the population being studied.

RANDOM SAMPLING refers to a sampling method where every unit in the population
has a known, non-zero chance of being selected for the sample. It is also called
probability sampling.

1. Simple Random Sampling. This is a basic and straightforward method, where each
unit of the population has an equal and independent chance of being selected.
2. Stratified Random Sampling. In this technique, the population is divided into
smaller homogenous subgroups, called strata, based on certain characteristics.
Then, a simple random sample is taken from each stratum.
3. Systematic Random Sampling. This method involves selecting every kth unit from
the population after randomly selecting a starting point between 1 and k.
4. Cluster Sampling. In this technique, the population is divided into several groups,
or clusters, and then a random sample of these clusters is selected. All units
within the selected clusters are included in the sample.

NON-RANDOM SAMPLING is a sampling procedure where samples selected in a


deliberate manner with little or no attention to randomization. It is also called non-
probability sampling.

1. Convenience Sampling. A process of selecting a group of individuals who are


easily accessible or available for study.

ACCTY 312 - Statistical Analysis with SoftwareApplications 7


2. Purposive Sampling. A process of selecting based on judgment to select a sample
which the researcher believed, based on prior knowledge, will provide the data
that they need.
3. Quota Sampling. This involves specifying a fixed number (or "quota") of
participants that are to be selected from each subgroup within the population.
The goal of quota sampling is to ensure that the sample represents specific
characteristics of the population, such as gender, age, or ethnicity.
4. Snowball Sampling. This is a technique in which one or more members of a
population are located and used to lead the researchers to other member of the
population.
5. Voluntary Sampling. This is a technique when sample are composed of
respondents who are self-select into the study/survey. Most of the time, samples
have a strong interest in the topic of the study.

DATA VISUALIZATION TECHNIQUES

It is common to think of statistical graphics and data visualization as relatively


modern developments in statistics. In fact, the graphic representation of quantitative
information has deep roots. Along the way, developments in technologies (printing,
reproduction), mathematical theory and practice, and empirical observation and
recording enabled the wider use of graphics and new advances in form and content
(Chen, et.al., 2008).

In 2020 when the COVID-


19 pandemic has emerged
across the globe, statistical data
visualization became very
helpful in projecting the COVID-
19 cases. Through this, decisions
were laid out to possibly flatten
the curve of the case and
eventually end the pandemic.

ACCTY 312 - Statistical Analysis with SoftwareApplications 8


Histogram
A histogram is a graphical display of data using bars of different heights. In
a histogram, each bar groups numbers into ranges. Taller bars show that more data falls
in that range. A histogram displays the shape and spread of continuous sample data.

Stem-and-Leaf Plot
A stem-and-leaf display or stem-and-leaf plot is a device for presenting
quantitative data in a graphical format, similar to a histogram, to assist in visualizing the
shape of a distribution.

Boxplot
A box plot or boxplot is a method for graphically depicting groups of numerical
data through their quartiles.

ACCTY 312 - Statistical Analysis with SoftwareApplications 9


Scatter Diagram
A scatter diagram (Also known as scatter plot, scatter graph, and correlation chart)
is a tool for analyzing relationships between two variables for determining how closely
the two variables are related. One variable is plotted on the horizontal axis and the other
is plotted on the vertical axis.

Frequency Polygon
Frequency polygons are a graphical device for understanding the shapes of
distributions. They are formed by placing a dot at the midpoint of the top of each
rectangle of the histogram and connectint the dots. They serve the same purpose as
histograms but are especially helpful for comparing sets of data.

ACCTY 312 - Statistical Analysis with SoftwareApplications 10


SUMMARY
 Statistics deals with the collection, organization, analysis, interpretation, and presentation
of data.
 Descriptive statistics summarizes the main features of a dataset, including central
tendency, dispersion, and shape, without making any assumptions about the underlying
population.
 Inferential Statistics uses sample data to make inferences about a larger population,
involving testing hypotheses and making predictions, relying on probabilities and models,
and subject to uncertainty.
 Parametric Test makes assumptions about the parameters of the population distribution,
often assuming normality.
 Non-Parametric Test is appropriate when a parametric assumption is violated or when
the distribution is unknown, based on the ranks of the data.
 Population is a collection of all possible members of a set of individuals, objects, or
measurement. It refers to groups or aggregates of people, objects, or things of any form
 Sample is a part or portion of a population.
 Slovin’s Formula is used to determine the sample size of a study, and is given by
N
n=
1 + Ne2
 Primary data are data that come from an original source, and are intended to answer
specific research questions.
 Secondary data are data that are taken from previously recorded data, such as
information in research conducted.
 There are two types of data: qualitative which are categorical and quantitative which are
numerical.
 Independent variables are variables controlled by the researcher while Dependent
variable is some measure of the behavior of subjects and expected to be influenced by
the independent variable.
 Continuous variable is a variable which can assume any of an infinite number of values
while Discrete variable is a variable which consists of either a finite number of values.
 Nominal variable is a variable also called the categorical variable scale such as sex,
religion, color.
 Ordinal scale is defined as a variable measurement scale used to simply depict the order
of variables and not the difference between each of the variables
 Interval scale is defined as a numerical scale where the order of the variables is known
and the difference between successive numbers in the interval measurement are
consistently the same
 Ratio measurement are also expressed in numbers, and the differences between two any
successive numbers are consistent. It has, however, the additional characteristics of
starting from a true zero.
 Random sampling refers to a sampling method where every unit in the population has a
known, non-zero chance of being selected for the sample.
 Non-random sampling is a sampling procedure where samples selected in a deliberate
manner with little or no attention to randomization.
 Data visualization techniques are used to present data in graphical form. It includes,
histogram, stem-and-leaf plot, boxplot, scatterplot, and frequency polygon, among
others.

ACCTY 312 - Statistical Analysis with SoftwareApplications 11

You might also like