0% found this document useful (0 votes)
213 views14 pages

Unit 2 Basic Statistical Concepts

This document discusses basic statistical concepts including variables, data, populations, and samples. It defines a variable as something that varies or takes on different values, and identifies types of variables such as independent, dependent, categorical, and continuous variables. The document also defines data and discusses types of data including qualitative and quantitative data. It provides examples and descriptions of commonly used statistical terms.

Uploaded by

HafizAhmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views14 pages

Unit 2 Basic Statistical Concepts

This document discusses basic statistical concepts including variables, data, populations, and samples. It defines a variable as something that varies or takes on different values, and identifies types of variables such as independent, dependent, categorical, and continuous variables. The document also defines data and discusses types of data including qualitative and quantitative data. It provides examples and descriptions of commonly used statistical terms.

Uploaded by

HafizAhmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT-2

BASIC STATISTICAL CONCEPTS

Written By:
Aftab Ahmad

Reviewed By:
Dr. Rizwan Akram Rana
Introduction
In this unit you will study some basic concepts like variable, data, population, and
sample. Types of variable, types of data, types of population and types of sample are also
discussed. The purpose of this unit is to give an awareness of these commonly used
concepts.

Objectives
After reading this unit the students will be able to:
1. explain variable and its types.
2. explain data and its types.
3. explain population and its types.
4. explain sample and its types.

2.1 Variable and Data


Variable
A variable is something that varies or something that is subject to variation. It has no definite
value but can assume any set of values. In other words we can say that a variable is a
characteristic that varies from one person or thing to another. It is a characteristic, number or
quantity that increases or decreases over time or takes different value in different situations;
or in more precise words it is a condition or quality that can differ from one case to another. It
may also be called a data item. In some other words, a variable is an image, concept or a
perception that can be measured. It should be kept in mind that a concept cannot be measured.
It must be converted to some measureable form; and measureable form of a concept is called
a variable. Examples of variables for human are height, weight, age, number of siblings,
business income and expenses, country of birth, capital expenditure, marital status, eye color,
gender, class grades, and vehicle type are examples of variables.
Variable = A Concept that can be measured

2.1.1 Types of Variables


Variables can be categorized in three different ways, (a) The causal relationship (b) The
design of study, and (c) The unit of measurement. Let us describe these variables in some
details.

The Causal Relationship


In causal relationship studies four types of variables may operate. These may be:
i) Change variables that are responsible for bringing about change in a phenomena;
ii) Variables which affect the link between cause and effect variables;
iii) Outcome variables which result from the effects of a change variable;
iv) Connecting or linking variables, which in certain situation are important to
complete relationship between cause and effect.

16
In research, change variables are referred to as independent variables while the outcome
variables are known as dependent variables. In cause effect relationship, there are some
unmeasured variables affecting the relationship. These are called extraneous variables.
The variables linking cause-effect relationship are called intervening variables. A brief
summary of above mentioned variables is given in the following table.

Table 2.1: Types of Variables (causal relationship)


Variable Description
Independent Variable It is a cause that brings changes in the situation
Dependent Variable It is a change that occurs due to dependent variable
Extraneous Variable It is a situation/factor in everyday life that influences changes in
dependent variable. As these factors are not measured in the
research study, they can increase or decrease the magnitude of
relationship between the independent and dependent variables.
Intervening Variable It is a link between independent and dependent variable.
Sometimes, without the intervention of another variable, it is
impossible to establish a relationship between independent and
independent variables.

Design of the Study


A study that investigates causation or association may be controlled, contrived
experiment, a quasi-experiment or an ex post facto or non-experimental study. Normally,
there are two types this category of variables.
i) Active Variables: these variables can be changed or controlled; and
ii) Attribute Variables: these variables can be changed or controlled and refer to
characteristics of the research study population. Demographic features like age,
gender, education, qualification and income etc. are attributive variables.

Some common types of variables are given below.


i) Binary Variable
These variables take only two values. For example, male or female, true or false,
yes of no, improved or not improved, completed task or failed to complete task etc.
These variables can be divided into two types; opposite binary variables, and
Conjunct binary variables. Opposite binary variables are polar opposite to each
other. For example, success or failure, true or false etc. There is no third or middle
value. On the other hand conjunct binary variables assume two values but also have
middle value. For example, agreeing 20% with the policies of one party and 80%
with others.

ii) Categorical Variable


Usually an independent variable or predictor contains values indicating
membership in more than one possible categories. For example, gender (male or
female), marital status (married, single, divorced, widow), or brand of a product.

17
iii) Confounding Variable
A variable that has hidden effect on the experiment.

iv) Continuous Variable


A variable with infinite number of values. And its values are obtained by
measuring. For example, height and weight of students in a class, time it takes to
get to school, distance between Lahore and Karachi etc.

v) Dependent Variable
Outcome or response of an experiment. An independent variable has direct or
inverse effect upon dependent variable. In graph it is plotted on y-axis.

vi) Independent Variable


The variable that is manipulated by the researcher. In graph it is plotted on x-axis.

vii) Nominal Variable


It is another name of categorical variable.

viii) Ordinal Variable


Similar as categorical variable, but there is clear order. For example, income level
of low, middle and high.

ix) Interval Variable


An interval variable is a measurement where the difference between two values is
meaningful. The difference between the temperature of 100o and 90o is the same as
80o and 70o.

x) Ratio Variable
Similar to interval variable, but has meaningful zero.

xi) Qualitative Variable


A broad category of any variable that can’t be counted “i.e. has no numerical
value”. Nominal and ordinal variable fall under this umbrella.

xii) Quantitative Variable


A broad category of any variable that can be counted “i.e. has numerical value
associated with it”. Variable fall in this category include discrete variable and ratio
variable.

2.1.2 Less Common Types of Variables Data


Some less common types of variables are given below.
i) Attribute Variable
Another name for a categorical variable (in statistical software) or a variable that
isn’t manipulated (in design of experiments).

18
ii) Collider Variable
A variable represented by a node on a causal graph that has paths pointing in as
well as out.

iii) Covariate Variable


Similar to an independent variable, it has an effect on the dependent variable but is
usually not the variable of interest.

iv) Criterion Variable


Another name for a dependent variable, when the variable is used in non-
experimental situations.

v) Dichotomous Variable
Another name for a binary variable.

vi) Dummy Variables


Used in regression analysis when you want to assign relationships to unconnected
categorical variables. For example, if you had the categories “has dogs” and “owns
a car” you might assign a 1 to mean “has dogs” and 0 to mean “owns a car.”

vii) Endogenous Variable


Similar to dependent variables, they are affected by other variables in the system.
Used almost exclusively in econometrics.

viii) Exogenous Variable


Variables that affect others in the system.

ix) Indicator variable


Another name for a dummy variable.

x) Intervening variable
A variable that is used to explain the relationship between variables.

xi) Latent Variable


A hidden variable that can’t be measured or observed directly.

xii) Manifest variable


A variable that can be directly observed or measured.

xiii) Manipulated variable


Another name for independent variable.

xiv) Mediating variable


Variables that explain how the relationship between variables happens. For
example, it could explain the difference between the predictor and criterion.

19
xv) Moderating variable
Changes the strength of an effect between independent and dependent variables.
For example, psychotherapy may reduce stress levels for women more than men,
so sex moderates the effect between psychotherapy and stress levels.

xvi) Nuisance Variable


An extraneous variable that increase variability overall.

xvii) Observed Variable


A measured variable (usually used in SEM).

xviii) Outcome variable


Similar in meaning to a dependent variable, but used in a non-experimental study.

xix) Polychotomous variables


variables that can have more than two values.

xx) Predictor variable


Similar in meaning to the independent variable, but used in regression and in non-
experimental studies.

xxi) Test Variable


Another name for the Dependent Variable.

xxii) Treatment variable


Another name for independent variable.

Data
The term “data” refers to the kind of information a researcher obtains to achieve
objectives of his research. All research processes start with collection of data, which
plays a significant role in the statistical analysis. This term is used in different contexts.
In general, it indicates facts or figures from which conclusions can be drawn. Or it is a
raw material from which information is obtained. Data are the actual pieces of
information that you collect through your study. In other words data can be defined as
collection of facts and details like text, figures, observations, symbols, or simply
description of things, event or entity gathered with a view of drawing inferences. It is a
raw fact which should be processed to get information

2.1.3 Types of Data


In research, different methods are used to collect data, all of which fall into two
categories, i.e. primary data and secondary data. It is a common classification based upon
who collected the data.

20
Primary data
As the name suggests, is one which is collected for the first time by the researcher
himself. Primary data is originated by the researcher for the first time for addressing his
research problem. It is also known as first hand raw data. The data can be collected using
various methods like survey, observations, physical testing, mailed questionnaire,
questionnaire filled and sent by enumerators, personal interviews, telephonic interviews,
focus groups discussion, case studies, etc.

Secondary data
Point towards the second hand information already collected and recorded by any other
person with a purpose not relating to current research problem. It is readily available
form of data and saves time and cast of the researcher. But as the data is gathered for the
purpose other than the problem under investigation, so the usefulness of the data may be
limited in a number of ways like relevance and accuracy. Also, the objectives and
methods adopted to collect data may not be suitable to the current situation. Therefore,
the researcher should be careful when using secondary data. Examples of secondary data
are censuses data, publications, internal records of the organizations, reports, books,
journal articles, websites etc.

2.1.4 Key Differences Between primary And Secondary Data


Some key differences between primary and secondary data are given in the following lines.
i) Primary data refers to the data originated by the researcher for the first time.
Secondary data is already existing data, collected by other researchers, agencies,
and organizations.
ii) Primary data is real-time data whereas secondary data is one which relates to the
past.
iii) Primary data is collected to address the problem in hand while the purpose behind
collection of secondary data is different from the problem in hand.
iv) Collection of primary data is a laborious process. On the other hand collection of
secondary data is easy and rapid.
v) Sources of primary data are survey, observations, physical testing, mailed
questionnaire, questionnaire filled and sent by enumerators, personal interviews,
telephonic interviews, focus groups discussion, case studies, etc. On the other hand
sources of secondary are censuses data, publications, internal records of the
organizations, reports, books, journal articles, websites etc.
vi) Collection of primary data requires a large amount of resources like time, cost, and
human resources. On the other hand collection of secondary data is expensive and
easily available.
vii) Primary data is specific to the researcher’s needs. He can control the quality of
research. On the other hand, secondary data is neither specific to researcher needs
nor has he control over the quality of data.
viii) Primary data is available in the raw form while secondary data has undergone some
statistical procedures and is refined from primary data.
ix) Data collected from primary sources are more reliable and accurate than the
secondary sources.

21
Data

Primary Secondary

2.2 Population and Sample


Population
A research population is a large collection of individuals or objects to which the
researcher wants the results of the study to apply. Population is the main focus of a
research question. A research population is also known as a well-defined collection of
individuals or objects known to have similar characteristics. All individuals or objects
within a certain population usually have a common, binding characteristic or trait.
Population can also be defined as all individual that meet a set of specification or a
specific criteria. All researches are done for the benefit of population.

2.2.1 Types of Population, Sample


In educational research, we commonly come across two types of populations.
i) The Target Population is also known as the theoretical population and refers to
the entire group of individuals or objects to which a researcher is interested to
generalize the conclusions. This type of population usually has varying degree of
characteristics.

ii) The Accessible Population is also known as the study population. It is the
population to which a researcher can apply the conclusions of the study. This
population is a subset of the target population.

22
Sample
A sample is simply a subset or subgroup of population (Frey, Carl, & Gary, 2000).The
concept of sample arises from the inability of the researchers to test all the individuals in
a given population. Sampling is the process of selecting some individuals from the
accessible population, in a way that these individuals represent whole accessible
population. The sample should be representative in a sense that each individual should
represent the characteristics of the whole population (Lohr, 1999). The main function of
the sample is to allow the researchers to conduct the study to individuals from the
population so that the results of their study can be used to derive conclusions that will
apply to the entire population.

2.2.2 Types of Sample


Generally researchers use two major sampling techniques: probability sampling and non-
probability sampling.

Probability sampling
Is a process that utilizes some form of random selection. In probability sampling, each
individual in chosen with a known probability. This type of sampling is also known as
random sampling or representative sampling; and depends on objective judgment.
Various types of probability are as under:
i) Simple Random sampling
In random sampling each member of the population has an equal chance of being
selected as subject. Each member is selected independently of the other member of
population. Many methods are used to proceed with random sampling. In a
commonly used method each member of the population is assigned a unique
number. All assigned numbers are placed in bowl and mixed thoroughly. The
researcher, then blind-folds and picks numbered tags from the bowl. All the
numbers picked are the subjects of the study. Another method is to use computer
for random selection from the population. For smaller population first method is
useful and for larger population computer-aided method is preferred.

Advantages of Simple Random Sampling


It is an easy way of selecting a sample from a given population. This method is free from
personal bias. As each member of the population is given equal opportunities of being
selected so it a fair way and one can get representative sample.

Disadvantages of Simple Random Sampling


One of the most obvious limitations of random sampling method is its nee of a complete
list of all members of the population. For larger population, usually this list is not
available. In such case, it is better to use other sampling techniques.

23
ii) Systematic Random Sampling
In systematic random sampling, the researcher first randomly picks the first item or
the subject from the population. Then he selects each nth subject from the list. The
procedure involved in this sampling is easy and can be done manually. The sample
drawn using this procedure is representative unless certain characteristics of the
population are repeated for every nth member, which is highly risky.
Suppose a researcher has a population of 100 individuals and he needs 12 subjects.
He first picks his starting number 7. He then picks his interval 8. The members of
his sample will be individual 7, 15, 23, 31, 39, 47, 55, 63, 71, 79, 87, and 95

Advantages of Systematic Random Sampling


The main advantage of using this technique is its simplicity. It allows researcher to add a
degree of system or process into the random selection of subjects. Another advantage is
its assurance that the population will be evenly sampled.

Disadvantages of Systematic Random Sampling


Systematic sampling assumes that the size of the population is available or can be
approximated. Suppose a researcher wants to study the behavior of monkeys of a
particular area. If he does not have any idea of how many monkeys there are, he cannot
systematically select a starting point or interval size. If any population has a type of
natural standardized pattern, the risk accidently choosing very common cases is more
apparent.

iii) Stratified Random Sampling


In this type of sampling, the whole population is divided into disjoint subgroups.
These subgroups are called stratum. From each stratum a sample of pre-specified
size is drawn independently in different strata, using simple random sampling. The
collection of these samples constitutes a stratified sample.

Advantages
This type of sampling is appropriate when the population has diversified social or ethnic
subgroups.

Disadvantages
While using this type of sampling, there is greater chance of overrepresentation of
subgroups in the sample.

iv) Cluster Sampling


It is a simple random sample in which each sampling unit is a collection or cluster, or
elements. For example, a researcher who wants to study students may first sample groups

24
or cluster of students such as classes, and then, select the sample of students from among
the clusters.

Advantages
This type of sampling is appropriate for larger population. It saves time and resources.

Disadvantages
In this type of sampling, there is a greater chance of selecting a sample that is not
representative of the whole population.
Non-Probability Sampling or Judgmental Sampling
This technique depends on subjective judgment. It is a process where probabilities cannot
be assigned to the individuals objectively. It means that in this technique samples are
gathered in a way does not give all individuals in the population equal chances of being
selected. Choose these methods could result in biased data or a limited ability to make
general inferences based on the findings. But there are also many situations in which
choosing this kind of sampling techniques is the best choice for a particular research
question or the stage of research.

There are four kinds of non-probability sampling techniques.


i) Convenience Sampling
In this technique a researcher relies on available subjects, such as stopping peoples
in the markets or on street corners as they pass by. This method is extremely risky
and does not allow the researcher to have any control over the representativeness of
the sample. It is useful when the researcher wants to know the opinion of the
masses on a current issue; or the characteristics of people passing by on streets at a
certain point of time; or if time and resources are limited in such a way that the
research would not be possible otherwise. What may be the reason for selecting
convenience samples, it is not possible to use the results from a convenience
sampling to generalize to a wider population.

ii) Purposive or Judgmental Sampling


In this technique a sample is selected on the bases of the knowledge of population
and the purpose of the study. For example, when an educational psychologist wants
to study the emotional and psychological effects of corporal punishment, he will
create a sample that will include only those students who ever had received
corporal punishment. In this case, the researcher used purposive sample because
those being selected fit a specific purpose or description that was necessary to
conduct the research.

25
iii) Snowball Sample
This type of sampling is appropriate when the members of the population are
difficult to locate, such as homeless industry workers, undocumented immigrants
etc. a snowball sample is one in which the researcher collects data on a few
members of the target population he or she can locate, then asks to locate those
individuals to provide information needed to locate other members of that
population whom they know. For example, if a researcher wants to interview
undocumented immigrants from Afghanistan, he might interview a few
undocumented individuals he knows or can locate, and would then rely on those
subjects to help locate more undocumented individuals. This process continues
until the researcher has all the interviews he needed, until all contacts have been
exhausted. This technique is useful when studying a sensitive topic that people
might not openly talk about, or if talking about the issue under investigation could
jeopardize their safety.

iv) Quota Sample


A quota sample is one in which units are selected into a sample on the basis of pre-
specified characteristics so that the total sample has the same distribution of
characteristics assumed to exist in the population. For example, if a researcher
wants a national quota sample, he might need to know what proportion of the
population is male and what proportion is the female, as well as what proportion of
each gender fall into different age category and educational category. The
researcher would then collect a sample with the same proportion as the national
population.

2.3 Self-Assessment Questions


Q. 1 What is a variable?
Q. 2 What are commonly used types of variable?
Q. 3 What do you understand by the term “data”?
Q. 4 Write down the types of data.
Q. 5 What is population?
Q. 6 What do you understand by the target population?
Q. 7 What do you mean by the assessable population?
Q. 8 What do you mean by the term “sample”?
Q. 9 Write down the types of probability sampling.
Q. 10 Write down the types of non-probability sampling.

26
2.4 Activities
1. Suppose a scientist is conducting an experiment to test the what extant a vitamin
could extend a person’s life expectancy. Identify:
i) Independent Variable of the experiment.
ii) Dependent Variable of the experiment.

2. Suppose a Lahore-based company is launching a new product for senior citizens of


Pakistan and tests that product for senior citizens of Lahore. Identify:
i) Target Population of the company.
ii) Assessable Population of the company.

27
2.5 Bibliography
Bartz, A. E. (1981). Basic Statistical Concepts (2nd Ed.). Minnesota: Burgess Publishing
Company

Deitz, T., & Kalof, L. (2009). Introduction to Social Statistics. UK: Wiley_-Blackwell

Frey, L. R., Carl H. B., & Gary L. K. (2000). Investigating Communication: An


Introduction to Research Methods.2nd Ed. Boston: Allyn and Bacon

Gay, L. R., Mills, G. E., & Airasian, P. W. (2010). Educational Research: Competencies
for Analysis and Application, 10th Edition. Pearson, New York USA.

Gravetter, F. J., & Wallnau, L. B. (2002). Essentials of Statistics for the Behavioral
Sciences (4th Ed.). Wadsworth, California, USA.

Lohr, S. L. (1999). Sampling: Design and Analysis. Albany: Duxbury Press.

28

You might also like