0% found this document useful (0 votes)
42 views7 pages

STS 311

Uploaded by

ziapark25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views7 pages

STS 311

Uploaded by

ziapark25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

STS Basic Terminologies in Statistics

LECTURE 1.1 OVERVIEW OF STATISTICS


Population
STATISTICS
 consists of all the members of the group about which
 is the science of collecting, organizing, summarizing,
you want to draw a conclusion
and analyzing information to draw conclusions or
answer questions.
Sample
 It provides procedure in data collection, presentation,
organization, and interpretation to have a meaningful
 is a portion or part of the population of interest selected
idea.
for analysis.

IMPORTANCE OF STATISTICS Parameter

 It is used in sports  is a numerical index describing a characteristic of a


 for example: to help a general manager decide population
which player might be the best fit for a team.
 It is used in politics Statistic
 to help candidates understand how the public feels
about various policies.  is a numerical index describing a characteristic of a
 It is used in medicine sample.
 to help determine the effectiveness of new drugs.
 Statistical research in business SOURCES OF DATA
 enables managers to analyze past performance,
predict future business practices and lead Primary Data
organizations effectively.
 Statistics can describe markets, inform advertising,  are data that come from an original source, and are
set prices and respond to changes in consumer intended to answer a specific research question.
demand.  This can be taken by interview, mail-in questionnaire,
 Statistics, being quantitative tools survey or experimentation.
 widely used in the areas of economics and finance,
could help to shape effective monetary and fiscal Secondary Data
policies and to develop pricing models for financial
assets such as equities, bonds, currencies, and  are data taken from previously recorded data, such as
derivative securities. information in previously conducted research, financial
statements, business periodicals, and government reports.
TYPES OF STATISTICS  It can also be taken electronically, for instance via
internet websites, etc.
Descriptive Statistics
Constant
 It basically consists of organizing and summarizing data.
 is a characteristic of objects, people, or events that does
 describe data through numerical summaries, tables, and
not vary.
graphs.
 For example
 Examples:
 the temperature at which water boils (100 degree
1. The average score of a volleyball player for the past
Celsius) is a constant.
10 games
2. Birth rate in rural areas in the Philippines
Variable
3. Enrollment record of all colleges in BSU – TNEU
Lipa Campus
 is a characteristic of objects, people, or events that can
take different values.
Inferential Statistics
 It can vary in quantity like weight of people, or in quality
like hair color of people.
 It is the logical process that involves generalizing from a
sample to the population from which the sample was
TWO TYPES OF VARIABLES
selected and assessing the reliability of such
generalizations.
Qualitative Variables ( Categorical Variables )
 It is also called as statistical inference or inductive
statistics.
 Are variables that yield categorical responses.
 Examples:
 These are words or codes that represent class or category.
1. A car manufacturer wishes to estimate the average
 Example:
lifetime of batteries by testing a sample of 50 batteries.
 eye color, sex, occupation, student number, etc.
2. The political views of the youth in the urban areas
with respect to inflation rate in Asia
3. A campaign manager analyzes the effect of TV ads on
the promotion of a presidential candidate
Quantitative variables ( Numerical Variables )  Examples:
 Highest Educational Attainment (elementary, high
 Are variables that take on numerical values representing school, bachelor, masteral, doctoral)
an amount or quantity.  Rank of military officer (lieutenant, captain, major,
 These numerical values should answer the question how colonel).
much or how many.
 Examples: Interval Level
 height, weight, distance, salary, etc.
 is a measurement level that specifies the distances
 Variables can also be classified into two according to between each interval on the scale.
purpose whether experimental or mathematical.  Variables of this level have no absolute zero. This
means that a value of zero does not mean the absence of
Experimental Classification the quantity.
 Example:
Independent Variables ( Explanatory Variables )  Temperature on Fahrenheit/Celsius thermometer
 IQ (e.g., high IQ vs. average IQ vs. low IQ)
 are variables controlled by the experimenter or
researcher, and expected to have an effect on the Ratio Level
behavior of the subjects.
 represents the highest, most precise, level of
Dependent Variables ( Outcome Variables ) measurement.
 Variables of this level have absolute zero which means
 Measure the behavior of subjects and expected to be that a value of zero means the absence of the quantity.
influenced by the independent variable.  Example:
 Height and weight
Mathematical Classification  Time
 Distance and speed
Discrete Variables
PROCESS OF STATISTICS
 are quantitative variables that are either a finite number
of possible values or a countable number of possible
1. Identify the research objective
values.
 A researcher must determine the question(s) he or
 These are variables that are countable.
she wants answered. The question(s) must clearly
 Examples:
identify the population that is to be studied. Identify
 number of cars, number of siblings, etc.
the research objective.
Continuous Variables
2. Collect the information needed to answer the questions.
 Conducting research on an entire population is often
 are quantitative variables that have an infinite number of
difficult and expensive, so we typically look at a
possible values that are not countable.
sample.
 These are variables that are no longer countable but are
 This step is vital to the statistical process, because if
measurable.
the data are not collected correctly, the conclusions
 Examples:
drawn are meaningless.
 height, weight, volume, etc.
 Do not overlook the importance of appropriate data
collection.
Level of Measurement of Variables
3. Organize and summarize the information.
Nominal Level
 Descriptive statistics allow the researcher to obtain
an overview of the data and can help determine the
 is the first level of measurement and it is characterized
type of statistical methods the researcher should use.
by data that consist of names, labels or categories only.
 Data cannot be arranged in ordering scheme.
4. Draw conclusion from the information.
 Nominal scales have no numerical value.
 In this step the information collected from the sample
 Examples:
is generalized to the population. Inferential
 Sex (male or female)
statistics uses methods that takes results obtained
 Type of School (public or private)
from a sample, extends them to the population, and
 Eye Color (blue, green, brown).
measures the reliability of the result.
Ordinal Level
Important Note
 involves data that may be arranged in some order, but
differences between data values either cannot be  If the entire population is studied, then inferential
determined or meaningless. statistics is not necessary, because descriptive statistics
 An ordinal scale not only classifies subjects but also will provide all the information that we need regarding
ranks them in terms of the degree to which they possess a the population.
characteristic of interest.
DATA COLLECTION 5. Observation
 phenomenon of interest by recording the
 is the process of gathering and measuring information on observations made about the phenomenon as it
variables of interest, in an established systematic fashion actually happens.
that enables one to answer stated research questions, test  Involves collecting information without asking
hypotheses, and evaluate outcomes. questions.

Consequences of Improperly Collected Data Secondary data can be collected by:

 Inability to answer research questions accurately. 1. Published report on newspaper and periodicals.
 Inability to repeat and validate the study. 2. Financial Data reported in annual reports.
 Distorted findings resulting in wasted resources. 3. Records maintained by the institution.
 Misleading other researchers to pursue fruitless avenues 4. Internal reports of the government departments.
of investigation. 5. Information from official publications.
 Compromising decisions for public policy.
 Causing harm to human participants and animal subjects. Take Note:

 Always investigate the validity and reliability of the data


Steps in Data Gathering by examining the collection method employed by your
source.
1. Set the objectives for collecting data  Do not use inappropriate data for your research.
2. Determine the data needed based on the set objectives.
3. Determine the method to be used in data gathering and SAMPLE SIZE
define the comprehensive data collection points.
4. Design data gathering forms to be used.  is typically denoted by n and it is always a positive
5. Collect data. integer.
 No exact sample size can be mentioned here and it can
Methods of Data Collection vary in different research settings. However, all else
being equal, large sized sample leads to increased
Primary data can be collected by: precision in estimates of various properties of the
population.
1. Direct personal interviews
 The researcher has direct contact with the Take Note:
interviewee.
 The researcher gathers information by asking  Representativeness, not size, is the more important
questions to the interviewee. consideration.
 Use no less than 30 subjects if possible.
2. Indirect/Questionnaire Method  If you use complex statistics, you may need a minimum of
 involve sourcing and accessing existing data that 100 or more in your sample (varies with method).
were originally collected for the purpose of the
study. Choosing of sample size depends on:
 Questions can either be:
 Open-ended question - does not include Non-statistical Considerations
response categories. This type of question is
usually appropriate for collecting subjective  It may include availability of resources, man power,
data. budget, ethics and sampling frame.
 Closed-ended question - includes a list of
response categories from which the Statistical Considerations
respondent will select his answer. This type of
question is usually appropriate for collecting  It will include the desired precision of the estimate.
objective data.

3. Focus Group Three criteria need to be specified to


 It is a group interview of approximately six to determine the appropriate sample size:
twelve people who share similar characteristics or
common interests.
1. Level of Precision ( Sampling Error )
 A facilitator guides the group based on a
 The level of precision, is the range in which the true
predetermined set of topics.
value of the population is estimated to be.
4. Experiment
2. Confidence Interval
 there is direct human intervention on the
 It is statistical measure of the number of times out
conditions that may affect the values of the variable
of 100 that results can be expected to be within a
of interest.
specified range.
 For example:
 a confidence interval of 90% means that  For a survey using in-person interviews, the sampling
results of an action will probably meet frame might be a list of all street addresses.
expectations 90% of the time.
Sampling technique/Sampling Strategies
3. Degree of Variability
 Depending upon the target population and  It is a plan you set forth to be sure that the sample you
attributes under consideration, the degree of use in your research study represents the population from
variability varies considerably. which you drew you sample.
 The more heterogeneous a population is, the larger
the sample size is required to get an optimum level Sampling Bias
of precision.
 This involves problems in your sampling, which
Raosoft Calculator reveals that your sample is not representative of your
population.
 This can be used to determine the sample size.
Advantages of Sampling

Basic Sampling Design  Less Labor


 Greater Efficiency and Accuracy
 The goal in sampling is to obtain individuals for a study  Reduced Cost
in such a way that accurate information about the  Convenience
population can be obtained.  Greater Speed
 Ethical Considerations
Reason for Sampling  Greater Scope

 Important that the individuals included in a sample SAMPLING


represent a cross section of individuals in the population.
 If sample is not representative it is biased. You cannot  refers to the process of selecting these individuals.
generalize to the population from your statistical data.

Definitions
Two Types of Sampling
Random Sampling or Probability Sampling
Observation unit
 It is a process whose members had an equal chance of
 An object on which a measurement is taken.
being selected from the population.
 This is the basic unit of observation, sometimes called an
 Samples are obtained using some objective chance
element. In studying human populations, observation
mechanism, thus involving randomization.
units are often individuals.
 They require the use of a complete listing of the elements
of the universe called the sampling frame.
Target population
 The probabilities of selection are known. They are
generally referred to as random samples.
 The complete collection of observations we want to
 They allow drawing of valid generalizations about the
study.
universe/population.
Sampled population
Simple Random Sampling
 The collection of all possible observation units that
 most basic method of drawing a probability sample
might have been chosen in a sample; the population from
which assigns equal probabilities of selection to each
which the sample was taken.
possible sample.
 It is also a process of selecting n sample size in the
Sample
population via random numbers or through lottery.
 A subset of a population.
Systematic Sampling
Sampling unit
 obtained by selecting every kth individual from the
population until the desired number of subjects or
 A unit that can be selected for a sample.
respondents is obtained.
 We may want to study individuals, but do not have a list
 The first individual selected corresponds to a random
of all individuals in the target population. Instead,
number between 1 to k.
households serve as the sampling units, and the
observation units are the individuals living in the
Stratified Random Sampling
households.
 obtained by separating the population into non-
Sampling frame
overlapping groups called strata and then obtaining a
simple random sample from each stratum.
 A list, map, or other specification of sampling units in
the population from which a sample may be selected.
 The individuals within each stratum should be  It is appropriate to determine the central tendency of an
homogeneous (or similar) in some way. interval or ratio data.

Cluster Sampling SYMBOL FOR MEAN

 process of selecting clusters from a population which is  symbol x̄, called “x bar”, is used to represent the mean
very large or widely spread out over a wide geographical of a sample
area  symbol μ, called “mu”, is used to denote the mean of a
population.
Non-random Sampling or Non-probability Sampling
PROPERTIES OF MEAN
 a sampling procedure where samples selected in a
deliberate manner with little or no attention to  A set of data has only one mean.
randomization.  Mean can be applied for interval and ratio data.
 Samples are obtained haphazardly, selected purposively  All values in the data set are included in computing the
or are taken as volunteers. mean.
 The probabilities of selection are unknown. They should  The mean is very useful in comparing two or more data
not be used for statistical inference. sets.
 Mean is most appropriate in symmetrical data.
Convenience Sampling  Mean is affected by the extreme small or large values
(outliers) on a data set.
 a process of selecting a group of individuals who are
conveniently available for a study. Mean can be computed as

Purposive Sampling

 a process of selecting based from judgement to select a


sample which the researcher believed, based on prior
information, will provide the data they need.
Sample Mean
Quota Sampling
The sample mean is computed as
 applied when an investigator survey collects information
from an assigned number, or quota of individuals from
one of several sample units fulfilling certain prescribed
criteria or belonging to one stratum to one stratum.

Snowball Sampling Where:


 x̄ is the sample mean
 a technique in which one or more members of a  x is the value of any particular observations or
population are located and used to lead the researchers measurements
to other members of the population.  ∑x is the sum of all x’s
 n is the total number of values in the sample
Voluntary Sampling
Population Mean
 a technique when a sample is composed of respondents
who are self-select (volunteered) into the study/survey. The population mean is computed as
 the respondents have a strong interest in the topic of
the study.

LECRURE 2.1 MEASURES OF CENTRAL TENDENCY


MEASURE OF CENTRAL TENDENCY

 commonly referred to as an average, is a single value Where:


that represents a data set.  μ is the population mean
 Its purpose is to locate the center of a data set.  x is the value of any particular observations or
measurements
Three Different Measures of Central  ∑x is the sum of all x’s
 N is the total number of values in the population
Tendency:
Median
Mean ( Arithmetic Mean )
 is the midpoint of the data array.
 is the most frequently used measure of central tendency.  When the data set is ordered whether ascending or
 only common measure in which all values play an equal descending, it is called data array.
role meaning to determine its values you would need to
consider all the values of any given data set.
 an appropriate measure of central tendency for data Types of Distribution
that are ordinal or above, but is more valuable in an
ordinal type of data. Symmetric Distribution

Properties of Median  the data values are evenly distributed on


both sides of the mean.
 The median is unique, there is only one median for a  Also, the distribution is unimodal and the
set of data. mean, median, and mode are similar and are
 The median is found by arranging the set of data from at the center of the distribution.
lowest or highest (or highest to lowest) and getting the
value of the middle observation. Positively Skewed (right- skewed distribution)
 Median is not affected by the extreme small or large
values.  most of the values in the data fall to the left
 Median can be applied for ordinal, interval, and ratio of the mean and group at the lower end of
data. the distribution; the tail is to the right.
 Median is most appropriate in a skewed data.  the mean is to the right of the median, and
the mode is to the left of the median.
To determine the value for median in a data set with n values,
we need to consider two rules. Negatively Skewed (left- skewed distribution)

A. If n is odd, the median is the middle ranked value.  when the mass of the data fall to the right
of the mean and group at the upper end of
the distribution, with the tail to the left.
 the mean is to the left of the median, and
the mode is to the right of the median.

B. If n is even, the median is the average of the two middle LESSON 2.3 MEASURES OF DISPERSION
ranked values. MEASURE OF DISPERSION

 Another important characteristic of a data is how it is


distributed.
 Dispersion is the difference between the actual value
 STEP 1: Arrange the data in order. and the average value.
 STEP 2: Select the middle rank value using the
formula. RANGE
 STEP 3: Identify the median.

Mode  is the simplest and easiest way to determine measure of


dispersion.
 is the value in a data set that appears most frequently.  It is the difference of the highest value (HV) and lowest
Like the median and unlike the mean, the extreme values value (LV) in the data set.
in a data set do not affect the mode.
 A data set that has only one value that occur the greatest Formula Of Range
frequency is said to be unimodal.
 If the data has two values with the same greatest  Range = Highest Value − Lowest Value
frequency, both values are considered the mode and
the data set is bimodal. Advantages of Range
 If a data set have more than two modes, and the data set
is said to be multimodal.  Easy to compute
 There are also some cases when data set values have the  Easy to understand
same number frequency, when this occur, the data set is
said to be no mode. Disadvantages of Range

Properties of Mode  Can be distorted by a single extreme value


 Only two values are used in the calculation
 The mode is found by locating the most frequently
occurring value. Standard Deviation (SD)
 The mode is the easiest average to compute.
 There can be more than one mode or even no mode in  One of the most widely used measures of dispersion
any given data set.  is calculated as the square root of variance.
 Mode is not affected by the extreme small or large  It provides a good indication of volatility.
values.  It measures how widely values are dispersed from the
 Mode can be applied for nominal, ordinal, interval, and average.
ratio data.
Formula for Sample Standard Deviation

Where:
 x is the value of any particular observation
 ∑x² is the sum of all the square of x’s
 ∑x is the sum of x’s
 n is the sample size

Formula for Population Standard Deviation

Where:
 x is the value of any particular observation
 ∑x² is the sum of all the square of x’s
 ∑x is the sum of x’s
 N is the population size

Variance
 is the mathematical expectation of the average squared
deviations from the mean.

Formula for Sample Variance

Where:
 s² is the sample variance
 x is the value of any particular observation
 ∑x² is the sum of all the square of x’s
 ∑x is the sum of x’s
 n is the sample size

Formula for Population Variance

Where:
 σ² is the population variance
 x is the value of any particular observation
 Σx² is the sum of all the square of x’s
 Σx is the sum of x’s
 N is the population size

You might also like